Text to Speech & Translation App With Serverless Functions, Web Speech API & Google Translate

javascript Jan 22, 2025

In this project, we're going to build an application that will do text-to-speech using the Web Speech API and also translate the text to a different language using the Google Translate API. We'll be using Tailwind CSS for the UI. We're also going to be deploying this project to Vercel and we're going to use a serverless function for the translation because it requires a Google API key and you never want that to be exposed on the client.

If you'd like to use a framework like React, that's fine, but I want to get back to doing more vanilla JavaScript projects. There's a million ways that you can create and structure this project, this is just one of them. I want to keep it simple. Let's get started!

Creating The File Structure

Let's start by creating our files and folders. This is a pretty simple structure

The HTML

Let's open the index.html file and add the following:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <script src="https://cdn.tailwindcss.com"> </script>
    <link rel="stylesheet" href="styles.css" />
    <title>TTS & Translation App</title>
  </head>
  <body class="bg-gray-100 flex items-center justify-center h-screen">
    <div class="bg-white shadow-md rounded-lg p-6 max-w-md w-full">
      <h1 class="text-2xl font-semibold text-gray-800 text-center mb-4">
        TTS & Translation
      </h1>

      <!-- Text Input -->
      <textarea
        class="w-full h-32 p-3 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500 resize-none"
        placeholder="Type your text here..."
      ></textarea>

      <!-- Voice Selection -->
      <div class="mt-4">
        <label
          for="voiceSelect"
          class="block text-sm font-medium text-gray-700 mb-2"
          >Select Voice</label
        >
        <select
          id="voiceSelect"
          class="w-full p-3 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
        >
          <option value="default">Default Voice</option>
          <option value="voice1">Voice 1</option>
          <option value="voice2">Voice 2</option>
        </select>
      </div>

      <!-- Play Button -->
      <button
        class="mt-6 w-full bg-blue-500 hover:bg-blue-600 text-white font-medium py-3 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
        id="playButton"
      >
        Play Text
      </button>
    </div>
    <script src="./script.js"> </script>
  </body>
</html>

This is pretty simple. We are using the Tailwind CDN to include Tailwind CSS. If you want to use the CLI, that's fine too. In fact, that is probably what I would do in a real project, but I wanted to keep it simple because Tailwind is not the focus.

Then we just have a container with a form in it with a text area and a voice selection dropdown. Then we have the play button. We are also including our script at the bottom.

TTS

The text-to-speech part is really simple because we can just use the built in Web Speech API in the browser. No need for any libraries or external APIs.

Add the following code to the public/script.js file:

const voiceSelect = document.querySelector('#voiceSelect');
const playButton = document.querySelector('#playButton');
const textInput = document.querySelector('textarea');

// Load available voices
let voices = [];
function loadVoices( ) {
  voices = speechSynthesis.getVoices();
  voiceSelect.innerHTML = voices
    .map(
      (voice, index) =>
        `<option value="${index}">${voice.name} (${voice.lang})</option>`
    )
    .join('');
}

// Trigger loading voices when they become available
speechSynthesis.onvoiceschanged = loadVoices;
loadVoices();

// Play TTS
playButton.addEventListener('click', () => {
  const utterance = new SpeechSynthesisUtterance(textInput.value);
  const selectedVoice = voices[voiceSelect.value];
  if (selectedVoice) utterance.voice = selectedVoice;
  speechSynthesis.speak(utterance);
});

We are first getting the UI elements from the DOM. Then we are getting the voices that the API gives us with speechSynthesis.getVoices and adding them to the dropdown by creating an option for each voice. We are also listening for the onvoiceschanged event to reload the voices when they change.

Then we are adding an event listener to the play button that creates a new SpeechSynthesisUtterance object with the text from the text area and the selected voice from the dropdown. Then we just call speechSynthesis.speak with the utterance object. This will play the text-to-speech.

You can see how simple this is. The Web Speech API is really powerful and easy to use.

Before we implement the translation, let's add a language select dropdown to the UI. Add the following code to the index.html file right above the voice select dropdown:

<!-- Language Selection -->
<div class="mt-4">
  <label
    for="languageSelect"
    class="block text-sm font-medium text-gray-700 mb-2"
    >Select Language</label
  >
  <select
    id="languageSelect"
    class="w-full p-3 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
  ></select>
</div>

Update Client Side JS

Now go back into the script.js file and add the selection of the language dropdown and add the languages you want to use to an array:

const voiceSelect = document.querySelector('#voiceSelect');
const playButton = document.querySelector('#playButton');
const textInput = document.querySelector('textarea');
const languageSelect = document.querySelector('#languageSelect');

// Array of supported languages with their ISO codes
const languages = [
  { code: 'en', name: 'English' },
  { code: 'es', name: 'Spanish' },
  { code: 'fr', name: 'French' },
  { code: 'de', name: 'German' },
  { code: 'it', name: 'Italian' },
  { code: 'ja', name: 'Japanese' },
  { code: 'zh-CN', name: 'Chinese (Simplified)' },
];

// Rest of the code...

Then add the languages to the dropdown:

// Populate language dropdown
languages.forEach(({ code, name }) => {
  const option = document.createElement('option');
  option.value = code;
  option.textContent = name;
  languageSelect.appendChild(option);
});

Vercel CLI Setup

This part can be a little tricky because we need to use the Google Translate API, which requires an API key. You never want to expose your API key on the client side because it can be easily stolen. We are going to use a serverless function to make the request to the API and then return the translated text to the client. The problem is that we are developing this locally and we don't have the serverless function yet. There are a few ways to handle this. We could just create a local function for now and then replace it with the serverless function later. We could also use the Vercel CLI to create a serverless function and then use that. So that's what we are going to do.

If you don't have a Vercel account, you will need to create one. You can simply login with Github at vercel.com. You will get a free hobby account, which is more than enough for this project.

Now we need to install the CLI. We will install it globally so we can use it from the terminal.

Open your terminal and run the following command:

npm install -g vercel

You can verify that it installed correctly by running vercel -v.

This will install the Vercel CLI globally. Then run the following command to create a new serverless function:

vercel login

I'm selecting the "Login with Github" option. It will open the browser and ask you to authorize the Vercel CLI. Once you do that, you can close the browser and go back to the terminal.

Now run the following command to create a new Vercel project:

vercel dev

You will be asked the following questions. Here are the answers you should use:

Set up and develop [project]? yes
Which scope should contain your project? Whatever you named your scope whether hobby or paid
Link to existing project? no
What’s your project’s name? tts-translate-app
In which directory is your code located? ./
Want to modify these settings? no

Now it will run your project on http://localhost:3000. You can open that in your browser and you will see your project at that address.

Create Serverless Function

The way this works is we can create a folder in the root called api and then create a file in that folder with the name of the function. So we are going to create a file called translate.js in the api folder.

Add the following code for now to the api/translate.js file:

export default async function handler(req, res) {
  return res.status(200).json({ message: 'Hello World' });
}

The function has to be an async function that takes two arguments, req and res. This is the same as Express if you are familiar with that. We are just returning a simple JSON object for now.

Now hit the route http://localhost:3000/api/translate in your browser and you should see the JSON object. This is mimicking what the serverless function will do when we deploy it.

Now that we have the serverless function set up, we can start working on the translation logic.

Add the following to the file:

export default async function handler(req, res) {
  if (req.method !== 'POST') {
    return res.status(405).json({ error: 'Method not allowed' });
  }

  const { text, target } = req.body;

  if (!text || !target) {
    return res.status(400).json({ error: 'Missing text or target language' });
  }

  const apiKey = process.env.GOOGLE_TRANSLATE_API_KEY; // Securely access your API key
  const apiUrl = `https://translation.googleapis.com/language/translate/v2?key=${apiKey}`;

  try {
    const response = await fetch(apiUrl, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        q: text,
        target,
        format: 'text',
      }),
    });

    if (!response.ok) {
      const errorText = await response.text();
      return res.status(response.status).json({ error: errorText });
    }

    const data = await response.json();
    res.status(200).json(data);
  } catch (error) {
    console.error('Error in API call:', error);
    res.status(500).json({ error: 'Internal server error' });
  }
}

First off, we need this to be a POST request, so any other method will get a 405 status code. Then we are getting the text and target language from the request body. If either of those is missing, we return a 400 status code.

We are getting the API key from an environment variable. This is the best way to store sensitive information like API keys. You can set this in the Vercel dashboard. We are then constructing the URL for the Google Translate API.

We are then making a POST request to the API with the text, target language, and format. If the response is not ok, we return the error text. If it is ok, we return the data.

Getting The API Key

You need to get your API key, which means logging into the Google Cloud Console and creating a new project. Go to the Google Cloud Console. You will need to create a new project. Once you have created the project, you will need to enable the Google Translate API. You can do this by searching for it in the search bar at the top of the page. Once you have enabled the API, you will need to create an API key. You can do this by going to the credentials section. Click on "Create Credentials". Once you have the key, we can move to the next step.

Local Environment Variables

We need to set the environment variable for the API key. We can do this by creating a .env file in the root of the project. Add the following to the file:

GOOGLE_TRANSLATE_API_KEY=YOUR_API_KEY

Also, create a .gitignore file and add the following:

node_modules
.vercel
.env

Create a .vercelignore file and add the following:

node_modules
.env

Now we need to install the dotenv package to read the environment variables. Run the following command:

npm install -D dotenv

Add the following to the top of the api/translate.js file:

if (process.env.NODE_ENV !== 'production') {
  (async () => {
    const dotenv = await import('dotenv');
    dotenv.config();
  })();
}

This will allow us to use the environment variables from the .env file locally in development. When in production, we add the environment variables in the Vercel dashboard and they are accessed the same way.

Test The Serverless Function

Now that we have the serverless function set up, we can test it. We can use a tool like Postman or Insomnia to test it. I will use Postman. If you want to use Postman, open a new tab and create a new POST request to http://localhost:3000/api/translate. Add the following JSON to the body:

{
  "text": "Hello, how are you?",
  "target": "es"
}

You should get back the following:

{
  "data": {
    "translations": [
      {
        "translatedText": "¿Hola, cómo estás?",
        "detectedSourceLanguage": "en"
      }
    ]
  }
}

We now know that our serverless function is working correctly.

Update Client Side JS

Now we need to go into the public/script.js file and add the translation logic by adding the following function:

// Translate the text by calling the API
async function translateText(text, targetLang) {
  try {
    const response = await fetch('/api/translate', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ text, target: targetLang }),
    });

    if (!response.ok) {
      throw new Error(`Error ${response.status}: ${await response.text()}`);
    }

    const data = await response.json();
    return data.data.translations[0].translatedText;
  } catch (error) {
    console.error('Translation error:', error);
    alert('Failed to translate the text. Please try again later.');
    return text;
  }
}

This function takes in the text and the target language and makes a POST request to the serverless function. If the response is ok, it returns the translated text. If there is an error, it logs the error and returns the original text.

Now we need to not only call the TTS function when the play button is clicked, but also call the translation function and pass the translated text to the TTS function. So let's move the TTS logic into a separate function and call that function when the play button is clicked. Add this function above the play listener:

// Text-to-Speech function
function playText(text, voiceIndex) {
  const utterance = new SpeechSynthesisUtterance(text);
  if (voices[voiceIndex]) {
    utterance.voice = voices[voiceIndex];
  }
  speechSynthesis.speak(utterance);
}

Now change the play listener to the following:

// Translate and play button event
playButton.addEventListener('click', async () => {
  const text = textInput.value.trim();
  const targetLang = languageSelect.value;
  const selectedVoiceIndex = voiceSelect.value;

  if (!text) {
    alert('Please enter some text.');
    return;
  }

  try {
    // Translate the text
    const translatedText = await translateText(text, targetLang);

    // Play the translated text
    playText(translatedText, selectedVoiceIndex);
  } catch (error) {
    console.error('Error during processing:', error);
    alert('An error occurred. Please try again.');
  }
});

We are now calling the translateText function with the text and target language. If that is successful, we call the playText function with the translated text and the selected voice index.

That's it. Reload the page and try to translate and play some text. You should hear the translated text spoken in the selected voice.

Deploy To Vercel

Now that we have everything working locally, we can deploy to Vercel. This is really easy. Just run the following command:

vercel

Of course you can also deploy using Github. You may want to add Github repo anyway.

Add Environment Variable To Vercel

Now that we have deployed the project, we need to add the environment variable to Vercel. Go to the Vercel dashboard and click on the project. Then go to the settings tab. You will see an environment variable section. Add the GOOGLE_TRANSLATE_API_KEY variable with your API key.

Go to the deployments tab and click the three dots and "Redeploy". Once it is deployed, you can test it out.

You now have your deployed TTS and translation app. You can share the link with anyone and they can use it. You can also add your own domain if you want.

Stay connected with news and updates!

Join our mailing list to receive the latest news and updates from our team.
Don't worry, your information will not be shared.

We hate SPAM. We will never sell your information, for any reason.

Text to Speech & Translation App With Serverless Functions, Web Speech API & Google Translate

Creating The File Structure

The HTML

TTS

Add Language Select

Update Client Side JS

Vercel CLI Setup

Create Serverless Function

Getting The API Key

Local Environment Variables

Test The Serverless Function

Update Client Side JS

Deploy To Vercel

Add Environment Variable To Vercel

Stay connected with news and updates!

Join Our Free Trial