Building a Text-To-Speech App using JavaScript Web Speech API

Have you ever wondered about the inner workings of audiobooks? I vividly recall my first encounter with one, marveling at the computer's ability to transform written text into spoken words. I believe many of us share similar sentiments when encountering applications that perform text-to-speech conversions. Thanks to the continuously expanding realm of technology, you now can craft your text-to-speech applications using the JavaScript Web Speech API. Isn't that remarkable?

About Web Speech API

API is an abbreviation for Application Programming Interface and it simply acts as an intermediary between software applications enabling them to operate together whilst sharing data seamlessly. The Web Speech API consists of two parts - SpeechRecognition and SpeechSynthesis interface. SpeechRecognition interface can recognize voice from a device’s audio input and respond accordingly.

On the other hand, SpeechSynthesis interface is a component that reads text content through the device’s speech synthesizer. The SpeechSynthesisVoice object houses different voice types that can be chosen by users.

We will be working with the SpeechSynthesis interface that converts text to speech. Let's go!

Prerequisites

To follow and get a good understanding of what is being taught in this article you will need:

An updated version of any of your favorite web browsers (preferably Google Chrome or Mozilla Firefox).
Basic understanding of HTML and CSS.
Good knowledge of JavaScript.
A functional device output.

Project Directory

Create a new directory for the project and create two files, index.html and index.js. Do not forget to link them together.

Getting the HTML template ready

In this project, we need just few HTML elements which are;

A Select Option element.
Two input types of range, names of pitch and speed.
A textarea.
Two buttons, one labeled Stop and the other Speak.

Not to worry, you can access the entire html template here

NOTE: Styling was done with TailwindCSS, in case you are not conversant with it, do not bother as long as it works.

Getting the JavaScript ready

Declaring variables

Firstly, it is good practice to declare and initialize variables in JavaScript before anything else. So, we declare our msg variable that is the new instance of the SpeechSynthesisUtterance object.

const msg = new SpeechSynthesisUtterance();

The output voices would be set to an empty array, so that different voices that are gotten from the speechSynthesis.getVoices() method can be pushed in.

let voices = [];

We need to grab html elements and assign to variables also

const voicesDropdown = document.querySelector("[name=voice]");
const options = document.querySelectorAll("[type=range], [name=text]");
const speakBtn = document.querySelector("#speak");
const stopBtn = document.querySelector("#stop");

The probably most important html element that contains the text to be converted to speech is the textarea. Here, we need the value of the text inputted to be assigned to the variable of msg and property of text.

msg.text = document.querySelector("[name=text]").value;

Populating voices

We want to get all the available voices in the getVoices instance method and push them into the voices array which we already set to empty previously. Each voice pushed into the voices array will be mapped and returns the equivalent properties of name and lang. Map functions always return elements in an array-like form, so we convert it back to string using the join method. Each voice element will now include an option value of voice name, voice name and language which will be passed as the innerHTML of the voicesDropdown element.

// Push all available voices into the voices array
function populateVoices() {
  voices = this.getVoices();

  const voicesOption = voices
    .map((voice) => {
      return `<option value="${voice.name}">${voice.name} (${voice.lang})</option>`;
    })
    .join("");
  voicesDropdown.innerHTML = voicesOption;
}

Toggle start/stop speech

The SpeechSynthesis interface has methods like speak(),pause(),resume() and cancel(). They all work on the Utterance queue which lines up texts to be spoken. In this project, we will make use of just the speak and cancel methods. The speak() method inserts utterances to the queue and will be spoken only when other previous utterances have been spoken. The cancel() method removes all utterances in the queue.

// toggle function to either stop the ongoing speech and/or start a new one
function toggle(startOver = true){
  if (startOver) {
    speechSynthesis.speak(msg);
  }else{
  speechSynthesis.cancel();
}
};

Find matching voices

We want to be able to identify a voice that matches the one selected and change to it. We need a function that uses the find method to check if the voice is equal to the selected voice.

// find a voice that's equal to the voice selected and change to it
function setVoice() {
  msg.voice = voices.find((voice) => voice.name === this.value);
  toggle(); //call the function for when a new voice is selected
}

Altering pitch and rate

A function called setOption will set pitch and rate to a new value when changed on the interface.

// Set the pitch, rate and texts to alter to the changed value
function setOption() {
  msg[this.name] = this.value;
  toggle();
}

AddEventListeners

The last steps will be to attach eventListeners to HTML elements with the events and functions to be triggered.

An event called voiceschanged fires off when the SpeechSynthesis.getVoices() list of SpeechSynthesisVoice() objects has changed.

speechSynthesis.addEventListener("voiceschanged", populateVoices);
voicesDropdown.addEventListener("change", setVoice); //listen for a change in the selected voice

options.forEach((option) => option.addEventListener("change", setOption));

speakBtn.addEventListener("click", toggle);
stopBtn.addEventListener("click", () => toggle(false)); //change the arg to false so it doesn't start over.

Here, is the full code:

"use strict";

const msg = new SpeechSynthesisUtterance();
let voices = [];

const voicesDropdown = document.querySelector("[name=voice]");
const options = document.querySelectorAll("[type=range], [name=text]");
const speakBtn = document.querySelector("#speak");
const stopBtn = document.querySelector("#stop");

msg.text = document.querySelector("[name=text]").value;

// Push all available voices into the voices array
function populateVoices() {
  voices = this.getVoices();

  const voicesOption = voices
    .map((voice) => {
      return `<option value="${voice.name}">${voice.name} (${voice.lang})</option>`;
    })
    .join("");
  voicesDropdown.innerHTML = voicesOption;
}

// find a voice that's equal to the voice selected and change to it
function setVoice() {
  msg.voice = voices.find((voice) => voice.name === this.value);
  toggle(); //call the function for when a new voice is selected
}

// toggle function to either stop the ongoing speech and/or start a new one
function toggle(startOver = true){
  speechSynthesis.cancel();
  if (startOver) {
    speechSynthesis.speak(msg);
  }
};

// Set the pitch, rate and texts to alter to the changed value
function setOption() {
  msg[this.name] = this.value;
  toggle();
}

speechSynthesis.addEventListener("voiceschanged", populateVoices);
voicesDropdown.addEventListener("change", setVoice); //listen for a change in the selected voice

options.forEach((option) => option.addEventListener("change", setOption));

speakBtn.addEventListener("click", toggle);
stopBtn.addEventListener("click", () => toggle(false)); //change the arg to false so it doesn't start over

Final Results

The full code can be found in my Github repository.

Check out the live project here