Building Real-Time Language Translation with AssemblyAI and DeepL in JavaScript

In
a
comprehensive
tutorial,
AssemblyAI
offers
insights
into
creating
a
real-time
language
translation
service
using
JavaScript.
The
tutorial
leverages
AssemblyAI
for
real-time
speech-to-text
transcription
and
DeepL
for
translating
the
transcribed
text
into
various
languages.

Introduction
to
Real-Time
Translation

Translations
play
a
critical
role
in
communication
and
accessibility
across
different
languages.
For
instance,
a
tourist
in
a
foreign
country
may
struggle
to
communicate
if
they
don’t
understand
the
local
language.
AssemblyAI’s

Streaming
Speech-to-Text
service
can
transcribe
speech
in
real-time,
which
can
then
be
translated
using
DeepL,
making
communication
seamless.

Setting
Up
the
Project

The
tutorial
begins
with
setting
up
a
Node.js
project.
Essential
dependencies
are
installed,
including
Express.js
for
creating
a
simple
server,
dotenv
for
managing
environment
variables,
and
the
official
libraries
for
AssemblyAI
and
DeepL.

mkdir real-time-translation
cd real-time-translation
npm init -y
npm install express dotenv assemblyai deepl-node

API
keys
for
AssemblyAI
and
DeepL
are
stored
in
a
.env
file
to
keep
them
secure
and
avoid
exposing
them
in
the
frontend.

Creating
the
Backend

The
backend
is
designed
to
keep
API
keys
secure
and
generate
temporary
tokens
for
secure
communication
with
the
AssemblyAI
and
DeepL
APIs.
Routes
are
defined
to
serve
the
frontend
and
handle
token
generation
and
text
translation.

const express = require("express");
const deepl = require("deepl-node");
const { AssemblyAI } = require("assemblyai");
require("dotenv").config(); const app = express();
const port = 3000; app.use(express.static("public"));
app.use(express.json()); app.get("/", (req, res) => { res.sendFile(__dirname + "/public/index.html");
}); app.get("/token", async (req, res) => { const token = await client.realtime.createTemporaryToken({ expires_in: 300 }); res.json({ token });
}); app.post("/translate", async (req, res) => { const { text, target_lang } = req.body; const translation = await translator.translateText(text, "en", target_lang); res.json({ translation });
}); app.listen(port, () => { console.log(`Listening on port ${port}`);
});

Frontend
Development

The
frontend
consists
of
an
HTML
page
with
text
areas
for
displaying
the
transcription
and
translation,
and
a
button
to
start
and
stop
recording.
The
AssemblyAI
SDK
and
RecordRTC
library
are
utilized
for
real-time
audio
recording
and
transcription.

<!DOCTYPE html>
<html lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Voice Recorder with Transcription</title> <script src="https://cdn.tailwindcss.com"></script> </head> <body> <div class="min-h-screen flex flex-col items-center justify-center bg-gray-100 p-4"> <div class="w-full max-w-6xl bg-white shadow-md rounded-lg p-4 flex flex-col md:flex-row space-y-4 md:space-y-0 md:space-x-4"> <div class="flex-1"> <label for="transcript" class="block text-sm font-medium text-gray-700">Transcript</label> <textarea id="transcript" rows="20" class="mt-1 block w-full p-2 border border-gray-300 rounded-md shadow-sm"></textarea> </div> <div class="flex-1"> <label for="translation" class="block text-sm font-medium text-gray-700">Translation</label> <select id="translation-language" class="mt-1 block w-full p-2 border border-gray-300 rounded-md shadow-sm"> <option value="es">Spanish</option> <option value="fr">French</option> <option value="de">German</option> <option value="zh">Chinese</option> </select> <textarea id="translation" rows="18" class="mt-1 block w-full p-2 border border-gray-300 rounded-md shadow-sm"></textarea> </div> </div> <button id="record-button" class="mt-4 px-6 py-2 bg-blue-500 text-white rounded-md shadow">Record</button> </div> <script src="https://www.unpkg.com/assemblyai@latest/dist/assemblyai.umd.min.js"></script> <script src="https://www.WebRTC-Experiment.com/RecordRTC.js"></script> <script src="main.js"></script> </body>
</html>

Real-Time
Transcription
and
Translation

The
main.js
file
handles
the
audio
recording,
transcription,
and
translation.
The
AssemblyAI
real-time
transcription
service
processes
the
audio,
and
the
DeepL
API
translates
the
final
transcriptions
into
the
selected
language.

const recordBtn = document.getElementById("record-button");
const transcript = document.getElementById("transcript");
const translationLanguage = document.getElementById("translation-language");
const translation = document.getElementById("translation"); let isRecording = false;
let recorder;
let rt; const run = async () => { if (isRecording) { if (rt) { await rt.close(false); rt = null; } if (recorder) { recorder.stopRecording(); recorder = null; } recordBtn.innerText = "Record"; transcript.innerText = ""; translation.innerText = ""; } else { recordBtn.innerText = "Loading..."; const response = await fetch("/token"); const data = await response.json(); rt = new assemblyai.RealtimeService({ token: data.token }); const texts = {}; let translatedText = ""; rt.on("transcript", async (message) => { let msg = ""; texts[message.audio_start] = message.text; const keys = Object.keys(texts); keys.sort((a, b) => a - b); for (const key of keys) { if (texts[key]) { msg += ` ${texts[key]}`; } } transcript.innerText = msg; if (message.message_type === "FinalTranscript") { const response = await fetch("/translate", { method: "POST", headers: { "Content-Type": "application/json", }, body: JSON.stringify({ text: message.text, target_lang: translationLanguage.value, }), }); const data = await response.json(); translatedText += ` ${data.translation.text}`; translation.innerText = translatedText; } }); rt.on("error", async (error) => { console.error(error); await rt.close(); }); rt.on("close", (event) => { console.log(event); rt = null; }); await rt.connect(); navigator.mediaDevices .getUserMedia({ audio: true }) .then((stream) => { recorder = new RecordRTC(stream, { type: "audio", mimeType: "audio/webm;codecs=pcm", recorderType: StereoAudioRecorder, timeSlice: 250, desiredSampRate: 16000, numberOfAudioChannels: 1, bufferSize: 16384, audioBitsPerSecond: 128000, ondataavailable: async (blob) => { if (rt) { rt.sendAudio(await blob.arrayBuffer()); } }, }); recorder.startRecording(); recordBtn.innerText = "Stop Recording"; }) .catch((err) => console.error(err)); } isRecording = !isRecording;
};
recordBtn.addEventListener("click", () => { run();
});

Conclusion

This
tutorial
demonstrates
how
to
build
a
real-time
language
translation
service
using
AssemblyAI
and
DeepL
in
JavaScript.
Such
a
tool
can
significantly
enhance
communication
and
accessibility
for
users
in
different
linguistic
contexts.
For
more
detailed
instructions,
visit
the
original

AssemblyAI
tutorial.

Image
source:
Shutterstock

Building Real-Time Language Translation with AssemblyAI and DeepL in JavaScript

Introduction to Real-Time Translation

Setting Up the Project

Creating the Backend

Frontend Development

Real-Time Transcription and Translation