How to Transcribe YouTube Videos and Generate Subtitles Using Node.js


How to Transcribe YouTube Videos and Generate Subtitles Using Node.js

In
a
recent
tutorial
by
AssemblyAI,
developers
can
learn
how
to
transcribe
YouTube
videos
and
generate
SRT
subtitles
using
Node.js.
This
guide
not
only
covers
the
transcription
process
but
also
demonstrates
how
to
create
subtitles
and
leverage
the
LeMUR
framework
for
video
prompting
using
a
Large
Language
Model
(LLM).

Step
1:
Set
Up
Your
Development
Environment

To
get
started,
developers
need
to
install

Node.js

18
or
higher.
After
setting
up
the
project
directory
and
initializing
a
new
Node.js
project,
the

package.json

file
should
be
configured
to
use
ES
Module
syntax.

Next,
install
the
necessary
NPM
modules:


  • assemblyai
    :
    Installs
    the
    AssemblyAI
    JavaScript
    SDK
    to
    interact
    with
    the
    AssemblyAI
    API.

  • youtube-dl-exec
    :
    A
    wrapper
    for
    the
    yt-dlp
    CLI
    tool
    to
    retrieve
    and
    download
    YouTube
    video
    information.

  • tsx
    :
    Allows
    execution
    of
    TypeScript
    code
    without
    additional
    setup.

Additionally,
Python
3.7
or
above
is
required
for

youtube-dl-exec
.
An
AssemblyAI
API
key
is
also
necessary,
which
can
be
configured
as
an
environment
variable
on
the
machine.

Step
2:
Retrieve
the
Audio
of
a
YouTube
Video

To
transcribe
a
video,
a
public
URL
to
the
audio
track
is
needed.
YouTube
stores
audio
and
video
separately,
and
the

youtube-dl-exec

module
can
retrieve
this
information.
The
following
script
retrieves
the
audio
URL
from
a
YouTube
video:

import { youtubeDl } from "youtube-dl-exec"; const youtubeVideoUrl = "https://www.youtube.com/watch?v=wtolixa9XTg"; console.log("Retrieving audio URL from YouTube video");
const videoInfo = await youtubeDl(youtubeVideoUrl, { dumpSingleJson: true, preferFreeFormats: true, addHeader: ["referer:youtube.com", "user-agent:googlebot"],
}); const audioUrl = videoInfo.formats.reverse().find( (format) => format.resolution === "audio only" && format.ext === "m4a",
)?.url; if (!audioUrl) { throw new Error("No audio only format found");
}
console.log("Audio URL retrieved successfully");
console.log("Audio URL:", audioUrl);

With
the
audio
URL,
the
audio
can
be
transcribed
using
AssemblyAI.

Step
3:
Save
the
Transcript
and
Subtitles

Once
the
transcription
is
complete,
the
transcript
text
can
be
saved
to
a
file.
The
following
code
saves
the
transcript
and
generates
SRT
subtitles:

import { writeFile } from "fs/promises" console.log("Saving transcript to file");
await writeFile("./transcript.txt", transcript.text!);
console.log("Transcript saved to file transcript.txt"); console.log("Retrieving transcript as SRT subtitles");
const subtitles = await aaiClient.transcripts.subtitles(transcript.id, "srt");
await writeFile("./subtitles.srt", subtitles);
console.log("Subtitles saved to file subtitles.srt");

To
generate
WebVTT
subtitles,
simply
replace
"srt"
with
"vtt"
and
save
the
file
with
the
.vtt
extension.

Step
4:
Run
the
Script

To
execute
the
script,
use
the
following
command:

npx tsx index.ts

The
transcript
text
and
subtitles
will
be
saved
to
the
disk,
with
the
process
duration
depending
on
the
length
of
the
YouTube
video.

Bonus:
Prompt
a
YouTube
Video
Using
LeMUR

AssemblyAI’s
LeMUR
framework
allows
developers
to
build
generative
AI
features.
By
writing
prompts
for
the
LLM,
developers
can
generate
responses
based
on
the
transcript.
For
instance,
a
prompt
to
summarize
the
video
using
bullet
points
can
be
implemented
as
follows:

console.log("Prompting LeMUR to summarize the video"); const prompt = "Summarize this video using bullet points";
const lemurResponse = await aaiClient.lemur.task({ transcript_ids: [transcript.id], prompt, final_model: "default"
});
console.log(prompt + ": " + lemurResponse.response);

For
further
customization,
various
supported
models
are
listed
in
the
LeMUR
documentation.

Next
Steps

In
this
tutorial,
developers
learned
to
retrieve
audio
from
a
YouTube
video,
transcribe
the
audio,
generate
subtitles,
and
summarize
the
video
using
LeMUR.
For
more
capabilities,
check
out
AssemblyAI’s

Audio
Intelligence
models

and

LeMUR
.

Image
source:
Shutterstock

Comments are closed.