How to Create a Discord Voice Bot Using ChatGPT

Discord,
a
popular
instant
messaging
and
social
media
platform,
is
widely
favored
by
online
communities,
streamers,
and
gamers.
One
of
its
most
cherished
features
is
its
voice
channels,
which
allow
members
to
connect
over
voice
and
video.
Another
significant
advantage
of
Discord,
especially
for
developers,
is
its
customizability,
enabling
the
creation
of
bots
to
add
new
functionalities.
According
to

AssemblyAI,
this
tutorial
will
guide
you
through
developing
a
Discord
bot
that
can
join
voice
channels,
transcribe
audio,
generate
intelligent
responses
via
ChatGPT,
and
convert
these
responses
back
to
speech.

Set
Up
the
Bot

To
build
the
Discord
bot,
you
will
use
Node.js
along
with
third-party
services
such
as
AssemblyAI
for
speech-to-text,
OpenAI
for
intelligent
responses,
and
ElevenLabs
for
text-to-speech
conversion.
Familiarity
with
JavaScript
and
Node.js,
as
well
as
setting
up
a
Node.js
project,
installing
dependencies,
and
writing
basic
asynchronous
code,
is
assumed.

First,
ensure
you
have
Node.js
(version
18
or
higher)
installed
and
access
to
a
Discord
server
with
administrator
rights.
Create
a
project
directory
and
initialize
a
Node.js
project:

mkdir discord-voice-bot && cd discord-voice-bot
npm init -y

Install
the
required
dependencies:

npm install discord.js libsodium-wrappers ffmpeg-static @discordjs/opus @discordjs/voice dotenv assemblyai elevenlabs-node openai

Store
API
keys
in
a
.env
file
for
security:

OPENAI_API_KEY=
ASSEMBLYAI_API_KEY=
ELEVENLABS_API_KEY=
DISCORD_TOKEN=

Set
up
a
Discord
developer
account,
create
an
application,
enable
necessary
permissions,
and
save
the
bot
token
in
the
.env
file.
Add
the
bot
to
your
server
using
the
generated
URL.

Develop
the
Discord
Voice
Bot
Functions

The
bot
will
join
a
voice
channel,
record
audio,
transcribe
it
using
AssemblyAI,
generate
responses
via
ChatGPT,
and
convert
these
responses
to
speech
using
ElevenLabs.

const { joinVoiceChannel, VoiceConnectionStatus } = require("@discordjs/voice"); client.on(Events.MessageCreate, async (message) => { if (message.content.toLowerCase() === "!join") { channel = message.member.voice.channel; if (channel) { const connection = joinVoiceChannel({ channelId: channel.id, guildId: message.guild.id, adapterCreator: message.guild.voiceAdapterCreator, }); connection.on(VoiceConnectionStatus.Ready, () => { message.reply(`Joined voice channel: ${channel.name}!`); listenAndRespond(connection, message); }); } else { message.reply("You need to join a voice channel first!"); } }
});

Record
and
Transcribe
Audio

Capture
audio
streams
from
voice
channels
and
transcribe
them
using
AssemblyAI:

const { AssemblyAI } = require("assemblyai");
const assemblyAI = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY }); const transcriber = assemblyAI.realtime.transcriber({ sampleRate: 48000 }); transcriber.on("transcript", (transcript) => { if (transcript.message_type === "FinalTranscript") { transcription += transcript.text + " "; }
}); async function listenAndRespond(connection, message) { const audioStream = connection.receiver.subscribe(message.author.id); const prism = require("prism-media"); const opusDecoder = new prism.opus.Decoder({ rate: 48000, channels: 1 }); audioStream.pipe(opusDecoder).on("data", (chunk) => { transcriber.sendAudio(chunk); }); audioStream.on("end", async () => { await transcriber.close(); const chatGPTResponse = await getChatGPTResponse(transcription); const audioPath = await convertTextToSpeech(chatGPTResponse); playAudio(connection, audioPath); });
}

Generate
Responses
with
ChatGPT

Use
OpenAI’s
GPT-3.5
Turbo
model
to
generate
intelligent
responses:

const { OpenAI } = require("openai");
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); async function getChatGPTResponse(text) { const response = await openai.completions.create({ model: "gpt-3.5-turbo", prompt: text, max_tokens: 100, }); return response.choices[0].text.trim();
}

Convert
Text
to
Speech
with
ElevenLabs

Convert
ChatGPT
responses
to
speech
using
ElevenLabs:

const ElevenLabs = require("elevenlabs-node");
const voice = new ElevenLabs({ apiKey: process.env.ELEVENLABS_API_KEY }); async function convertTextToSpeech(text) { const fileName = `${Date.now()}.mp3`; const response = await voice.textToSpeech({ fileName, textInput: text }); return response.status === "ok" ? fileName : null;
}

Conclusion

This
tutorial
demonstrated
how
to
create
a
sophisticated
Discord
voice
bot
integrating
AssemblyAI
for
speech
transcription,
OpenAI’s
GPT-3.5
Turbo
model
for
intelligent
responses,
and
ElevenLabs
for
speech
synthesis.
This
project
showcases
the
potential
of
modern
AI
and
voice
technologies
for
creating
interactive,
accessible,
and
engaging
applications.

Image
source:
Shutterstock

How to Create a Discord Voice Bot Using ChatGPT

Set Up the Bot

Develop the Discord Voice Bot Functions

Join the Voice Channel

Record and Transcribe Audio

Generate Responses with ChatGPT

Convert Text to Speech with ElevenLabs