NVIDIA Introduces NIM Microservices for Enhanced Speech and Translation Capabilities

NVIDIA
has
unveiled
its
NIM
microservices
for
speech
and
translation,
part
of
the
NVIDIA
AI
Enterprise
suite,
according
to
the

NVIDIA
Technical
Blog.
These
microservices
enable
developers
to
self-host
GPU-accelerated
inferencing
for
both
pretrained
and
customized
AI
models
across
clouds,
data
centers,
and
workstations.

Advanced
Speech
and
Translation
Features

The
new
microservices
leverage
NVIDIA
Riva
to
provide
automatic
speech
recognition
(ASR),
neural
machine
translation
(NMT),
and
text-to-speech
(TTS)
functionalities.
This
integration
aims
to
enhance
global
user
experience
and
accessibility
by
incorporating
multilingual
voice
capabilities
into
applications.

Developers
can
utilize
these
microservices
to
build
customer
service
bots,
interactive
voice
assistants,
and
multilingual
content
platforms,
optimizing
for
high-performance
AI
inference
at
scale
with
minimal
development
effort.

Interactive
Browser
Interface

Users
can
perform
basic
inference
tasks
such
as
transcribing
speech,
translating
text,
and
generating
synthetic
voices
directly
through
their
browsers
using
the
interactive
interfaces
available
in
the
NVIDIA
API
catalog.
This
feature
provides
a
convenient
starting
point
for
exploring
the
capabilities
of
the
speech
and
translation
NIM
microservices.

These
tools
are
flexible
enough
to
be
deployed
in
various
environments,
from
local
workstations
to
cloud
and
data
center
infrastructures,
making
them
scalable
for
diverse
deployment
needs.

Running
Microservices
with
NVIDIA
Riva
Python
Clients

The
NVIDIA
Technical
Blog
details
how
to
clone
the

nvidia-riva/python-clients
GitHub
repository
and
use
provided
scripts
to
run
simple
inference
tasks
on
the
NVIDIA
API
catalog
Riva
endpoint.
Users
need
an
NVIDIA
API
key
to
access
these
commands.

Examples
provided
include
transcribing
audio
files
in
streaming
mode,
translating
text
from
English
to
German,
and
generating
synthetic
speech.
These
tasks
demonstrate
the
practical
applications
of
the
microservices
in
real-world
scenarios.

Deploying
Locally
with
Docker

For
those
with
advanced
NVIDIA
data
center
GPUs,
the
microservices
can
be
run
locally
using
Docker.
Detailed
instructions
are
available
for
setting
up
ASR,
NMT,
and
TTS
services.
An
NGC
API
key
is
required
to
pull
NIM
microservices
from
NVIDIA’s
container
registry
and
run
them
on
local
systems.

Integrating
with
a
RAG
Pipeline

The
blog
also
covers
how
to
connect
ASR
and
TTS
NIM
microservices
to
a
basic
retrieval-augmented
generation
(RAG)
pipeline.
This
setup
enables
users
to
upload
documents
into
a
knowledge
base,
ask
questions
verbally,
and
receive
answers
in
synthesized
voices.

Instructions
include
setting
up
the
environment,
launching
the
ASR
and
TTS
NIMs,
and
configuring
the
RAG
web
app
to
query
large
language
models
by
text
or
voice.
This
integration
showcases
the
potential
of
combining
speech
microservices
with
advanced
AI
pipelines
for
enhanced
user
interactions.

Getting
Started

Developers
interested
in
adding
multilingual
speech
AI
to
their
applications
can
start
by
exploring
the
speech
NIM
microservices.
These
tools
offer
a
seamless
way
to
integrate
ASR,
NMT,
and
TTS
into
various
platforms,
providing
scalable,
real-time
voice
services
for
a
global
audience.

For
more
information,
visit
the

NVIDIA
Technical
Blog.

Image
source:
Shutterstock

NVIDIA Introduces NIM Microservices for Enhanced Speech and Translation Capabilities

Advanced Speech and Translation Features

Interactive Browser Interface

Running Microservices with NVIDIA Riva Python Clients

Deploying Locally with Docker

Integrating with a RAG Pipeline

Getting Started