Hugging Face Introduces Inference-as-a-Service with NVIDIA NIM for AI Developers

Hugging
Face,
a
leading
AI
community
platform,
is
now
offering
developers
Inference-as-a-Service
powered
by
NVIDIA’s
NIM
microservices,
according
to
NVIDIA
Blog.
The
service
aims
to
boost
token
efficiency
by
up
to
five
times
with
popular
AI
models
and
provide
immediate
access
to
NVIDIA
DGX
Cloud.

Enhanced
AI
Model
Efficiency

This
new
service,
announced
at
the
SIGGRAPH
conference,
allows
developers
to
rapidly
deploy
leading
large
language
models,
including
the
Llama
3
family
and
Mistral
AI
models.
These
models
are
optimized
using
NVIDIA
NIM
microservices
running
on
NVIDIA
DGX
Cloud.

Developers
can
prototype
with
open-source
AI
models
hosted
on
the
Hugging
Face
Hub
and
deploy
them
in
production
seamlessly.
Enterprise
Hub
users
can
leverage
serverless
inference
for
increased
flexibility,
minimal
infrastructure
overhead,
and
optimized
performance.

Streamlined
AI
Development

The
Inference-as-a-Service
complements
the
existing
Train
on
DGX
Cloud
service,
which
is
already
available
on
Hugging
Face.
This
integration
provides
developers
with
a
centralized
hub
to
compare
various
open-source
models,
experiment,
test,
and
deploy
cutting-edge
models
on
NVIDIA-accelerated
infrastructure.

The
tools
are
easily
accessible
through
the
“Train”
and
“Deploy”
drop-down
menus
on
Hugging
Face
model
cards,
enabling
users
to
get
started
with
just
a
few
clicks.

NVIDIA
NIM
Microservices

NVIDIA
NIM
is
a
collection
of
AI
microservices,
including
NVIDIA
AI
foundation
models
and
open-source
community
models,
optimized
for
inference
using
industry-standard
APIs.
NIM
offers
higher
efficiency
in
processing
tokens,
improving
the
efficiency
of
the
underlying
NVIDIA
DGX
Cloud
infrastructure
and
increasing
the
speed
of
critical
AI
applications.

For
example,
the
70-billion-parameter
version
of
Llama
3
delivers
up
to
5x
higher
throughput
when
accessed
as
a
NIM
compared
to
off-the-shelf
deployment
on
NVIDIA
H100
Tensor
Core
GPU-powered
systems.

Accessible
AI
Acceleration

The
NVIDIA
DGX
Cloud
platform
is
purpose-built
for
generative
AI,
offering
developers
easy
access
to
reliable
accelerated
computing
infrastructure.
This
platform
supports
every
step
of
AI
development,
from
prototype
to
production,
without
requiring
long-term
AI
infrastructure
commitments.

Hugging
Face’s
Inference-as-a-Service
on
NVIDIA
DGX
Cloud,
powered
by
NIM
microservices,
offers
easy
access
to
compute
resources
optimized
for
AI
deployment.
This
enables
users
to
experiment
with
the
latest
AI
models
in
an
enterprise-grade
environment.

More
Announcements
at
SIGGRAPH

At
the
SIGGRAPH
conference,
NVIDIA
also
introduced
generative
AI
models
and
NIM
microservices
for
the
OpenUSD
framework.
This
aims
to
accelerate
developers’
abilities
to
build
highly
accurate
virtual
worlds
for
the
next
evolution
of
AI.

For
more
information,
visit
the
official

NVIDIA
Blog.

Image
source:
Shutterstock

Hugging Face Introduces Inference-as-a-Service with NVIDIA NIM for AI Developers

Enhanced AI Model Efficiency

Streamlined AI Development

NVIDIA NIM Microservices

Accessible AI Acceleration

More Announcements at SIGGRAPH

Enhanced
AI
Model
Efficiency

Streamlined
AI
Development

NVIDIA
NIM
Microservices

Accessible
AI
Acceleration

More
Announcements
at
SIGGRAPH