Google Cloud Run Integrates NVIDIA L4 GPUs for Enhanced AI Inference Deployments

Google
Cloud
Run
has
announced
the
integration
of
NVIDIA
L4
Tensor
Core
GPUs,
NVIDIA
NIM
microservices,
and
capabilities
for
serverless
AI
inference
deployments,
according
to
the

NVIDIA
Technical
Blog.
This
collaboration
aims
to
address
the
challenges
enterprises
face
when
deploying
AI-enabled
applications,
including
performance
optimization,
scalability,
and
infrastructure
complexity.

Enhancing
AI
Inference
Deployments

Google
Cloud’s
fully
managed
serverless
container
runtime,
Cloud
Run,
now
supports
NVIDIA
L4
Tensor
Core
GPUs
in
preview.
This
allows
enterprises
to
run
real-time
AI
applications
on
demand
without
the
hassle
of
managing
infrastructure.
The
integration
of
NVIDIA
NIM
microservices
further
simplifies
the
optimization
and
deployment
of
AI
models,
maximizing
application
performance
and
reducing
complexity.

Real-Time
AI-Enabled
Applications

Cloud
Run
abstracts
infrastructure
management
by
dynamically
allocating
resources
based
on
incoming
traffic,
ensuring
efficient
scaling
and
resource
utilization.
The
support
for
NVIDIA
L4
GPUs
represents
a
significant
upgrade
from
previous
CPU-only
offerings,
providing
up
to
120x
higher
AI
video
performance
over
CPU
solutions
and
2.7x
more
generative
AI
inference
performance
over
the
previous
generation.

Notably,
companies
like
Let’s
Enhance,
Wombo,
Writer,
Descript,
and
AppLovin
are
leveraging
NVIDIA
L4
GPUs
to
power
their
generative
AI
applications,
delivering
enhanced
user
experiences.

Performance-Optimized
Serverless
AI
Inference

Optimizing
AI
model
performance
is
crucial
for
resource
efficiency
and
cost
management.
NVIDIA
NIM
offers
a
set
of
optimized
cloud-native
microservices
that
simplify
and
accelerate
AI
model
deployment.
These
pre-optimized,
containerized
models
integrate
seamlessly
into
applications,
reducing
development
time
and
maximizing
resource
efficiency.

NVIDIA
NIM
on
Cloud
Run
allows
for
the
deployment
of
high-performance
AI
applications
using
optimized
inference
engines
that
unlock
the
full
potential
of
NVIDIA
L4
GPUs,
providing
superior
throughput
and
latency
without
requiring
specialized
expertise
in
inference
performance
optimization.

Deploying
Llama3-8B-Instruct
NIM
Microservice

Deploying
models
like
Llama3-8B-Instruct
with
Cloud
Run
on
NVIDIA
L4
GPUs
is
straightforward.
Users
need
to
install
the
Google
Cloud
SDK
and
follow
a
series
of
steps
to
clone
the
repository,
set
environment
variables,
edit
the
Dockerfile,
build
the
container,
and
deploy
it
using
provided
scripts.

Getting
Started

The
integration
of
the
NVIDIA
AI
platform,
including
NVIDIA
NIM
and
NVIDIA
L4
GPUs,
with
Google
Cloud
Run
addresses
key
challenges
in
AI
application
deployment.
This
synergy
accelerates
deployment,
boosts
performance,
and
ensures
operational
efficiency
and
cost-effectiveness.

Developers
can
prototype
with
NVIDIA
NIM
microservices
through
the
NVIDIA
API
catalog,
then
download
NIM
containers
for
further
development
on
Google
Cloud
Run.
For
enterprise-grade
security
and
support,
a
90-day
NVIDIA
AI
Enterprise
license
is
available.

Currently,
Cloud
Run
with
NVIDIA
L4
GPU
support
is
in
preview
in
the
us-central1
Google
Cloud
region.
More
information
and
demos
are
available
at
the
launch
event
livestream
and
sign-up
page.

Image
source:
Shutterstock

Google Cloud Run Integrates NVIDIA L4 GPUs for Enhanced AI Inference Deployments

Enhancing AI Inference Deployments

Real-Time AI-Enabled Applications

Performance-Optimized Serverless AI Inference

Deploying Llama3-8B-Instruct NIM Microservice

Getting Started

Enhancing
AI
Inference
Deployments

Real-Time
AI-Enabled
Applications

Performance-Optimized
Serverless
AI
Inference

Deploying
Llama3-8B-Instruct
NIM
Microservice

Getting
Started