Anyscale Introduces Multi-Tenant Serve Applications with Containerized Runtime Environments


Anyscale Introduces Multi-Tenant Serve Applications with Containerized Runtime Environments

In
a
recent
update,
Anyscale,
a
leading
AI
application
platform,
announced
the
introduction
of
multi-tenant
serve
applications
utilizing
runtime
environments
as
containers.
This
development
aims
to
enhance
resource
management
and
operational
efficiency,
according
to

Anyscale
.

Advancements
in
Multi-Application
Support

In
an
enlightening
conversation
between
Sam
Chan,
Technical
Program
Manager
at
Anyscale,
and
Cindy
Zhang,
the
two
discussed
the
advancements
and
challenges
of
multi-application
serve
clusters
with
different
dependencies.
Multi-application
support
allows
different
applications
to
run
on
the
same
cluster,
each
using
the
same
runtime
environments
as
containers.
This
approach
helps
manage
resources
more
effectively
and
reduces
operational
complexity,
enabling
independent
upgrades
for
different
applications.

Zhang
highlighted
the
previous
limitations,
where
users
had
to
bundle
all
model
dependencies
into
one
large
Docker
image,
leading
to
bloated
images
and
mixed
dependencies.
This
was
particularly
challenging
for
customers
with
multiple
research
teams
working
on
separate
models.
The
new
feature
allows
each
team
to
deploy
their
code
in
its
own
container,
offering
cleaner
isolation
and
easier
maintenance.

The
Role
of
Runtime
Environments
as
Containers

The
new
feature,
“runtime
environments
as
containers,”
permits
specifying
a
different
Docker
image
for
each
application.
When
Ray
needs
to
start
a
replica
for
an
app,
it
will
initiate
a
container
from
that
app’s
image
and
run
the
worker
process
inside.
This
ensures
clean
isolation
between
applications
and
enhances
the
efficiency
of
resource
sharing.

Zhang
explained
that
this
feature
unlocks
Ray’s
multi-tenancy
capabilities,
allowing
multiple
applications
to
share
resources
more
efficiently
on
the
same
cluster.
For
instance,
eight
applications
can
be
squeezed
onto
a
single
large
VM
with
eight
GPUs,
each
Ray
Serve
application
configured
to
use
one
GPU.
This
granular
utilization
of
GPU
capacity
minimizes
underutilized
resources
and
simplifies
operational
management
by
maintaining
a
single
Ray
cluster.

Technical
Implementation
and
Challenges

Under
the
hood,
Ray
is
integrated
with
Podman
to
pull
images
and
spin
up
containers.
When
a
new
Ray
Worker
needs
to
start,
Ray
calls
out
to
Podman
to
orchestrate
the
pull
of
the
relevant
image
and
spin
up
the
container.
Ray
then
orchestrates
the
running
of
the
Ray
Worker
code
inside
that
container.

However,
the
feature
is
still
experimental.
Zhang
cautioned
that
there
might
be
startup
delays
the
first
time
an
image
needs
to
be
pulled,
and
the
feature
hasn’t
been
tested
at
a
large
scale.
Additionally,
other
runtime
environment
fields,
such
as
Python
environments
or
working
directories,
are
not
currently
supported
with
container
runtime
environments.

Future
Plans

Looking
ahead,
Anyscale
plans
to
refine
the
user
experience
around
combining
containers
with
other
runtime
environment
fields,
such
as
specific
environment
variables
for
each
application.
They
are
actively
gathering
user
feedback
to
determine
which
fields
to
include
and
are
planning
more
scalability
testing.

For
those
interested
in
exploring
this
new
feature,
Anyscale
provides
a
detailed
guide
to
get
started
with
multiple
Ray
Serve
applications
and
runtime
environments
as
containers.

Image
source:
Shutterstock

Comments are closed.