Together AI and NVIDIA Join Forces to Enhance Llama 3.1 Models on DGX Cloud

Together
AI
has
announced
a
strategic
collaboration
with
NVIDIA
to
enhance
the
capabilities
of
Llama
3.1
models
for
enterprises
by
leveraging
NVIDIA’s
DGX
Cloud.
This
partnership
aims
to
empower
businesses
and
developers
to
utilize
openly
available
models,
enabling
optimized
AI
inference
on
NVIDIA’s
advanced
infrastructure.

Optimized
AI
Inference
for
Enterprises

The
collaboration
introduces
the
Together
Inference
Engine
to
NVIDIA
AI
Foundry
customers,
offering
a
robust
platform
for
running
Llama
3.1
models
on
the
NVIDIA
DGX
Cloud.
According
to

Together
AI,
this
integration
allows
enterprises
to
achieve
superior
performance,
accuracy,
and
cost-efficiency
at
production
scale.

“Enterprises
want
to
leverage
the
power
of
openly
available
AI
models
like
Llama
3.1,
customized
to
their
specific
needs,”
said
Alexis
Bjorlin,
vice
president
of
DGX
Cloud
at
NVIDIA.
“By
collaborating
with
Together
AI,
we’re
introducing
the
highly
optimized
Together
Inference
Engine
to
DGX
Cloud,
offering
companies
efficient
and
scalable
AI
inference
capabilities.”

Innovative
Technology
and
Benefits

The
Together
Inference
Engine
is
built
on
several
technological
advancements,
including
FlashAttention-3
kernels,
custom-built
speculators
based
on
RedPajama,
and
advanced
quantization
techniques.
These
innovations
optimize
enterprise
workloads
for
NVIDIA
Tensor
Core
GPUs,
facilitating
the
development
and
deployment
of
generative
AI
applications
with
unmatched
efficiency.

With
this
collaboration,
NVIDIA
AI
Foundry
customers
can
utilize
the
latest
NVIDIA
AI
architecture,
optimized
for
faster
deployment.
Enterprises
have
the
flexibility
to
fine-tune
models
with
proprietary
data,
ensuring
higher
accuracy
and
performance
while
maintaining
data
ownership.

Impact
on
Open-Source
AI

This
partnership
marks
a
significant
milestone
for
open-source
AI
with
the
launch
of
Llama
3.1
405B,
the
largest
openly
available
foundation
model.
It
offers
comprehensive
capabilities
in
general
knowledge,
steerability,
math,
tool
use,
and
multilingual
translation,
rivaling
top
closed-source
models
while
providing
safety
tools
for
responsible
development.

At
Together
AI,
the
focus
remains
on
advancing
open
research
and
trust
between
researchers,
developers,
and
enterprises.
The
company
has
pioneered
methods
like
FlashAttention
3,
Mixture
of
Agents,
Medusa,
Sequoia,
Hyena,
Mamba,
and
CocktailSGD,
driving
faster
innovation
and
time-to-market
for
AI
solutions.

Real-World
Applications

Enterprises
such
as
Zomato,
DuckDuckGo,
and
the
Washington
Post
are
already
leveraging
Together
Inference
for
their
generative
AI
applications.
With
the
NVIDIA
collaboration,
businesses
with
sophisticated
workloads
can
deploy
open-source
models
on
DGX
Cloud
with
enhanced
performance,

scalability,
and
security.

This
partnership
is
set
to
accelerate
the
adoption
of
open-source
AI,
providing
developers
and
enterprises
with
the
tools
needed
to
build
advanced
AI
solutions
efficiently
and
effectively.

Image
source:
Shutterstock

Together AI and NVIDIA Join Forces to Enhance Llama 3.1 Models on DGX Cloud

Optimized AI Inference for Enterprises

Innovative Technology and Benefits

Impact on Open-Source AI

Real-World Applications

Optimized
AI
Inference
for
Enterprises

Innovative
Technology
and
Benefits

Impact
on
Open-Source
AI

Real-World
Applications