Meta Partners with Together AI to Launch High-Performance Llama 3.1 Models


Terrill
Dicki


Jul
24,
2024
02:11

Meta
and
Together
AI
launch
Llama
3.1
models,
offering
accelerated
performance
and
full
accuracy
for
inference
and
fine-tuning.

Meta Partners with Together AI to Launch High-Performance Llama 3.1 Models

Meta
has
partnered
with
Together
AI
to
unveil
the
Llama
3.1
models,
marking
a
significant
milestone
in
the
open-source
AI
landscape.
The
release
includes
the
Llama
3.1
405B,
70B,
8B,
and
LlamaGuard
models,
all
of
which
are
now
available
for
inference
and
fine-tuning
through
Together
AI’s
platform.
This
collaboration
aims
to
deliver
accelerated
performance
while
maintaining
full
accuracy,
according
to

Together
AI
.

Unmatched
Performance
and
Scalability

The
Together
Inference
Platform
promises
horizontal
scalability
with
industry-leading
performance
metrics.
The
Llama
3.1
405B
model
can
process
up
to
80
tokens
per
second,
while
the
8B
model
can
handle
up
to
400
tokens
per
second.
This
represents
a
speed
improvement
of
1.9x
to
4.5x
over
vLLM,
all
while
maintaining
full
accuracy.

These
advancements
are
built
on
Together
AI’s
proprietary
inference
optimization
research,
incorporating
technologies
like
FlashAttention-3
kernels
and
custom-built
speculators
based
on
RedPajama.
The
platform
supports
both
serverless
and
dedicated
endpoints,
offering
flexibility
for
developers
and
enterprises
to
build
generative
AI
applications
at
production
scale.

Broad
Adoption
and
Use
Cases

Over
100,000
developers
and
companies,
including
Zomato,
DuckDuckGo,
and
the
Washington
Post,
are
already
leveraging
the
Together
Platform
for
their
generative
AI
needs.
The
Llama
3.1
models
offer
unmatched
flexibility
and
control,
making
them
suitable
for
a
range
of
applications
from
general
knowledge
tasks
to
multilingual
translation
and
tool
use.

The
Llama
3.1
405B
model,
in
particular,
stands
out
as
the
largest
openly
available
foundation
model,
rivaling
the
best
closed-source
alternatives.
It
includes
advanced
features
like
synthetic
data
generation
and
model
distillation,
which
are
expected
to
accelerate
the
adoption
of
open-source
AI.

Advanced
Features
and
Tools

The
Together
Inference
Engine
also
includes
LlamaGuard,
a
moderation
model
that
can
be
used
as
a
standalone
classifier
or
as
a
filter
to
safeguard
responses.
This
feature
allows
developers
to
screen
for
potentially
unsafe
content,
enhancing
the
safety
and
reliability
of
AI
applications.

The
Llama
3.1
models
also
expand
context
length
to
128K
and
add
support
for
eight
languages.
These
enhancements,
along
with
new
security
and
safety
tools,
make
the
models
highly
versatile
and
suitable
for
a
wide
range
of
applications.

Available
Through





API

and
Dedicated
Endpoints

All
Llama
3.1
models
are
accessible
via
the
Together
API,
and
the
405B
model
is
available
for
QLoRA
fine-tuning,
allowing
enterprises
to
tailor
the
models
to
their
specific
needs.
The
Together
Turbo
endpoints
offer
best-in-class
throughput
and
accuracy,
making
them
the
most
cost-effective
solution
for
building
with
Llama
3.1
at
scale.

Future
Prospects

The
partnership
between
Meta
and
Together
AI
aims
to
democratize
access
to
high-performance
AI
models,
fostering
innovation
and
collaboration
within
the
AI
community.
The
open-source
nature
of
the
Llama
3.1
models
aligns
with
Together
AI’s
vision
of
open
research
and
trust
between
researchers,
developers,
and
enterprises.

As
the
launch
partner
for
the
Llama
3.1
models,
Together
AI
is
committed
to
providing
the
best
performance,
accuracy,
and
cost-efficiency
for
generative
AI
workloads,
ensuring
that
developers
and
enterprises
can
keep
their
data
and
models
secure.

Image
source:
Shutterstock

Comments are closed.