AI21 Labs Unveils Jamba 1.5 LLMs with Hybrid Architecture for Enhanced Reasoning


Jessie
A
Ellis


Aug
23,
2024
01:33

AI21
Labs
introduces
Jamba
1.5,
a
new
family
of
large
language
models
leveraging
hybrid
architecture
for
superior
reasoning
and
long
context
handling.

AI21 Labs Unveils Jamba 1.5 LLMs with Hybrid Architecture for Enhanced Reasoning

AI21
Labs
has
introduced
the
Jamba
1.5
model
family,
a
state-of-the-art
collection
of
large
language
models
(LLMs)
engineered
to
excel
in
a
variety
of
generative
AI
tasks,
according
to
the

NVIDIA
Technical
Blog
.

Hybrid
Architecture
Delivers
Superior
Performance

The
Jamba
1.5
family
employs
a
hybrid
approach
combining
Mamba
and
transformer
architectures,
complemented
by
a
mixture
of
experts
(MoE)
module.
This
architecture
excels
in
managing
long
contexts
with
minimal
computational
overhead
while
ensuring
high
accuracy
in
reasoning
tasks.
The
MoE
module
increases
the
model’s
capacity
without
escalating
computational
requirements
by
utilizing
only
a
subset
of
available
parameters
during
token
generation.

Each
Jamba
block,
configured
with
eight
layers
and
an
attention-to-Mamba
ratio
of
1:7,
fits
into
a
single
NVIDIA
H100
80
GB
GPU.
The
model’s
architecture
balances
memory
usage
and
computational
efficiency,
making
it
suitable
for
various
enterprise
applications.

The
Jamba
1.5
models
also
boast
an
extensive
256K
token
context
window,
enabling
the
processing
of
approximately
800
pages
of
text.
This
capability
improves
the
accuracy
of
responses
by
retaining
more
relevant
information
over
longer
contexts.

Enhancing
AI
Interactivity
with
Function
Calling
and
JSON
Support

One
of
the
standout
features
of
the
Jamba
1.5
models
is
their
robust
function
calling
capability
with
JSON
data
interchange
support.
This
functionality
allows
the
models
to
execute
complex
actions
and
handle
sophisticated
queries,
enhancing
the
interactivity
and
relevance
of
AI
applications.

For
instance,
businesses
can
deploy
these
models
for
real-time,
high-precision
tasks
such
as
generating
loan
term
sheets
for
financial
services
or
acting
as
shopping
assistants
in
retail
environments.

Maximizing
Accuracy
with
Retrieval-Augmented
Generation

The
Jamba
1.5
models
are
optimized
for
retrieval-augmented
generation
(RAG),
which
improves
their
ability
to
deliver
contextually
relevant
responses.
The
256K
token
context
window
allows
for
managing
large
volumes
of
information
without
continuous
chunking,
ideal
for
scenarios
requiring
comprehensive
data
analysis.

RAG
is
particularly
beneficial
in
environments
with
extensive
and
scattered
knowledge
bases,
enabling
the
models
to
retrieve
and
provide
more
relevant
information
efficiently.

Get
Started

The
Jamba
1.5
models
are
now
available
on
the
NVIDIA
API
catalog,
joining
over
100
popular
AI
models
supported
by
NVIDIA
NIM
microservices.
These
microservices
simplify
the
deployment
of
performance-optimized
models
for
various
enterprise
applications.

NVIDIA
collaborates
with
leading
model
builders
to
support
a
wide
range
of
models,
including
Llama
3.1
405B,
Mistral
8x22B,
Phi-3,
and
Nemotron
340B
Reward.
For
more
information
and
to
explore
these
models,
visit

ai.nvidia.com
.

Image
source:
Shutterstock

Comments are closed.