NVIDIA Introduces NeMo Retriever for Enhanced RAG Pipelines


Jessie
A
Ellis


Jul
24,
2024
02:55

NVIDIA
unveils
NeMo
Retriever
NIMs,
enhancing
text
retrieval
for
RAG
pipelines
with
new
embedding
and
reranking
models,
promising
better
efficiency
and
accuracy.

NVIDIA Introduces NeMo Retriever for Enhanced RAG Pipelines

Enterprises
are
increasingly
looking
to
leverage
their
vast
data
reserves
to
improve
operational
efficiency,
reduce
costs,
and
boost
productivity.
NVIDIA’s
latest
offering,
the
NeMo
Retriever,
aims
to
facilitate
this
by
enabling
developers
to
build
and
deploy
advanced
retrieval-augmented
generation
(RAG)
pipelines.
According
to
the

NVIDIA
Technical
Blog
,
the
NeMo
Retriever
collection
introduces
four
new
community-based
NeMo
Retriever
NIMs
designed
for
text
embedding
and
reranking.

New
Models
for
Enhanced
Text
Retrieval

NVIDIA
has
announced
the
release
of
three
NeMo
Retriever
Embedding
NIMs
and
one
NeMo
Retriever
Reranking
NIM.
These
models
are:


  • NV-EmbedQA-E5-v5
    :
    Optimized
    for
    text
    question-answering
    retrieval.

  • NV-EmbedQA-Mistral7B-v2
    :
    A
    multilingual
    model
    fine-tuned
    for
    text
    embedding
    and
    accurate
    question
    answering.

  • Snowflake-Arctic-Embed-L
    :
    An
    optimized
    model
    for
    text
    embedding.

  • NV-RerankQA-Mistral4B-v3
    :
    Fine-tuned
    for
    text
    reranking
    and
    accurate
    question
    answering.

Understanding
the
Retrieval
Pipeline

The
retrieval
pipeline
employs
embedding
models
to
generate
vector
representations
of
text
for
semantic
encoding,
stored
in
a
vector
database.
When
a
user
queries
the
database,
the
question
is
encoded
into
a
vector,
which
is
matched
against
stored
vectors
to
retrieve
relevant
information.
Reranking
models
then
score
the
relevance
of
the
retrieved
text
chunks,
ensuring
the
most
accurate
information
is
presented.

Embedding
models
offer
speed
and
cost-efficiency,
while
reranking
models
provide
higher
accuracy.
By
combining
these
models,
enterprises
can
achieve
a
balance
between
performance
and
cost,
using
embedding
models
to
identify
relevant
chunks
and
reranking
models
to
refine
the
results.

NeMo
Retriever
NIMs:
Cost
and
Stability

Cost

NeMo
Retriever
NIMs
are
designed
to
reduce
time-to-market
and
operational
costs.
These
containerized
solutions,
equipped
with
industry-standard
APIs
and
Helm
charts,
facilitate
easy
and
scalable
model
deployment.
Utilizing
the
NVIDIA
AI
Enterprise
software
suite,
NIMs
maximize
model
inference
efficiency,
thereby
lowering
deployment
costs.

Stability

The
NIMs
are
part
of
the
NVIDIA
AI
Enterprise
license,
which
ensures
API
stability,
security
patches,
quality
assurance,
and
support,
providing
a
seamless
transition
from
prototype
to
production
for
AI-driven
enterprises.

Selecting
NIMs
for
Your
Pipeline

When
designing
a
retrieval
pipeline,
developers
need
to
balance
accuracy,
latency,
data
ingestion
throughput,
and
production
throughput.
NVIDIA
offers
guidelines
for
selecting
the
appropriate
NIMs
based
on
these
factors:


  • Maximize
    throughput
    and
    minimize
    latency
    :
    Use
    NV-EmbedQA-E5-v5
    for
    optimized
    lightweight
    embedding
    model
    inference.

  • Optimize
    for
    low-volume,
    low-velocity
    databases
    :
    Use
    NV-EmbedQA-Mistral7B-v2
    for
    both
    ingestion
    and
    production
    to
    balance
    throughput
    and
    accuracy
    with
    low
    latency.

  • Optimize
    for
    high-volume,
    high-velocity
    data
    :
    Combine
    NV-EmbedQA-E5-v5
    for
    document
    ingestion
    with
    NV-RerankQA-Mistral-4B-v3
    for
    reranking
    to
    enhance
    retrieval
    accuracy.

Performance
benchmarks,
such
as
NQ,
HotpotQA,
FiQA,
and
TechQA,
show
that
NeMo
Retriever
NIMs
achieve
significant
improvements
in
embedding
and
reranking
performance,
making
them
suitable
for
various
enterprise
retrieval
use
cases.

Getting
Started

Developers
can
explore
the
NVIDIA
NeMo
Retriever
NIMs
in
the

API
catalog

and
access
NVIDIA’s
generative
AI
examples
on
GitHub.
NVIDIA
also
offers
labs
to
try
the
AI
Chatbot
with
RAG
workflow
through
NVIDIA
LaunchPad,
allowing
for
customization
and
deployment
of
NIMs
across
various
data
environments.

Image
source:
Shutterstock

Comments are closed.