Optimizing IVF-PQ Performance with RAPIDS cuVS: Key Tuning Techniques


Tony
Kim


Jul
18,
2024
19:39

Learn
how
to
optimize
the
IVF-PQ
algorithm
for
vector
search
performance
using
RAPIDS
cuVS,
with
practical
tips
on
tuning
hyper-parameters
and
improving
recall.

Optimizing IVF-PQ Performance with RAPIDS cuVS: Key Tuning Techniques

In
the
first
part
of
the
series,
an
overview
of
the
IVF-PQ
algorithm
was
presented,
explaining
its
foundation
on
the
IVF-Flat
algorithm
and
the
use
of
Product
Quantization
(PQ)
to
compress
the
index
and
support
larger
datasets.
In
part
two,
the
focus
shifts
to
the
practical
aspects
of
tuning
IVF-PQ
performance,
which
is
crucial
for
achieving
optimal
results,
especially
when
dealing
with
billion-scale
datasets.

Tuning
Parameters
for
Index
Building

IVF-PQ
shares
some
parameters
with
IVF-Flat,
such
as
coarse-level
indexing
and
search
hyper-parameters.
However,
IVF-PQ
introduces
additional
parameters
that
control
compression.
One
of
the
critical
parameters
is

n_lists
,
which
determines
the
number
of
partitions
(inverted
lists)
into
which
the
input
dataset
is
clustered.
The
performance
is
influenced
by
the
number
of
lists
probed
and
their
sizes.
Experiments
suggest
that

n_lists

in
the
range
of
10K
to
50K
yield
good
performance
across
recall
levels,
though
this
can
vary
depending
on
the
dataset.

Another
crucial
parameter
is

pq_dim
,
which
controls
compression.
Starting
with
one
fourth
the
number
of
features
in
the
dataset
and
increasing
in
steps
is
a
good
technique
for
tuning
this
parameter.
Figure
2
in
the
original
blog
post
illustrates
significant
drops
in
QPS,
which
can
be
attributed
to
factors
such
as
increased
compute
work
and
shared
memory
requirements
per
CUDA
block.

The

pq_bits

parameter,
ranging
from
4
to
8,
controls
the
number
of
bits
used
in
each
individual
PQ
code,
affecting
the
codebook
size
and
recall.
Reducing

pq_bits

can
improve
search
speed
by
fitting
the
look-up
table
(LUT)
in
shared
memory,
although
this
comes
at
the
cost
of
recall.

Additional
Parameters

The

codebook_kind

parameter
determines
how
the
codebooks
for
the
second-level
quantizer
are
constructed,
either
for
each
subspace
or
for
each
cluster.
The
choice
between
these
options
can
impact
training
time,
GPU
shared
memory
utilization,
and
recall.
Parameters
such
as

kmeans_n_iters

and

kmeans_trainset_fraction

are
also
important,
though
they
rarely
need
adjustment.

Tuning
Parameters
for
Search

The

n_probes

parameter,
discussed
in
the
previous
blog
post
on
IVF-Flat,
is
essential
for
search
accuracy
and
throughput.
IVF-PQ
provides
additional
parameters
like

internal_distance_dtype

and

lut_dtype
,
which
control
the
representation
of
distance
or
similarity
during
the
search
and
the
datatype
used
to
store
the
LUT,
respectively.
Adjusting
these
parameters
can
significantly
impact
performance,
especially
for
datasets
with
large
dimensionality.

Improving
Recall
with
Refinement

When
tuning
parameters
is
not
enough
to
achieve
the
desired
recall,
refinement
offers
a
promising
alternative.
This
separate
operation,
performed
after
the
ANN
search,
recomputes
exact
distances
for
selected
candidates
and
reranks
them.
The
refinement
operation
can
significantly
improve
recall,
as
demonstrated
in
Figure
4
of
the
original
blog
post,
though
it
requires
access
to
the
source
dataset.

Summary

The
series
on
accelerating
vector
search
with
inverted-file
indexes
covers
two
cuVS
algorithms:
IVF-Flat
and
IVF-PQ.
IVF-PQ
extends
IVF-Flat
with
PQ
compression,
enabling
faster
searches
and
the
ability
to
handle
billion-scale
datasets
with
limited
GPU
memory.
By
fine-tuning
parameters
for
index
building
and
search,
data
practitioners
can
achieve
the
best
results
efficiently.
The
RAPIDS
cuVS
library
offers
a
range
of
vector
search
algorithms
to
cater
to
various
use
cases,
from
exact
searches
to
low-accuracy-high-QPS
ANN
methods.

For
practical
tuning
of
IVF-PQ
parameters,
refer
to
the

IVF-PQ
notebook

on
GitHub.
For
more
details
on
the
provided
APIs,
see
the

cuVS
documentation
.

Image
source:
Shutterstock

Comments are closed.