NVIDIA’s RAPIDS cuDF Enhances pandas Performance by 30x on Large Datasets


Felix
Pinkston


Aug
10,
2024
02:42

NVIDIA
releases
RAPIDS
cuDF
unified
memory,
boosting
pandas
performance
up
to
30x
on
large
and
text-heavy
datasets.

NVIDIA's RAPIDS cuDF Enhances pandas Performance by 30x on Large Datasets

NVIDIA
has
unveiled
new
features
in
RAPIDS
cuDF,
significantly
improving
the
performance
of
the
pandas
library
when
handling
large
and
text-heavy
datasets.
According
to

NVIDIA
Technical
Blog
,
the
enhancements
enable
data
scientists
to
accelerate
their
workloads
by
up
to
30x.

RAPIDS
cuDF
and
pandas

RAPIDS
is
a
suite
of
open-source
GPU-accelerated
data
science
and
AI
libraries,
and
cuDF
is
its
Python
GPU
DataFrame
library
designed
for
data
loading,
joining,
aggregating,
and
filtering.
pandas,
a
widely-used
data
analysis
and
manipulation
library
for
Python,
has
struggled
with
processing
speed
and
efficiency
as
dataset
sizes
grow,
particularly
on
CPU-only
systems.

At
GTC
2024,
NVIDIA
announced
that
RAPIDS
cuDF
could
accelerate
pandas
nearly
150x
without
requiring
code
changes.
Google
later
revealed
that
RAPIDS
cuDF
is
available
by
default
on
Google
Colab,
making
it
more
accessible
to
data
scientists.

Tackling
Limitations

User
feedback
on
the
initial
release
of
cuDF
highlighted
several
limitations,
particularly
with
the
size
and
type
of
datasets
that
could
benefit
from
acceleration:

  • To
    maximize
    acceleration,
    datasets
    needed
    to
    fit
    within
    GPU
    memory,
    limiting
    the
    data
    size
    and
    complexity
    of
    operations
    that
    could
    be
    performed.
  • Text-heavy
    datasets
    faced
    constraints,
    with
    the
    original
    cuDF
    release
    supporting
    only
    up
    to
    2.1
    billion
    characters
    in
    a
    column.

To
address
these
issues,
the
latest
release
of
RAPIDS
cuDF
includes:

  • Optimized
    CUDA
    unified
    memory,
    allowing
    for
    up
    to
    30x
    speedups
    of
    larger
    datasets
    and
    more
    complex
    workloads.
  • Expanded
    string
    support
    from
    2.1
    billion
    characters
    in
    a
    column
    to
    2.1
    billion
    rows
    of
    tabular
    text
    data.

Accelerated
Data
Processing
with
Unified
Memory

cuDF
relies
on
CPU
fallback
to
ensure
a
seamless
experience.
When
memory
requirements
exceed
GPU
capacity,
cuDF
transfers
data
into
CPU
memory
and
uses
pandas
for
processing.
However,
to
avoid
frequent
CPU
fallback,
datasets
should
ideally
fit
within
GPU
memory.

With
CUDA
unified
memory,
cuDF
can
now
scale
pandas
workloads
beyond
GPU
memory.
Unified
memory
provides
a
single
address
space
spanning
CPUs
and
GPUs,
enabling
virtual
memory
allocations
larger
than
available
GPU
memory
and
migrating
data
as
needed.
This
helps
maximize
performance,
although
datasets
should
still
be
sized
to
fit
in
GPU
memory
for
peak
acceleration.

Benchmarks
show
that
using
cuDF
for
data
joins
on
a
10
GB
dataset
with
a
16
GB
memory
GPU
can
achieve
up
to
30x
speedups
compared
to
CPU-only
pandas.
This
is
a
significant
improvement,
especially
for
processing
datasets
larger
than
4
GB,
which
previously
faced
performance
issues
due
to
GPU
memory
constraints.

Processing
Tabular
Text
Data
at
Scale

The
original
cuDF
release’s
2.1
billion
character
limit
in
a
column
posed
challenges
for
large
datasets.
With
the
new
release,
cuDF
can
now
handle
up
to
2.1
billion
rows
of
tabular
text
data,
making
pandas
a
viable
tool
for
data
preparation
in
generative
AI
pipelines.

These
improvements
make
pandas
code
execution
much
faster,
especially
for
text-heavy
datasets
like
product
reviews,
customer
service
logs,
and
datasets
with
substantial
location
or
user
ID
data.

Get
Started

All
these
features
are
available
with
RAPIDS
24.08,
which
can
be
downloaded
from
the

RAPIDS
Installation
Guide
.
Note
that
the
unified
memory
feature
is
only
supported
on
Linux-based
systems.

Image
source:
Shutterstock

Comments are closed.