Polars Launches GPU Engine with RAPIDS cuDF for Enhanced Data Processing


Jessie
A
Ellis


Sep
17,
2024
15:38

Polars
releases
GPU
engine
powered
by
RAPIDS
cuDF,
boosting
data
processing
speeds
up
to
13x
on
NVIDIA
GPUs.
Now
available
in
open
beta.

Polars Launches GPU Engine with RAPIDS cuDF for Enhanced Data Processing

Polars
has
announced
the
release
of
its
new
GPU
engine,
powered
by
RAPIDS
cuDF,
which
significantly
enhances
data
processing
speeds
on
NVIDIA
GPUs.
This
advancement
allows
data
scientists
to
process
hundreds
of
millions
of
rows
of
data
in
seconds
on
a
single
machine,
according
to
the

NVIDIA
Technical
Blog
.

Growing
Data
Challenges

Traditional
data
processing
libraries
such
as
pandas
are
single-threaded
and
often
become
impractical
when
handling
datasets
beyond
a
few
million
rows.
While
distributed
data
processing
systems
can
manage
billions
of
rows,
they
introduce
complexity
and
overhead
for
smaller
datasets.
This
presents
a
gap
in
tools
that
can
efficiently
process
tens
of
millions
to
a
few
hundred
million
rows
of
data,
a
common
need
in
industries
such
as
finance,
retail,
and
manufacturing
for
tasks
like
model
development,
demand
forecasting,
and
logistics.

Polars,
a
rapidly
growing
Python
library
designed
for
data
scientists
and
engineers,
aims
to
address
these
challenges.
It
employs
advanced
query
optimizations
to
minimize
unnecessary
data
movement
and
processing,
enabling
smooth
handling
of
hundreds
of
millions
of
rows
on
a
single
machine.
Polars
offers
an
appealing
solution
for
medium-scale
data
processing,
bridging
the
gap
between
single-threaded
tools
and
complex
distributed
systems.

Bringing
NVIDIA
Accelerated
Computing
to
Polars

Polars
leverages
multi-threaded
execution,
advanced
memory
optimizations,
and
lazy
evaluation
to
deliver
significant
out-of-the-box
acceleration
compared
to
other
CPU-only
data
manipulation
tools.
However,
as
data
processing
demands
grow
across
various
industries,
higher
performance
is
required.
This
is
where
accelerated
computing
becomes
essential.

cuDF,
part
of
the
NVIDIA
RAPIDS
suite
of
CUDA-X
libraries,
is
a
GPU-accelerated
DataFrame
library
that
harnesses
the
massive
parallelism
of
GPUs
to
significantly
enhance
data
processing
performance.
By
partnering
with
NVIDIA,
the
Polars
team
has
integrated
the
speed
of
cuDF
with
Polars’
efficiency,
resulting
in
performance
boosts
of
up
to
13x
compared
to
CPU-based
Polars.
This
integration
allows
users
to
maintain
an
interactive
experience
even
as
their
data
processing
workloads
scale
to
hundreds
of
millions
or
billions
of
rows.

The
Polars
GPU
engine
is
built
directly
into
the
Polars
Lazy
API.
Users
can
access
GPU
acceleration
for
their
workflows
by
installing

polars[gpu]

via
pip
and
passing

[engine="gpu"]

to
the

collect

operation.
This
approach
ensures
efficient
execution
and
minimal
memory
usage
through
Polars’
query
optimizer,
full
compatibility
with
Polars’
ecosystem
of
data
visualization,
I/O,
and
machine
learning
libraries,
and
zero
changes
to
existing
Polars
code.

 pip install polars[gpu] --extra-index-url=https://pypi.nvidia.com import polars as pl (transactions .group_by("CUST_ID") .agg(pl.col("AMOUNT").sum()) .sort(by="AMOUNT", descending=True) .head() .collect(engine="gpu"))

Conclusion

The
Polars
GPU
engine
powered
by
RAPIDS
cuDF
is
now
available
in
open
beta,
offering
data
scientists
and
engineers
a
powerful
tool
for
medium-scale
data
processing.
By
accelerating
Polars
workflows
up
to
13x
on
NVIDIA
GPUs,
the
engine
efficiently
handles
datasets
of
hundreds
of
millions
of
rows
without
the
overhead
of
distributed
systems.
The
Polars
GPU
engine
is
seamlessly
integrated
into
the
Polars
API,
making
it
easily
accessible
to
all
users.

Getting
Started
with
the
Polars
GPU
Engine

For
more
information
and
to
get
started
with
the
Polars
GPU
engine,
visit
the
official

NVIDIA
Technical
Blog
.

Image
source:
Shutterstock

Comments are closed.