NVIDIA Achieves Record Performance in Latest MLPerf Training Benchmarks


NVIDIA Achieves Record Performance in Latest MLPerf Training Benchmarks

The
full-stack
NVIDIA
accelerated
computing
platform
has
once
again
demonstrated
exceptional
performance
in
the
latest
MLPerf
Training
v4.0
benchmarks,
according
to
the

NVIDIA
Blog
.

Unprecedented
Performance
in
Large
Language
Models

NVIDIA
more
than
tripled
its
performance
on
the
large
language
model
(LLM)
benchmark,
based
on
GPT-3
175B,
compared
to
its
previous
record-setting
submission.
This
feat
was
achieved
using
an
AI
supercomputer
featuring
11,616
NVIDIA
H100
Tensor
Core
GPUs
connected
with
NVIDIA
Quantum-2
InfiniBand
networking,
a
significant
increase
from
the
3,584
H100
GPUs
used
last
year.
This
scalability
showcases
the
extensive
full-stack
engineering
efforts
by
NVIDIA.

The
scalability
of
the
NVIDIA
AI
platform
enables
faster
training
of
massive
AI
models
like
GPT-3
175B,
translating
into
significant
business
opportunities.
For
instance,
NVIDIA’s
recent
earnings
call
highlighted
that
LLM
service
providers
could
potentially
turn
a
single
dollar
invested
into
seven
dollars
over
four
years
by
running
the
Llama
3
70B
model
on
NVIDIA
HGX
H200
servers.

NVIDIA
H200
GPU:
Pushing
Boundaries

The
NVIDIA
H200
Tensor
GPU,
built
on
the
Hopper
architecture,
offers
141GB
of
HBM3
memory
and
over
40%
more
memory
bandwidth
compared
to
the
H100
GPU.
In
its
MLPerf
Training
debut,
the
H200
extended
the
H100’s
performance
by
up
to
47%,
pushing
the
boundaries
of
AI
training
capabilities.

Software
Optimizations
Drive
Performance
Gains

NVIDIA
also
reported
a
27%
performance
boost
in
its
512
H100
GPU
configuration
compared
to
the
previous
year,
thanks
to
numerous
software
stack
optimizations.
This
improvement
underscores
the
impact
of
continuous
software
enhancements
on
performance,
even
with
existing
hardware.

The
submission
highlighted
nearly
perfect
scaling,
with
performance
increasing
proportionally
as
the
number
of
GPUs
rose
from
3,584
to
11,616.

Excellence
in
LLM
Fine-Tuning

LLM
fine-tuning,
a
critical
workload
for
enterprises
customizing
pretrained
large
language
models,
was
also
a
highlight.
NVIDIA
excelled
in
this
area,
scaling
from
eight
to
1,024
GPUs
and
completing
the
benchmark
in
a
record
1.5
minutes.

Accelerating
Stable
Diffusion
and
GNN
Training

NVIDIA
achieved
up
to
an
80%
increase
in
Stable
Diffusion
v2
training
performance
at
the
same
system
scales
as
the
previous
round.
Additionally,
the
H200
GPU
delivered
a
47%
boost
in
single-node
graph
neural
network
(GNN)
training
compared
to
the
H100,
demonstrating
the
powerful
performance
and
efficiency
of
NVIDIA
GPUs
for
various
AI
applications.

Broad
Ecosystem
Support

The
breadth
of
the
NVIDIA
AI
ecosystem
was
evident
with
10
partners,
including
ASUS,
Dell
Technologies,
and
Lenovo,
submitting
their
own
impressive
benchmark
results.
This
widespread
participation
underscores
the
industry’s
trust
in
NVIDIA’s
AI
platform.

MLCommons
continues
to
play
a
vital
role
in
AI
computing
by
enabling
peer-reviewed
comparisons
of
AI
and
HPC
platforms.
This
is
crucial
for
guiding
important
purchasing
decisions
in
a
rapidly
evolving
field.

Looking
ahead,
the
NVIDIA
Blackwell
platform
promises
next-level
AI
performance
for
trillion-parameter
generative
AI
models,
both
in
training
and
inference.



Image
source:
Shutterstock

.
.
.

Tags

Comments are closed.