NVIDIA’s Blackwell Platform Breaks New Records in MLPerf Inference v4.1


Joerg
Hiller


Aug
29,
2024
07:18

NVIDIA’s
Blackwell
architecture
sets
new
benchmarks
in
MLPerf
Inference
v4.1,
showcasing
significant
performance
improvements
in
LLM
inference.

NVIDIA's Blackwell Platform Breaks New Records in MLPerf Inference v4.1

NVIDIA’s
new
Blackwell
architecture
has
set
unprecedented
benchmarks
in
the
latest
MLPerf
Inference
v4.1,
according
to
the

NVIDIA
Technical
Blog
.
The
platform,
introduced
at
NVIDIA
GTC
2024,
features
a
superchip
based
on
208
billion
transistors
and
employs
the
TSMC
4NP
process
tailored
for
NVIDIA,
making
it
the
largest
GPU
ever
built.

NVIDIA
Blackwell
Shines
in
MLPerf
Inference
Debut

In
its
inaugural
round
of
MLPerf
Inference
submissions,
NVIDIA’s
Blackwell
architecture
delivered
remarkable
results
on
the
Llama
2
70B
LLM
benchmark,
achieving
up
to
4x
higher
tokens
per
second
per
GPU
compared
to
the
previous
H100
GPU.
This
performance
leap
was
facilitated
by
the
new
second-generation
Transformer
Engine,
which
leverages
Blackwell
Tensor
Core
technology
and
TensorRT-LLM
innovations.

According
to
the
MLPerf
results,
Blackwell’s
FP4
Transformer
Engine
managed
to
execute
approximately
50%
of
the
workload
in
FP4,
reaching
a
delivered
math
throughput
of
5.2
petaflops.
The
Blackwell-based
submissions
were
in
the
closed
division,
meaning
the
models
were
unmodified
yet
met
high
accuracy
standards.

NVIDIA
H200
Tensor
Core
GPU’s
Outstanding
Performance

The
NVIDIA
H200
GPU,
an
upgrade
to
the
Hopper
architecture,
also
delivered
exceptional
results
across
all
benchmarks.
The
H200,
equipped
with
HBM3e
memory,
showed
significant
improvements
in
memory
capacity
and
bandwidth,
benefiting
memory-sensitive
applications.

For
example,
the
H200
achieved
notable
performance
gains
on
the
Llama
2
70B
benchmark,
with
a
14%
improvement
over
the
previous
round,
purely
through
software
enhancements
in
TensorRT-LLM.
Additionally,
the
H200’s
performance
surged
by
12%
when
its
thermal
design
power
(TDP)
was
increased
to
1,000
watts.

Jetson
AGX
Orin’s
Giant
Leap
in
Edge
AI

NVIDIA’s
Jetson
AGX
Orin
demonstrated
impressive
performance
improvements
in
generative
AI
at
the
edge,
achieving
up
to
6.2x
more
throughput
and
2.4x
better
latency
on
the
GPT-J
6B
parameter
LLM
benchmark.
This
was
made
possible
through
numerous
software
optimizations,
including
the
use
of
INT4
Activation-aware
Weight
Quantization
(AWQ)
and
in-flight
batching.

The
Jetson
AGX
Orin
platform
is
uniquely
positioned
to
run
complex
models
like
GPT-J,
vision
transformers,
and
Stable
Diffusion
at
the
edge,
providing
real-time,
actionable
insights
from
sensor
data
such
as
images
and
videos.

Conclusion

In
summary,
NVIDIA’s
Blackwell
architecture
has
set
new
standards
in
MLPerf
Inference
v4.1,
achieving
up
to
4x
the
performance
of
its
predecessor,
the
H100.
The
H200
GPU
continues
to
deliver
top-tier
performance
across
multiple
benchmarks,
while
Jetson
AGX
Orin
showcases
significant
advancements
in
edge
AI.

NVIDIA’s
continuous
innovation
across
the
technology
stack
ensures
it
remains
at
the
forefront
of
AI
inference
performance,
from
large-scale
data
centers
to
low-power
edge
devices.

Image
source:
Shutterstock

Comments are closed.