AMD Unveils ROCm 6.2: Boosting AI and HPC Performance with New Enhancements

AMD
has
announced
the
release
of
ROCm
6.2,
a
major
update
aimed
at
enhancing
the
performance,
efficiency,
and
scalability
of
AI
and
high-performance
computing
(HPC)
applications.
According
to

AMD.com,
this
release
includes
several
key
improvements
that
solidify
ROCm’s
position
as
a
leading
platform
for
AI
and
HPC
development.

Extending
vLLM
Support

ROCm
6.2
expands
vLLM
support
to
improve
the
efficiency
and
scalability
of
AI
models
on
AMD
Instinct™
Accelerators.
Designed
for
large
language
models
(LLMs),
vLLM
addresses
key
inferencing
challenges
such
as
efficient
multi-GPU
computation,
reduced
memory
usage,
and
minimized
computational
bottlenecks.
This
update
enables
various
upstream
vLLM
features
like
multi-GPU
execution
and
FP8
KV
cache,
making
it
easier
for
developers
to
tackle
complex
AI
tasks.

Bitsandbytes
Quantization

The
inclusion
of
the
Bitsandbytes
quantization
library
in
ROCm
6.2
significantly
boosts
memory
efficiency
and
performance
on
AMD
Instinct™
GPU
accelerators.
Utilizing
8-bit
optimizers,
it
reduces
memory
usage
during
AI
training,
allowing
developers
to
work
with
larger
models
on
limited
hardware.
The
LLM.Int8()
quantization
optimizes
AI
deployment,
making
advanced
AI
capabilities
more
accessible
and
cost-effective.

New
Offline
Installer
Creator

The
new
ROCm
Offline
Installer
Creator
simplifies
the
installation
process
for
systems
without
internet
access.
It
creates
a
single
installer
file
that
includes
all
necessary
dependencies,
making
deployment
straightforward.
This
tool
integrates
functionalities
into
a
unified
interface,
automates
post-installation
tasks,
and
ensures
correct
and
consistent
installations,
improving
overall
system
stability.

Omnitrace
and
Omniperf
Profiler
Tools

The
introduction
of
Omnitrace
and
Omniperf
Profiler
Tools
(Beta)
in
ROCm
6.2
aims
to
revolutionize
AI
and
HPC
development.
Omnitrace
provides
a
holistic
view
of
system
performance
across
CPUs,
GPUs,
NICs,
and
network
fabrics,
while
Omniperf
offers
detailed
GPU
kernel
analysis
for
fine-tuning.
These
tools
help
developers
identify
and
resolve
performance
bottlenecks,
ensuring
efficient
resource
utilization
and
faster
AI
training
and
HPC
simulations.

Broader
FP8
Support

ROCm
6.2
extends
FP8
support
across
its
ecosystem,
enhancing
AI
inferencing
by
addressing
memory
bottlenecks
and
high
latency
associated
with
higher
precision
formats.
The
update
includes
FP8
GEMM
support
in
PyTorch
and
JAX,
FP8-specific
collective
operations
in
RCCL,
and
FP8-based
Fused
Flash
attention
in
MIOPEN.
These
enhancements
enable
more
efficient
training
and
inference
processes,
maximizing
throughput
and
reducing
latency.

AMD
continues
to
demonstrate
its
commitment
to
providing
robust,
competitive,
and
innovative
solutions
for
the
AI
and
HPC
community
with
the
ROCm
6.2
release.
Developers
now
have
the
tools
and
support
needed
to
push
the
boundaries
of
what’s
possible,
fostering
confidence
in
ROCm
as
the
open
platform
of
choice
for
next-generation
computational
tasks.

Discover
the
range
of
new
features
introduced
in
ROCm
6.2
by
reviewing
the

release
notes.

Image
source:
Shutterstock

AMD Unveils ROCm 6.2: Boosting AI and HPC Performance with New Enhancements

Extending vLLM Support

Bitsandbytes Quantization

New Offline Installer Creator

Omnitrace and Omniperf Profiler Tools

Broader FP8 Support

Extending
vLLM
Support

Bitsandbytes
Quantization

New
Offline
Installer
Creator

Omnitrace
and
Omniperf
Profiler
Tools

Broader
FP8
Support