AMD ROCm 6.1 Enhances AI and HPC Performance with New Capabilities


AMD ROCm 6.1 Enhances AI and HPC Performance with New Capabilities

AMD
has
unveiled
ROCm
6.1,
the
latest
iteration
of
its
open-source
software
platform
designed
to
maximize
the
performance
of
AMD
Instinct™
accelerators.
According
to

AMD.com
,
the
update
brings
a
host
of
new
features
and
enhancements
aimed
at
AI
and
high-performance
computing
(HPC)
developers.

Enhanced
GPU
Support
and
Ecosystem
Expansion

ROCm
6.1
significantly
expands
its
support
for
AMD
Instinct™
and
Radeon™
GPUs.
The
update
includes
optimizations
across
various
computational
domains
and
extends
ecosystem
support
to
keep
up
with
rapid
advancements
in
AI
frameworks.
These
enhancements
aim
to
improve
the
stability
and
performance
of
applications,
enabling
developers
to
push
the
boundaries
of
AI
and
HPC.

New
Video
Decoding
Capabilities

The
new
ROCm
library
introduces
high-performance
video
decoding
directly
on
the
GPU,
utilizing
the
Video
Core
Next
(VCN)
engines
built
into
AMD
GPUs.
This
feature,
known
as
rocDecode,
allows
compressed
video
to
be
decoded
directly
into
video
memory,
minimizing
data
transfers
over
the
PCIe
bus
and
eliminating
common
bottlenecks
in
video
processing.
This
capability
is
crucial
for
real-time
applications
like
video
scaling,
color
conversion,
and
augmentation,
which
are
essential
for
advanced
analytics,
inferencing,
and
machine
learning
training.

Advanced
Model
Inference
with
MIGraphX

MIGraphX,
the
AMD
graph
inference
engine,
receives
significant
updates
in
ROCm
6.1.
The
engine
now
supports
Flash
Attention,
which
enhances
the
memory
efficiency
of
transformer-based
models
like
BERT,
GPT,
and
Stable
Diffusion.
Additionally,
a
new
Torch-MIGraphX
library
integrates
MIGraphX
capabilities
directly
into
PyTorch
workflows,
supporting
a
range
of
data
types
including
FP32,
FP16,
and
INT8.

Improved
Deep
Learning
with
MIOpen

MIOpen,
AMD’s
open-source
deep-learning
primitives
library,
also
sees
notable
improvements.
ROCm
6.1
introduces
Find
2.0
fusion
plans
to
optimize
inference
tasks
and
updates
convolution
kernels
for
the
NHWC
format,
enhancing
performance
in
various
applications.
These
updates
aim
to
optimize
memory
bandwidth
and
GPU
launch
overheads,
crucial
for
efficient
deep
learning
operations.

Composable
Kernel
and
hipSPARSELt
Enhancements

The
Composable
Kernel
(CK)
library
in
ROCm
6.1
now
supports
stochastic
rounding,
replacing
the
traditional
FP8
rounding
logic.
This
method
improves
model
convergence,
offering
a
more
accurate
approach
to
handling
data
within
machine
learning
models.
Additionally,
hipSPARSELt
introduces
support
for
structured
sparsity
matrices,
enhancing
the
flexibility
and
performance
of
Sparse
Matrix-Matrix
Multiplication
(SPMM)
operations.

Advanced
Tensor
Operations
with
hipTensor

hipTensor,
AMD’s
dedicated
C++
library
for
accelerating
tensor
operations,
introduces
support
for
4D
tensor
permutation
and
contraction.
This
update
broadens
the
scope
of
operations
that
can
be
optimized
by
hipTensor,
essential
for
complex
computational
tasks
such
as
neural
network
training
and
advanced
simulations.

Overall,
the
ROCm
6.1
update
aims
to
provide
developers
with
powerful
tools
to
unlock
their
innovative
potential.
Each
enhancement
is
designed
to
improve
performance,
streamline
workflows,
and
help
developers
achieve
their
goals
more
efficiently.

Image
source:
Shutterstock

Comments are closed.