AMD Radeon PRO GPUs and ROCm Software Expand LLM Inference Capabilities

AMD
has
announced
advancements
in
its
Radeon
PRO
GPUs
and
ROCm
software,
enabling
small
enterprises
to
leverage
Large
Language
Models
(LLMs)
like
Meta’s
Llama
2
and
3,
including
the
newly
released
Llama
3.1,
according
to

AMD.com.

New
Capabilities
for
Small
Enterprises

With
dedicated
AI
accelerators
and
substantial
on-board
memory,
AMD’s
Radeon
PRO
W7900
Dual
Slot
GPU
offers
market-leading
performance
per
dollar,
making
it
feasible
for
small
firms
to
run
custom
AI
tools
locally.
This
includes
applications
such
as
chatbots,
technical
documentation
retrieval,
and
personalized
sales
pitches.
The
specialized
Code
Llama
models
further
enable
programmers
to
generate
and
optimize
code
for
new
digital
products.

The
latest
release
of
AMD’s
open
software
stack,
ROCm
6.1.3,
supports
running
AI
tools
on
multiple
Radeon
PRO
GPUs.
This
enhancement
allows
small
and
medium-sized
enterprises
(SMEs)
to
handle
larger
and
more
complex
LLMs,
supporting
more
users
simultaneously.

Expanding
Use
Cases
for
LLMs

While
AI
techniques
are
already
prevalent
in
data
analysis,
computer
vision,
and
generative
design,
the
potential
use
cases
for
AI
extend
far
beyond
these
areas.
Specialized
LLMs
like
Meta’s
Code
Llama
enable
app
developers
and
web
designers
to
generate
working
code
from
simple
text
prompts
or
debug
existing
code
bases.
The
parent
model,
Llama,
offers
extensive
applications
in
customer
service,
information
retrieval,
and
product
personalization.

Small
enterprises
can
utilize
retrieval-augmented
generation
(RAG)
to
make
AI
models
aware
of
their
internal
data,
such
as
product
documentation
or
customer
records.
This
customization
results
in
more
accurate
AI-generated
outputs
with
less
need
for
manual
editing.

Local
Hosting
Benefits

Despite
the
availability
of
cloud-based
AI
services,
local
hosting
of
LLMs
offers
significant
advantages:

Data
Security:
Running
AI
models
locally
eliminates
the
need
to
upload
sensitive
data
to
the
cloud,
addressing
major
concerns
about
data
sharing.
Lower
Latency:
Local
hosting
reduces
lag,
providing
instant
feedback
in
applications
like
chatbots
and
real-time
support.
Control
Over
Tasks:
Local
deployment
allows
technical
staff
to
troubleshoot
and
update
AI
tools
without
relying
on
remote
service
providers.
Sandbox
Environment:
Local
workstations
can
serve
as
sandbox
environments
for
prototyping
and
testing
new
AI
tools
before
full-scale
deployment.

AMD’s
AI
Performance

For
SMEs,
hosting
custom
AI
tools
need
not
be
complex
or
expensive.
Applications
like
LM
Studio
facilitate
running
LLMs
on
standard
Windows
laptops
and
desktop
systems.
LM
Studio
is
optimized
to
run
on
AMD
GPUs
via
the
HIP
runtime
API,
leveraging
the
dedicated
AI
Accelerators
in
current
AMD
graphics
cards
to
boost
performance.

Professional
GPUs
like
the
32GB
Radeon
PRO
W7800
and
48GB
Radeon
PRO
W7900
offer
sufficient
memory
to
run
larger
models,
such
as
the
30-billion-parameter
Llama-2-30B-Q8.
ROCm
6.1.3
introduces
support
for
multiple
Radeon
PRO
GPUs,
enabling
enterprises
to
deploy
systems
with
multiple
GPUs
to
serve
requests
from
numerous
users
simultaneously.

Performance
tests
with
Llama
2
indicate
that
the
Radeon
PRO
W7900
offers
up
to
38%
higher
performance-per-dollar
compared
to
NVIDIA’s
RTX
6000
Ada
Generation,
making
it
a
cost-effective
solution
for
SMEs.

With
the
evolving
capabilities
of
AMD’s
hardware
and
software,
even
small
enterprises
can
now
deploy
and
customize
LLMs
to
enhance
various
business
and
coding
tasks,
avoiding
the
need
to
upload
sensitive
data
to
the
cloud.

Image
source:
Shutterstock

AMD Radeon PRO GPUs and ROCm Software Expand LLM Inference Capabilities

New Capabilities for Small Enterprises

Expanding Use Cases for LLMs

Local Hosting Benefits

AMD’s AI Performance

New
Capabilities
for
Small
Enterprises

Expanding
Use
Cases
for
LLMs

Local
Hosting
Benefits

AMD’s
AI
Performance