AMD Radeon PRO GPUs and ROCm Software Expand LLM Inference Capabilities


Felix
Pinkston


Aug
31,
2024
01:52

AMD’s
Radeon
PRO
GPUs
and
ROCm
software
enable
small
enterprises
to
leverage
advanced
AI
tools,
including
Meta’s
Llama
models,
for
various
business
applications.

AMD Radeon PRO GPUs and ROCm Software Expand LLM Inference Capabilities

AMD
has
announced
advancements
in
its
Radeon
PRO
GPUs
and
ROCm
software,
enabling
small
enterprises
to
leverage
Large
Language
Models
(LLMs)
like
Meta’s
Llama
2
and
3,
including
the
newly
released
Llama
3.1,
according
to

AMD.com
.

New
Capabilities
for
Small
Enterprises

With
dedicated
AI
accelerators
and
substantial
on-board
memory,
AMD’s
Radeon
PRO
W7900
Dual
Slot
GPU
offers
market-leading
performance
per
dollar,
making
it
feasible
for
small
firms
to
run
custom
AI
tools
locally.
This
includes
applications
such
as
chatbots,
technical
documentation
retrieval,
and
personalized
sales
pitches.
The
specialized
Code
Llama
models
further
enable
programmers
to
generate
and
optimize
code
for
new
digital
products.

The
latest
release
of
AMD’s
open
software
stack,
ROCm
6.1.3,
supports
running
AI
tools
on
multiple
Radeon
PRO
GPUs.
This
enhancement
allows
small
and
medium-sized
enterprises
(SMEs)
to
handle
larger
and
more
complex
LLMs,
supporting
more
users
simultaneously.

Expanding
Use
Cases
for
LLMs

While
AI
techniques
are
already
prevalent
in
data
analysis,
computer
vision,
and
generative
design,
the
potential
use
cases
for
AI
extend
far
beyond
these
areas.
Specialized
LLMs
like
Meta’s
Code
Llama
enable
app
developers
and
web
designers
to
generate
working
code
from
simple
text
prompts
or
debug
existing
code
bases.
The
parent
model,
Llama,
offers
extensive
applications
in
customer
service,
information
retrieval,
and
product
personalization.

Small
enterprises
can
utilize
retrieval-augmented
generation
(RAG)
to
make
AI
models
aware
of
their
internal
data,
such
as
product
documentation
or
customer
records.
This
customization
results
in
more
accurate
AI-generated
outputs
with
less
need
for
manual
editing.

Local
Hosting
Benefits

Despite
the
availability
of
cloud-based
AI
services,
local
hosting
of
LLMs
offers
significant
advantages:


  • Data
    Security:

    Running
    AI
    models
    locally
    eliminates
    the
    need
    to
    upload
    sensitive
    data
    to
    the
    cloud,
    addressing
    major
    concerns
    about
    data
    sharing.

  • Lower
    Latency:

    Local
    hosting
    reduces
    lag,
    providing
    instant
    feedback
    in
    applications
    like
    chatbots
    and
    real-time
    support.

  • Control
    Over
    Tasks:

    Local
    deployment
    allows
    technical
    staff
    to
    troubleshoot
    and
    update
    AI
    tools
    without
    relying
    on
    remote
    service
    providers.

  • Sandbox
    Environment:

    Local
    workstations
    can
    serve
    as
    sandbox
    environments
    for
    prototyping
    and
    testing
    new
    AI
    tools
    before
    full-scale
    deployment.

AMD’s
AI
Performance

For
SMEs,
hosting
custom
AI
tools
need
not
be
complex
or
expensive.
Applications
like
LM
Studio
facilitate
running
LLMs
on
standard
Windows
laptops
and
desktop
systems.
LM
Studio
is
optimized
to
run
on
AMD
GPUs
via
the
HIP
runtime
API,
leveraging
the
dedicated
AI
Accelerators
in
current
AMD
graphics
cards
to
boost
performance.

Professional
GPUs
like
the
32GB
Radeon
PRO
W7800
and
48GB
Radeon
PRO
W7900
offer
sufficient
memory
to
run
larger
models,
such
as
the
30-billion-parameter
Llama-2-30B-Q8.
ROCm
6.1.3
introduces
support
for
multiple
Radeon
PRO
GPUs,
enabling
enterprises
to
deploy
systems
with
multiple
GPUs
to
serve
requests
from
numerous
users
simultaneously.

Performance
tests
with
Llama
2
indicate
that
the
Radeon
PRO
W7900
offers
up
to
38%
higher
performance-per-dollar
compared
to
NVIDIA’s
RTX
6000
Ada
Generation,
making
it
a
cost-effective
solution
for
SMEs.

With
the
evolving
capabilities
of
AMD’s
hardware
and
software,
even
small
enterprises
can
now
deploy
and
customize
LLMs
to
enhance
various
business
and
coding
tasks,
avoiding
the
need
to
upload
sensitive
data
to
the
cloud.

Image
source:
Shutterstock

Comments are closed.