NVIDIA Unveils DoRA: A Superior Fine-Tuning Method for AI Models


NVIDIA Unveils DoRA: A Superior Fine-Tuning Method for AI Models

NVIDIA
has
announced
the
development
of
a
new
fine-tuning
method
called
DoRA
(Weight-Decomposed
Low-Rank
Adaptation),
which
offers
a
high-performing
alternative
to
the
widely
used
Low-Rank
Adaptation
(LoRA).
According
to
the

NVIDIA
Technical
Blog
,
DoRA
enhances
both
the
learning
capacity
and
stability
of
LoRA
without
introducing
any
additional
inference
overhead.

Advantages
of
DoRA

DoRA
has
demonstrated
significant
performance
improvements
across
various
large
language
models
(LLMs)
and
vision
language
models
(VLMs).
For
instance,
in
common-sense
reasoning
tasks,
DoRA
outperformed
LoRA
with
improvements
such
as
+3.7
points
on
Llama
7B
and
+4.4
points
on
Llama
3
8B.
Additionally,
DoRA
showed
better
results
in
multi-turn
benchmarks,
image/video-text
understanding,
and
visual
instruction
tuning
tasks.

This
innovative
method
has
been
accepted
as
an
oral
paper
at
ICML
2024,
marking
its
credibility
and
potential
impact
in
the
field
of
machine
learning.

Mechanics
of
DoRA

DoRA
operates
by
decomposing
the
pretrained
weight
into
its
magnitude
and
directional
components,
fine-tuning
both.
The
method
leverages
LoRA
for
directional
adaptation,
ensuring
efficient
fine-tuning.
After
the
training
process,
DoRA
merges
the
fine-tuned
components
back
into
the
pretrained
weight,
avoiding
any
additional
latency
during
inference.

Visualizations
of
the
magnitude
and
directional
differences
between
DoRA
and
pretrained
weights
reveal
that
DoRA
makes
substantial
directional
adjustments
with
minimal
changes
in
magnitude,
closely
resembling
full
fine-tuning
(FT)
learning
patterns.

Performance
Across
Models

In
various
performance
benchmarks,
DoRA
consistently
outperforms
LoRA.
For
example,
in
large
language
models,
DoRA
significantly
enhances
commonsense
reasoning
abilities
and
conversation/instruction-following
capabilities.
In
vision
language
models,
DoRA
shows
superior
results
in
image-text
and
video-text
understanding,
as
well
as
visual
instruction
tuning
tasks.

Large
Language
Models

Comparative
studies
highlight
that
DoRA
surpasses
LoRA
in
commonsense
reasoning
benchmarks
and
multi-turn
benchmarks.
In
tests,
DoRA
achieved
higher
average
scores
across
various
datasets,
indicating
its
robust
performance.

Vision
Language
Models

DoRA
also
excels
in
vision
language
models,
outperforming
LoRA
in
tasks
like
image-text
understanding,
video-text
understanding,
and
visual
instruction
tuning.
The
method’s
efficacy
is
evident
in
higher
average
scores
across
multiple
benchmarks.

Compression-Aware
LLMs

DoRA
can
be
integrated
into
the
QLoRA
framework,
enhancing
the
accuracy
of
low-bit
pretrained
models.
Collaborative
efforts
with
Answer.AI
on
the
QDoRA
project
showed
that
QDoRA
outperforms
both
FT
and
QLoRA
on
Llama
2
and
Llama
3
models.

Text-to-Image
Generation

DoRA’s
application
extends
to
text-to-image
personalization
with
DreamBooth,
yielding
significantly
better
results
than
LoRA
in
challenging
datasets
like
3D
Icon
and
Lego
sets.

Implications
and
Future
Applications

DoRA
is
poised
to
become
a
default
choice
for
fine-tuning
AI
models,
compatible
with
LoRA
and
its
variants.
Its
efficiency
and
effectiveness
make
it
a
valuable
tool
for
adapting
foundation
models
to
various
applications,
including
NVIDIA
Metropolis,
NVIDIA
NeMo,
NVIDIA
NIM,
and
NVIDIA
TensorRT.

For
more
detailed
information,
visit
the

NVIDIA
Technical
Blog
.

Image
source:
Shutterstock

Comments are closed.