Llama-3 Fine-Tuning Achieves 90% of GPT-4’s Performance at Lower Cost


Luisa
Crawford


Jul
14,
2024
02:46

Llama-3
fine-tuning
demonstrates
significant
performance
gains,
achieving
90%
of
GPT-4’s
accuracy
at
a
fraction
of
the
cost,
according
to
together.ai.

Llama-3 Fine-Tuning Achieves 90% of GPT-4's Performance at Lower Cost

The
success
of
Llama-3
has
been
remarkable,
showcasing
that
open-source
models
are
closing
the
gap
with
their
closed-source
counterparts,
according
to

together.ai
.
By
leveraging
proprietary
data,
customers
have
been
able
to
fine-tune
smaller
open-source
software
(OSS)
models
like
Llama-3
to
achieve
higher
accuracy
than
top-tier
closed-source
models.

Fine-Tuning
Process

Together
AI’s
platform
allows
users
to
fine-tune
Llama-3-8B
on
proprietary
data,
creating
custom
models
that
outperform
larger
OSS
alternatives
like
Llama-3-70B
and
are
comparable
to
leading
closed-source
models
like
GPT-4,
all
at
a
fraction
of
the
cost.
A
detailed
guide
demonstrates
how
a
fine-tuned
Llama-3
8B
model
improved
from
47%
accuracy
to
65%,
surpassing
Llama-3-70B’s
64%
and
nearing
GPT-4’s
71%
accuracy.

The
fine-tuning
process
involves
several
steps,
including
dataset
transformation,
uploading
and
verifying
datasets,
starting
a
fine-tuning
job,
and
running
evaluations
to
compare
the
results.
The
initial
step
requires
downloading
the
Math
Instruct
dataset
from
HuggingFace,
cleaning
it
up,
and
transforming
it
into
a
JSONL
file
format
suitable
for
Together’s
platform.

Dataset
Transformation

The
transformation
process
involves
loading
the
original
JSON
data,
defining
the
Llama-3
prompt
format,
and
converting
the
data
into
the
correct
format.
This
formatted
dataset
is
then
validated
using
Together’s
SDK
before
being
uploaded
for
fine-tuning.

Uploading
and
Fine-Tuning

Once
the
dataset
is
prepared,
it
is
uploaded
to
Together
AI
via
the
Python
SDK.
The
fine-tuning
job
is
then
created
using
the
Llama-3-8B
base
model,
specifying
the
dataset,
number
of
epochs,
and
other
parameters.
Users
can
monitor
the
fine-tuning
job
through
Together
AI’s
dashboard.

Evaluation
and
Results

After
fine-tuning,
the
model’s
performance
is
evaluated
using
1000
math
problems.
The
fine-tuned
Llama-3-8B
model’s
accuracy
is
compared
to
the
base
Llama-3-8B,
Llama-3-70B,
and
GPT-4.
The
fine-tuned
model
achieved
a
65.2%
accuracy,
outperforming
the
base
model’s
47.2%
and
Llama-3-70B’s
64.2%,
and
coming
close
to
GPT-4’s
71.4%
accuracy.

The
results
indicate
that
the
fine-tuned
Llama-3-8B
model
outperformed
the
base
model
by
nearly
20%,
surpassed
the
top
OSS
model
Llama-3-70B,
and
achieved
over
90%
of
GPT-4’s
accuracy.
Additionally,
the
fine-tuned
model
is
faster,
50
times
cheaper
than
GPT-4,
and
offers
full
ownership
of
the
model
and
weights.

Conclusion

This
fine-tuning
approach
demonstrates
that
small
open-source
models
like
Llama-3-8B
can
be
customized
to
perform
specific
tasks
with
high
accuracy,
speed,
and
cost-efficiency.
Users
can
leverage
their
proprietary
data
to
fine-tune
a
model
and
either
host
it
on
Together
AI
or
run
it
independently,
maintaining
full
control
and
ownership.

The
Llama-3-8B
model
trained
on
math
problems
outperformed
leading
OSS
models
and
approached
GPT-4’s
performance,
with
a
total
fine-tuning
cost
of
less
than
$100
on
Together
AI.

Image
source:
Shutterstock

Comments are closed.