NVIDIA FLARE Enhances Federated XGBoost for Efficient Machine Learning

According
to
the

NVIDIA
Technical
Blog,
NVIDIA
has
introduced
significant
enhancements
to
Federated
XGBoost
with
its
Federated
Learning
Application
Runtime
Environment
(FLARE).
This
integration
aims
to
make
federated
learning
more
practical
and
productive,
particularly
in
machine
learning
tasks
such
as
regression,
classification,
and
ranking.

Key
Features
of
Federated
XGBoost

XGBoost,
a
machine
learning
algorithm
known
for
its
scalability
and
effectiveness,
has
been
widely
used
for
various
data
science
tasks.
The
introduction
of
Federated
XGBoost
in
version
1.7.0
allowed
multiple
institutions
to
train
XGBoost
models
collaboratively
without
sharing
data.
The
subsequent
version
2.0.0
further
enhanced
this
capability
to
support
vertical
federated
learning,
allowing
for
more
complex
data
structures.

NVIDIA
FLARE,
since
2023,
has
built-in
integration
with
these
Federated
XGBoost
features,
including
horizontal
histogram-based
and
tree-based
XGBoost,
as
well
as
vertical
XGBoost.
Additionally,
support
for
Private
Set
Intersection
(PSI)
for
sample
alignment
has
been
added,
making
it
possible
to
conduct
federated
learning
without
extensive
coding
requirements.

Running
Multiple
Experiments
Concurrently

One
of
the
standout
features
of
NVIDIA
FLARE
is
its
ability
to
run
multiple
concurrent
XGBoost
training
experiments.
This
capability
allows
data
scientists
to
test
various
hyperparameters
or
feature
combinations
simultaneously,
thereby
reducing
the
overall
training
time.
NVIDIA
FLARE
manages
the
communication
multiplexing,
eliminating
the
need
for
opening
new
ports
for
each
job.

Figure
1.
Two
concurrent
XGBoost
jobs
with
a
unique
set
of
features.
Each
job
has
two
clients
shown
as
two
visible
curves

Fault-Tolerant
XGBoost
Training

In
cross-region
or
cross-border
training
scenarios,
network
reliability
can
be
a
significant
issue.
NVIDIA
FLARE
addresses
this
with
its
fault-tolerant
features,
which
automatically
handle
message
retries
during
network
interruptions.
This
ensures
resilience
and
maintains
data
integrity
throughout
the
training
process.

Figure
2.
XGBoost
communication
is
routed
through
the
NVIDIA
FLARE
Communicator
layer

Federated
Experiment
Tracking

Monitoring
training
and
evaluation
metrics
is
crucial,
especially
in
distributed
settings
like
federated
learning.
NVIDIA
FLARE
integrates
with
various
experiment
tracking
systems,
including
MLflow,
Weights
&
Biases,
and
TensorBoard,
to
provide
comprehensive
monitoring
capabilities.
Users
can
choose
between
decentralized
and
centralized
tracking
configurations
based
on
their
needs.

Figure
3.
Metrics
streaming
to
the
FL
server
or
clients
and
delivered
to
different
experiment
tracking
systems

Adding
tracking
to
an
experiment
is
straightforward
and
requires
minimal
code
changes.
For
instance,
integrating
MLflow
tracking
involves
just
three
lines
of
code:

 from nvflare.client.tracking import MLflowWriter
mlflow = MLflowWriter()
mlflow.log_metric("loss", running_loss / 2000, global_step)

Summary

NVIDIA
FLARE
2.4.x
offers
robust
support
for
Federated
XGBoost,
making
federated
learning
more
efficient
and
reliable.
For
more
detailed
information,
refer
to
the

NVIDIA
FLARE
2.4
branch
on
GitHub
and
the

NVIDIA
FLARE
2.4
documentation.

Image
source:
Shutterstock

NVIDIA FLARE Enhances Federated XGBoost for Efficient Machine Learning

Key Features of Federated XGBoost

Running Multiple Experiments Concurrently

Fault-Tolerant XGBoost Training

Federated Experiment Tracking

Summary

Key
Features
of
Federated
XGBoost

Running
Multiple
Experiments
Concurrently

Fault-Tolerant
XGBoost
Training

Federated
Experiment
Tracking