NVIDIA Utilizes Synthetic Data to Enhance Multi-Camera Tracking Accuracy

Large-scale,
use-case-specific
synthetic
data
is
becoming
increasingly
significant
in
real-world
computer
vision
and
AI
workflows.
By
leveraging
digital
twins,
NVIDIA
is
revolutionizing
the
creation
of
physics-based
virtual
replicas
of
environments
such
as
factories
and
retail
spaces,
enabling
precise
simulations
of
real-world
settings,
according
to
the

NVIDIA
Technical
Blog.

Enhancing
AI
with
Synthetic
Data

NVIDIA
Isaac
Sim,
built
on
NVIDIA
Omniverse,
is
a
comprehensive
application
designed
to
facilitate
the
design,
simulation,
testing,
and
training
of
AI-enabled
robots.
The
Omni.Replicator.Agent
(ORA)
extension
in
Isaac
Sim
is
specifically
used
for
generating
synthetic
data
to
train
computer
vision
models,
including
the
TAO
PeopleNet
Transformer
and
TAO
ReIdentificationNet
Transformer.

This
approach
is
part
of
NVIDIA’s
broader
strategy
to
improve
multi-camera
tracking
(MTMC)
vision
AI
applications.
By
generating
high-quality
synthetic
data
and
fine-tuning
base
models
for
specific
use
cases,
NVIDIA
aims
to
enhance
the
accuracy
and
robustness
of
these
models.

Overview
of
ReIdentificationNet

ReIdentificationNet
(ReID)
is
a
network
used
in
MTMC
and
Real-Time
Location
System
(RTLS)
applications
to
track
and
identify
objects
across
different
camera
views.
It
extracts
embeddings
from
detected
object
crops,
capturing
essential
information
such
as
appearance,
texture,
color,
and
shape.
This
enables
the
identification
of
similar
objects
across
multiple
cameras.

Accurate
ReID
models
are
crucial
for
multi-camera
tracking,
as
they
help
associate
objects
across
different
camera
views
and
maintain
continuous
tracking.
The
accuracy
of
these
models
can
be
significantly
improved
by
fine-tuning
them
with
synthetic
data
generated
from
ORA.

Model
Architecture
and
Pretraining

The
ReIdentificationNet
model
uses
RGB
image
crops
of
size
256
x
128
as
inputs
and
outputs
an
embedding
vector
of
size
256
for
each
image
crop.
The
model
supports
ResNet-50
and
Swin
transformer
backbones,
with
the
Swin
variant
being
a
human-centric
foundational
model
pretrained
on
approximately
3
million
image
crops.

For
pretraining,
NVIDIA
adopted
a
self-supervised
learning
technique
called
SOLIDER,
built
on
DINO
(self-DIstillation
with
NO
labels).
SOLIDER
uses
prior
knowledge
of
human-image
crops
to
generate
pseudo-semantic
labels,
which
train
the
human
representations
with
semantic
information.
The
pretraining
dataset
includes
a
combination
of
NVIDIA
proprietary
datasets
and
Open
Images
V5.

Fine-tuning
the
ReID
Model

Fine-tuning
involves
training
the
pretrained
model
on
various
supervised
person
re-identification
datasets,
which
include
both
synthetic
and
real
NVIDIA
proprietary
datasets.
This
process
helps
mitigate
issues
like
ID
switches,
which
occur
when
the
system
incorrectly
associates
IDs
due
to
high
visual
similarity
between
different
individuals
or
changes
in
appearance
over
time.

To
fine-tune
the
ReID
model,
NVIDIA
recommends
generating
synthetic
data
using
ORA,
ensuring
that
the
model
learns
the
unique
characteristics
and
nuances
of
the
specific
environment.
This
leads
to
more
reliable
identification
and
tracking.

Simulation
and
Data
Generation

The
Isaac
Sim
and
Omniverse
Replicator
Agent
extension
are
used
to
generate
synthetic
data
for
training
the
ReID
model.
Best
practices
for
configuring
the
simulation
include
considering
factors
such
as
character
count,
character
uniqueness,
camera
placement,
and
character
behavior.

Character
count
and
uniqueness
are
crucial
for
ReIdentificationNet,
as
the
model
benefits
from
a
higher
number
of
unique
identities.
Camera
placement
is
also
important,
as
cameras
should
be
positioned
to
cover
the
entire
floor
area
where
characters
are
expected
to
be
detected
and
tracked.
Character
behavior
can
be
customized
in
Isaac
Sim
ORA
to
provide
flexibility
and
variety
in
their
movement.

Training
and
Evaluation

Once
the
synthetic
data
is
generated,
it
is
prepared
and
sampled
for
training
the
TAO
ReIdentificationNet
model.
Training
tricks
such
as
using
ID
loss,
triplet
loss,
center
loss,
random
erasing
augmentation,
warmup
learning
rate,
BNNeck,
and
label
smoothing
can
enhance
the
accuracy
of
the
ReID
model
during
the
fine-tuning
process.

Evaluation
scripts
are
used
to
verify
the
accuracy
of
the
ReID
model
before
and
after
fine-tuning.
Metrics
such
as
rank-1
accuracy
and
mean
average
precision
(mAP)
are
used
to
evaluate
the
model’s
performance.
Fine-tuning
with
synthetic
data
has
been
shown
to
significantly
boost
accuracy
scores,
as
demonstrated
by
NVIDIA’s
internal
tests.

Deployment
and
Conclusion

After
fine-tuning,
the
ReID
model
can
be
exported
to
ONNX
format
for
deployment
in
MTMC
or
RTLS
applications.
This
workflow
enables
developers
to
enhance
ReID
models’
accuracy
without
the
need
for
extensive
labeling
efforts,
leveraging
the
flexibility
of
ORA
and
the
developer-friendly
TAO
API.

Image
source:
Shutterstock

NVIDIA Utilizes Synthetic Data to Enhance Multi-Camera Tracking Accuracy

Enhancing AI with Synthetic Data

Overview of ReIdentificationNet

Model Architecture and Pretraining

Fine-tuning the ReID Model

Simulation and Data Generation

Training and Evaluation

Deployment and Conclusion