Jockey: Leveraging Twelve Labs APIs and LangGraph for Advanced Video Processing

Jockey,
an
open-source
conversational
video
agent,
has
been
significantly
enhanced
through
the
integration
of
Twelve
Labs
APIs
and
LangGraph.
This
combination
aims
to
provide
more
intelligent
and
efficient
video
processing
capabilities,
according
to
a
recent

LangChain
Blog
post.

Overview
of
Twelve
Labs
APIs

Twelve
Labs
offers
state-of-the-art
video
understanding
APIs
that
extract
rich
insights
and
information
directly
from
video
content.
These
advanced
video
foundation
models
(VFMs)
work
natively
with
video,
bypassing
intermediary
representations
like
pre-generated
captions.
This
allows
for
a
more
accurate
and
contextual
understanding
of
video
content,
including
visuals,
audio,
on-screen
text,
and
temporal
relationships.

The
APIs
support
various
functionalities,
such
as
video
search,
classification,
summarization,
and
question
answering.
They
can
be
integrated
into
applications
for
content
discovery,
video
editing
automation,
interactive
video
FAQs,
and
AI-generated
highlight
reels.
With
enterprise-grade
security
and
scalability,
Twelve
Labs
APIs
open
up
new
possibilities
for
video-powered
applications.

LangGraph
v0.1
and
LangGraph
Cloud
Launch

LangChain
has
introduced

LangGraph
v0.1,
a
framework
designed
for
building
agentic
and
multi-agent
applications
with
enhanced
control
and
precision.
Unlike
its
predecessor,
LangChain
AgentExecutor,
LangGraph
provides
a
flexible
API
for
custom
cognitive
architectures,
allowing
developers
to
control
code
flow,
prompts,
and
LLM
calls.
It
also
supports
human-agent
collaboration
through
a
built-in
persistence
layer,
enabling
human
approval
before
task
execution
and ‘time
travel’
for
editing
and
resuming
agent
actions.

To
complement
this
framework,
LangChain
has
also
launched

LangGraph
Cloud,
currently
in
closed
beta.
This
service
provides
scalable
infrastructure
for
deploying
LangGraph
agents,
managing
horizontally-scaling
servers
and
task
queues
to
handle
numerous
concurrent
users
and
store
large
states.
LangGraph
Cloud
integrates
with
LangGraph
Studio
for
visualizing
and
debugging
agent
trajectories,
facilitating
rapid
iteration
and
feedback
for
developers.

How
Jockey
Leverages
LangGraph
and
Twelve
Labs
APIs

Jockey,
in
its
latest
v1.1
release,
now
utilizes
LangGraph
for
enhanced
scalability
and
functionality.
Originally
built
on
LangChain,
Jockey’s
new
architecture
offers
more
efficient
and
precise
control
over
complex
video
workflows.
This
transition
marks
a
significant
advancement,
enabling
better
management
of
video
processing
tasks.

Jockey
combines
Large
Language
Models
(LLMs)
with
Twelve
Labs’
specialized
video
APIs
through
LangGraph’s
flexible
framework.
The
intricate
network
of
nodes
within
LangGraph
UI
illustrates
Jockey’s
decision-making
process,
including
components
like
the
supervisor,
planner,
video-editing,
video-search,
and
video-text-generation
nodes.
This
granular
control
optimizes
token
usage
and
guides
node
responses,
resulting
in
more
efficient
video
processing.

The
data-flow
diagram
of
Jockey
shows
how
information
moves
through
the
system,
from
initial
query
input
to
complex
video
processing
steps.
This
involves
retrieving
videos
from
Twelve
Labs
APIs,
segmenting
content
as
needed,
and
presenting
final
results
to
the
user.

Jockey
Architecture
Overview

Jockey’s
architecture
is
designed
to
handle
complex
video-related
tasks
through
a
multi-agent
system
comprising
the
Supervisor,
Planner,
and
Workers.
The
Supervisor
acts
as
the
central
coordinator,
routing
tasks
between
nodes
and
managing
the
workflow.
The
Planner
creates
detailed
plans
for
complex
requests,
while
the
Workers
execute
tasks
using
specialized
tools
like
video
search,
text
generation,
and
editing.

This
architecture
allows
Jockey
to
adapt
dynamically
to
different
queries,
from
simple
text
responses
to
complex
video
manipulation
tasks.
LangGraph’s
framework
helps
manage
the
state
between
nodes,
optimize
token
usage,
and
provide
granular
control
over
each
step
in
the
video
processing
workflow.

Customizing
Jockey

Jockey’s
modular
design
facilitates
customization
and
extension.
Developers
can
modify
prompts,
extend
the
state
for
more
complex
scenarios,
or
add
new
workers
to
address
specific
use
cases.
This
flexibility
makes
Jockey
a
versatile
foundation
for
building
advanced
video
AI
applications.

For
example,
developers
can
create
prompts
that
instruct
Jockey
to
identify
specific
scenes
from
videos
without
changing
the
core
system.
More
substantial
customizations
can
involve
modifying
prompts,
extending
state
management,
or
adding
new
specialized
workers
for
tasks
like
advanced
video
effects
or
video
generation.

Conclusion

Jockey
represents
a
powerful
fusion
of
LangGraph’s
agent
framework
and
Twelve
Labs’
video
understanding
APIs,
opening
new
possibilities
for
intelligent
video
processing
and
interaction.
Developers
can
explore
Jockey’s
capabilities
by
visiting
the

Jockey
GitHub
repository
or
accessing
the

LangGraph
documentation
for
more
details.

Image
source:
Shutterstock

Jockey: Leveraging Twelve Labs APIs and LangGraph for Advanced Video Processing

Overview of Twelve Labs APIs

LangGraph v0.1 and LangGraph Cloud Launch

How Jockey Leverages LangGraph and Twelve Labs APIs

Jockey Architecture Overview

Customizing Jockey