AssemblyAI Unveils Enhanced PII Redaction and Entity Detection Features


Jessie
A
Ellis


Jul
26,
2024
05:52

AssemblyAI
introduces
advanced
PII
Redaction
in
47
languages
and
adds
16
new
entity
types
to
its
Entity
Detection
model,
ensuring
99%
accuracy.

AssemblyAI Unveils Enhanced PII Redaction and Entity Detection Features

AssemblyAI
has
announced
significant
upgrades
to
its
PII
Redaction
and
Entity
Detection
features,
aimed
at
enhancing
data
security
and
extracting
key
insights
from
audio
transcripts.
According
to

AssemblyAI
,
the
latest
updates
include
support
for
PII
Text
Redaction
across
47
languages
and
the
addition
of
16
new
entity
types
to
its
Entity
Detection
model,
bringing
the
total
to
44.

Enhanced
PII
Redaction
Capabilities

The
updated
PII
Text
Redaction
feature
now
supports
47
languages,
ensuring
comprehensive
protection
of
personally
identifiable
information
(PII)
across
diverse
regions.
This
upgrade
allows
users
to
identify
and
remove
sensitive
data
such
as
addresses,
phone
numbers,
and
credit
card
details
from
their
transcripts.
Additionally,
users
can
generate
transcripts
with
PII
removed
or
use
the
tool
to “beep
out”
sensitive
information
in
audio
files.

An
example
of
how
to
use
the
API
for
PII
redaction
is
provided
by
AssemblyAI:

import assemblyai as aai aai.settings.api_key = "YOUR API KEY" audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3" config = aai.TranscriptionConfig(speaker_labels=True).set_redact_pii( policies=[ aai.PIIRedactionPolicy.person_name, aai.PIIRedactionPolicy.organization, aai.PIIRedactionPolicy.occupation, ], substitution=aai.PIISubstitutionPolicy.hash,
) transcript = aai.Transcriber().transcribe(audio_url, config) for utterance in transcript.utterances: print(f"Speaker {utterance.speaker}: {utterance.text}") print(transcript.text)

Users
can
refer
to
AssemblyAI’s
documentation
for
more
detailed
examples
and
an
in-depth
dive
into
the
updates.

Expanded
Entity
Detection

The
Entity
Detection
model
has
been
upgraded
with
16
new
entity
types,
allowing
for
the
automatic
identification
and
categorization
of
critical
information
in
transcripts.
This
brings
the
total
number
of
supported
entity
types
to
44,
which
includes
names,
organizations,
addresses,
and
more.
The
model
ensures
99%
accuracy
in
major
languages,
making
it
a
robust
tool
for
extracting
valuable
insights
from
audio
data.

An
example
of
how
to
use
the
API
for
Entity
Detection
is
also
provided:

import assemblyai as aai aai.settings.api_key = "YOUR API KEY" audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3" config = aai.TranscriptionConfig(entity_detection=True) transcript = aai.Transcriber().transcribe(audio_url, config) for entity in transcript.entities: print(entity.text) print(entity.entity_type) print(f"Timestamp: {entity.start} - {entity.end}\n")

Additional
Resources

AssemblyAI
has
also
shared
several
new
blog
posts
and
tutorials
to
help
users
get
the
most
out
of
their
products.
Topics
include
using
Claude
3.5
Sonnet
with
audio
data,
understanding
Microsoft’s
Florence-2
image
model,
and
creating
a
real-time
language
translation
service
with
AssemblyAI
and
DeepL
in
JavaScript.

For
more
information
on
these
updates
and
to
explore
additional
resources,
visit
AssemblyAI’s
official
blog.

Image
source:
Shutterstock

Comments are closed.