About¶
Audioneex is an audio content recognition (ACR) engine providing audio fingerprinting technology specifically designed for real-time applications. It is general purpose, based on content-agnostic algorithms and runs on all kinds of machines, from big servers to mobile and embedded devices.
What it is for¶
ACR systems can be used in a variety of scenarios, such as broadcast monitoring, over-the-air (OTA) identification, content synchronization, second screen, audio surveillance, etc. Audio content identification and management technology finds applications in a wide range of industries. Following are a few examples of the most common use cases:
User engagement
Copyright management
Advertisement
Piracy detection
Law enforcement
Audience metering
Surveillance & Safety
Audioneex provides the core technology for such applications in the form of a C++ cross-platform API that can be integrated as a backend component in web services, mobile and desktop apps, embedded systems and more.
Features¶
Highly efficient fingerprinting - The fingerprints generation is much faster than real-time even on low-end hardware and the resulting fingerprints are very small in size. On average, one hour of audio will be encoded in less than 1 MB (in uncompressed form).
Fast recognitions - It only requires a few seconds of audio to perform an identification (3-4 seconds on average for moderately distorted audio) making it suitable for real-time applications.
Cross-platform API - Implemented in standard C++ to guarantee the high performances that only native code can achieve, while providing portability to any platform with a modern C++ compiler (version 11 and above).
Content-agnostic recognition - The core algorithms are independent of the nature of the audio to be recognized, allowing the identification of basically any kind of content, from music, to TV and Radio shows, movies, commercials, news, and even generic sounds.
Flexible recognition system - By providing several parameters that can be set through the API, the engine allows fine-tuning of performances based on the kind of application at hand.
Mobile & IoT-ready - Its efficient algorithms (and a tiny binary of just a few hundred kB) make it suitable for devices with limited resources, such as mobile and embedded platforms, for on-device ACR.
Database-neutral - Designed to be independent of specific storage solutions, it does not lock you into a pre-determined technology and can be used with many databases by rewriting the drivers.
Architecture¶
The architecture is extremely modular, with three main interfaces that abstract
access to most of the functionality of the engine: Recognizer
, Indexer
,
DataStore
and AudioProvider
.

The Recognizer
can be considered as the front-end to all of the identification
functionality. It deals with the collection of audio data from the clients,
dispatching of the audio to the Fingerprinter
for fingerprint extraction,
dispatching of the extracted fingerprints to the Matcher
to initiate the search
of the best candidates, analysis of the results returned by the matching process
and production of the final results.
The Indexer
provides access to functionality concerning the generation
of the reference fingerprints. It deals with the collection of audio data from
client applications, initiating the fingerprinting process, processing the
resulting fingerprints into a format suitable for quick searches and storing the
data into the appropriate structures. In the context of the Audioneex engine,
all these processes collectively are referred to as “indexing”, and the outcome
is the generation of the “reference database”, which is the first step to take
before using the engine for any recognition operation.
The Datastore
interface provides an abstraction layer over the data storage
functionality by exposing a specification that clients can follow in order to
interface the engine with different data stores.
This approach allows decoupling from vendor-specific solutions, with the
consequence of providing a lot of flexibility in the choice of this crucial
component depending on the application.
For example, a web service may need to use a database based on a client-server
architecture, whereas an embedded system may require an in-process database for
on-device recognitions. Data access drivers for two popular high-performance
databases are provided out-of-the-box and can be used straight away.
The AudioProvider
interface is used to connect the engine to a source of audio
data by means of a callback mechanism. It is simply a way to create “listener”
objects to be registered with the engine so that it can get the audio to be
processed.