Ferment Engine

Perception and prediction engine.

Native media perception runtime — audio, visual, language, search — for custom pipelines. Live on mobile, deep on server.

Born backstage.

It started there: near zero-latency perceptor for live shows. The stage taught machines to listen.

Signals, by family.

Audio

  • Beat, downbeat and tempo grid
  • Structure, instruments, highlights, genres
  • Speech, singing and music segments
  • Open-vocabulary stems
  • Drop, breakdown and energy prediction

Visual

  • Shot and scene boundaries
  • Face detection, identity, clustering, masking
  • Open-vocabulary object detection, tracking, masking
  • Optical flow and motion energy
  • Saliency

Language

  • Transcription
  • Understanding

Search

  • Multimodal semantic search
  • Cross-encoder reranking
  • Unified index

Proof,
in production.

Everything Cuts does begins here — every beat found, every word timed, every scene cut. Engine isn't a roadmap. It ships, every day, inside an app you can hold.

Cuts — production build

Engineering your own
media pipeline?