Ferment Engine
Perception × and prediction engine.
Native media perception runtime — audio, visual, language, search — for custom pipelines. Live on mobile, deep on server.
Born backstage.
It started there: near zero-latency perceptor for live shows. The stage taught machines to listen.
Signals, by family.
Audio
- Beat, downbeat and tempo grid
- Structure, instruments, highlights, genres
- Speech, singing and music segments
- Open-vocabulary stems
- Drop, breakdown and energy prediction
Visual
- Shot and scene boundaries
- Face detection, identity, clustering, masking
- Open-vocabulary object detection, tracking, masking
- Optical flow and motion energy
- Saliency
Language
- Transcription
- Understanding
Search
- Multimodal semantic search
- Cross-encoder reranking
- Unified index
Proof,
in production.
Everything Cuts does begins here — every beat found, every word timed, every scene cut. Engine isn't a roadmap. It ships, every day, inside an app you can hold.
Cuts — production build