Vision Query Ingest
Overview
Section titled “Overview”Vision Query lets you describe what you’re looking for when ingesting footage, and the engine will only keep clips that match your description. Instead of ingesting everything and sorting later, you tell the AI what matters up front.
This uses the same OpenCLIP model that powers the Driver System — your text query is encoded into a 768-dimensional CLIP vector and compared against every clip’s visual embedding using cosine similarity.
How to Use
Section titled “How to Use”- Open the 📥 INGEST dialog from the Video Bin toolbar
- Select your footage folder
- In the Vision Query section, type a description of the clips you want to keep:
"two men fighting""sunset over water""close-up of face""car drifting on a track"
- Adjust the Strictness slider (optional)
- Click RUN INGEST
The engine will analyze all clips as usual, but at the end of each video’s ingest, it compares every clip’s embedding against your query and drops the ones that don’t match.
Strictness Slider
Section titled “Strictness Slider”The Strictness slider controls how closely clips must match your description:
| Slider Value | Behavior |
|---|---|
| 0.10 (leftmost) | Very loose — keeps most clips that are even vaguely related |
| 0.26 (default) | Balanced — keeps clips with clear visual relevance |
| 0.40 (rightmost) | Very strict — only keeps clips that strongly match |
How It Works
Section titled “How It Works”Under the hood, Vision Query uses a two-stage filtering approach:
- Adaptive threshold: The engine computes the similarity distribution (mean + 0.5 × standard deviation) across all clips for your query, ensuring the top-matching clips are selected relative to the batch.
- Hard floor: Your Strictness slider value acts as an absolute minimum — clips below this score are never included, regardless of the adaptive threshold.
This means the filter adapts to your footage. If many clips match your query well, the adaptive threshold rises and keeps only the best. If few clips match, it still finds the closest ones (as long as they meet the hard floor).
Example Workflow
Section titled “Example Workflow”Scenario: You have 10 hours of nature documentary footage and want to build a library of only the ocean shots.
- Click 📥 INGEST → select your footage folder
- Vision Query:
"ocean waves, underwater, beach, coral reef" - Strictness:
0.22(slightly loose to catch variety) - Run ingest
Result: Instead of 3,000 clips covering forests, mountains, and oceans, your library contains only the ~400 clips that visually match ocean/water content. Ready for editing immediately.
Combining with Collections
Section titled “Combining with Collections”Vision Query and Collections serve different filtering purposes:
- Vision Query filters at ingest time — clips that don’t match never enter your library
- Collections filter at render time — all clips are in the library, but renders use only clips from a specific collection
You can combine both: ingest with a vision query to build a focused library, then further organize with collections for different projects.