Skill

SkillsContent & Creative › Video & audio

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

Freerisk: medium
videodbwebhookpythonsquareffmpeg

Tools: read grep glob bash(python:*)

The full skill

— name: videodb description: See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture. origin: ECC allowed-tools: Read Grep Glob Bash(python:*) argument-hint: "[task description]" — # VideoDB Skill **Perception + memory + actions for video, live streams, and desktop sessions.** ## When to use ### Desktop Perception – Start/stop a **desktop session** capturing **screen, mic, and system audio** – Stream **live context** and store **episodic session memory** – Run **real-time alerts/triggers** on what's spoken and what's happening on screen – Produce **session summaries**, a searchable timeline, and **playable evidence links** ### Video ingest + stream – Ingest a **file or URL** and return a **playable web stream link** – Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio** ### Index + search (timestamps + evidence) – Build **visual**, **spoken**, and **keyword** indexes – Search and return exact moments with **timestamps** and **playable evidence** – Auto-create **clips** from search results ### Timeline editing + generation – Subtitles: **generate**, **translate**, **burn-in** – Overlays: **text/image/branding**, motion captions – Audio: **background music**, **voiceover**, **dubbing** – Programmatic composition and exports via **timeline operations** ### Live streams (RTSP) + monitoring – Connect **RTSP/live feeds** – Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows ## How it works ### Common inputs – Local **file path**, public **URL**, or **RTSP URL** – Desktop capture request: **start / stop / summarize session** – Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules ### Common outputs – **Stream URL** – Search results with **timestamps** and **evidence links** – Generated assets: subtitles, audio, images, clips – **Event/alert payloads** for live streams – Desktop **session summaries** and memory entries ### Running Python code Before running any VideoDB code, change to the project directory and load environment variables: “`python from dotenv import load_dotenv load_dotenv(".env") import videodb conn = videodb.connect() “` This reads `VIDEO_DB_API_KEY` from: 1. Environment (if already exported) 2. Project's `.env` file in current directory If the key is missing, `videodb.connect()` raises `AuthenticationError` automatically. Do NOT write a script file when a short inline command works. When writing inline Python (`python -c "…"`), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead: “`bash python << 'EOF' from dotenv import load_dotenv load_dotenv(".env") import videodb conn = videodb.connect() coll = conn.get_collection() print(f"Videos: {len(coll.get_videos())}") EOF “` ### Setup When the user asks to "setup videodb" or similar: ### 1. Install SDK “`bash pip install "videodb[capture]" python-dotenv “` If `videodb[capture]` fails on Linux, install without the capture extra: “`bash pip install videodb python-dotenv “` ### 2. Configure API key The user must set `VIDEO_DB_API_KEY` using **either** method: – **Export in terminal** (before starting Claude): `export VIDEO_DB_API_KEY=your-key` – **Project `.env` file**: Save `VIDEO_DB_API_KEY=your-key` in the project's `.env` file Get a free API key at [console.videodb.io](https://console.videodb.io) (50 free uploads, no credit card). **Do NOT** read, write, or handle the API key yourself. Always let the user set it. ### Quick Reference ### Upload media “`python # URL video = coll.upload(url="https://example.com/video.mp4") # YouTube video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID") # Local file video = coll.upload(file_path="/path/to/video.mp4") “` ### Transcript + subtitle “`python # force=True skips the error if the video is already indexed video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle() “` ### Search inside videos “`python from videodb.exceptions import InvalidRequestError video.index_spoken_words(force=True) # search() raises InvalidRequestError when no results are found. # Always wrap in try/except and treat "No results found" as empty. try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise “` ### Scene search “`python import re from videodb import SearchType, IndexType, SceneExtractionType from videodb.exceptions import InvalidRequestError # index_scenes() has no force parameter — it raises an error if a scene # index already exists. Extract the existing index ID from the error. try: scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content in this scene.", ) except Exception as e: match = re.search(r"id\s+([a-f0-9]+)", str(e)) if match: scene_index_id = match.group(1) else: raise # Use score_threshold to filter low-relevance noise (recommended: 0.3+) try: results = video.search( query="person writing on a whiteboard", search_type=SearchType.semantic, index_type=IndexType.scene, scene_index_id=scene_index_id, score_threshold=0.3, ) shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise “` ### Timeline editing **Important:** Always validate timestamps before building a timeline: – `start` must be >= 0 (negative values are silently accepted but produce broken output) – `start` must be < `end` – `end` must be <= `video.length` “`python from videodb.timeline import Timeline from videodb.asset import VideoAsset, TextAsset, TextStyle timeline = Timeline(conn) timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30)) timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36))) stream_url = timeline.generate_stream() “` ### Transcode video (resolution / quality change) “`python from videodb import TranscodeMode, VideoConfig, AudioConfig # Change resolution, quality, or aspect ratio server-side job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/webhook", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"), audio_config=AudioConfig(mute=False), ) “` ### Reframe aspect ratio (for social platforms) **Warning:** `reframe()` is a slow server-side operation. For long videos it can take several minutes and may time out. Best practices: – Always limit to a short segment using `start`/`end` when possible – For full-length videos, use `callback_url` for async processing – Trim the video on a `Timeline` first, then reframe the shorter result “`python from videodb import ReframeMode # Always prefer reframing a short segment: reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart) # Async reframe for full-length videos (returns None, result via webhook): video.reframe(target="vertical", callback_url="https://example.com/webhook") # Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9) reframed = video.reframe(start=0, end=60, target="square") # Custom dimensions reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720}) “` ### Generative media “`python image = coll.generate_image( prompt="a sunset over mountains", aspect_ratio="16:9", ) “` ## Error handling “`python from videodb.exceptions import AuthenticationError, InvalidRequestError try: conn = videodb.connect() except AuthenticationError: print("Check your VIDEO_DB_API_KEY") try: video = coll.upload(url="https://example.com/video.mp4") except InvalidRequestError as e: print(f"Upload failed: {e}") “` ### Common pitfalls | Scenario | Error message | Solution | |———-|————–|———-| | Indexing an already-indexed video | `Spoken word index for video already exists` | Use `video.index_spoken_words(force=True)` to skip if already indexed | | Scene index already exists | `Scene index with id XXXX already exists` | Extract the existing `scene_index_id` from the error with `re.search(r"id\s+([a-f0-9]+)", str(e))` | | Search finds no matches | `InvalidRequestError: No results found` | Catch the exception and treat as empty results (`shots = []`) | | Reframe times out | Blocks indefinitely on long videos | Use `start`/`end` to limit segment, or pass `callback_url` for async | | Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` | | `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits | ## Examples ### Canonical prompts – "Start desktop capture and alert when a password field appears." – "Record my session and produce an actionable summary when it ends." – "Ingest this file and return a playable stream link." – "Index this folder and find every scene with people, return timestamps." – "Generate subtitles, burn them in, and add light background music." – "Connect this RTSP URL and alert when a person enters the zone." ### Screen Recording (Desktop Capture) Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only. #### Quick Start 1. **Choose state dir**: `STATE_DIR="${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}"` 2. **Start listener**: `VIDEODB_EVENTS_DIR="$STATE_DIR" python scripts/ws_listener.py –clear "$STATE_DIR" &` 3. **Get WebSocket ID**: `cat "$STATE_DIR/videodb_ws_id"` 4. **Run capture code** (see reference/capture.md for the full workflow) 5. **Events written to**: `$STATE_DIR/videodb_events.jsonl` Use `–clear` whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session. #### Query Events “`python import json import os import time from pathlib import Path events_dir = Path(os.environ.get("VIDEODB_EVENTS_DIR", Path.home() / ".local" / "state" / "videodb")) events_file = events_dir / "videodb_events.jsonl" events = [] if events_file.exists(): with events_file.open(encoding="utf-8") as handle: for line in handle: try: events.append(json.loads(line)) except json.JSONDecodeError: continue transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"] cutoff = time.time() – 300 recent_visual = [ e for e in events if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff ] “` ## Additional docs Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed. – [reference/api-reference.md](reference/api-reference.md) – Complete VideoDB Python SDK API reference – [reference/search.md](reference/search.md) – In-depth guide to video search (spoken word and scene-based) – [reference/editor.md](reference/editor.md) – Timeline editing, assets, and composition – [reference/streaming.md](reference/streaming.md) – HLS streaming and instant playback – [reference/generative.md](reference/generative.md) – AI-powered media generation (images, video, audio) – [reference/rtstream.md](reference/rtstream.md) – Live stream ingestion workflow (RTSP/RTMP) – [reference/rtstream-reference.md](reference/rtstream-reference.md) – RTStream SDK methods and AI pipelines – [reference/capture.md](reference/capture.md) – Desktop capture workflow – [reference/capture-reference.md](reference/capture-reference.md) – Capture SDK and WebSocket events – [reference/use-cases.md](reference/use-cases.md) – Common video processing patterns and examples **Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing). ### When to use what | Problem | VideoDB solution | |———|—————–| | Platform rejects video aspect ratio or resolution | `video.reframe()` or `conn.transcode()` with `VideoConfig` | | Need to resize video for Twitter/Instagram/TikTok | `video.reframe(target="vertical")` or `target="square"` | | Need to change resolution (e.g. 1080p → 720p) | `conn.transcode()` with `VideoConfig(resolution=720)` | | Need to overlay audio/music on video | `AudioAsset` on a `Timeline` | | Need to add subtitles | `video.add_subtitle()` or `CaptionAsset` | | Need to combine/trim clips | `VideoAsset` on a `Timeline` | | Need to generate voiceover, music, or SFX | `coll.generate_voice()`, `generate_music()`, `generate_sound_effect()` | ## Provenance Reference material for this skill is vendored locally under `skills/videodb/reference/`. Use the local copies above instead of following external repository links at runtime.