Skills › DevOps & Infrastructure › Observability & tracing
add-tracing
Add OpenTelemetry tracing spans to Clojure code following Metabase tracing conventions. Use when instrumenting backend code with trace coverage.
The full skill
—
name: add-tracing
description: Add OpenTelemetry tracing spans to Clojure code following Metabase tracing conventions. Use when instrumenting backend code with trace coverage.
—
# Add Tracing Spans to Clojure Code
This skill helps you add OpenTelemetry (OTel) tracing spans to the Metabase backend codebase using the custom `tracing/with-span` macro.
## Reference Files
– `src/metabase/tracing/core.clj` – `with-span` macro, group registry, SDK lifecycle, `best-effort-sanitize-sql`, Pyroscope integration
– `src/metabase/task/impl.clj` – `defjob` macro that wraps Quartz jobs with root spans
– `.clj-kondo/config/modules/config.edn` – Module boundary configuration
## Module Architecture
The tracing module has a deliberately minimal API surface. **Only 2 namespaces are public** (listed in `:api` in the module config):
| Namespace | Role | Status |
|—|—|—|
| `tracing.core` | Primary API: `with-span`, groups, SDK lifecycle, Pyroscope, MDC, `best-effort-sanitize-sql` | **Public API** |
| `tracing.init` | Side-effect loader for `quartz` and `settings` | **Public API** (init convention) |
| `tracing.attributes` | `best-effort-sanitize-sql` implementation (re-exported via `tracing.core`) | Internal |
| `tracing.settings` | Setting definitions (`MB_TRACING_*` env vars) | Internal |
| `tracing.quartz` | Quartz JDBC proxy + JobListener | Internal |
**Rules:**
– Only require `[metabase.tracing.core :as tracing]` from outside the module. `tracing/best-effort-sanitize-sql` and all other public functions are available from this single namespace.
– Do not add new API namespaces. Add new public functions to `tracing.core` instead.
– Do not require internal namespaces (`tracing.attributes`, `tracing.settings`, `tracing.quartz`) from outside the module.
– `:uses :any` on the `core` module does NOT bypass the target module's `:api` check — internal namespaces are still enforced.
### Cyclic Dependency Avoidance
`tracing/core.clj` is required by many modules across the codebase. It **must NOT** compile-time require `tracing.settings`, as this creates transitive cyclic load dependencies (e.g., `settings/core -> tracing/settings -> tracing/core -> events/impl -> events/core`).
Instead, `tracing/core.clj` uses `requiring-resolve` for settings access:
“`clojure
;; CORRECT — lazy runtime resolution, no compile-time dependency
((requiring-resolve 'metabase.tracing.settings/tracing-enabled))
;; WRONG — creates cyclic load dependency
(require '[metabase.tracing.settings :as settings])
(settings/tracing-enabled)
“`
External library namespaces (clj-otel API, SDK, exporters) are safe to require normally — they don't participate in Metabase namespace cycles.
**Important:** `requiring-resolve` must use **literal quoted symbols**. Kondo hooks validate that `required-namespaces` are all simple symbols, so dynamic construction fails:
“`clojure
;; CORRECT — literal quoted symbol
(requiring-resolve 'metabase.tracing.settings/tracing-endpoint)
;; WRONG — kondo hook rejects this: "Assert failed: (every? simple-symbol? required-namespaces)"
(requiring-resolve (symbol "metabase.tracing.settings" "tracing-endpoint"))
“`
## Quick Checklist
When adding tracing spans:
– [ ] Module has `tracing` in its `:uses` set in `.clj-kondo/config/modules/config.edn`
– [ ] Added `[metabase.tracing.core :as tracing]` to ns requires (alphabetically sorted)
– [ ] Span wraps a meaningful I/O boundary (not pure computation)
– [ ] Group matches the domain (check `src/metabase/tracing/core.clj` for registered groups; add a new one if none fit)
– [ ] Span name follows dot-notation convention (`"domain.subsystem.operation"`)
– [ ] Attributes use namespaced keywords (`:search/query-length`, `:db/id`)
– [ ] No sensitive data in attributes (use `best-effort-sanitize-sql` for HoneySQL, never raw SQL)
– [ ] No new tracing namespaces created (add to `tracing.core` instead)
– [ ] No `DO_NOT_ADD_NEW_FILES_HERE.txt` violations in the target directory
– [ ] Run `clj-kondo –lint <files>` to verify 0 errors, 0 warnings
– [ ] Add or update tests in the corresponding `test/` path (see Testing section below)
– [ ] Run tests: `clojure -X:dev:test :only <test-ns>`
## The `with-span` Macro
“`clojure
(tracing/with-span group span-name attrs & body)
“`
– **group** – A keyword selecting which trace group this span belongs to (e.g., `:tasks`, `:sync`)
– **span-name** – A string identifying the span in traces (e.g., `"search.execute"`)
– **attrs** – A map of span attributes (e.g., `{:db/id 42}`)
– **body** – The code to execute inside the span
**When disabled:** zero overhead — single atom deref + boolean check, body runs directly.
**When enabled:** creates OTel span AND injects `trace_id`/`span_id` into Log4j2 MDC for log-to-trace correlation.
## Trace Groups
Groups are registered in `src/metabase/tracing/core.clj`. Check that file for the current list. The general rule: **match the group to the domain, not the call site.** If code runs inside a Quartz job but is logically search work, use `:search`, not `:tasks`.
To add a new group:
“`clojure
;; In src/metabase/tracing/core.clj
(register-group! :my-domain "Description of what this covers")
“`
Users enable groups via `MB_TRACING_GROUPS=tasks,search,sync` (comma-separated, or `"all"`).
## Naming Conventions
### Span Names
Use dot-separated hierarchical names: `"domain.subsystem.operation"`. The domain prefix should match the group name:
“`
search.execute — `:search` group
sync.fingerprint.table — `:sync` group
task.session-cleanup.delete — `:tasks` group
db-app.collection-items — `:db-app` group
“`
### Attributes
Use namespaced keywords. The namespace groups related attributes:
“`clojure
:db/id — Database ID (integer)
:db/engine — Database engine name (string)
:db/statement — Sanitized SQL (string, via best-effort-sanitize-sql)
:search/engine — Search engine name (string)
:search/query-length — Query string length (integer)
:sync/table — Table name (string)
:sync/step — Sync step name (string)
:task/name — Task name (string)
:http/method — HTTP method (string)
:http/url — Request URL (string)
“`
Invent new namespaced attributes as needed (e.g., `:pulse/id`, `:transform/count`). Keep values as primitives (strings, numbers, booleans) — no maps or collections.
## Step-by-Step: Adding a Span
### 1. Check module boundaries
Look up the module for your namespace in `.clj-kondo/config/modules/config.edn`. If `tracing` is not in the module's `:uses` set, add it (keep alphabetically sorted):
“`edn
my-module
{:team "MyTeam"
:uses #{analytics config tracing util}}
“`
### 2. Add the require
“`clojure
(ns metabase.my-module.thing
(:require
[metabase.tracing.core :as tracing]
[metabase.util :as u]))
“`
`best-effort-sanitize-sql` is available from `tracing.core` — no additional require needed.
### 3. Identify the I/O boundary
Only wrap code at **meaningful I/O boundaries**:
**DO trace:**
– External API calls (embedding APIs, metabot, webhooks)
– Database queries (both app DB and user DB)
– Network requests (HTTP calls to external services)
– Heavy batch processing (batch indexing, batch embedding)
– Top-level orchestration functions that coordinate multiple sub-operations
**DO NOT trace:**
– Pure computation (sorting, filtering, mapping)
– Simple single-row lookups (`t2/select-one :model/Setting :key k`)
– Every function in a call chain (only boundaries matter)
– Trivial operations (string formatting, hash calculations)
### 4. Wrap with `with-span`
“`clojure
;; Simple span (no attributes needed)
(tracing/with-span :search "search.init-index" {}
(do-expensive-thing))
;; Span with static attributes
(tracing/with-span :sync "sync.fingerprint.table"
{:db/id (:db_id table)
:sync/table (:name table)}
(fingerprint-fields! table fields))
;; Span with computed attributes
(tracing/with-span :search "search.execute"
{:search/engine (name (:search-engine ctx))
:search/query-length (count (:search-string ctx))}
(search.engine/results ctx))
;; Span with sanitized SQL (for dynamic HoneySQL queries)
(let [hsql {:delete-from [(t2/table-name :model/Session)]
:where [:< :created_at oldest-allowed]}]
(tracing/with-span :tasks "task.session-cleanup.delete"
{:db/statement (tracing/best-effort-sanitize-sql hsql)}
(t2/query-one hsql)))
;; Sub-spans breaking a function into I/O phases
(let [embedding (tracing/with-span :search "search.semantic.embedding"
{:search.semantic/provider (:provider model)}
(get-embedding model search-string))
results (tracing/with-span :search "search.semantic.db-query" {}
(into [] xform reducible))]
(process results))
;; Per-item iteration spans
(doseq [e (search.engine/active-engines)]
(tracing/with-span :search "search.ingestion.update" {:search/engine (name e)}
(search.engine/update! e batch)))
“`
### 5. Add tests
Create or update tests in the corresponding `test/` path. Follow the patterns in existing tracing tests:
– **Reference tests:** `test/metabase/tracing/quartz_test.clj`, `test/metabase/server/middleware/trace_test.clj`
– Use `tracing/init-enabled-groups!` / `tracing/shutdown-groups!` with `try`/`finally` to manage group lifecycle
– Test both enabled and disabled paths (verify zero overhead when group is off)
– Use `reify` mocks for Java interfaces (Connection, PreparedStatement, JobListener, etc.)
– Add `(set! *warn-on-reflection* true)` and type-hint proxy/reify calls to avoid reflection warnings
“`clojure
(deftest my-span-enabled-test
(testing "when group is enabled, span is created"
(try
(tracing/init-enabled-groups! "my-group" "INFO")
;; … test that span behavior occurs …
(finally
(tracing/shutdown-groups!)))))
(deftest my-span-disabled-test
(testing "when group is disabled, code runs without tracing"
(tracing/shutdown-groups!)
;; … test that code still works, no wrapping applied …
))
“`
### 6. Lint and run tests
“`bash
# Lint modified source and test files — expect 0 errors, 0 warnings
clj-kondo –lint path/to/modified/file.clj path/to/test/file.clj
# Run tests (requires Java 21+)
clojure -X:dev:test :only my-ns.test-ns
“`
Expect: all tests pass, 0 failures, 0 errors, no reflection warnings from your files.
## Sanitizing SQL for Attributes
When including SQL in span attributes, **always** use `tracing/best-effort-sanitize-sql`. This converts HoneySQL maps to parameterized SQL strings where values become `?` placeholders — no data leaks.
“`clojure
(let [hsql {:delete-from [:core_session]
:where [:< :created_at some-timestamp]}]
(tracing/with-span :tasks "task.cleanup.delete"
{:db/statement (tracing/best-effort-sanitize-sql hsql)}
(t2/query-one hsql)))
;; Trace attribute: db/statement = "DELETE FROM core_session WHERE created_at < ?"
“`
**Rules:**
– Never put raw SQL strings or user-provided values in attributes
– Use `best-effort-sanitize-sql` only for app DB (HoneySQL) queries
– For external/user DB queries, trace only timing and counts, not SQL content
## Defjob and Root Spans
The `defjob` macro in `metabase.task.impl` automatically wraps every Quartz job with a `:tasks` root span:
“`clojure
(task/defjob ^{DisallowConcurrentExecution true} SessionCleanup [_]
(cleanup-sessions!))
;; Automatically creates span: "task.SessionCleanup" {:task/name "SessionCleanup"}
“`
You do NOT need a root span inside `defjob` bodies. Add **child spans** for I/O inside the job.
For code on plain `Thread`s (not Quartz), add the root span manually:
“`clojure
(defn init! []
(tracing/with-span :search "search.task.init" {}
(search/init-index!)))
“`
## What NOT to Do
### Span Usage Mistakes
“`clojure
;; WRONG – pure computation, no I/O
(tracing/with-span :search "search.format-results" {}
(map format-result results))
;; WRONG – trivial single-row lookup
(tracing/with-span :db-app "db-app.get-setting" {}
(t2/select-one :model/Setting :key "my-setting"))
;; WRONG – raw SQL in attributes (data leak)
(tracing/with-span :tasks "task.cleanup" {:db/statement raw-sql-string}
(execute! raw-sql-string))
;; WRONG – wrong group (search work should use :search, not :tasks)
(tracing/with-span :tasks "search.execute" {} …)
;; WRONG – redundant nesting (do-search already has a span)
(tracing/with-span :search "search.process" {}
(let [results (do-search ctx)]
(tracing/with-span :search "search.format" {}
(format-results results))))
“`
### Architecture Mistakes
“`clojure
;; WRONG – creating a new tracing namespace
(ns metabase.tracing.my-feature …)
;; WRONG – requiring internal tracing namespaces from outside the module
(ns metabase.my-module.thing
(:require [metabase.tracing.attributes :as trace-attrs] ;; internal!
[metabase.tracing.settings :as tracing.settings] ;; internal!
[metabase.tracing.quartz :as tracing.quartz])) ;; internal!
;; WRONG – adding compile-time requires to tracing/core.clj for settings or SDK
;; This creates cyclic load dependencies because tracing/core is widely required
(ns metabase.tracing.core
(:require [metabase.tracing.settings :as settings])) ;; causes cycle!
;; WRONG – dynamic symbol construction with requiring-resolve (kondo rejects it)
(requiring-resolve (symbol "metabase.tracing.settings" "tracing-enabled"))
“`
## Configuration
All settings are env-var-only (defined in `src/metabase/tracing/settings.clj`):
“`bash
# Core
MB_TRACING_ENABLED=true # Enable tracing (default: false)
MB_TRACING_ENDPOINT=host:4317 # OTLP collector endpoint (default: http://localhost:4317)
MB_TRACING_GROUPS=tasks,search,sync # Comma-separated groups or "all" (default: all)
MB_TRACING_SERVICE_NAME=metabase # Service name in traces (default: hostname)
MB_TRACING_LOG_LEVEL=DEBUG # Log threshold for traced threads: TRACE/DEBUG/INFO (default: INFO)
# Batch span processor tuning
MB_TRACING_MAX_QUEUE_SIZE=2048 # Max spans queued for export; drops when full (default: 2048)
MB_TRACING_EXPORT_TIMEOUT_MS=10000 # Max wait for batch export to complete (default: 10000)
MB_TRACING_SCHEDULE_DELAY_MS=5000 # Delay between consecutive batch exports (default: 5000)
“`