Skills › Data & Databases › Data engineering & pipelines
bloblang-authoring
This skill should be used when users need to create or debug Bloblang transformation scripts. Trigger when users ask about transforming data, mapping fields, parsing JSON/CSV/XML, converting timestamps, filtering arrays, or mention "bloblang", "blobl", "mapping processor", or describe any data transformation need like "convert this to that" or "transform my JSON".
The full skill
—
name: bloblang-authoring
description: This skill should be used when users need to create or debug Bloblang transformation scripts. Trigger when users ask about transforming data, mapping fields, parsing JSON/CSV/XML, converting timestamps, filtering arrays, or mention "bloblang", "blobl", "mapping processor", or describe any data transformation need like "convert this to that" or "transform my JSON".
—
# Redpanda Connect Bloblang Script Generator
Create working, tested Bloblang transformation scripts from natural language descriptions.
## Objective
Generate a Bloblang (blobl) script that correctly transforms the user's input data according to their requirements.
The script MUST be tested before presenting it.
## Setup
This skill requires `rpk` `rpk connect`, `python3`, and `jq`.
See the [SETUP](SETUP.md) for installation instructions.
## Tools
### Script format-bloblang.sh
Generates category-organized Bloblang reference files in XML format.
**Run once at the start of each session** before searching for functions/methods.
“`bash
# Usage:
./resources/scripts/format-bloblang.sh
“`
– No arguments
– Generates category files organized by type (e.g., `functions-General.xml`, `methods-String_Manipulation.xml`)
– Outputs generated files to a versioned directory
– Outputs the directory path to stdout (capture in `BLOBLREF_DIR` variable for later use)
– Each XML file contains structured function/method definitions with parameters, descriptions, and examples
#### Functions
Generated function files have `functions-<Category>.xml` names and contain functions relevant to that category.
– `functions-Encoding.xml` – Schema registry headers
– `functions-Environment.xml` – Environment vars, files, timestamps, hostname
– `functions-Fake_Data_Generation.xml` – Fake data generation
– `functions-General.xml` – Bytes, counter, deleted, ksuid, nanoid, uuid, random, range, snowflake
– `functions-Message_Info.xml` – Batch index, content, error, metadata, span links, tracing IDs
– etc.
**The `function` XML tag format:**
– `name` attribute – function name
– `params` attribute – comma-separated list of parameters with types, format `<name>:<type>` or empty string if no parameters
– body – description of function purpose and usage
– `example` XML subtag
– `summary` attribute (optional) – brief description of the example
– body – code block demonstrating usage
Example function definition:
“`xml
<function name="random_int" params="seed:query expression, min:integer, max:integer">
Generates a pseudo-random non-negative 64-bit integer.
Use this for creating random IDs, sampling data, or generating test values.
Provide a seed for reproducible randomness, or use a dynamic seed like `timestamp_unix_nano()` for unique values per mapping instance.
Optional `min` and `max` parameters constrain the output range (both inclusive).
For dynamic ranges based on message data, use the modulo operator instead: `random_int() % dynamic_max + dynamic_min`.
<example>
root.first = random_int()
root.second = random_int(1)
root.third = random_int(max:20)
root.fourth = random_int(min:10, max:20)
root.fifth = random_int(timestamp_unix_nano(), 5, 20)
root.sixth = random_int(seed:timestamp_unix_nano(), max:20)
</example>
<example summary="Use a dynamic seed for unique random values per mapping instance.">
root.random_id = random_int(timestamp_unix_nano())
root.sample_percent = random_int(seed: timestamp_unix_nano(), min: 0, max: 100)
</example>
</function>
“`
#### Methods
Generated method files have `methods-<Category>.xml` names and contain methods relevant to that category.
– `methods-Encoding_and_Encryption.xml` – Base64, compression, hashing, encryption
– `methods-General.xml` – Basic operations, type checking
– `methods-GeoIP.xml` – GeoIP lookups
– `methods-JSON_Web_Tokens.xml` – JWT operations
– `methods-Number_Manipulation.xml` – Arithmetic, rounding, formatting
– `methods-Object___Array_Manipulation.xml` – Filtering, mapping, sorting, merging
– `methods-Parsing.xml` – JSON, CSV, XML, protocol buffer parsing
– `methods-Regular_Expressions.xml` – Regex matching and replacement
– `methods-SQL.xml` – SQL operations
– `methods-String_Manipulation.xml` – Case, trimming, splitting, formatting
– `methods-Timestamp_Manipulation.xml` – Parsing, formatting, timezone conversion
– `methods-Type_Coercion.xml` – Type conversions
– etc.
**The `method` XML tag format:**
– `name` attribute – function name
– `params` attribute – comma-separated list of parameters with types, format `<name>:<type>` or empty string if no parameters
– body – description of function purpose and usage
– `example` XML subtag
– `summary` attribute (optional) – brief description of the example
– body – code block demonstrating usage
Example method definition:
“`xml
<method name="ts_format" params="format:string, tz:string">
Formats a timestamp into a string using the specified format layout.
<example>
root.formatted = this.timestamp.ts_format("2006-01-02T15:04:05Z07:00")
</example>
</method>
“`
### Grep Search
Lists Available functions and methods without loading full files.
“`bash
# List all available functions and methods by name
grep -hE '<(function|method) name=' "$BLOBLREF_DIR"
# Search by keyword (searches names, descriptions, params, examples)
grep -i "timestamp" "$BLOBLREF_DIR"
# Search by parameter name (e.g., find all with "format" parameter)
grep 'params="[^"]*format' "$BLOBLREF_DIR"
“`
– Requires `BLOBLREF_DIR` set to the directory output by `format-bloblang.sh`
### Script test-blobl.sh
Tests a Bloblang script against input data.
Executes the transformation and returns results or errors.
Can be run repeatedly during iteration.
“`bash
# Usage:
./resources/scripts/test-blobl.sh <target-directory>
“`
– Requires `data.json` (input) and `script.blobl` (transformation) in the target directory
– Returns transformed data or error messages
## Bloblang
**Bloblang** (blobl) is Redpanda Connect's native mapping language for transforming message data.
It's designed for readability and safely reshaping documents of any structure.
### Core Concepts
**Assignment**: Create new documents by assigning values to paths.
– `root` = the new document being created
– `this` = the input document being read
“`bloblang
# Copy entire input
root = this
# Create specific fields
root.id = this.thing.id
root.type = "processed"
# In: {"thing":{"id":"abc123"}}
# Out: {"id":"abc123","type":"processed"}
“`
**Field Paths**: Use dot notation for nested fields. Use quotes for special characters:
“`bloblang
root.user.name = this.customer.full_name
root."foo.bar".baz = this."field with spaces"
“`
**Literals**: Numbers, booleans, strings, null, arrays, and objects:
“`bloblang
root = {
"count": 42,
"active": true,
"items": ["a", "b", "c"],
"nested": {"key": "value"}
}
“`
### Functions and Methods
**Functions** generate values (no target needed):
“`bloblang
root.id = uuid_v4()
root.timestamp = now()
root.hostname = hostname()
“`
**Methods** transform values (called on a target with `.`):
“`bloblang
root.upper = this.name.uppercase()
root.formatted = this.date.ts_parse("2006-01-02").ts_format("Mon Jan 2")
root.sorted = this.items.sort()
“`
Methods can be chained:
“`bloblang
root.clean = this.text.trim().lowercase().replace_all("_", "-")
“`
Methods require a target (called with `.`), while functions do not.
Check the XML reference files to determine correct usage:
“`bloblang
# Bad: floor() is a method, not a function
root.rounded = floor(this.value) # Error: floor is not a function
# Good: Call floor() as a method on a value
root.rounded = this.value.floor()
# Bad: uuid_v4() is a function, not a method
root.id = this.uuid_v4() # Error: uuid_v4 is not a method
# Good: Call uuid_v4() as a function
root.id = uuid_v4()
“`
**Discovering Available Functions & Methods**
Bloblang provides hundreds of functions and methods organized into categories.
Start with these **foundational categories** that cover common use cases:
– `functions-General.xml` – Core utility functions (uuid_v4, timestamp, random, etc.)
– `functions-Message_Info.xml` – Message metadata access (hostname, env, content_type, etc.)
– `methods-General.xml` – Universal transformations (type conversions, existence checks, etc.)
For specialized needs, consult **domain-specific categories**: strings (uppercase, trim, regexp), timestamps (ts_parse, ts_format), arrays (map_each, filter), objects (keys, values), encoding (base64, json), and more.
**Discovery tools**:
– Run `format-bloblang.sh` to generate category-organized XML reference files in a versioned directory
– Use grep patterns to search function/method names, descriptions, parameters, and examples across categories
– Read specific category XML files for structured definitions with complete function signatures, parameter details, and usage examples
### Control Flow
**Conditionals** (if/else):
“`bloblang
root.category = if this.score >= 80 {
"high"
} else if this.score >= 50 {
"medium"
} else {
"low"
}
“`
**Pattern Matching** (match):
“`bloblang
root.sound = match this.animal {
"cat" => "meow"
"dog" => "woof"
"cow" => "moo"
_ => "unknown" # Catch-all
}
“`
**Coalescing** (try multiple paths with `|`):
“`bloblang
# Use first non-null value from alternative fields
root.content = this.article.body | this.comment.text | "no content"
# Try different nested paths
root.id = this.data.(primary_id | secondary_id | backup_id)
“`
Note: Use `|` for alternative field paths (missing fields), use `.catch()` for operation failures (parse errors, type mismatches).
### Common Operations
**Deletion**:
“`bloblang
root = this
root.password = deleted() # Remove field
# Or filter entire message
root = if this.spam { deleted() }
“`
**Variables** (reuse values without adding to output):
“`bloblang
let user_id = this.user.id
let enriched = this.user.name + " (" + $user_id + ")"
root.display_name = $enriched
root.user_id = $user_id
“`
**IMPORTANT**: Variables must be declared at the top level, not inside `if`, `match`, or other blocks.
“`bloblang
# Bad: Will cause "expected }" parse error
root.age = if this.birthdate != null {
let parsed = this.birthdate.ts_parse("2006-01-02") # let not allowed here!
$parsed.ts_unix()
}
# Good: Declare variables at top level
let parsed = this.birthdate.ts_parse("2006-01-02").catch(null)
root.age = if $parsed != null {
$parsed.ts_unix()
} else {
null
}
“`
**Named mappings**: (reusable scripts)
“`bloblang
map extract_user {
root.id = this.user_id
root.name = this.full_name
root.email = this.contact.email
}
root.customer = this.customer_data.apply("extract_user")
root.vendor = this.vendor_data.apply("extract_user")
“`
**Error Handling** (provide fallback values):
“`bloblang
# Catch errors from any point in the chain
root.count = this.items.length().catch(0)
root.parsed = this.data.parse_json().catch({})
# Catch missing/null values
root.name = this.user.name.or("anonymous")
# Multi-format parsing with catch chains
# Store value in variable for reliable access in catch fallbacks
let date_str = this.date
root.parsed = $date_str.ts_parse("2006-01-02").catch(
$date_str.ts_parse("2006/01/02")
).catch(null)
“`
**IMPORTANT**: When using `.catch()` with fallback expressions that reference `this.field`, store the field in a variable first.
Context references in catch chains can be unreliable:
“`bloblang
# Risky: Context may not be preserved in catch
root.parsed = this.date.ts_parse("2006-01-02").catch(
this.date.ts_parse("2006/01/02") # this.date might not work here
)
# Safe: Store in variable first
let date_str = this.date
root.parsed = $date_str.ts_parse("2006-01-02").catch(
$date_str.ts_parse("2006/01/02") # variable reference is reliable
)
“`
**Metadata**:
“`bloblang
# Read metadata with @ or metadata()
root.topic = @kafka_topic
root.partition = @kafka_partition
# Set metadata
meta output_key = this.id
meta content_type = "application/json"
“`
### Common Edge Case Patterns
**Safe field access with fallbacks**
“`bloblang
# Bad: Will fail if user or name is missing
root.name = this.user.name
# Good: Provides fallback chain
root.name = this.user.name.or("anonymous")
root.name = this.(user.name | profile.display_name | "unknown")
“`
**Safe collection operations**
“`bloblang
# Bad: Will fail on empty array
root.first = this.items[0]
# Good: Handles empty arrays
root.first = if this.items.length() > 0 { this.items[0] } else { null }
root.first = this.items[0].catch(null)
“`
**Safe parsing with error recovery**
“`bloblang
# Bad: Will fail on invalid JSON
root.data = this.payload.parse_json()
# Good: Provides fallback on parse failure
root.data = this.payload.parse_json().catch({})
root.data = this.payload.parse_json().catch(this.payload) # Keep original on failure
“`
**Safe type coercion**
“`bloblang
# Bad: Assumes field is already a string
root.id = this.user_id.uppercase()
# Good: Converts to string first
root.id = this.user_id.string().uppercase()
root.count = this.total.number().catch(0)
“`
**IMPORTANT**: Arithmetic operations on null values fail silently.
Always check for null or use `.catch()` to provide fallbacks:
“`bloblang
# Bad: Fails silently if price is null
root.total = this.price * this.quantity
# Good: Check for null before operations
root.total = if this.price != null && this.quantity != null {
this.price * this.quantity
} else {
null
}
# Also good: Use catch to handle null gracefully
root.total = (this.price * this.quantity).catch(null)
“`
## Workflow
1. **Understand** – Analyze input structure, desired output, and required transformations
– **Ambiguous requirements**: If transformation goal is unclear, ask clarifying questions before proceeding (e.g., "Should missing fields be omitted or set to null?", "How should arrays with mixed types be handled?")
– **Missing sample data**: If user doesn't provide input example, request it explicitly – never proceed with assumptions
– **Complex multistep transformations**: Break down into logical phases (parse â transform â filter â format) and confirm approach with user
2. **Discover** – Generate category files to versioned directory (capture `BLOBLREF_DIR` from script output), identify relevant categories, read specific category XML files to find actual Bloblang functions/methods (NEVER guess)
3. **Develop** – Write valid Bloblang syntax using discovered functions (root for output, this for input, chain methods, handle nulls)
4. **Validate** – Test script with sample input data, verify output matches expectations, iterate on errors until working
– **Test edge cases**: Missing fields, null values, invalid formats, empty collections
– **Iterate**: Fix syntax errors first (variable placement, method chains), then logic errors
5. **Deliver** – Write the working script and example input to files (`script.blobl`, `data.json`), present the tested output, document any assumptions
**Critical: Never present untested code. All scripts must be validated before showing to user.**