Skill

SkillsData & Databases › Data engineering & pipelines

bloblang-authoring

This skill should be used when users need to create or debug Bloblang transformation scripts. Trigger when users ask about transforming data, mapping fields, parsing JSON/CSV/XML, converting timestamps, filtering arrays, or mention "bloblang", "blobl", "mapping processor", or describe any data transformation need like "convert this to that" or "transform my JSON".

Freerisk: medium
bloblangauthoringsqljwtschema

The full skill

— name: bloblang-authoring description: This skill should be used when users need to create or debug Bloblang transformation scripts. Trigger when users ask about transforming data, mapping fields, parsing JSON/CSV/XML, converting timestamps, filtering arrays, or mention "bloblang", "blobl", "mapping processor", or describe any data transformation need like "convert this to that" or "transform my JSON". — # Redpanda Connect Bloblang Script Generator Create working, tested Bloblang transformation scripts from natural language descriptions. ## Objective Generate a Bloblang (blobl) script that correctly transforms the user's input data according to their requirements. The script MUST be tested before presenting it. ## Setup This skill requires `rpk` `rpk connect`, `python3`, and `jq`. See the [SETUP](SETUP.md) for installation instructions. ## Tools ### Script format-bloblang.sh Generates category-organized Bloblang reference files in XML format. **Run once at the start of each session** before searching for functions/methods. “`bash # Usage: ./resources/scripts/format-bloblang.sh “` – No arguments – Generates category files organized by type (e.g., `functions-General.xml`, `methods-String_Manipulation.xml`) – Outputs generated files to a versioned directory – Outputs the directory path to stdout (capture in `BLOBLREF_DIR` variable for later use) – Each XML file contains structured function/method definitions with parameters, descriptions, and examples #### Functions Generated function files have `functions-<Category>.xml` names and contain functions relevant to that category. – `functions-Encoding.xml` – Schema registry headers – `functions-Environment.xml` – Environment vars, files, timestamps, hostname – `functions-Fake_Data_Generation.xml` – Fake data generation – `functions-General.xml` – Bytes, counter, deleted, ksuid, nanoid, uuid, random, range, snowflake – `functions-Message_Info.xml` – Batch index, content, error, metadata, span links, tracing IDs – etc. **The `function` XML tag format:** – `name` attribute – function name – `params` attribute – comma-separated list of parameters with types, format `<name>:<type>` or empty string if no parameters – body – description of function purpose and usage – `example` XML subtag – `summary` attribute (optional) – brief description of the example – body – code block demonstrating usage Example function definition: “`xml <function name="random_int" params="seed:query expression, min:integer, max:integer"> Generates a pseudo-random non-negative 64-bit integer. Use this for creating random IDs, sampling data, or generating test values. Provide a seed for reproducible randomness, or use a dynamic seed like `timestamp_unix_nano()` for unique values per mapping instance. Optional `min` and `max` parameters constrain the output range (both inclusive). For dynamic ranges based on message data, use the modulo operator instead: `random_int() % dynamic_max + dynamic_min`. <example> root.first = random_int() root.second = random_int(1) root.third = random_int(max:20) root.fourth = random_int(min:10, max:20) root.fifth = random_int(timestamp_unix_nano(), 5, 20) root.sixth = random_int(seed:timestamp_unix_nano(), max:20) </example> <example summary="Use a dynamic seed for unique random values per mapping instance."> root.random_id = random_int(timestamp_unix_nano()) root.sample_percent = random_int(seed: timestamp_unix_nano(), min: 0, max: 100) </example> </function> “` #### Methods Generated method files have `methods-<Category>.xml` names and contain methods relevant to that category. – `methods-Encoding_and_Encryption.xml` – Base64, compression, hashing, encryption – `methods-General.xml` – Basic operations, type checking – `methods-GeoIP.xml` – GeoIP lookups – `methods-JSON_Web_Tokens.xml` – JWT operations – `methods-Number_Manipulation.xml` – Arithmetic, rounding, formatting – `methods-Object___Array_Manipulation.xml` – Filtering, mapping, sorting, merging – `methods-Parsing.xml` – JSON, CSV, XML, protocol buffer parsing – `methods-Regular_Expressions.xml` – Regex matching and replacement – `methods-SQL.xml` – SQL operations – `methods-String_Manipulation.xml` – Case, trimming, splitting, formatting – `methods-Timestamp_Manipulation.xml` – Parsing, formatting, timezone conversion – `methods-Type_Coercion.xml` – Type conversions – etc. **The `method` XML tag format:** – `name` attribute – function name – `params` attribute – comma-separated list of parameters with types, format `<name>:<type>` or empty string if no parameters – body – description of function purpose and usage – `example` XML subtag – `summary` attribute (optional) – brief description of the example – body – code block demonstrating usage Example method definition: “`xml <method name="ts_format" params="format:string, tz:string"> Formats a timestamp into a string using the specified format layout. <example> root.formatted = this.timestamp.ts_format("2006-01-02T15:04:05Z07:00") </example> </method> “` ### Grep Search Lists Available functions and methods without loading full files. “`bash # List all available functions and methods by name grep -hE '<(function|method) name=' "$BLOBLREF_DIR" # Search by keyword (searches names, descriptions, params, examples) grep -i "timestamp" "$BLOBLREF_DIR" # Search by parameter name (e.g., find all with "format" parameter) grep 'params="[^"]*format' "$BLOBLREF_DIR" “` – Requires `BLOBLREF_DIR` set to the directory output by `format-bloblang.sh` ### Script test-blobl.sh Tests a Bloblang script against input data. Executes the transformation and returns results or errors. Can be run repeatedly during iteration. “`bash # Usage: ./resources/scripts/test-blobl.sh <target-directory> “` – Requires `data.json` (input) and `script.blobl` (transformation) in the target directory – Returns transformed data or error messages ## Bloblang **Bloblang** (blobl) is Redpanda Connect's native mapping language for transforming message data. It's designed for readability and safely reshaping documents of any structure. ### Core Concepts **Assignment**: Create new documents by assigning values to paths. – `root` = the new document being created – `this` = the input document being read “`bloblang # Copy entire input root = this # Create specific fields root.id = this.thing.id root.type = "processed" # In: {"thing":{"id":"abc123"}} # Out: {"id":"abc123","type":"processed"} “` **Field Paths**: Use dot notation for nested fields. Use quotes for special characters: “`bloblang root.user.name = this.customer.full_name root."foo.bar".baz = this."field with spaces" “` **Literals**: Numbers, booleans, strings, null, arrays, and objects: “`bloblang root = { "count": 42, "active": true, "items": ["a", "b", "c"], "nested": {"key": "value"} } “` ### Functions and Methods **Functions** generate values (no target needed): “`bloblang root.id = uuid_v4() root.timestamp = now() root.hostname = hostname() “` **Methods** transform values (called on a target with `.`): “`bloblang root.upper = this.name.uppercase() root.formatted = this.date.ts_parse("2006-01-02").ts_format("Mon Jan 2") root.sorted = this.items.sort() “` Methods can be chained: “`bloblang root.clean = this.text.trim().lowercase().replace_all("_", "-") “` Methods require a target (called with `.`), while functions do not. Check the XML reference files to determine correct usage: “`bloblang # Bad: floor() is a method, not a function root.rounded = floor(this.value) # Error: floor is not a function # Good: Call floor() as a method on a value root.rounded = this.value.floor() # Bad: uuid_v4() is a function, not a method root.id = this.uuid_v4() # Error: uuid_v4 is not a method # Good: Call uuid_v4() as a function root.id = uuid_v4() “` **Discovering Available Functions & Methods** Bloblang provides hundreds of functions and methods organized into categories. Start with these **foundational categories** that cover common use cases: – `functions-General.xml` – Core utility functions (uuid_v4, timestamp, random, etc.) – `functions-Message_Info.xml` – Message metadata access (hostname, env, content_type, etc.) – `methods-General.xml` – Universal transformations (type conversions, existence checks, etc.) For specialized needs, consult **domain-specific categories**: strings (uppercase, trim, regexp), timestamps (ts_parse, ts_format), arrays (map_each, filter), objects (keys, values), encoding (base64, json), and more. **Discovery tools**: – Run `format-bloblang.sh` to generate category-organized XML reference files in a versioned directory – Use grep patterns to search function/method names, descriptions, parameters, and examples across categories – Read specific category XML files for structured definitions with complete function signatures, parameter details, and usage examples ### Control Flow **Conditionals** (if/else): “`bloblang root.category = if this.score >= 80 { "high" } else if this.score >= 50 { "medium" } else { "low" } “` **Pattern Matching** (match): “`bloblang root.sound = match this.animal { "cat" => "meow" "dog" => "woof" "cow" => "moo" _ => "unknown" # Catch-all } “` **Coalescing** (try multiple paths with `|`): “`bloblang # Use first non-null value from alternative fields root.content = this.article.body | this.comment.text | "no content" # Try different nested paths root.id = this.data.(primary_id | secondary_id | backup_id) “` Note: Use `|` for alternative field paths (missing fields), use `.catch()` for operation failures (parse errors, type mismatches). ### Common Operations **Deletion**: “`bloblang root = this root.password = deleted() # Remove field # Or filter entire message root = if this.spam { deleted() } “` **Variables** (reuse values without adding to output): “`bloblang let user_id = this.user.id let enriched = this.user.name + " (" + $user_id + ")" root.display_name = $enriched root.user_id = $user_id “` **IMPORTANT**: Variables must be declared at the top level, not inside `if`, `match`, or other blocks. “`bloblang # Bad: Will cause "expected }" parse error root.age = if this.birthdate != null { let parsed = this.birthdate.ts_parse("2006-01-02") # let not allowed here! $parsed.ts_unix() } # Good: Declare variables at top level let parsed = this.birthdate.ts_parse("2006-01-02").catch(null) root.age = if $parsed != null { $parsed.ts_unix() } else { null } “` **Named mappings**: (reusable scripts) “`bloblang map extract_user { root.id = this.user_id root.name = this.full_name root.email = this.contact.email } root.customer = this.customer_data.apply("extract_user") root.vendor = this.vendor_data.apply("extract_user") “` **Error Handling** (provide fallback values): “`bloblang # Catch errors from any point in the chain root.count = this.items.length().catch(0) root.parsed = this.data.parse_json().catch({}) # Catch missing/null values root.name = this.user.name.or("anonymous") # Multi-format parsing with catch chains # Store value in variable for reliable access in catch fallbacks let date_str = this.date root.parsed = $date_str.ts_parse("2006-01-02").catch( $date_str.ts_parse("2006/01/02") ).catch(null) “` **IMPORTANT**: When using `.catch()` with fallback expressions that reference `this.field`, store the field in a variable first. Context references in catch chains can be unreliable: “`bloblang # Risky: Context may not be preserved in catch root.parsed = this.date.ts_parse("2006-01-02").catch( this.date.ts_parse("2006/01/02") # this.date might not work here ) # Safe: Store in variable first let date_str = this.date root.parsed = $date_str.ts_parse("2006-01-02").catch( $date_str.ts_parse("2006/01/02") # variable reference is reliable ) “` **Metadata**: “`bloblang # Read metadata with @ or metadata() root.topic = @kafka_topic root.partition = @kafka_partition # Set metadata meta output_key = this.id meta content_type = "application/json" “` ### Common Edge Case Patterns **Safe field access with fallbacks** “`bloblang # Bad: Will fail if user or name is missing root.name = this.user.name # Good: Provides fallback chain root.name = this.user.name.or("anonymous") root.name = this.(user.name | profile.display_name | "unknown") “` **Safe collection operations** “`bloblang # Bad: Will fail on empty array root.first = this.items[0] # Good: Handles empty arrays root.first = if this.items.length() > 0 { this.items[0] } else { null } root.first = this.items[0].catch(null) “` **Safe parsing with error recovery** “`bloblang # Bad: Will fail on invalid JSON root.data = this.payload.parse_json() # Good: Provides fallback on parse failure root.data = this.payload.parse_json().catch({}) root.data = this.payload.parse_json().catch(this.payload) # Keep original on failure “` **Safe type coercion** “`bloblang # Bad: Assumes field is already a string root.id = this.user_id.uppercase() # Good: Converts to string first root.id = this.user_id.string().uppercase() root.count = this.total.number().catch(0) “` **IMPORTANT**: Arithmetic operations on null values fail silently. Always check for null or use `.catch()` to provide fallbacks: “`bloblang # Bad: Fails silently if price is null root.total = this.price * this.quantity # Good: Check for null before operations root.total = if this.price != null && this.quantity != null { this.price * this.quantity } else { null } # Also good: Use catch to handle null gracefully root.total = (this.price * this.quantity).catch(null) “` ## Workflow 1. **Understand** – Analyze input structure, desired output, and required transformations – **Ambiguous requirements**: If transformation goal is unclear, ask clarifying questions before proceeding (e.g., "Should missing fields be omitted or set to null?", "How should arrays with mixed types be handled?") – **Missing sample data**: If user doesn't provide input example, request it explicitly – never proceed with assumptions – **Complex multistep transformations**: Break down into logical phases (parse → transform → filter → format) and confirm approach with user 2. **Discover** – Generate category files to versioned directory (capture `BLOBLREF_DIR` from script output), identify relevant categories, read specific category XML files to find actual Bloblang functions/methods (NEVER guess) 3. **Develop** – Write valid Bloblang syntax using discovered functions (root for output, this for input, chain methods, handle nulls) 4. **Validate** – Test script with sample input data, verify output matches expectations, iterate on errors until working – **Test edge cases**: Missing fields, null values, invalid formats, empty collections – **Iterate**: Fix syntax errors first (variable placement, method chains), then logic errors 5. **Deliver** – Write the working script and example input to files (`script.blobl`, `data.json`), present the tested output, document any assumptions **Critical: Never present untested code. All scripts must be validated before showing to user.**