Skills › Research & Science › Bioinformatics & life science
Dnanexus Integration
"DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution."
Tools: dxpy
The full skill
—
name: dnanexus-integration
description: "DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution."
—
# DNAnexus Integration
## Overview
DNAnexus is a cloud platform for biomedical data analysis and genomics. Build and deploy apps/applets, manage data objects, run workflows, and use the dxpy Python SDK for genomics pipeline development and execution.
## When to Use This Skill
This skill should be used when:
– Creating, building, or modifying DNAnexus apps/applets
– Uploading, downloading, searching, or organizing files and records
– Running analyses, monitoring jobs, creating workflows
– Writing scripts using dxpy to interact with the platform
– Setting up dxapp.json, managing dependencies, using Docker
– Processing FASTQ, BAM, VCF, or other bioinformatics files
– Managing projects, permissions, or platform resources
## Core Capabilities
The skill is organized into five main areas, each with detailed reference documentation:
### 1. App Development
**Purpose**: Create executable programs (apps/applets) that run on the DNAnexus platform.
**Key Operations**:
– Generate app skeleton with `dx-app-wizard`
– Write Python or Bash apps with proper entry points
– Handle input/output data objects
– Deploy with `dx build` or `dx build –app`
– Test apps on the platform
**Common Use Cases**:
– Bioinformatics pipelines (alignment, variant calling)
– Data processing workflows
– Quality control and filtering
– Format conversion tools
**Reference**: See `references/app-development.md` for:
– Complete app structure and patterns
– Python entry point decorators
– Input/output handling with dxpy
– Development best practices
– Common issues and solutions
### 2. Data Operations
**Purpose**: Manage files, records, and other data objects on the platform.
**Key Operations**:
– Upload/download files with `dxpy.upload_local_file()` and `dxpy.download_dxfile()`
– Create and manage records with metadata
– Search for data objects by name, properties, or type
– Clone data between projects
– Manage project folders and permissions
**Common Use Cases**:
– Uploading sequencing data (FASTQ files)
– Organizing analysis results
– Searching for specific samples or experiments
– Backing up data across projects
– Managing reference genomes and annotations
**Reference**: See `references/data-operations.md` for:
– Complete file and record operations
– Data object lifecycle (open/closed states)
– Search and discovery patterns
– Project management
– Batch operations
### 3. Job Execution
**Purpose**: Run analyses, monitor execution, and orchestrate workflows.
**Key Operations**:
– Launch jobs with `applet.run()` or `app.run()`
– Monitor job status and logs
– Create subjobs for parallel processing
– Build and run multi-step workflows
– Chain jobs with output references
**Common Use Cases**:
– Running genomics analyses on sequencing data
– Parallel processing of multiple samples
– Multi-step analysis pipelines
– Monitoring long-running computations
– Debugging failed jobs
**Reference**: See `references/job-execution.md` for:
– Complete job lifecycle and states
– Workflow creation and orchestration
– Parallel execution patterns
– Job monitoring and debugging
– Resource management
### 4. Python SDK (dxpy)
**Purpose**: Programmatic access to DNAnexus platform through Python.
**Key Operations**:
– Work with data object handlers (DXFile, DXRecord, DXApplet, etc.)
– Use high-level functions for common tasks
– Make direct API calls for advanced operations
– Create links and references between objects
– Search and discover platform resources
**Common Use Cases**:
– Automation scripts for data management
– Custom analysis pipelines
– Batch processing workflows
– Integration with external tools
– Data migration and organization
**Reference**: See `references/python-sdk.md` for:
– Complete dxpy class reference
– High-level utility functions
– API method documentation
– Error handling patterns
– Common code patterns
### 5. Configuration and Dependencies
**Purpose**: Configure app metadata and manage dependencies.
**Key Operations**:
– Write dxapp.json with inputs, outputs, and run specs
– Install system packages (execDepends)
– Bundle custom tools and resources
– Use assets for shared dependencies
– Integrate Docker containers
– Configure instance types and timeouts
**Common Use Cases**:
– Defining app input/output specifications
– Installing bioinformatics tools (samtools, bwa, etc.)
– Managing Python package dependencies
– Using Docker images for complex environments
– Selecting computational resources
**Reference**: See `references/configuration.md` for:
– Complete dxapp.json specification
– Dependency management strategies
– Docker integration patterns
– Regional and resource configuration
– Example configurations
## Quick Start Examples
### Upload and Analyze Data
“`python
import dxpy
# Upload input file
input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")
# Run analysis
job = dxpy.DXApplet("applet-xxxx").run({
"reads": dxpy.dxlink(input_file.get_id())
})
# Wait for completion
job.wait_on_done()
# Download results
output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"]
dxpy.download_dxfile(output_id, "aligned.bam")
“`
### Search and Download Files
“`python
import dxpy
# Find BAM files from a specific experiment
files = dxpy.find_data_objects(
classname="file",
name="*.bam",
properties={"experiment": "exp001"},
project="project-xxxx"
)
# Download each file
for file_result in files:
file_obj = dxpy.DXFile(file_result["idd:T14ab,# DNAnexus App Development
## Overview
Apps and applets are executable programs that run on the DNAnexus platform. They can be written in Python or Bash and are deployed with all necessary dependencies and configuration.
## Applets vs Apps
– **Applets**: Data objects that live inside projects. Good for development and testing.
– **Apps**: Versioned, shareable executables that don't live inside projects. Can be published for others to use.
Both are created identically until the final build step. Applets can be converted to apps later.
## Creating an App/Applet
### Using dx-app-wizard
Generate a skeleton app directory structure:
“`bash
dx-app-wizard
“`
This creates:
– `dxapp.json` – Configuration file
– `src/` – Source code directory
– `resources/` – Bundled dependencies
– `test/` – Test files
### Building and Deploying
Build an applet:
“`bash
dx build
“`
Build an app:
“`bash
dx build –app
“`
The build process:
1. Validates dxapp.json configuration
2. Bundles source code and resources
3. Deploys to the platform
4. Returns the applet/app ID
## App Directory Structure
“`
my-app/
âââ dxapp.json # Metadata and configuration
âââ src/
â âââ my-app.py # Main executable (Python)
â âââ my-app.sh # Or Bash script
âââ resources/ # Bundled files and dependencies
â âââ tools/
â âââ data/
âââ test/ # Test data and scripts
âââ test.json
“`
## Python App Structure
### Entry Points
Python apps use the `@dxpy.entry_point()` decorator to define functions:
“`python
import dxpy
@dxpy.entry_point('main')
def main(input1, input2):
# Process inputs
# Return outputs
return {
"output1": result1,
"output2": result2
}
dxpy.run()
“`
### Input/Output Handling
**Inputs**: DNAnexus data objects are represented as dicts containing links:
“`python
@dxpy.entry_point('main')
def main(reads_file):
# Convert link to handler
reads_dxfile = dxpy.DXFile(reads_file)
# Download to local filesystem
dxpy.download_dxfile(reads_dxfile.get_id(), "reads.fastq")
# Process file…
“`
**Outputs**: Return primitive types directly, convert file outputs to links:
“`python
# Upload result file
output_file = dxpy.upload_local_file("output.fastq")
return {
"trimmed_reads": dxpy.dxlink(output_file)
}
“`
## Bash App Structure
Bash apps use a simpler shell script approach:
“`bash
#!/bin/bash
set -e -x -o pipefail
main() {
# Download inputs
dx download "$reads_file" -o reads.fastq
# Process
process_reads reads.fastq > output.fastq
# Upload outputs
trimmed_reads=$(dx upload output.fastq –brief)
# Set job output
dx-jobutil-add-output trimmed_reads "$trimmed_reads" –class=file
}
“`
## Common Development Patterns
### 1. Bioinformatics Pipeline
Download â Process â Upload pattern:
“`python
# Download input
dxpy.download_dxfile(input_file_id, "input.fastq")
# Run analysis
subprocess.check_call(["tool", "input.fastq", "output.bame:T3374,# DNAnexus App Configuration and Dependencies
## Overview
This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers.
## dxapp.json Structure
The `dxapp.json` file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies.
### Minimal Example
“`json
{
"name": "my-app",
"title": "My Analysis App",
"summary": "Performs analysis on input files",
"dxapi": "1.0.0",
"version": "1.0.0",
"inputSpec": [],
"outputSpec": [],
"runSpec": {
"interpreter": "python3",
"file": "src/my-app.py",
"distribution": "Ubuntu",
"release": "24.04"
}
}
“`
## Metadata Fields
### Required Fields
“`json
{
"name": "my-app", // Unique identifier (lowercase, numbers, hyphens, underscores)
"title": "My App", // Human-readable name
"summary": "One line description",
"dxapi": "1.0.0" // API version
}
“`
### Optional Metadata
“`json
{
"version": "1.0.0", // Semantic version (required for apps)
"description": "Extended description…",
"developerNotes": "Implementation notes…",
"categories": [ // For app discovery
"Read Mapping",
"Variation Calling"
],
"details": { // Arbitrary metadata
"contactEmail": "[email protected]",
"upstreamVersion": "2.1.0",
"citations": ["doi:10.1000/example"],
"changelog": {
"1.0.0": "Initial release"
}
}
}
“`
## Input Specification
Define input parameters:
“`json
{
"inputSpec": [
{
"name": "reads",
"label": "Input reads",
"class": "file",