Skill

SkillsResearch & Science › Bioinformatics & life science

Dnanexus Integration

"DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution."

Freerisk: medium
dnanexusintegrationpythondocker

Tools: dxpy

The full skill

— name: dnanexus-integration description: "DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution." — # DNAnexus Integration ## Overview DNAnexus is a cloud platform for biomedical data analysis and genomics. Build and deploy apps/applets, manage data objects, run workflows, and use the dxpy Python SDK for genomics pipeline development and execution. ## When to Use This Skill This skill should be used when: – Creating, building, or modifying DNAnexus apps/applets – Uploading, downloading, searching, or organizing files and records – Running analyses, monitoring jobs, creating workflows – Writing scripts using dxpy to interact with the platform – Setting up dxapp.json, managing dependencies, using Docker – Processing FASTQ, BAM, VCF, or other bioinformatics files – Managing projects, permissions, or platform resources ## Core Capabilities The skill is organized into five main areas, each with detailed reference documentation: ### 1. App Development **Purpose**: Create executable programs (apps/applets) that run on the DNAnexus platform. **Key Operations**: – Generate app skeleton with `dx-app-wizard` – Write Python or Bash apps with proper entry points – Handle input/output data objects – Deploy with `dx build` or `dx build –app` – Test apps on the platform **Common Use Cases**: – Bioinformatics pipelines (alignment, variant calling) – Data processing workflows – Quality control and filtering – Format conversion tools **Reference**: See `references/app-development.md` for: – Complete app structure and patterns – Python entry point decorators – Input/output handling with dxpy – Development best practices – Common issues and solutions ### 2. Data Operations **Purpose**: Manage files, records, and other data objects on the platform. **Key Operations**: – Upload/download files with `dxpy.upload_local_file()` and `dxpy.download_dxfile()` – Create and manage records with metadata – Search for data objects by name, properties, or type – Clone data between projects – Manage project folders and permissions **Common Use Cases**: – Uploading sequencing data (FASTQ files) – Organizing analysis results – Searching for specific samples or experiments – Backing up data across projects – Managing reference genomes and annotations **Reference**: See `references/data-operations.md` for: – Complete file and record operations – Data object lifecycle (open/closed states) – Search and discovery patterns – Project management – Batch operations ### 3. Job Execution **Purpose**: Run analyses, monitor execution, and orchestrate workflows. **Key Operations**: – Launch jobs with `applet.run()` or `app.run()` – Monitor job status and logs – Create subjobs for parallel processing – Build and run multi-step workflows – Chain jobs with output references **Common Use Cases**: – Running genomics analyses on sequencing data – Parallel processing of multiple samples – Multi-step analysis pipelines – Monitoring long-running computations – Debugging failed jobs **Reference**: See `references/job-execution.md` for: – Complete job lifecycle and states – Workflow creation and orchestration – Parallel execution patterns – Job monitoring and debugging – Resource management ### 4. Python SDK (dxpy) **Purpose**: Programmatic access to DNAnexus platform through Python. **Key Operations**: – Work with data object handlers (DXFile, DXRecord, DXApplet, etc.) – Use high-level functions for common tasks – Make direct API calls for advanced operations – Create links and references between objects – Search and discover platform resources **Common Use Cases**: – Automation scripts for data management – Custom analysis pipelines – Batch processing workflows – Integration with external tools – Data migration and organization **Reference**: See `references/python-sdk.md` for: – Complete dxpy class reference – High-level utility functions – API method documentation – Error handling patterns – Common code patterns ### 5. Configuration and Dependencies **Purpose**: Configure app metadata and manage dependencies. **Key Operations**: – Write dxapp.json with inputs, outputs, and run specs – Install system packages (execDepends) – Bundle custom tools and resources – Use assets for shared dependencies – Integrate Docker containers – Configure instance types and timeouts **Common Use Cases**: – Defining app input/output specifications – Installing bioinformatics tools (samtools, bwa, etc.) – Managing Python package dependencies – Using Docker images for complex environments – Selecting computational resources **Reference**: See `references/configuration.md` for: – Complete dxapp.json specification – Dependency management strategies – Docker integration patterns – Regional and resource configuration – Example configurations ## Quick Start Examples ### Upload and Analyze Data “`python import dxpy # Upload input file input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx") # Run analysis job = dxpy.DXApplet("applet-xxxx").run({ "reads": dxpy.dxlink(input_file.get_id()) }) # Wait for completion job.wait_on_done() # Download results output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"] dxpy.download_dxfile(output_id, "aligned.bam") “` ### Search and Download Files “`python import dxpy # Find BAM files from a specific experiment files = dxpy.find_data_objects( classname="file", name="*.bam", properties={"experiment": "exp001"}, project="project-xxxx" ) # Download each file for file_result in files: file_obj = dxpy.DXFile(file_result["idd:T14ab,# DNAnexus App Development ## Overview Apps and applets are executable programs that run on the DNAnexus platform. They can be written in Python or Bash and are deployed with all necessary dependencies and configuration. ## Applets vs Apps – **Applets**: Data objects that live inside projects. Good for development and testing. – **Apps**: Versioned, shareable executables that don't live inside projects. Can be published for others to use. Both are created identically until the final build step. Applets can be converted to apps later. ## Creating an App/Applet ### Using dx-app-wizard Generate a skeleton app directory structure: “`bash dx-app-wizard “` This creates: – `dxapp.json` – Configuration file – `src/` – Source code directory – `resources/` – Bundled dependencies – `test/` – Test files ### Building and Deploying Build an applet: “`bash dx build “` Build an app: “`bash dx build –app “` The build process: 1. Validates dxapp.json configuration 2. Bundles source code and resources 3. Deploys to the platform 4. Returns the applet/app ID ## App Directory Structure “` my-app/ ├── dxapp.json # Metadata and configuration ├── src/ │ └── my-app.py # Main executable (Python) │ └── my-app.sh # Or Bash script ├── resources/ # Bundled files and dependencies │ └── tools/ │ └── data/ └── test/ # Test data and scripts └── test.json “` ## Python App Structure ### Entry Points Python apps use the `@dxpy.entry_point()` decorator to define functions: “`python import dxpy @dxpy.entry_point('main') def main(input1, input2): # Process inputs # Return outputs return { "output1": result1, "output2": result2 } dxpy.run() “` ### Input/Output Handling **Inputs**: DNAnexus data objects are represented as dicts containing links: “`python @dxpy.entry_point('main') def main(reads_file): # Convert link to handler reads_dxfile = dxpy.DXFile(reads_file) # Download to local filesystem dxpy.download_dxfile(reads_dxfile.get_id(), "reads.fastq") # Process file… “` **Outputs**: Return primitive types directly, convert file outputs to links: “`python # Upload result file output_file = dxpy.upload_local_file("output.fastq") return { "trimmed_reads": dxpy.dxlink(output_file) } “` ## Bash App Structure Bash apps use a simpler shell script approach: “`bash #!/bin/bash set -e -x -o pipefail main() { # Download inputs dx download "$reads_file" -o reads.fastq # Process process_reads reads.fastq > output.fastq # Upload outputs trimmed_reads=$(dx upload output.fastq –brief) # Set job output dx-jobutil-add-output trimmed_reads "$trimmed_reads" –class=file } “` ## Common Development Patterns ### 1. Bioinformatics Pipeline Download → Process → Upload pattern: “`python # Download input dxpy.download_dxfile(input_file_id, "input.fastq") # Run analysis subprocess.check_call(["tool", "input.fastq", "output.bame:T3374,# DNAnexus App Configuration and Dependencies ## Overview This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers. ## dxapp.json Structure The `dxapp.json` file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies. ### Minimal Example “`json { "name": "my-app", "title": "My Analysis App", "summary": "Performs analysis on input files", "dxapi": "1.0.0", "version": "1.0.0", "inputSpec": [], "outputSpec": [], "runSpec": { "interpreter": "python3", "file": "src/my-app.py", "distribution": "Ubuntu", "release": "24.04" } } “` ## Metadata Fields ### Required Fields “`json { "name": "my-app", // Unique identifier (lowercase, numbers, hyphens, underscores) "title": "My App", // Human-readable name "summary": "One line description", "dxapi": "1.0.0" // API version } “` ### Optional Metadata “`json { "version": "1.0.0", // Semantic version (required for apps) "description": "Extended description…", "developerNotes": "Implementation notes…", "categories": [ // For app discovery "Read Mapping", "Variation Calling" ], "details": { // Arbitrary metadata "contactEmail": "[email protected]", "upstreamVersion": "2.1.0", "citations": ["doi:10.1000/example"], "changelog": { "1.0.0": "Initial release" } } } “` ## Input Specification Define input parameters: “`json { "inputSpec": [ { "name": "reads", "label": "Input reads", "class": "file",