Skill

SkillsResearch & Science › Bioinformatics & life science

gget

Fast CLI/Python queries to 20+ bioinformatics databases. Use for quick lookups: gene info, BLAST searches, AlphaFold structures, enrichment analysis. Best for interactive exploration, simple queries. For batch processing or advanced BLAST use biopython; for multi-database Python workflows use bioservices.

Freerisk: medium
ggetpubmedalphafoldpythongouniprot

Tools: gget,–upgrade,openmm,pandas

The full skill

— name: gget description: "Fast CLI/Python queries to 20+ bioinformatics databases. Use for quick lookups: gene info, BLAST searches, AlphaFold structures, enrichment analysis. Best for interactive exploration, simple queries. For batch processing or advanced BLAST use biopython; for multi-database Python workflows use bioservices." license: BSD-2-Clause license metadata: skill-author: K-Dense Inc. — # gget ## Overview gget is a command-line bioinformatics tool and Python package providing unified access to 20+ genomic databases and analysis methods. Query gene information, sequence analysis, protein structures, expression data, and disease associations through a consistent interface. All gget modules work both as command-line tools and as Python functions. **Important**: The databases queried by gget are continuously updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary. ## Installation Install gget in a clean virtual environment to avoid conflicts: “`bash # Using uv (recommended) uv uv pip install gget # Or using pip uv pip install –upgrade gget # In Python/Jupyter import gget “` ## Quick Start Basic usage pattern for all modules: “`bash # Command-line gget <module> [arguments] [options] # Python gget.module(arguments, options) “` Most modules return: – **Command-line**: JSON (default) or CSV with `-csv` flag – **Python**: DataFrame or dictionary Common flags across modules: – `-o/–out`: Save results to file – `-q/–quiet`: Suppress progress information – `-csv`: Return CSV format (command-line only) ## Module Categories ### 1. Reference & Gene Information #### gget ref – Reference Genome Downloads Retrieve download links and metadata for Ensembl reference genomes. **Parameters**: – `species`: Genus_species format (e.g., 'homo_sapiens', 'mus_musculus'). Shortcuts: 'human', 'mouse' – `-w/–which`: Specify return types (gtf, cdna, dna, cds, cdrna, pep). Default: all – `-r/–release`: Ensembl release number (default: latest) – `-l/–list_species`: List available vertebrate species – `-liv/–list_iv_species`: List available invertebrate species – `-ftp`: Return only FTP links – `-d/–download`: Download files (requires curl) **Examples**: “`bash # List available species gget ref –list_species # Get all reference files for human gget ref homo_sapiens # Download only GTF annotation for mouse gget ref -w gtf -d mouse “` “`python # Python gget.ref("homo_sapiens") gget.ref("mus_musculus", which="gtf", download=True) “` #### gget search – Gene Search Locate genes by name or description across species. **Parameters**: – `searchwords`: One or more search terms (case-insensitive) – `-s/–species`: Target species (e.g., 'homo_sapiens', 'mouse') – `-r/–release`: Ensembl release number – `-t/–id_type`: Return 'gene' (default) or 'transcript' – `-ao/–andor`: 'or' (default) finds ANY searchword; 'and' requires ALL – `-l/–limit`: Maximum results to return **Returns**: ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL **Examples**: “`bash # Search for GABA-related genes in human gget search -s human gaba gamma-aminobutyric # Find specific gene, require all terms gget search -s mouse -ao and pax7 transcription “` “`python # Python gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens") “` #### gget info – Gene/Transcript Information Retrieve comprehensive gene and transcript metadata from Ensembl, UniProt, and NCBI. **Parameters**: – `ens_ids`: One or more Ensembl IDs (also supports WormBase, Flybase IDs). Limit: ~1000 IDs – `-n/–ncbi`: Disable NCBI data retrieval – `-u/–uniprot`: Disable UniProt data retrieval – `-pdb`: Include PDB identifiers (increases runtime) **Returns**: UniProt ID, NCBI gene ID, primary gene name, synonyms, protein names, descriptions, biotype, canonical transcript **Examples**: “`bash # Get info for multiple genes gget info ENSG00000034713 ENSG00000104853 ENSG00000170296 # Include PDB IDs gget info ENSG00000034713 -pdb “` “`python # Python gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True) “` #### gget seq – Sequence Retrieval Fetch nucleotide or amino acid sequences for genes and transcripts. **Parameters**: – `ens_ids`: One or more Ensembl identifiers – `-t/–translate`: Fetch amino acid sequences instead of nucleotide – `-iso/–isoforms`: Return all transcript variants (gene IDs only) **Returns**: FASTA format sequences **Examples**: “`bash # Get nucleotide sequences gget seq ENSG00000034713 ENSG00000104853 # Get all protein isoforms gget seq -t -iso ENSG00000034713 “` “`python # Python gget.seq(["ENSG00000034713"], translate=True, isoforms=True) “` ### 2. Sequence Analysis & Alignment #### gget blast – BLAST Searches BLAST nucleotide or amino acid sequences against standard databases. **Parameters**: – `sequence`: Sequence string or path to FASTA/.txt file – `-p/–program`: blastn, blastp, blastx, tblastn, tblastx (auto-detected) – `-db/–database`: – Nucleotide: nt, refseq_rna, pdbnt – Protein: nr, swissprot, pdbaa, refseq_protein – `-l/–limit`: Max hits (default: 50) – `-e/–expect`: E-value cutoff (default: 10.0) – `-lcf/–low_comp_filt`: Enable low complexity filtering – `-mbo/–megablast_off`: Disable MegaBLAST (blastn only) **Examples**: “`bash # BLAST protein sequence gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR # BLAST from file with specific database gget blast sequence.fasta -db swissprot -l 10 “` “`python # Python gget.blast("MKWMFK…", database="swissprot", limit=10) “` #### gget blat – BLAT Searches Locate genomic positions of sequences using UCSC BLAT. **Parameters**: – `sequence`: Sequence string or path to FASTA/.txt file – `-st/–seqtype`: 'DNA', 'protein', 'translated%20RNA', 'translated%20DNA' (auto-detected) – `-a/–assembly`: Target assembly (default: 'human'/hg38; options: 'mouse'/mm39, 'zebrafinch'/taeGut2, etc.) **Returns**: genome, query size, alignment positions, matches, mismatches, alignment percentage **Examples**: “`bash # Find genomic location in human gget blat ATCGATCGATCGATCG # Search in different assembly gget blat -a mm39 ATCGATCGATCGATCG “` “`python # Python gget.blat("ATCGATCGATCGATCG", assembly="mouse") “` #### gget muscle – Multiple Sequence Alignment Align multiple nucleotide or amino acid sequences using Muscle5. **Parameters**: – `fasta`: Sequences or path to FASTA/.txt file – `-s5/–super5`: Use Super5 algorithm for faster processing (large datasets) **Returns**: Aligned sequences in ClustalW format or aligned FASTA (.afa) **Examples**: “`bash # Align sequences from file gget muscle sequences.fasta -o aligned.afa # Use Super5 for large dataset gget muscle large_dataset.fasta -s5 “` “`python # Python gget.muscle("sequences.fasta", save=True) “` #### gget diamond – Local Sequence Alignment Perform fast local protein or translated DNA alignment using DIAMOND. **Parameters**: – Query: Sequences (string/list) or FASTA file path – `–reference`: Reference sequences (string/list) or FASTA file path (required) – `–sensitivity`: fast, mid-sensitive, sensitive, more-sensitive, very-sensitive (default), ultra-sensitive – `–threads`: CPU threads (default: 1) – `–diamond_db`: Save database for reuse – `–translated`: Enable nucleotide-to-amino acid alignment **Returns**: Identity percentage, sequence lengths, match positions, gap openings, E-values, bit scores **Examples**: “`bash # Align against reference gget diamond GGETISAWESQME -ref reference.fasta –threads 4 # Save database for reuse gget diamond query.fasta -ref ref.fasta –diamond_db my_db.dmnd “` “`python # Python gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4) “` ### 3. Structural & Protein Analysis #### gget pdb – Protein Structures Query RCSB Protein Data Bank for structure and metadata. **Parameters**: – `pdb_id`: PDB identifier (e.g., '7S7U') – `-r/–resource`: Data type (pdb, entry, pubmed, assembly, entity types) – `-i/–identifier`: Assembly, entity, or chain ID **Returns**: PDB format (structures) or JSON (metadata) **Examples**: “`bash # Download PDB structure gget pdb 7S7U -o 7S7U.pdb # Get metadata gget pdb 7S7U -r entry “` “`python # Python gget.pdb("7S7U", save=True) “` #### gget alphafold – Protein Structure Prediction Predict 3D protein structures using simplified AlphaFold2. **Setup Required**: “`bash # Install OpenMM first uv pip install openmm # Then setup AlphaFold gget setup alphafold “` **Parameters**: – `sequence`: Amino acid sequence (string), multiple sequences (list), or FASTA file. Multiple sequences trigger multimer modeling – `-mr/–multimer_recycles`: Recycling iterations (default: 3; recommend 20 for accuracy) – `-mfm/–multimer_for_monomer`: Apply multimer model to single proteins – `-r/–relax`: AMBER relaxation for top-ranked model – `plot`: Python-only; generate interactive 3D visualization (default: True) – `show_sidechains`: Python-only; include side chains (default: True) **Returns**: PDB structure file, JSON alignment error data, optional 3D visualization **Examples**: “`bash # Predict single protein structure gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR # Predict multimer with higher accuracy gget alphafold sequence1.fasta -mr 20 -r “` “`python # Python with visualization gget.alphafold("MKWMFK…", plot=True, show_sidechains=True) # Multimer prediction gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20) “` #### gget elm – Eukaryotic Linear Motifs Predict Eukaryotic Linear Motifs in protein sequences. **Setup Required**: “`bash gget setup elm “` **Parameters**: – `sequence`: Amino acid sequence or UniProt Acc – `-u/–uniprot`: Indicates sequence is UniProt Acc – `-e/–expand`: Include protein names, organisms, references – `-s/–sensitivity`: DIAMOND alignment sensitivity (default: "very-sensitive") – `-t/–threads`: Number of threads (default: 1) **Returns**: Two outputs: 1. **ortholog_df**: Linear motifs from orthologous proteins 2. **regex_df**: Motifs directly matched in input sequence **Examples**: “`bash # Predict motifs from sequence gget elm LIAQSIGQASFV -o results # Use UniProt accession with expanded info gget elm –uniprot Q02410 -e “` “`python # Python ortholog_df, regex_df = gget.elm("LIAQSIGQASFV") “` ### 4. Expression & Disease Data #### gget archs4 – Gene Correlation & Tissue Expression Query ARCHS4 database for correlated genes or tissue expression data. **Parameters**: – `gene`: Gene symbol or Ensembl ID (with `–ensembl` flag) – `-w/–which`: 'correlation' (default, returns 100 most correlated genes) or 'tissue' (expression atlas) – `-s/–species`: 'human' (default) or 'mouse' (tissue data only) – `-e/–ensembl`: Input is Ensembl ID **Returns**: – **Correlation mode**: Gene symbols, Pearson correlation coefficients – **Tissue mode**: Tissue identifiers, min/Q1/median/Q3/max expression values **Examples**: “`bash # Get correlated genes gget archs4 ACE2 # Get tissue expression gget archs4 -w tissue ACE2 “` “`python # Python gget.archs4("ACE2", which="tissue") “` #### gget cellxgene – Single-Cell RNA-seq Data Query CZ CELLxGENE Discover Census for single-cell data. **Setup Required**: “`bash gget setup cellxgene “` **Parameters**: – `–gene` (-g): Gene names or Ensembl IDs (case-sensitive! 'PAX7' for human, 'Pax7' for mouse) – `–tissue`: Tissue type(s) – `–cell_type`: Specific cell type(s) – `–species` (-s): 'homo_sapiens' (default) or 'mus_musculus' – `–census_version` (-cv): Version ("stable", "latest", or dated) – `–ensembl` (-e): Use Ensembl IDs – `–meta_only` (-mo): Return metadata only – Additional filters: disease, development_stage, sex, assay, dataset_id, donor_id, ethnicity, suspension_type **Returns**: AnnData object with count matrices and metadata (or metadata-only dataframes) **Examples**: “`bash # Get single-cell data for specific genes and cell types gget cellxgene –gene ACE2 ABCA1 –tissue lung –cell_type "mucus secreting cell" -o lung_data.h5ad # Metadata only gget cellxgene –gene PAX7 –tissue muscle –meta_only -o metadata.csv “` “`python # Python adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell") “` #### gget enrichr – Enrichment Analysis Perform ontology enrichment analysis on gene lists using Enrichr. **Parameters**: – `genes`: Gene symbols or Ensembl IDs – `-db/–database`: Reference database (supports shortcuts: 'pathway', 'transcription', 'ontology', 'diseases_drugs', 'celltypes') – `-s/–species`: human (default), mouse, fly, yeast, worm, fish – `-bkg_l/–background_list`: Background genes for comparison – `-ko/–kegg_out`: Save KEGG pathway images with highlighted genes – `plot`: Python-only; generate graphical results **Database Shortcuts**: – 'pathway' → KEGG_2021_Human – 'transcription' → ChEA_2016 – 'ontology' → GO_Biological_Process_2021 – 'diseases_drugs' → GWAS_Catalog_2019 – 'celltypes' → PanglaoDB_Augmented_2021 **Examples**: “`bash # Enrichment analysis for ontology gget enrichr -db ontology ACE2 AGT AGTR1 # Save KEGG pathways gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/ “` “`python # Python with plot gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True) “` #### gget bgee – Orthology & Expression Retrieve orthology and gene expression data from Bgee database. **Parameters**: – `ens_id`: Ensembl gene ID or NCBI gene ID (for non-Ensembl species). Multiple IDs supported when `type=expression` – `-t/–type`: 'orthologs' (default) or 'expression' **Returns**: – **Orthologs mode**: Matching genes across species with IDs, names, taxonomic info – **Expression mode**: Anatomical entities, confidence scores, expression status **Examples**: “`bash # Get orthologs gget bgee ENSG00000169194 # Get expression data gget bgee ENSG00000169194 -t expression # Multiple genes gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression “` “`python # Python gget.bgee("ENSG00000169194", type="orthologs") “` #### gget opentargets – Disease & Drug Associations Retrieve disease and drug associations from OpenTargets. **Parameters**: – Ensembl gene ID (required) – `-r/–resource`: diseases (default), drugs, tractability, pharmacogenetics, expression, depmap, interactions – `-l/–limit`: Cap results count – Filter arguments (vary by resource): – drugs: `–filter_disease` – pharmacogenetics: `–filter_drug` – expression/depmap: `–filter_tissue`, `–filter_anat_sys`, `–filter_organ` – interactions: `–filter_protein_a`, `–filter_protein_b`, `–filter_gene_b` **Examples**: “`bash # Get associated diseases gget opentargets ENSG00000169194 -r diseases -l 5 # Get associated drugs gget opentargets ENSG00000169194 -r drugs -l 10 # Get tissue expression gget opentargets ENSG00000169194 -r expression –filter_tissue brain “` “`python # Python gget.opentargets("ENSG00000169194", resource="diseases", limit=5) “` #### gget cbio – cBioPortal Cancer Genomics Plot cancer genomics heatmaps using cBioPortal data. **Two subcommands**: **search** – Find study IDs: “`bash gget cbio search breast lung “` **plot** – Generate heatmaps: **Parameters**: – `-s/–study_ids`: Space-separated cBioPortal study IDs (required) – `-g/–genes`: Space-separated gene names or Ensembl IDs (required) – `-st/–stratification`: Column to organize data (tissue, cancer_type, cancer_type_detailed, study_id, sample) – `-vt/–variation_type`: Data type (mutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, Consequence) – `-f/–filter`: Filter by column value (e.g., 'study_id:msk_impact_2017') – `-dd/–data_dir`: Cache directory (default: ./gget_cbio_cache) – `-fd/–figure_dir`: Output directory (default: ./gget_cbio_figures) – `-dpi`: Resolution (default: 100) – `-sh/–show`: Display plot in window – `-nc/–no_confirm`: Skip download confirmations **Examples**: “`bash # Search for studies gget cbio search esophag ovary # Create heatmap gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences “` “`python # Python gget.cbio_search(["esophag", "ovary"]) gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue") “` #### gget cosmic – COSMIC Database Search COSMIC (Catalogue Of Somatic Mutations In Cancer) database. **Important**: License fees apply for commercial use. Requires COSMIC account credentials. **Parameters**: – `searchterm`: Gene name, Ensembl ID, mutation notation, or sample ID – `-ctp/–cosmic_tsv_path`: Path to downloaded COSMIC TSV file (required for querying) – `-l/–limit`: Maximum results (default: 100) **Database download flags**: – `-d/–download_cosmic`: Activate download mode – `-gm/–gget_mutate`: Create version for gget mutate – `-cp/–cosmic_project`: Database type (cancer, census, cell_line, resistance, genome_screen, targeted_screen) – `-cv/–cosmic_version`: COSMIC version – `-gv/–grch_version`: Human reference genome (37 or 38) – `–email`, `–password`: COSMIC credentials **Examples**: “`bash # First download database gget cosmic -d –email [email protected] –password xxx -cp cancer # Then query gget cosmic EGFR -ctp cosmic_data.tsv -l 10 “` “`python # Python gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10) “` ### 5. Additional Tools #### gget mutate – Generate Mutated Sequences Generate mutated nucleotide sequences from mutation annotations. **Parameters**: – `sequences`: FASTA file path or direct sequence input (string/list) – `-m/–mutations`: CSV/TSV file or DataFrame with mutation data (required) – `-mc/–mut_column`: Mutation column name (default: 'mutation') – `-sic/–seq_id_column`: Sequence ID column (default: 'seq_ID') – `-mic/–mut_id_column`: Mutation ID column – `-k/–k`: Length of flanking sequences (default: 30 nucleotides) **Returns**: Mutated sequences in FASTA format **Examples**: “`bash # Single mutation gget mutate ATCGCTAAGCT -m "c.4G>T" # Multiple sequences with mutations from file gget mutate sequences.fasta -m mutations.csv -o mutated.fasta “` “`python # Python import pandas as pd mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]}) gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df) “` #### gget gpt – OpenAI Text Generation Generate natural language text using OpenAI's API. **Setup Required**: “`bash gget setup gpt “` **Important**: Free tier limited to 3 months after account creation. Set monthly billing limits. **Parameters**: – `prompt`: Text input for generation (required) – `api_key`: OpenAI authentication (required) – Model configuration: temperature, top_p, max_tokens, frequency_penalty, presence_penalty – Default model: gpt-3.5-turbo (configurable) **Examples**: “`bash gget gpt "Explain CRISPR" –api_key your_key_here “` “`python # Python gget.gpt("Explain CRISPR", api_key="your_key_here") “` #### gget setup – Install Dependencies Install/download third-party dependencies for specific modules. **Parameters**: – `module`: Module name requiring dependency installation – `-o/–out`: Output folder path (elm module only) **Modules requiring setup**: – `alphafold` – Downloads ~4GB of model parameters – `cellxgene` – Installs cellxgene-census (may not support latest Python) – `elm` – Downloads local ELM database – `gpt` – Configures OpenAI integration **Examples**: “`bash # Setup AlphaFold gget setup alphafold # Setup ELM with custom directory gget setup elm -o /path/to/elm_data “` “`python # Python gget.setup("alphafold") “` ## Common Workflows ### Workflow 1: Gene Discovery to Sequence Analysis Find and analyze genes of interest: “`python # 1. Search for genes results = gget.search(["GABA", "receptor"], species="homo_sapiens") # 2. Get detailed information gene_ids = results["ensembl_id"].tolist() info = gget.info(gene_ids[:5]) # 3. Retrieve sequences sequences = gget.seq(gene_ids[:5], translate=True) “` ### Workflow 2: Sequence Alignment and Structure Align sequences and predict structures: “`python # 1. Align multiple sequences alignment = gget.muscle("sequences.fasta") # 2. Find similar sequences blast_results = gget.blast(my_sequence, database="swissprot", limit=10) # 3. Predict structure structure = gget.alphafold(my_sequence, plot=True) # 4. Find linear motifs ortholog_df, regex_df = gget.elm(my_sequence) “` ### Workflow 3: Gene Expression and Enrichment Analyze expression patterns and functional enrichment: “`python # 1. Get tissue expression tissue_expr = gget.archs4("ACE2", which="tissue") # 2. Find correlated genes correlated = gget.archs4("ACE2", which="correlation") # 3. Get single-cell data adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell") # 4. Perform enrichment analysis gene_list = correlated["gene_symbol"].tolist()[:50] enrichment = gget.enrichr(gene_list, database="ontology", plot=True) “` ### Workflow 4: Disease and Drug Analysis Investigate disease associations and therapeutic targets: “`python # 1. Search for genes genes = gget.search(["breast cancer"], species="homo_sapiens") # 2. Get disease associations diseases = gget.opentargets("ENSG00000169194", resource="diseases") # 3. Get drug associations drugs = gget.opentargets("ENSG00000169194", resource="drugs") # 4. Query cancer genomics data study_ids = gget.cbio_search(["breast"]) gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type") # 5. Search COSMIC for mutations cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv") “` ### Workflow 5: Comparative Genomics Compare proteins across species: “`python # 1. Get orthologs orthologs = gget.bgee("ENSG00000169194", type="orthologs") # 2. Get sequences for comparison human_seq = gget.seq("ENSG00000169194", translate=True) mouse_seq = gget.seq("ENSMUSG00000026091", translate=True) # 3. Align sequences alignment = gget.muscle([human_seq, mouse_seq]) # 4. Compare structures human_structure = gget.pdb("7S7U") mouse_structure = gget.alphafold(mouse_seq) “` ### Workflow 6: Building Reference Indices Prepare reference data for downstream analysis (e.g., kallisto|bustools): “`bash # 1. List available species gget ref –list_species # 2. Download reference files gget ref -w gtf -w cdna -d homo_sapiens # 3. Build kallisto index kallisto index -i transcriptome.idx transcriptome.fasta # 4. Download genome for alignment gget ref -w dna -d homo_sapiens “` ## Best Practices ### Data Retrieval – Use `–limit` to control result sizes for large queries – Save results with `-o/–out` for reproducibility – Check database versions/releases for consistency across analyses – Use `–quiet` in production scripts to reduce output ### Sequence Analysis – For BLAST/BLAT, start with default parameters, then adjust sensitivity – Use `gget diamond` with `–threads` for faster local alignment – Save DIAMOND databases with `–diamond_db` for repeated queries – For multiple sequence alignment, use `-s5/–super5` for large datasets ### Expression and Disease Data – Gene symbols are case-sensitive in cellxgene (e.g., 'PAX7' vs 'Pax7') – Run `gget setup` before first use of alphafold, cellxgene, elm, gpt – For enrichment analysis, use database shortcuts for convenience – Cache cBioPortal data with `-dd` to avoid repeated downloads ### Structure Prediction – AlphaFold multimer predictions: use `-mr 20` for higher accuracy – Use `-r` flag for AMBER relaxation of final structures – Visualize results in Python with `plot=True` – Check PDB database first before running AlphaFold predictions ### Error Handling – Database structures change; update gget regularly: `uv pip install –upgrade gget` – Process max ~1000 Ensembl IDs at once with gget info – For large-scale analyses, implement rate limiting for API queries – Use virtual environments to avoid dependency conflicts ## Output Formats ### Command-line – Default: JSON – CSV: Add `-csv` flag – FASTA: gget seq, gget mutate – PDB: gget pdb, gget alphafold – PNG: gget cbio plot ### Python – Default: DataFrame or dictionary – JSON: Add `json=True` parameter – Save to file: Add `save=True` or specify `out="filename"` – AnnData: gget cellxgene ## Resources This skill includes reference documentation for detailed module information: ### references/ – `module_reference.md` – Comprehensive parameter reference for all modules – `database_info.md` – Information about queried databases and their update frequencies – `workflows.md` – Extended workflow examples and use cases For additional help: – Official documentation: https://pachterlab.github.io/gget/ – GitHub issues: https://github.com/pachterlab/gget/issues – Citation: Luebbert, L. & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836