Skill

SkillsResearch & Science › Bioinformatics & life science

Opentargets Database

"Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification."

Freerisk: low
opentargetsdatabasepythongraphqlchembluniprot

Tools: requests

The full skill

— name: opentargets-database description: "Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification." — # Open Targets Database ## Overview The Open Targets Platform is a comprehensive resource for systematic identification and prioritization of potential therapeutic drug targets. It integrates publicly available datasets including human genetics, omics, literature, and chemical data to build and score target-disease associations. **Key capabilities:** – Query target (gene) annotations including tractability, safety, expression – Search for disease-target associations with evidence scores – Retrieve evidence from multiple data types (genetics, pathways, literature, etc.) – Find known drugs for diseases and their mechanisms – Access drug information including clinical trial phases and adverse events – Evaluate target druggability and therapeutic potential **Data access:** The platform provides a GraphQL API, web interface, data downloads, and Google BigQuery access. This skill focuses on the GraphQL API for programmatic access. ## When to Use This Skill This skill should be used when: – **Target discovery:** Finding potential therapeutic targets for a disease – **Target assessment:** Evaluating tractability, safety, and druggability of genes – **Evidence gathering:** Retrieving supporting evidence for target-disease associations – **Drug repurposing:** Identifying existing drugs that could be repurposed for new indications – **Competitive intelligence:** Understanding clinical precedence and drug development landscape – **Target prioritization:** Ranking targets based on genetic evidence and other data types – **Mechanism research:** Investigating biological pathways and gene functions – **Biomarker discovery:** Finding genes differentially expressed in disease – **Safety assessment:** Identifying potential toxicity concerns for drug targets ## Core Workflow ### 1. Search for Entities Start by finding the identifiers for targets, diseases, or drugs of interest. **For targets (genes):** “`python from scripts.query_opentargets import search_entities # Search by gene symbol or name results = search_entities("BRCA1", entity_types=["targetd:T18bc,# Open Targets Platform API Reference ## API Endpoint “` https://api.platform.opentargets.org/api/v4/graphql “` Interactive GraphQL playground with documentation: “` https://api.platform.opentargets.org/api/v4/graphql/browser “` ## Access Methods The Open Targets Platform provides multiple access methods: 1. **GraphQL API** – Best for single entity queries and flexible data retrieval 2. **Web Interface** – Interactive platform at https://platform.opentargets.org 3. **Data Downloads** – FTP at https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/ 4. **Google BigQuery** – For large-scale systematic queries ## Authentication No authentication is required for the GraphQL API. All data is freely accessible. ## Rate Limits For systematic queries involving multiple targets or diseases, use dataset downloads or BigQuery instead of repeated API calls. The API is optimized for single-entity and exploratory queries. ## GraphQL Query Structure GraphQL queries consist of: 1. Query operation with optional variables 2. Field selection (request only needed fields) 3. Nested entity traversal ### Basic Python Example “`python import requests import json # Define the query query_string = """ query target($ensemblId: String!){ target(ensemblId: $ensemblId){ id approvedSymbol biotype geneticConstraint { constraintType exp obs score } } } """ # Define variables variables = {"ensemblId": "ENSG00000169083"} # Make the request base_url = "https://api.platform.opentargets.org/api/v4/graphql" response = requests.post(base_url, json={"query": query_string, "variables": variables}) data = json.loads(response.text) print(data) “` ## Available Query Endpoints ### /target Retrieve gene annotations, tractability assessments, and disease associations. **Common fields:** – `id` – Ensembl gene ID – `approvedSymbol` – HGNC gene symbol – `approvedName` – Full gene name – `biotype` – Gene type (protein_coding, etc.) – `tractability` – Druggability assessment – `safetyLiabilities` – Safety information – `expressions` – Baseline expression data – `knownDrugs` – Approved/clinical drugs – `associatedDiseases` – Disease associations with evidence ### /disease Retrieve disease/phenotype data, known drugs, and clinical information. **Common fields:** – `id` – EFO disease identifier – `name` – Disease name – `description` – Disease description – `therapeuticAreas` – High-level disease categories – `synonyms` – Alternative names – `knownDrugs` – Drugs indicated for disease – `associatedTargets` – Target associations with evidence ### /drug Retrieve compound details, mechanisms of action, and pharmacovigilance data. **Common fields:** – `id` – ChEMBL identifier – `name` – Drug name – `drugType` – Small molecule, antibody, etc. – `maximumClinicalTrialPhase` – Development stage – `indications` – Disease indications – `mechanismsOfAction` – Target mechanisms – `adverseEvents` – Pharmacovigilance data ### /search Search across all entities (targets, diseases, drugs). **Parameters:** – `queryString` – Search term – `entityNames` – Filter by entity type(s) – `page` – Pagination ### /associationDiseaseIndirect Retrieve target-disease associations including indirect evidence from disease descendants in ontology. **Key fields:** – `rows` – Association records with scores – `aggregations` – Aggregated statistics ## Example Queries ### Query 1: Get target information with disease associations “`python query = """ query targetInfo($ensemblId: String!) { target(ensemblId: $ensemblId) { approvedSymbol approvedName tractability { label modality value } associatedDiseases(page: {size: 10}) { rows { disease { name } score datatypeScores { componentId score } } } } } """ variables = {"ensemblId": "ENSG00000157764"} “` ### Query 2: Search for diseases “`python query = """ query searchDiseases($queryString: String!) { search(queryString: $queryString, entityNames: ["diseasee:T223d,# Evidence Types and Data Sources ## Overview Evidence represents any event or set of events that identifies a target as a potential causal gene or protein for a disease. Evidence is standardized and mapped to: – **Ensembl gene IDs** for targets – **EFO (Experimental Factor Ontology)** for diseases/phenotypes Evidence is organized into **data types** (broader categories) and **data sources** (specific databases/studies). ## Evidence Data Types ### 1. Genetic Association Evidence from human genetics linking genetic variants to disease phenotypes. #### Data Sources: **GWAS (Genome-Wide Association Studies)** – Population-level common variant associations – Filtered with Locus-to-Gene (L2G) scores >0.05 – Includes fine-mapping and colocalization data – Sources: GWAS Catalog, FinnGen, UK Biobank, EBI GWAS **Gene Burden Tests** – Rare variant association analyses – Aggregate effects of multiple rare variants in a gene – Particularly relevant for Mendelian and rare diseases **ClinVar Germline** – Clinical variant interpretations – Classifications: pathogenic, likely pathogenic, VUS, benign – Expert-reviewed variant-disease associations **Genomics England PanelApp** – Expert gene-disease ratings – Green (confirmed), amber (probable), red (no evidence) – Focus on rare diseases and cancer **Gene2Phenotype** – Curated gene-disease relationships – Allelic requirements and inheritance patterns – Clinical validity assessments **UniProt Literature & Variants** – Literature-based gene-disease associations – Expert-curated from scientific publications **Orphanet** – Rare disease gene associations – Expert-reviewed and maintained **ClinGen** – Clinical genome resource classifications – Gene-disease validity assertions ### 2. Somatic Mutations Evidence from cancer genomics identifying driver genes and therapeutic targets. #### Data Sources: **Cancer Gene Census** – Expert-curated cancer genes – Tier classifications (1 = strong evidence, 2 = emerging) – Mutation types and cancer types **IntOGen** – Computational driver gene predictions – Aggregated from large cohort studies – Statistical significance of mutations **ClinVar Somatic** – Somatic clinical variant interpretations – Oncogenic/likely oncogenic classifications **Cancer Biomarkers** – FDA/EMA approved biomarkers – Clinical trial biomarkers – Prognostic and predictive markers ### 3. Known Drugs Evidence from clinical precedence showing drugs targeting genes for disease indications. #### Data Source: **ChEMBL** – Approved drugs (Phase 4) – Clinical candidates (Phase 1-3) – Withdrawn drugs – Drug-target-indication triplets with mechanism of action **Clinical Trial Information:** – `phase`: Maximum clinical trial phase (1, 2, 3, 4) – `status`: Active, terminated, completed, withdrawn – `mechanismOfAction`: How drug affects target ### 4. Affected Pathways Evidence linking genes to disease through pathway perturbations and functional screens. #### Data Sources: **CRISPR Screens** – Genome-scale knockout screens – Cancer dependency and essentiality data **Project Score (Cancer Dependency Map)** – CRISPR-Cas9 fitness screens across cancer cell lines – Gene essentiality profiles **SLAPenrich** – Pathway enrichment analysis – Somatic mutation pathway impacts **PROGENy** – Pathway activity inference – Signaling pathway perturbations **Reactome** – Expert-curated pathway annotations – Biological pathway representations **Gene Signatures** – Expression-based signatures – Pathway activity patterns ### 5. RNA Expression Evidence from differential gene expression in disease vs. control tissues. #### Data Source: **Expression Atlas** – Differential expression data – Baseline expression across tissues/conditions – RNA-Seq and microarray studies – Log2 fold-change and p-values ### 6. Animal Models Evidence from in vivo studies showing phenotypes associated with gene perturbations. #### Data Source: **IMPC (International Mouse Phenotyping Consortium)** – Systematic mouse knockout phenotypes – Phenotype-disease mappings via ontologies – Standardized phenotyping procedures ### 7. Literature Evidence from text-mining of biomedical literature. #### Data Source: **Europe PMC** – Co-occurrence of genes and diseases in abstracts – Normalized citation counts – Weighted by publication type and recency ## Evidence Scoring Each evidence source has its own scoring methodology: ### Score Ranges – Most scores normalized to 0-1 range – Higher scores indicate stronger evidence – Scores are NOT confidence levels but relative strength indicators ### Common Scoring Approaches: **Binary Classifications:** – ClinVar: Pathogenic (1.0), Likely pathogenic (0.99), etc. – Gene2Phenotype: Confirmed/probable ratings – PanelApp: Green/amber/red classifications **Statistical Measures:** – GWAS: L2G scores incorporating multiple lines of evidence – Gene Burden: Statistical significance of variant aggregation – Expression: Adjusted p-values and fold-changes **Clinical Precedence:** – Known Drugs: Phase weights (Phase 4 = 1.0, Phase 3 = 0.8, etc.) – Clinical status modifiers **Computational Predictions:** – IntOGen: Q-values from driver mutation analysis – PROGENy/SLAPenrich: Pathway activity/enrichment scores ## Evidence Interpretation Guidelines ### Strengths by Data Type **Genetic Association** – Strongest human genetic evidence – Direct link between genetic variation and disease – Mendelian diseases: high confidence – GWAS: requires L2G to identify causal gene – Consider ancestry and population-specific effects **Somatic Mutations** – Direct evidence in cancer – Strong for oncology indications – Driver mutations indicate therapeutic potential – Consider cancer type specificity **Known Drugs** – Clinical validation – Highest confidence: approved drugs (Phase 4) – Consider mechanism relevance to new indication – Phase 1-2: early evidence, higher risk **Affected Pathways** – Mechanistic insights – Supports biological plausibility – May not predict clinical success – Useful for hypothesis generation **RNA Expression** – Observational evidence – Correlation, not causation – May reflect disease consequence vs. cause – Useful for biomarker identification **Animal Models** – Translational evidence – Strong for understanding biology – Variable translation to human disease – Most useful when phenotype matches human disease **Literature** – Exploratory signal – Text-mining captures research focus – May reflect publication bias – Requires manual literature review for validation ### Important Considerations 1. **Multiple evidence types strengthen confidence** – Convergent evidence from different data types provides stronger support 2. **Under-studied diseases score lower** – Novel or rare diseases may have strong evidence but lower aggregate scores due to limited research 3. **Association scores are not probabilities** – Scores rank relative evidence strength, not success probability 4. **Context matters** – Evidence strength depends on: – Disease mechanism understanding – Target biology and druggability – Clinical precedence in related indications – Safety considerations 5. **Data source reliability varies** – Weight expert-curated sources (ClinGen, Gene2Phenotype) higher than computational predictions ## Using Evidence in Queries ### Filtering by Data Type “`python query = """ query evidenceByType($ensemblId: String!, $efoId: String!, $dataTypes: [String!]) { disease(efoId: $efoId) { evi