Skills › Research & Science › Bioinformatics & life science
Opentargets Database
"Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification."
Tools: requests
The full skill
—
name: opentargets-database
description: "Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification."
—
# Open Targets Database
## Overview
The Open Targets Platform is a comprehensive resource for systematic identification and prioritization of potential therapeutic drug targets. It integrates publicly available datasets including human genetics, omics, literature, and chemical data to build and score target-disease associations.
**Key capabilities:**
– Query target (gene) annotations including tractability, safety, expression
– Search for disease-target associations with evidence scores
– Retrieve evidence from multiple data types (genetics, pathways, literature, etc.)
– Find known drugs for diseases and their mechanisms
– Access drug information including clinical trial phases and adverse events
– Evaluate target druggability and therapeutic potential
**Data access:** The platform provides a GraphQL API, web interface, data downloads, and Google BigQuery access. This skill focuses on the GraphQL API for programmatic access.
## When to Use This Skill
This skill should be used when:
– **Target discovery:** Finding potential therapeutic targets for a disease
– **Target assessment:** Evaluating tractability, safety, and druggability of genes
– **Evidence gathering:** Retrieving supporting evidence for target-disease associations
– **Drug repurposing:** Identifying existing drugs that could be repurposed for new indications
– **Competitive intelligence:** Understanding clinical precedence and drug development landscape
– **Target prioritization:** Ranking targets based on genetic evidence and other data types
– **Mechanism research:** Investigating biological pathways and gene functions
– **Biomarker discovery:** Finding genes differentially expressed in disease
– **Safety assessment:** Identifying potential toxicity concerns for drug targets
## Core Workflow
### 1. Search for Entities
Start by finding the identifiers for targets, diseases, or drugs of interest.
**For targets (genes):**
“`python
from scripts.query_opentargets import search_entities
# Search by gene symbol or name
results = search_entities("BRCA1", entity_types=["targetd:T18bc,# Open Targets Platform API Reference
## API Endpoint
“`
https://api.platform.opentargets.org/api/v4/graphql
“`
Interactive GraphQL playground with documentation:
“`
https://api.platform.opentargets.org/api/v4/graphql/browser
“`
## Access Methods
The Open Targets Platform provides multiple access methods:
1. **GraphQL API** – Best for single entity queries and flexible data retrieval
2. **Web Interface** – Interactive platform at https://platform.opentargets.org
3. **Data Downloads** – FTP at https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/
4. **Google BigQuery** – For large-scale systematic queries
## Authentication
No authentication is required for the GraphQL API. All data is freely accessible.
## Rate Limits
For systematic queries involving multiple targets or diseases, use dataset downloads or BigQuery instead of repeated API calls. The API is optimized for single-entity and exploratory queries.
## GraphQL Query Structure
GraphQL queries consist of:
1. Query operation with optional variables
2. Field selection (request only needed fields)
3. Nested entity traversal
### Basic Python Example
“`python
import requests
import json
# Define the query
query_string = """
query target($ensemblId: String!){
target(ensemblId: $ensemblId){
id
approvedSymbol
biotype
geneticConstraint {
constraintType
exp
obs
score
}
}
}
"""
# Define variables
variables = {"ensemblId": "ENSG00000169083"}
# Make the request
base_url = "https://api.platform.opentargets.org/api/v4/graphql"
response = requests.post(base_url, json={"query": query_string, "variables": variables})
data = json.loads(response.text)
print(data)
“`
## Available Query Endpoints
### /target
Retrieve gene annotations, tractability assessments, and disease associations.
**Common fields:**
– `id` – Ensembl gene ID
– `approvedSymbol` – HGNC gene symbol
– `approvedName` – Full gene name
– `biotype` – Gene type (protein_coding, etc.)
– `tractability` – Druggability assessment
– `safetyLiabilities` – Safety information
– `expressions` – Baseline expression data
– `knownDrugs` – Approved/clinical drugs
– `associatedDiseases` – Disease associations with evidence
### /disease
Retrieve disease/phenotype data, known drugs, and clinical information.
**Common fields:**
– `id` – EFO disease identifier
– `name` – Disease name
– `description` – Disease description
– `therapeuticAreas` – High-level disease categories
– `synonyms` – Alternative names
– `knownDrugs` – Drugs indicated for disease
– `associatedTargets` – Target associations with evidence
### /drug
Retrieve compound details, mechanisms of action, and pharmacovigilance data.
**Common fields:**
– `id` – ChEMBL identifier
– `name` – Drug name
– `drugType` – Small molecule, antibody, etc.
– `maximumClinicalTrialPhase` – Development stage
– `indications` – Disease indications
– `mechanismsOfAction` – Target mechanisms
– `adverseEvents` – Pharmacovigilance data
### /search
Search across all entities (targets, diseases, drugs).
**Parameters:**
– `queryString` – Search term
– `entityNames` – Filter by entity type(s)
– `page` – Pagination
### /associationDiseaseIndirect
Retrieve target-disease associations including indirect evidence from disease descendants in ontology.
**Key fields:**
– `rows` – Association records with scores
– `aggregations` – Aggregated statistics
## Example Queries
### Query 1: Get target information with disease associations
“`python
query = """
query targetInfo($ensemblId: String!) {
target(ensemblId: $ensemblId) {
approvedSymbol
approvedName
tractability {
label
modality
value
}
associatedDiseases(page: {size: 10}) {
rows {
disease {
name
}
score
datatypeScores {
componentId
score
}
}
}
}
}
"""
variables = {"ensemblId": "ENSG00000157764"}
“`
### Query 2: Search for diseases
“`python
query = """
query searchDiseases($queryString: String!) {
search(queryString: $queryString, entityNames: ["diseasee:T223d,# Evidence Types and Data Sources
## Overview
Evidence represents any event or set of events that identifies a target as a potential causal gene or protein for a disease. Evidence is standardized and mapped to:
– **Ensembl gene IDs** for targets
– **EFO (Experimental Factor Ontology)** for diseases/phenotypes
Evidence is organized into **data types** (broader categories) and **data sources** (specific databases/studies).
## Evidence Data Types
### 1. Genetic Association
Evidence from human genetics linking genetic variants to disease phenotypes.
#### Data Sources:
**GWAS (Genome-Wide Association Studies)**
– Population-level common variant associations
– Filtered with Locus-to-Gene (L2G) scores >0.05
– Includes fine-mapping and colocalization data
– Sources: GWAS Catalog, FinnGen, UK Biobank, EBI GWAS
**Gene Burden Tests**
– Rare variant association analyses
– Aggregate effects of multiple rare variants in a gene
– Particularly relevant for Mendelian and rare diseases
**ClinVar Germline**
– Clinical variant interpretations
– Classifications: pathogenic, likely pathogenic, VUS, benign
– Expert-reviewed variant-disease associations
**Genomics England PanelApp**
– Expert gene-disease ratings
– Green (confirmed), amber (probable), red (no evidence)
– Focus on rare diseases and cancer
**Gene2Phenotype**
– Curated gene-disease relationships
– Allelic requirements and inheritance patterns
– Clinical validity assessments
**UniProt Literature & Variants**
– Literature-based gene-disease associations
– Expert-curated from scientific publications
**Orphanet**
– Rare disease gene associations
– Expert-reviewed and maintained
**ClinGen**
– Clinical genome resource classifications
– Gene-disease validity assertions
### 2. Somatic Mutations
Evidence from cancer genomics identifying driver genes and therapeutic targets.
#### Data Sources:
**Cancer Gene Census**
– Expert-curated cancer genes
– Tier classifications (1 = strong evidence, 2 = emerging)
– Mutation types and cancer types
**IntOGen**
– Computational driver gene predictions
– Aggregated from large cohort studies
– Statistical significance of mutations
**ClinVar Somatic**
– Somatic clinical variant interpretations
– Oncogenic/likely oncogenic classifications
**Cancer Biomarkers**
– FDA/EMA approved biomarkers
– Clinical trial biomarkers
– Prognostic and predictive markers
### 3. Known Drugs
Evidence from clinical precedence showing drugs targeting genes for disease indications.
#### Data Source:
**ChEMBL**
– Approved drugs (Phase 4)
– Clinical candidates (Phase 1-3)
– Withdrawn drugs
– Drug-target-indication triplets with mechanism of action
**Clinical Trial Information:**
– `phase`: Maximum clinical trial phase (1, 2, 3, 4)
– `status`: Active, terminated, completed, withdrawn
– `mechanismOfAction`: How drug affects target
### 4. Affected Pathways
Evidence linking genes to disease through pathway perturbations and functional screens.
#### Data Sources:
**CRISPR Screens**
– Genome-scale knockout screens
– Cancer dependency and essentiality data
**Project Score (Cancer Dependency Map)**
– CRISPR-Cas9 fitness screens across cancer cell lines
– Gene essentiality profiles
**SLAPenrich**
– Pathway enrichment analysis
– Somatic mutation pathway impacts
**PROGENy**
– Pathway activity inference
– Signaling pathway perturbations
**Reactome**
– Expert-curated pathway annotations
– Biological pathway representations
**Gene Signatures**
– Expression-based signatures
– Pathway activity patterns
### 5. RNA Expression
Evidence from differential gene expression in disease vs. control tissues.
#### Data Source:
**Expression Atlas**
– Differential expression data
– Baseline expression across tissues/conditions
– RNA-Seq and microarray studies
– Log2 fold-change and p-values
### 6. Animal Models
Evidence from in vivo studies showing phenotypes associated with gene perturbations.
#### Data Source:
**IMPC (International Mouse Phenotyping Consortium)**
– Systematic mouse knockout phenotypes
– Phenotype-disease mappings via ontologies
– Standardized phenotyping procedures
### 7. Literature
Evidence from text-mining of biomedical literature.
#### Data Source:
**Europe PMC**
– Co-occurrence of genes and diseases in abstracts
– Normalized citation counts
– Weighted by publication type and recency
## Evidence Scoring
Each evidence source has its own scoring methodology:
### Score Ranges
– Most scores normalized to 0-1 range
– Higher scores indicate stronger evidence
– Scores are NOT confidence levels but relative strength indicators
### Common Scoring Approaches:
**Binary Classifications:**
– ClinVar: Pathogenic (1.0), Likely pathogenic (0.99), etc.
– Gene2Phenotype: Confirmed/probable ratings
– PanelApp: Green/amber/red classifications
**Statistical Measures:**
– GWAS: L2G scores incorporating multiple lines of evidence
– Gene Burden: Statistical significance of variant aggregation
– Expression: Adjusted p-values and fold-changes
**Clinical Precedence:**
– Known Drugs: Phase weights (Phase 4 = 1.0, Phase 3 = 0.8, etc.)
– Clinical status modifiers
**Computational Predictions:**
– IntOGen: Q-values from driver mutation analysis
– PROGENy/SLAPenrich: Pathway activity/enrichment scores
## Evidence Interpretation Guidelines
### Strengths by Data Type
**Genetic Association** – Strongest human genetic evidence
– Direct link between genetic variation and disease
– Mendelian diseases: high confidence
– GWAS: requires L2G to identify causal gene
– Consider ancestry and population-specific effects
**Somatic Mutations** – Direct evidence in cancer
– Strong for oncology indications
– Driver mutations indicate therapeutic potential
– Consider cancer type specificity
**Known Drugs** – Clinical validation
– Highest confidence: approved drugs (Phase 4)
– Consider mechanism relevance to new indication
– Phase 1-2: early evidence, higher risk
**Affected Pathways** – Mechanistic insights
– Supports biological plausibility
– May not predict clinical success
– Useful for hypothesis generation
**RNA Expression** – Observational evidence
– Correlation, not causation
– May reflect disease consequence vs. cause
– Useful for biomarker identification
**Animal Models** – Translational evidence
– Strong for understanding biology
– Variable translation to human disease
– Most useful when phenotype matches human disease
**Literature** – Exploratory signal
– Text-mining captures research focus
– May reflect publication bias
– Requires manual literature review for validation
### Important Considerations
1. **Multiple evidence types strengthen confidence** – Convergent evidence from different data types provides stronger support
2. **Under-studied diseases score lower** – Novel or rare diseases may have strong evidence but lower aggregate scores due to limited research
3. **Association scores are not probabilities** – Scores rank relative evidence strength, not success probability
4. **Context matters** – Evidence strength depends on:
– Disease mechanism understanding
– Target biology and druggability
– Clinical precedence in related indications
– Safety considerations
5. **Data source reliability varies** – Weight expert-curated sources (ClinGen, Gene2Phenotype) higher than computational predictions
## Using Evidence in Queries
### Filtering by Data Type
“`python
query = """
query evidenceByType($ensemblId: String!, $efoId: String!, $dataTypes: [String!]) {
disease(efoId: $efoId) {
evi