Skills › Research & Science › Bioinformatics & life science
Pdb Database
"Access RCSB PDB for 3D protein/nucleic acid structures. Search by text/sequence/structure, download coordinates (PDB/mmCIF), retrieve metadata, for structural biology and drug discovery."
Tools: requests
The full skill
—
name: pdb-database
description: "Access RCSB PDB for 3D protein/nucleic acid structures. Search by text/sequence/structure, download coordinates (PDB/mmCIF), retrieve metadata, for structural biology and drug discovery."
—
# PDB Database
## Overview
RCSB PDB is the worldwide repository for 3D structural data of biological macromolecules. Search for structures, retrieve coordinates and metadata, perform sequence and structure similarity searches across 200,000+ experimentally determined structures and computed models.
## When to Use This Skill
This skill should be used when:
– Searching for protein or nucleic acid 3D structures by text, sequence, or structural similarity
– Downloading coordinate files in PDB, mmCIF, or BinaryCIF formats
– Retrieving structural metadata, experimental methods, or quality metrics
– Performing batch operations across multiple structures
– Integrating PDB data into computational workflows for drug discovery, protein engineering, or structural biology research
## Core Capabilities
### 1. Searching for Structures
Find PDB entries using various search criteria:
**Text Search:** Search by protein name, keywords, or descriptions
“`python
from rcsbapi.search import TextQuery
query = TextQuery("hemoglobin")
results = list(query())
print(f"Found {len(results)} structures")
“`
**Attribute Search:** Query specific properties (organism, resolution, method, etc.)
“`python
from rcsbapi.search import AttributeQuery
from rcsbapi.search.attrs import rcsb_entity_source_organism
# Find human protein structures
query = AttributeQuery(
attribute=rcsb_entity_source_organism.scientific_name,
operator="exact_match",
value="Homo sapiens"
)
results = list(query())
“`
**Sequence Similarity:** Find structures similar to a given sequence
“`python
from rcsbapi.search import SequenceQuery
query = SequenceQuery(
value="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEKMSKDGKKKKKKSKTKCVIM",
evalue_cutoff=0.1,
identity_cutoff=0.9
)
results = list(query())
“`
**Structure Similarity:** Find structures with similar 3D geometry
“`python
from rcsbapi.search import StructSimilarityQuery
query = StructSimilarityQuery(
structure_search_type="entry",
entry_id="4HHB" # Hemoglobin
)
results = list(query())
“`
**Combining Queries:** Use logical operators to build complex searches
“`python
from rcsbapi.search import TextQuery, AttributeQuery
from rcsbapi.search.attrs import rcsb_entry_info
# High-resolution human proteins
query1 = AttributeQuery(
attribute=rcsb_entity_source_organism.scientific_name,
operator="exact_match",
value="Homo sapiens"
)
query2 = AttributeQuery(
attribute=rcsb_entry_info.resolution_combined,
operator="less",
value=2.0
)
combined_query = query1 & query2 # AND operation
results = list(combined_query())
“`
### 2. Retrieving Structure Data
Access detailed information about specific PDB entries:
**Basic Entry Information:**
“`python
from rcsbapi.data import Schema, fetch
# Get entry-level data
entry_data = fetch("4HHB", schema=Schema.ENTRY)
print(entry_data["struct"]["title:T3b8c,# RCSB PDB API Reference
This document provides detailed information about the RCSB Protein Data Bank APIs, including advanced usage patterns, data schemas, and best practices.
## API Overview
RCSB PDB provides multiple programmatic interfaces:
1. **Data API** – Retrieve PDB data when you have an identifier
2. **Search API** – Find identifiers matching specific search criteria
3. **ModelServer API** – Access macromolecular model subsets
4. **VolumeServer API** – Retrieve volumetric data subsets
5. **Sequence Coordinates API** – Obtain alignments between structural and sequence databases
6. **Alignment API** – Perform structure alignment computations
## Data API
### Core Data Objects
The Data API organizes information hierarchically:
– **core_entry**: PDB entries or Computed Structure Models (CSM IDs start with AF_ or MA_)
– **core_polymer_entity**: Protein, DNA, and RNA entities
– **core_nonpolymer_entity**: Ligands, cofactors, ions
– **core_branched_entity**: Oligosaccharides
– **core_assembly**: Biological assemblies
– **core_polymer_entity_instance**: Individual chains
– **core_chem_comp**: Chemical components
### REST API Endpoints
Base URL: `https://data.rcsb.org/rest/v1/`
**Entry Data:**
“`
GET https://data.rcsb.org/rest/v1/core/entry/{entry_id}
“`
**Polymer Entity:**
“`
GET https://data.rcsb.org/rest/v1/core/polymer_entity/{entry_id}_{entity_id}
“`
**Assembly:**
“`
GET https://data.rcsb.org/rest/v1/core/assembly/{entry_id}/{assembly_id}
“`
**Examples:**
“`bash
# Get entry data for hemoglobin
curl https://data.rcsb.org/rest/v1/core/entry/4HHB
# Get first polymer entity
curl https://data.rcsb.org/rest/v1/core/polymer_entity/4HHB_1
# Get biological assembly 1
curl https://data.rcsb.org/rest/v1/core/assembly/4HHB/1
“`
### GraphQL API
Endpoint: `https://data.rcsb.org/graphql`
The GraphQL API enables flexible data retrieval, allowing you to grab any piece of data from any level of the hierarchy in a single query.
**Example Query:**
“`graphql
{
entry(entry_id: "4HHB") {
struct {
title
}
exptl {
method
}
rcsb_entry_info {
resolution_combined
deposited_atom_count
polymer_entity_count
}
rcsb_accession_info {
deposit_date
initial_release_date
}
}
}
“`
**Python Example:**
“`python
import requests
query = """
{
polymer_entity(entity_id: "4HHB_1") {
rcsb_polymer_entity {
pdbx_description
formula_weight
}
entity_poly {
pdbx_seq_one_letter_code
pdbx_strand_id
}
rcsb_entity_source_organism {
ncbi_taxonomy_id
scientific_name
}
}
}
"""
response = requests.post(
"https://data.rcsb.org/graphql",
json={"query": query}
)
data = response.json()
“`
### Common Data Fields
**Entry Level:**
– `struct.title` – Structure title/description
– `exptl[].method` – Experimental method (X-RAY DIFFRACTION, NMR, ELECTRON MICROSCOPY, etc.)
– `rcsb_entry_info.resolution_combined` – Resolution in Ã
ngströms
– `rcsb_entry_info.deposited_atom_count` – Total number of atoms
– `rcsb_accession_info.deposit_date` – Deposition date
– `rcsb_accession_info.initial_release_date` – Release date
**Polymer Entity Level:**
– `entity_poly.pdbx_seq_one_letter_code` – Primary sequence
– `rcsb_polymer_entity.formula_weight` – Molecular weight
– `rcsb_entity_source_organism.scientific_name` – Source organism
– `rcsb_entity_source_organism.ncbi_taxonomy_id` – NCBI taxonomy ID
**Assembly Level:**
– `rcsb_assembly_info.polymer_entity_count` – Number of polymer entities
– `rcsb_assembly_info.assembly_id` – Assembly identifier
## Search API
### Query Types
The Search API supports seven primary query types:
1. **TextQuery** – Full-text search
2. **AttributeQuery** – Property-based search
3. **SequenceQuery** – Sequence similarity search
4. **SequenceMotifQuery** – Motif pattern search
5. **StructSimilarityQuery** – 3D structure similarity
6. **StructMotifQuery** – Structural motif search
7. **ChemSimilarityQuery** – Chemical similarity search
### AttributeQuery Operators
Available operators for AttributeQuery:
– `exact_match` – Exact string match
– `contains_words` – Contains all words
– `contains_phrase` – Contains exact phrase
– `equals` – Numerical equality
– `greater` – Greater than (numerical)
– `greater_or_equal` – Greater than or equal
– `less` – Less than (numerical)
– `less_or_equal` – Less than or equal
– `range` – Numerical range (closed interval)
– `exists` – Field has a value
– `in` – Value in list
### Common Searchable Attributes
**Resolution and Quality:**
“`python
from rcsbapi.search import AttributeQuery
from rcsbapi.search.attrs import rcsb_entry_info
# High-resolution structures
query = AttributeQuery(
attribute=rcsb_entry_info.resolution_combined,
operator="less",
value=2.0
)
“`
**Experimental Method:**
“`python
from rcsbapi.search.attrs import exptl
query = AttributeQuery(
attribute=exptl.method,
operator="exact_match",
value="X-RAY DIFFRACTION"
)
“`
**Organism:**
“`python
from rcsbapi.search.attrs import rcsb_entity_source_organism
query = AttributeQuery(
attribute=rcsb_entity_source_organism.scientific_name,
operator="exact_match",
value="Homo sapiens"
)
“`
**Molecular Weight:**
“`python
from rcsbapi.search.attrs import rcsb_polymer_entity
query = AttributeQuery(
attribute=rcsb_polymer_entity.formula_weight,
operator="range",
value=(10000, 50000) # 10-50 kDa
)
“`
**Release Date:**
“`python
from rcsbapi.search.attrs import rcsb_accession_info
# Structures released in 2024
query = AttributeQuery(
attribute=rcsb_accession_info.initial_release_date,
operator="range",
value=("2024-01-01", "2024-12-31")
)
“`
### Sequence Similarity Search
Search for structures with similar sequences using MMseqs2:
“`python
from rcsbapi.search import SequenceQuery
# Basic sequence search
qu