Skill

SkillsResearch & Science › Bioinformatics & life science

Pdb Database

"Access RCSB PDB for 3D protein/nucleic acid structures. Search by text/sequence/structure, download coordinates (PDB/mmCIF), retrieve metadata, for structural biology and drug discovery."

Freerisk: low
pdbdatabasepythongraphql

Tools: requests

The full skill

— name: pdb-database description: "Access RCSB PDB for 3D protein/nucleic acid structures. Search by text/sequence/structure, download coordinates (PDB/mmCIF), retrieve metadata, for structural biology and drug discovery." — # PDB Database ## Overview RCSB PDB is the worldwide repository for 3D structural data of biological macromolecules. Search for structures, retrieve coordinates and metadata, perform sequence and structure similarity searches across 200,000+ experimentally determined structures and computed models. ## When to Use This Skill This skill should be used when: – Searching for protein or nucleic acid 3D structures by text, sequence, or structural similarity – Downloading coordinate files in PDB, mmCIF, or BinaryCIF formats – Retrieving structural metadata, experimental methods, or quality metrics – Performing batch operations across multiple structures – Integrating PDB data into computational workflows for drug discovery, protein engineering, or structural biology research ## Core Capabilities ### 1. Searching for Structures Find PDB entries using various search criteria: **Text Search:** Search by protein name, keywords, or descriptions “`python from rcsbapi.search import TextQuery query = TextQuery("hemoglobin") results = list(query()) print(f"Found {len(results)} structures") “` **Attribute Search:** Query specific properties (organism, resolution, method, etc.) “`python from rcsbapi.search import AttributeQuery from rcsbapi.search.attrs import rcsb_entity_source_organism # Find human protein structures query = AttributeQuery( attribute=rcsb_entity_source_organism.scientific_name, operator="exact_match", value="Homo sapiens" ) results = list(query()) “` **Sequence Similarity:** Find structures similar to a given sequence “`python from rcsbapi.search import SequenceQuery query = SequenceQuery( value="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEKMSKDGKKKKKKSKTKCVIM", evalue_cutoff=0.1, identity_cutoff=0.9 ) results = list(query()) “` **Structure Similarity:** Find structures with similar 3D geometry “`python from rcsbapi.search import StructSimilarityQuery query = StructSimilarityQuery( structure_search_type="entry", entry_id="4HHB" # Hemoglobin ) results = list(query()) “` **Combining Queries:** Use logical operators to build complex searches “`python from rcsbapi.search import TextQuery, AttributeQuery from rcsbapi.search.attrs import rcsb_entry_info # High-resolution human proteins query1 = AttributeQuery( attribute=rcsb_entity_source_organism.scientific_name, operator="exact_match", value="Homo sapiens" ) query2 = AttributeQuery( attribute=rcsb_entry_info.resolution_combined, operator="less", value=2.0 ) combined_query = query1 & query2 # AND operation results = list(combined_query()) “` ### 2. Retrieving Structure Data Access detailed information about specific PDB entries: **Basic Entry Information:** “`python from rcsbapi.data import Schema, fetch # Get entry-level data entry_data = fetch("4HHB", schema=Schema.ENTRY) print(entry_data["struct"]["title:T3b8c,# RCSB PDB API Reference This document provides detailed information about the RCSB Protein Data Bank APIs, including advanced usage patterns, data schemas, and best practices. ## API Overview RCSB PDB provides multiple programmatic interfaces: 1. **Data API** – Retrieve PDB data when you have an identifier 2. **Search API** – Find identifiers matching specific search criteria 3. **ModelServer API** – Access macromolecular model subsets 4. **VolumeServer API** – Retrieve volumetric data subsets 5. **Sequence Coordinates API** – Obtain alignments between structural and sequence databases 6. **Alignment API** – Perform structure alignment computations ## Data API ### Core Data Objects The Data API organizes information hierarchically: – **core_entry**: PDB entries or Computed Structure Models (CSM IDs start with AF_ or MA_) – **core_polymer_entity**: Protein, DNA, and RNA entities – **core_nonpolymer_entity**: Ligands, cofactors, ions – **core_branched_entity**: Oligosaccharides – **core_assembly**: Biological assemblies – **core_polymer_entity_instance**: Individual chains – **core_chem_comp**: Chemical components ### REST API Endpoints Base URL: `https://data.rcsb.org/rest/v1/` **Entry Data:** “` GET https://data.rcsb.org/rest/v1/core/entry/{entry_id} “` **Polymer Entity:** “` GET https://data.rcsb.org/rest/v1/core/polymer_entity/{entry_id}_{entity_id} “` **Assembly:** “` GET https://data.rcsb.org/rest/v1/core/assembly/{entry_id}/{assembly_id} “` **Examples:** “`bash # Get entry data for hemoglobin curl https://data.rcsb.org/rest/v1/core/entry/4HHB # Get first polymer entity curl https://data.rcsb.org/rest/v1/core/polymer_entity/4HHB_1 # Get biological assembly 1 curl https://data.rcsb.org/rest/v1/core/assembly/4HHB/1 “` ### GraphQL API Endpoint: `https://data.rcsb.org/graphql` The GraphQL API enables flexible data retrieval, allowing you to grab any piece of data from any level of the hierarchy in a single query. **Example Query:** “`graphql { entry(entry_id: "4HHB") { struct { title } exptl { method } rcsb_entry_info { resolution_combined deposited_atom_count polymer_entity_count } rcsb_accession_info { deposit_date initial_release_date } } } “` **Python Example:** “`python import requests query = """ { polymer_entity(entity_id: "4HHB_1") { rcsb_polymer_entity { pdbx_description formula_weight } entity_poly { pdbx_seq_one_letter_code pdbx_strand_id } rcsb_entity_source_organism { ncbi_taxonomy_id scientific_name } } } """ response = requests.post( "https://data.rcsb.org/graphql", json={"query": query} ) data = response.json() “` ### Common Data Fields **Entry Level:** – `struct.title` – Structure title/description – `exptl[].method` – Experimental method (X-RAY DIFFRACTION, NMR, ELECTRON MICROSCOPY, etc.) – `rcsb_entry_info.resolution_combined` – Resolution in Ångströms – `rcsb_entry_info.deposited_atom_count` – Total number of atoms – `rcsb_accession_info.deposit_date` – Deposition date – `rcsb_accession_info.initial_release_date` – Release date **Polymer Entity Level:** – `entity_poly.pdbx_seq_one_letter_code` – Primary sequence – `rcsb_polymer_entity.formula_weight` – Molecular weight – `rcsb_entity_source_organism.scientific_name` – Source organism – `rcsb_entity_source_organism.ncbi_taxonomy_id` – NCBI taxonomy ID **Assembly Level:** – `rcsb_assembly_info.polymer_entity_count` – Number of polymer entities – `rcsb_assembly_info.assembly_id` – Assembly identifier ## Search API ### Query Types The Search API supports seven primary query types: 1. **TextQuery** – Full-text search 2. **AttributeQuery** – Property-based search 3. **SequenceQuery** – Sequence similarity search 4. **SequenceMotifQuery** – Motif pattern search 5. **StructSimilarityQuery** – 3D structure similarity 6. **StructMotifQuery** – Structural motif search 7. **ChemSimilarityQuery** – Chemical similarity search ### AttributeQuery Operators Available operators for AttributeQuery: – `exact_match` – Exact string match – `contains_words` – Contains all words – `contains_phrase` – Contains exact phrase – `equals` – Numerical equality – `greater` – Greater than (numerical) – `greater_or_equal` – Greater than or equal – `less` – Less than (numerical) – `less_or_equal` – Less than or equal – `range` – Numerical range (closed interval) – `exists` – Field has a value – `in` – Value in list ### Common Searchable Attributes **Resolution and Quality:** “`python from rcsbapi.search import AttributeQuery from rcsbapi.search.attrs import rcsb_entry_info # High-resolution structures query = AttributeQuery( attribute=rcsb_entry_info.resolution_combined, operator="less", value=2.0 ) “` **Experimental Method:** “`python from rcsbapi.search.attrs import exptl query = AttributeQuery( attribute=exptl.method, operator="exact_match", value="X-RAY DIFFRACTION" ) “` **Organism:** “`python from rcsbapi.search.attrs import rcsb_entity_source_organism query = AttributeQuery( attribute=rcsb_entity_source_organism.scientific_name, operator="exact_match", value="Homo sapiens" ) “` **Molecular Weight:** “`python from rcsbapi.search.attrs import rcsb_polymer_entity query = AttributeQuery( attribute=rcsb_polymer_entity.formula_weight, operator="range", value=(10000, 50000) # 10-50 kDa ) “` **Release Date:** “`python from rcsbapi.search.attrs import rcsb_accession_info # Structures released in 2024 query = AttributeQuery( attribute=rcsb_accession_info.initial_release_date, operator="range", value=("2024-01-01", "2024-12-31") ) “` ### Sequence Similarity Search Search for structures with similar sequences using MMseqs2: “`python from rcsbapi.search import SequenceQuery # Basic sequence search qu