Skill

SkillsResearch & Science › Bioinformatics & life science

medchem

Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, complexity metrics, for compound prioritization and library filtering.

Freerisk: low
medchempythonchemblpandasrdkit

Tools: medchem,datamol,pandas,rdkit,tqdm

The full skill

— name: medchem description: Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, complexity metrics, for compound prioritization and library filtering. license: Apache-2.0 license metadata: skill-author: K-Dense Inc. — # Medchem ## Overview Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specific—use as guidelines combined with domain expertise. ## When to Use This Skill This skill should be used when: – Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries – Filtering molecules by structural alerts or PAINS patterns – Prioritizing compounds for lead optimization – Assessing compound quality and medicinal chemistry properties – Detecting reactive or problematic functional groups – Calculating molecular complexity metrics ## Installation “`bash uv pip install medchem “` ## Core Capabilities ### 1. Medicinal Chemistry Rules Apply established drug-likeness rules to molecules using the `medchem.rules` module. **Available Rules:** – Rule of Five (Lipinski) – Rule of Oprea – Rule of CNS – Rule of leadlike (soft and strict) – Rule of three – Rule of Reos – Rule of drug – Rule of Veber – Golden triangle – PAINS filters **Single Rule Application:** “`python import medchem as mc # Apply Rule of Five to a SMILES string smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin passes = mc.rules.basic_rules.rule_of_five(smiles) # Returns: True # Check specific rules passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles) passes_cns = mc.rules.basic_rules.rule_of_cns(smiles) “` **Multiple Rules with RuleFilters:** “`python import datamol as dm import medchem as mc # Load molecules mols = [dm.to_mol(smiles) for smiles in smiles_list] # Create filter with multiple rules rfilter = mc.rules.RuleFilters( rule_list=[ "rule_of_five", "rule_of_oprea", "rule_of_cns", "rule_of_leadlike_soft" ] ) # Apply filters with parallelization results = rfilter( mols=mols, n_jobs=-1, # Use all CPU cores progress=True ) “` **Result Format:** Results are returned as dictionaries with pass/fail status and detailed information for each rule. ### 2. Structural Alert Filters Detect potentially problematic structural patterns using the `medchem.structural` module. **Available Filters:** 1. **Common Alerts** – General structural alerts derived from ChEMBL curation and literature 2. **NIBR Filters** – Novartis Institutes for BioMedical Research filter set 3. **Lilly Demerits** – Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits) **Common Alerts:** “`python import medchem as mc # Create filter alert_filter = mc.structural.CommonAlertsFilters() # Check single molecule mol = dm.to_mol("c1ccccc1") has_alerts, details = alert_filter.check_mol(mol) # Batch filtering with parallelization results = alert_filter( mols=mol_list, n_jobs=-1, progress=True ) “` **NIBR Filters:** “`python import medchem as mc # Apply NIBR filters nibr_filter = mc.structural.NIBRFilters() results = nibr_filter(mols=mol_list, n_jobs=-1) “` **Lilly Demerits:** “`python import medchem as mc # Calculate Lilly demerits lilly = mc.structural.LillyDemeritsFilters() results = lilly(mols=mol_list, n_jobs=-1) # Each result includes demerit score and whether it passes (≤100 demerits) “` ### 3. Functional API for High-Level Operations The `medchem.functional` module provides convenient functions for common workflows. **Quick Filtering:** “`python import medchem as mc # Apply NIBR filters to a list filter_ok = mc.functional.nibr_filter( mols=mol_list, n_jobs=-1 ) # Apply common alerts alert_results = mc.functional.common_alerts_filter( mols=mol_list, n_jobs=-1 ) “` ### 4. Chemical Groups Detection Identify specific chemical groups and functional groups using `medchem.groups`. **Available Groups:** – Hinge binders – Phosphate binders – Michael acceptors – Reactive groups – Custom SMARTS patterns **Usage:** “`python import medchem as mc # Create group detector group = mc.groups.ChemicalGroup(groups=["hinge_bindersf:T3038,# Medchem API Reference Comprehensive reference for all medchem modules and functions. ## Module: medchem.rules ### Class: RuleFilters Filter molecules based on multiple medicinal chemistry rules. **Constructor:** “`python RuleFilters(rule_list: List[str]) “` **Parameters:** – `rule_list`: List of rule names to apply. See available rules below. **Methods:** “`python __call__(mols: List[Chem.Mol], n_jobs: int = 1, progress: bool = False) -> Dict “` – `mols`: List of RDKit molecule objects – `n_jobs`: Number of parallel jobs (-1 uses all cores) – `progress`: Show progress bar – **Returns**: Dictionary with results for each rule **Example:** “`python rfilter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_cns:T3d50,#!/usr/bin/env python3 """ Batch molecular filtering using medchem library. This script provides a production-ready workflow for filtering compound libraries using medchem rules, structural alerts, and custom constraints. Usage: python filter_molecules.py input.csv –rules rule_of_five,rule_of_cns –alerts nibr –output filtered.csv python filter_molecules.py input.sdf –rules rule_of_drug –lilly –complexity 400 –output results.csv python filter_molecules.py smiles.txt –nibr –pains –n-jobs -1 –output clean.csv """ import argparse import sys from pathlib import Path from typing import List, Dict, Optional, Tuple import json try: import pandas as pd import datamol as dm import medchem as mc from rdkit import Chem from tqdm import tqdm except ImportError as e: print(f"Error: Missing required package: {e}") print("Install dependencies: pip install medchem datamol pandas tqdm") sys.exit(1) def load_molecules(input_file: Path, smiles_column: str = "smiles") -> Tuple[pd.DataFrame, List[Chem.Mol]]: """ Load molecules from various file formats. Supports: – CSV/TSV with SMILES column – SDF files – Plain text files with one SMILES per line Returns: Tuple of (DataFrame with metadata, list of RDKit molecules) """ suffix = input_file.suffix.lower() if suffix == ".sdf": print(f"Loading SDF file: {input_file}") supplier = Chem.SDMolSupplier(str(input_file)) mols = [mol for mol in supplier if mol is not None] # Create DataFrame from SDF properties data = [] for mol in mols: props = mol.GetPropsAsDict() props["smiles"] = Chem.MolToSmiles(mol) data.append(props) df = pd.DataFrame(data) elif suffix in [".csv", ".tsv"]: print(f"Loading CSV/TSV file: {input_file}") sep = "\t" if suffix == ".tsv" else "," df = pd.read_csv(input_file, sep=sep) if smiles_column not in df.columns: print(f"Error: Column '{smiles_column}' not found in file") print(f"Available columns: {', '.join(df.columns)}") sys.exit(1) print(f"Converting SMILES to molecules…") mols = [dm.to_mol(smi) for smi in tqdm(df[smiles_column], desc="Parsing")] elif suffix == ".txt": print(f"Loading text file: {input_file}") with open(input_file) as f: smiles_list = [line.strip() for line in f if line.strip()] df = pd.DataFrame({"smiles": smiles_list}) print(f"Converting SMILES to molecules…") mols = [dm.to_mol(smi) for smi in tqdm(smiles_list, desc="Parsing")] else: print(f"Error: Unsupported file format: {suffix}") print("Supported formats: .csv, .tsv, .sdf, .txt") sys.exit(1) # Filter out invalid molecules valid_indices = [i for i, mol in enumerate(mols) if mol is not None] if len(valid_indices) < len(mols): n_invalid = len(mols) – len(valid_indices) print(f"Warning: {n_invalid} invalid molecules removed") df = df.iloc[valid_indices].reset_index(drop=True) mols = [mols[i] for i in valid_indices] print(f"Loaded {len(mols)} valid molecules") return df, mols def apply_rule_filters(mols: List[Chem.Mol], rules: List[str], n_jobs: int) -> pd.DataFrame: """Apply medicinal chemistry rule filters.""" print(f"\nApplying rule filters: {', '.join(rules)}") rfilter = mc.rules.RuleFilters(rule_list=rules) results = rfilter(mols=mols, n_jobs=n_jobs, progress=True) # Convert to DataFrame df_results = pd.DataFrame(results) # Add summary column df_results["passes_all_rules"] = df_results.all(axis=1) return df_results def apply_structural_alerts(mols: List[Chem.Mol], alert_type: str, n_jobs: int) -> pd.DataFrame: """Apply structural alert filters.""" print(f"\nApplying {alert_type} structural alerts…") if alert_type == "common": alert_filter = mc.structural.CommonAlertsFilters() results = alert_filter(mols=mols, n_jobs=n_jobs, progress=True) df_results = pd.DataFrame({ "has_common_alerts": [r["has_alerts"] for r in results], "num_common_alerts": [r["num_alerts"] for r in results], "common_alert_details": [", ".join(r["alert_details:T30e0,# Medchem Rules and Filters Catalog Comprehensive catalog of all available medicinal chemistry rules, structural alerts, and filters in medchem. ## Table of Contents 1. [Drug-Likeness Rules](#drug-likeness-rules) 2. [Lead-Likeness Rules](#lead-likeness-rules) 3. [Fragment Rules](#fragment-rules) 4. [CNS Rules](#cns-rules) 5. [Structural Alert Filters](#structural-alert-filters) 6. [Chemical Group Patterns](#chemical-group-patterns) — ## Drug-Likeness Rules ### Rule of Five (Lipinski) **Reference:** Lipinski et al., Adv Drug Deliv Rev (1997) 23:3-25 **Purpose:** Predict oral bioavailability **Criteria:** – Molecular Weight ≤ 500 Da – LogP ≤ 5 – Hydrogen Bond Donors ≤ 5 – Hydrogen Bond Acceptors ≤ 10 **Usage:** “`python mc.rules.basic_rules.rule_of_five(mol) “` **Notes:** – One of the most widely used filters in drug discovery – About 90% of orally active drugs comply with these rules – Exceptions exist, especially for natural products and antibiotics — ### Rule of Veber **Reference:** Veber et al., J Med Chem (2002) 45:2615-2623 **Purpose:** Additional criteria for oral bioavailability **Criteria:** – Rotatable Bonds ≤ 10 – Topological Polar Surface Area (TPSA) ≤ 140 Ų **Usage:** “`python mc.rules.basic_rules.rule_of_veber(mol) “` **Notes:** – Complements Rule of Five – TPSA correlates with cell permeability – Rotatable bonds affect molecular flexibility —