Skills › Research & Science › Bioinformatics & life science
medchem
Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, complexity metrics, for compound prioritization and library filtering.
Tools: medchem,datamol,pandas,rdkit,tqdm
The full skill
—
name: medchem
description: Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, complexity metrics, for compound prioritization and library filtering.
license: Apache-2.0 license
metadata:
skill-author: K-Dense Inc.
—
# Medchem
## Overview
Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specificâuse as guidelines combined with domain expertise.
## When to Use This Skill
This skill should be used when:
– Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries
– Filtering molecules by structural alerts or PAINS patterns
– Prioritizing compounds for lead optimization
– Assessing compound quality and medicinal chemistry properties
– Detecting reactive or problematic functional groups
– Calculating molecular complexity metrics
## Installation
“`bash
uv pip install medchem
“`
## Core Capabilities
### 1. Medicinal Chemistry Rules
Apply established drug-likeness rules to molecules using the `medchem.rules` module.
**Available Rules:**
– Rule of Five (Lipinski)
– Rule of Oprea
– Rule of CNS
– Rule of leadlike (soft and strict)
– Rule of three
– Rule of Reos
– Rule of drug
– Rule of Veber
– Golden triangle
– PAINS filters
**Single Rule Application:**
“`python
import medchem as mc
# Apply Rule of Five to a SMILES string
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin
passes = mc.rules.basic_rules.rule_of_five(smiles)
# Returns: True
# Check specific rules
passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles)
passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)
“`
**Multiple Rules with RuleFilters:**
“`python
import datamol as dm
import medchem as mc
# Load molecules
mols = [dm.to_mol(smiles) for smiles in smiles_list]
# Create filter with multiple rules
rfilter = mc.rules.RuleFilters(
rule_list=[
"rule_of_five",
"rule_of_oprea",
"rule_of_cns",
"rule_of_leadlike_soft"
]
)
# Apply filters with parallelization
results = rfilter(
mols=mols,
n_jobs=-1, # Use all CPU cores
progress=True
)
“`
**Result Format:**
Results are returned as dictionaries with pass/fail status and detailed information for each rule.
### 2. Structural Alert Filters
Detect potentially problematic structural patterns using the `medchem.structural` module.
**Available Filters:**
1. **Common Alerts** – General structural alerts derived from ChEMBL curation and literature
2. **NIBR Filters** – Novartis Institutes for BioMedical Research filter set
3. **Lilly Demerits** – Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)
**Common Alerts:**
“`python
import medchem as mc
# Create filter
alert_filter = mc.structural.CommonAlertsFilters()
# Check single molecule
mol = dm.to_mol("c1ccccc1")
has_alerts, details = alert_filter.check_mol(mol)
# Batch filtering with parallelization
results = alert_filter(
mols=mol_list,
n_jobs=-1,
progress=True
)
“`
**NIBR Filters:**
“`python
import medchem as mc
# Apply NIBR filters
nibr_filter = mc.structural.NIBRFilters()
results = nibr_filter(mols=mol_list, n_jobs=-1)
“`
**Lilly Demerits:**
“`python
import medchem as mc
# Calculate Lilly demerits
lilly = mc.structural.LillyDemeritsFilters()
results = lilly(mols=mol_list, n_jobs=-1)
# Each result includes demerit score and whether it passes (â¤100 demerits)
“`
### 3. Functional API for High-Level Operations
The `medchem.functional` module provides convenient functions for common workflows.
**Quick Filtering:**
“`python
import medchem as mc
# Apply NIBR filters to a list
filter_ok = mc.functional.nibr_filter(
mols=mol_list,
n_jobs=-1
)
# Apply common alerts
alert_results = mc.functional.common_alerts_filter(
mols=mol_list,
n_jobs=-1
)
“`
### 4. Chemical Groups Detection
Identify specific chemical groups and functional groups using `medchem.groups`.
**Available Groups:**
– Hinge binders
– Phosphate binders
– Michael acceptors
– Reactive groups
– Custom SMARTS patterns
**Usage:**
“`python
import medchem as mc
# Create group detector
group = mc.groups.ChemicalGroup(groups=["hinge_bindersf:T3038,# Medchem API Reference
Comprehensive reference for all medchem modules and functions.
## Module: medchem.rules
### Class: RuleFilters
Filter molecules based on multiple medicinal chemistry rules.
**Constructor:**
“`python
RuleFilters(rule_list: List[str])
“`
**Parameters:**
– `rule_list`: List of rule names to apply. See available rules below.
**Methods:**
“`python
__call__(mols: List[Chem.Mol], n_jobs: int = 1, progress: bool = False) -> Dict
“`
– `mols`: List of RDKit molecule objects
– `n_jobs`: Number of parallel jobs (-1 uses all cores)
– `progress`: Show progress bar
– **Returns**: Dictionary with results for each rule
**Example:**
“`python
rfilter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_cns:T3d50,#!/usr/bin/env python3
"""
Batch molecular filtering using medchem library.
This script provides a production-ready workflow for filtering compound libraries
using medchem rules, structural alerts, and custom constraints.
Usage:
python filter_molecules.py input.csv –rules rule_of_five,rule_of_cns –alerts nibr –output filtered.csv
python filter_molecules.py input.sdf –rules rule_of_drug –lilly –complexity 400 –output results.csv
python filter_molecules.py smiles.txt –nibr –pains –n-jobs -1 –output clean.csv
"""
import argparse
import sys
from pathlib import Path
from typing import List, Dict, Optional, Tuple
import json
try:
import pandas as pd
import datamol as dm
import medchem as mc
from rdkit import Chem
from tqdm import tqdm
except ImportError as e:
print(f"Error: Missing required package: {e}")
print("Install dependencies: pip install medchem datamol pandas tqdm")
sys.exit(1)
def load_molecules(input_file: Path, smiles_column: str = "smiles") -> Tuple[pd.DataFrame, List[Chem.Mol]]:
"""
Load molecules from various file formats.
Supports:
– CSV/TSV with SMILES column
– SDF files
– Plain text files with one SMILES per line
Returns:
Tuple of (DataFrame with metadata, list of RDKit molecules)
"""
suffix = input_file.suffix.lower()
if suffix == ".sdf":
print(f"Loading SDF file: {input_file}")
supplier = Chem.SDMolSupplier(str(input_file))
mols = [mol for mol in supplier if mol is not None]
# Create DataFrame from SDF properties
data = []
for mol in mols:
props = mol.GetPropsAsDict()
props["smiles"] = Chem.MolToSmiles(mol)
data.append(props)
df = pd.DataFrame(data)
elif suffix in [".csv", ".tsv"]:
print(f"Loading CSV/TSV file: {input_file}")
sep = "\t" if suffix == ".tsv" else ","
df = pd.read_csv(input_file, sep=sep)
if smiles_column not in df.columns:
print(f"Error: Column '{smiles_column}' not found in file")
print(f"Available columns: {', '.join(df.columns)}")
sys.exit(1)
print(f"Converting SMILES to molecules…")
mols = [dm.to_mol(smi) for smi in tqdm(df[smiles_column], desc="Parsing")]
elif suffix == ".txt":
print(f"Loading text file: {input_file}")
with open(input_file) as f:
smiles_list = [line.strip() for line in f if line.strip()]
df = pd.DataFrame({"smiles": smiles_list})
print(f"Converting SMILES to molecules…")
mols = [dm.to_mol(smi) for smi in tqdm(smiles_list, desc="Parsing")]
else:
print(f"Error: Unsupported file format: {suffix}")
print("Supported formats: .csv, .tsv, .sdf, .txt")
sys.exit(1)
# Filter out invalid molecules
valid_indices = [i for i, mol in enumerate(mols) if mol is not None]
if len(valid_indices) < len(mols):
n_invalid = len(mols) – len(valid_indices)
print(f"Warning: {n_invalid} invalid molecules removed")
df = df.iloc[valid_indices].reset_index(drop=True)
mols = [mols[i] for i in valid_indices]
print(f"Loaded {len(mols)} valid molecules")
return df, mols
def apply_rule_filters(mols: List[Chem.Mol], rules: List[str], n_jobs: int) -> pd.DataFrame:
"""Apply medicinal chemistry rule filters."""
print(f"\nApplying rule filters: {', '.join(rules)}")
rfilter = mc.rules.RuleFilters(rule_list=rules)
results = rfilter(mols=mols, n_jobs=n_jobs, progress=True)
# Convert to DataFrame
df_results = pd.DataFrame(results)
# Add summary column
df_results["passes_all_rules"] = df_results.all(axis=1)
return df_results
def apply_structural_alerts(mols: List[Chem.Mol], alert_type: str, n_jobs: int) -> pd.DataFrame:
"""Apply structural alert filters."""
print(f"\nApplying {alert_type} structural alerts…")
if alert_type == "common":
alert_filter = mc.structural.CommonAlertsFilters()
results = alert_filter(mols=mols, n_jobs=n_jobs, progress=True)
df_results = pd.DataFrame({
"has_common_alerts": [r["has_alerts"] for r in results],
"num_common_alerts": [r["num_alerts"] for r in results],
"common_alert_details": [", ".join(r["alert_details:T30e0,# Medchem Rules and Filters Catalog
Comprehensive catalog of all available medicinal chemistry rules, structural alerts, and filters in medchem.
## Table of Contents
1. [Drug-Likeness Rules](#drug-likeness-rules)
2. [Lead-Likeness Rules](#lead-likeness-rules)
3. [Fragment Rules](#fragment-rules)
4. [CNS Rules](#cns-rules)
5. [Structural Alert Filters](#structural-alert-filters)
6. [Chemical Group Patterns](#chemical-group-patterns)
—
## Drug-Likeness Rules
### Rule of Five (Lipinski)
**Reference:** Lipinski et al., Adv Drug Deliv Rev (1997) 23:3-25
**Purpose:** Predict oral bioavailability
**Criteria:**
– Molecular Weight ⤠500 Da
– LogP ⤠5
– Hydrogen Bond Donors ⤠5
– Hydrogen Bond Acceptors ⤠10
**Usage:**
“`python
mc.rules.basic_rules.rule_of_five(mol)
“`
**Notes:**
– One of the most widely used filters in drug discovery
– About 90% of orally active drugs comply with these rules
– Exceptions exist, especially for natural products and antibiotics
—
### Rule of Veber
**Reference:** Veber et al., J Med Chem (2002) 45:2615-2623
**Purpose:** Additional criteria for oral bioavailability
**Criteria:**
– Rotatable Bonds ⤠10
– Topological Polar Surface Area (TPSA) ⤠140 Ų
**Usage:**
“`python
mc.rules.basic_rules.rule_of_veber(mol)
“`
**Notes:**
– Complements Rule of Five
– TPSA correlates with cell permeability
– Rotatable bonds affect molecular flexibility
—