API Reference¶
This page contains the complete API reference for the DOCKTOPUS package.
Main Classes¶
Docking¶
- class docktopus.docking.Docking(engine, work_dir, model_runner_class=None, **engine_params)[source]¶
Bases:
objectMain orchestrator class for molecular docking workflows.
This class provides a unified interface for running molecular docking simulations using various docking engines. It handles input preparation, docking execution, and results processing in a streamlined workflow.
The class supports multiple docking engines: - GNINA: Deep learning-based docking with CNN scoring - Vina: Traditional molecular docking with empirical scoring - GalaxyDock2 HEME: Specialized docking for heme-containing proteins - RFAA: AlphaFold2-based protein-ligand structure prediction
- Parameters:
engine (str)
work_dir (str)
- work_dir¶
Base directory for all workflow outputs
- Type:
Path
- logger¶
Logger instance for workflow events
- Type:
logging.Logger
- preprocessor¶
Instance for molecular preparation
- Type:
- engine¶
Docking engine instance (type depends on engine parameter)
- Example Usage:
>>> from docktopus import Docking
>>> # Initialize Vina docking >>> dock = Docking( ... engine="vina", ... work_dir="./test-data", ... box_center=[54.426, 78.117, 10.330], ... box_size=[15, 15, 15], ... seed=1000, ... cpu=4, ... exhaustiveness=8, ... num_modes=3 ... )
>>> # Prepare ligands from SMILES >>> smiles = ["CCNC(=O)c1ccc2c(c1)NC(=O)/C2=C(\Nc1ccc(CN(C)C)cc1)c1ccccc1"] >>> ligands = dock.prepare_ligands(smiles)
>>> # Prepare receptor >>> receptor = "test-data/1W0F-cyp.pdb" >>> target = dock.prepare_receptor(receptor_pdb=receptor)
>>> # Run docking >>> results = dock.dock( ... receptor=target, ... ligands=ligands, ... smiles=smiles ... )
- __init__(engine, work_dir, model_runner_class=None, **engine_params)[source]¶
Initialize the docking workflow orchestrator.
- Parameters:
engine (str) – Name of docking engine to use. Supported values: - ‘gnina’: GNINA deep learning docking engine - ‘vina’: AutoDock Vina docking engine - ‘galaxydock2-heme’: GalaxyDock2 HEME specialized engine - ‘rfaa’: RFAA AlphaFold2-based engine
work_dir (str) – Base directory for all workflow outputs. Will be created if it doesn’t exist.
model_runner_class – ModelRunner class for RFAA engine (required if engine=’rfaa’)
**engine_params – Additional parameters passed to the specific docking engine. Common parameters for non-rfaa engines include: - box_center: (x,y,z) coordinates of docking box center - box_size: (x,y,z) dimensions of docking box - exhaustiveness: Search exhaustiveness (higher = more thorough) - num_modes: Number of binding modes to generate - cpu: Number of CPU cores to use - seed: Random seed for reproducibility
executable (GNINA engine requires to pass path to gnina)
while (GalaxyDock2 HEME requires path to the driver script)
RFAA. (RosettaFold-All-Atoms requires ModelRunner object which you need to import directly from rf2aa.run_inference module shipped with)
RFAA (If you intend to use)
directory. (your driver script should be in the top RFAA repository)
- Raises:
ValueError – If unsupported engine is specified or required parameters are missing
FileNotFoundError – If required executables or files are not found
Example
>>> # Initialize GNINA docking >>> docking = Docking( ... engine='gnina', ... work_dir='./docking_results', ... box_center=(10.0, 20.0, 30.0), ... box_size=(20.0, 20.0, 20.0), ... gnina_path="/home/username/gnina" ... )
>>> # Initialize RFAA docking >>> from rf2aa.run_inference import ModelRunner >>> docking = Docking( ... engine='rfaa', ... work_dir='./rfaa_results', ... model_runner_class=ModelRunner ... )
- dock(ligands, receptor, smiles=None, **kwargs)[source]¶
Orchestrate docking for single or multiple ligands.
This method automatically determines whether to run single or batch docking based on the input type. It performs pre-docking checks and handles both file-based and SMILES-based ligand inputs. The method assumes the provided ligand (and receptor) files are already protonated to the correct pH. To prepare them refer to prepare_ligand (prepare_receptor) methods.
- Parameters:
ligands – Ligand input specification. Can be: - str: Path to single ligand file - list: List of ligand file paths for batch docking - str: SMILES string (single molecule) - list: List of SMILES strings (batch processing)
receptor (str) – Path to receptor structure file
smiles (Optional[str]) – SMILES string corresponding to the ligand(s). Required for RFAA engine, optional for others.
**kwargs – Additional arguments passed to dock_single or dock_many methods. Common parameters include: - output_prefix: Optional prefix for output files - prepare_inputs: Whether to preprocess input files (default: True)
- Returns:
Docking results. For single ligand: dictionary containing scores and output file paths. For multiple ligands: list of result dictionaries.
- Return type:
Union[Dict, List[Dict]]
- Raises:
ValueError – If ligands parameter is not a string or list
FileNotFoundError – If input files are not found
RuntimeError – If docking engine fails
Example
>>> # Single ligand docking >>> result = docking.dock( ... ligands='ligand.sdf', ... receptor='protein.pdb' ... ) >>> print(f"Docking metrics: {result}")
>>> # Batch docking with multiple ligands >>> results = docking.dock( ... ligands=['lig1.sdf', 'lig2.sdf', 'lig3.sdf'], ... receptor='protein.pdb' ... ) >>> for i, result in enumerate(results): ... print(f"Ligand {i+1}: {result['scores'][0]['affinity']}")
>>> # SMILES-based docking >>> result = docking.dock( ... ligands='CC(=O)OC1=CC=CC=C1C(=O)O', ... receptor='protein.pdb', ... smiles='CC(=O)OC1=CC=CC=C1C(=O)O' ... )
- dock_many(receptor_file, ligand_files, smiles, box_center=None, box_size=(30.0, 30.0, 30.0))[source]¶
Perform docking for multiple receptor-ligand pairs.
This method executes batch docking for multiple ligands against a single receptor. It processes each ligand individually and collects results, with error handling to ensure that failures of individual ligands don’t stop the entire batch.
- Parameters:
receptor_file (str) – Path to receptor structure file
ligand_files (List[str]) – List of paths to ligand structure files
smiles (Optional[str]) – SMILES string corresponding to the ligands. Required for RFAA engine, optional for others.
box_center (Optional[Tuple[float, float, float]]) – (x,y,z) coordinates of docking box center. If None, the engine will use ligand center. Defaults to None.
box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).
- Returns:
List of docking result dictionaries. Each dictionary has the same structure as returned by dock_single(). Failed dockings will have an “error” key with the error message.
- Return type:
List[Dict]
- Raises:
FileNotFoundError – If receptor file is not found
ValueError – If ligand_files is not a list
Example
>>> ligand_files = ['lig1.sdf', 'lig2.sdf', 'lig3.sdf'] >>> results = docking.dock_many( ... receptor_file='protein.pdb', ... ligand_files=ligand_files ... ) >>> for i, result in enumerate(results): ... if 'error' in result: ... print(f"Ligand {i+1} failed: {result['error']}") ... else: ... print(f"Ligand {i+1} score: {result['scores'][0]['affinity']}")
Note
Each ligand is processed independently
Failed dockings are logged but don’t stop the batch
Results maintain the same order as input ligand_files
Error handling ensures robust batch processing
- dock_single(receptor_file, ligand_file, smiles=None, output_prefix=None, box_center=None, box_size=(30.0, 30.0, 30.0))[source]¶
Perform docking for a single receptor-ligand pair.
This method executes the actual docking calculation using the configured docking engine. It handles the specific requirements of each engine and returns comprehensive results including scores and output file paths.
- Parameters:
receptor_file (str) – Path to receptor structure file
ligand_file (str) – Path to ligand structure file
smiles (Optional[str]) – SMILES string corresponding to the ligand. Required for RFAA engine, optional for others.
output_prefix (Optional[str]) – Optional prefix for output files. If not provided, uses the ligand filename stem.
box_center (Optional[Tuple[float, float, float]]) – (x,y,z) coordinates of docking box center. If None, the engine will use ligand center. Defaults to None.
box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).
- Returns:
- Dictionary containing docking results with the following keys:
output_file: Path to the main output file (docked structure)
log_file: Path to the docking log file
scores: List of dictionaries containing scores for each pose
valid: (RFAA only) Boolean indicating if the result passed validation
metrics: (RFAA only) Additional quality metrics
- Return type:
Dict
- Raises:
FileNotFoundError – If input files are not found
RuntimeError – If docking calculation fails
ValueError – If required parameters are missing
Example
>>> result = docking.dock_single( ... receptor_file='protein.pdb', ... ligand_file='ligand.sdf', ... output_prefix='my_docking' ... ) >>> print(f"Best pose score: result") >>> print(f"Output structure: {result}")
Note
The exact content of the scores list depends on the docking engine: - GNINA: affinity, intramol, cnn_pose, cnn_affinity - Vina: affinity - GalaxyDock2 HEME: Energy - RFAA: ligand_mean_pae, mean_plddts
- prepare_ligands(ligands)[source]¶
Prepare ligand structures from SMILES strings for docking.
This method generates 3D conformers from SMILES strings and prepares them for docking using the appropriate preprocessor methods. The preparation process depends on the docking engine being used.
- Parameters:
ligands (Union[str, List[str]]) – SMILES string or list of SMILES strings representing the molecules to prepare.
- Returns:
Path(s) to prepared ligand file(s). Returns a single path if input was a single SMILES, or a list of paths if input was a list of SMILES.
- Return type:
Union[str, List[str]]
- Raises:
ValueError – If ligands parameter is not a string or list of strings
RuntimeError – If conformer generation or preparation fails
Example
>>> # Prepare single ligand >>> prepared_file = docking.prepare_ligands('CC(=O)OC1=CC=CC=C1C(=O)O') >>> print(f"Prepared ligand saved to: {prepared_file}")
>>> # Prepare multiple ligands >>> smiles_list = [ ... 'CC(=O)OC1=CC=CC=C1C(=O)O', ... 'c1ccccc1', ... 'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O' ... ] >>> prepared_files = docking.prepare_ligands(smiles_list) >>> print(f"Prepared {len(prepared_files)} ligands")
Note
The preparation process includes: 1. 3D conformer generation from SMILES using openbabel 2. Hydrogen addition at physiological pH=7.4 3. Format conversion to engine-specific requirements 4. For RFAA engine, you don’t need to use this function
- prepare_receptor(receptor_pdb)[source]¶
Prepare receptor structure for docking.
This method handles receptor preparation including protonation and format conversion as required by the specific docking engine. Some engines (like RFAA) may not require receptor preparation.
- Parameters:
receptor_pdb (str) – Path to receptor structure file (typically PDB format)
- Returns:
Path to prepared receptor file, or None if no preparation is required (e.g., for RFAA engine).
- Return type:
Optional[str]
- Raises:
FileNotFoundError – If receptor file is not found
RuntimeError – If receptor preparation fails
Example
>>> prepared_receptor = docking.prepare_receptor('protein.pdb') >>> if prepared_receptor: ... print(f"Prepared receptor saved to: {prepared_receptor}") ... else: ... print("No receptor preparation required")
Note
Preparation steps may include: - Hydrogen addition at physiological pH=7.4 - Format conversion (e.g., PDB to PDBQT for Vina) - Structure cleaning and validation
DataPreprocessor¶
- class docktopus.preprocessor.DataPreprocessor(work_dir)[source]¶
Bases:
objectHandles molecular preparation and format conversion for docking workflows.
This class provides methods for preparing molecular structures for docking simulations, including protonation, format conversion, and 3D conformer generation. It uses Open Babel for molecular manipulation and supports various input/output formats. This is a helper class called internally by the Docking class.
The preprocessor handles: - Hydrogen addition at specified pH values - Format conversion between molecular file formats - Protein and ligand preparation for specific docking engines - 3D conformer generation from SMILES strings
- Parameters:
work_dir (str)
- work_dir¶
Directory where processed files are stored
- Type:
Path
- __init__(work_dir)[source]¶
Initialize the preprocessor with a working directory.
- Parameters:
work_dir (str) – Directory where processed files will be stored. Will be created if it doesn’t exist.
Example
>>> preprocessor = DataPreprocessor('./molecular_data')
- convert_format(input_file, output_file, remove_hydrogens=False)[source]¶
Convert between molecular file formats.
This method converts molecular structures between different file formats using Open Babel. It can optionally remove hydrogens during conversion.
- Parameters:
input_file (str) – Path to input file
output_file (str) – Path to output file
remove_hydrogens (bool, optional) – Whether to remove hydrogens during conversion. Defaults to False.
- Returns:
Path to the converted file
- Return type:
str
- Raises:
FileNotFoundError – If input file doesn’t exist
RuntimeError – If format conversion fails
Example
>>> # Convert SDF to MOL2 >>> mol2_file = preprocessor.convert_format( ... 'ligand.sdf', ... 'ligand.mol2' ... )
>>> # Convert PDB to PDBQT (removing hydrogens) >>> pdbqt_file = preprocessor.convert_format( ... 'protein.pdb', ... 'protein.pdbqt', ... remove_hydrogens=True ... )
Note
Input and output formats are determined by file extensions
Common formats: SDF, PDB, MOL2, PDBQT, SMILES
Removing hydrogens can be useful for certain docking engines
- generate_conformers(smiles, output_file)[source]¶
Generate a 3D conformer from a SMILES string and write to an SDF file.
This method generates a single 3D conformer from a SMILES string using Open Babel’s 3D coordinate generation. The resulting structure is saved in SDF format without hydrogens added.
- Parameters:
smiles (str) – SMILES string of the molecule
output_file (str) – Path to save the 3D structure (should have .sdf extension)
- Returns:
Path to the output SDF file containing the 3D conformer
- Return type:
str
- Raises:
ValueError – If SMILES string is invalid
RuntimeError – If 3D generation fails
Example
>>> output_file = preprocessor.generate_conformers( ... 'CC(=O)OC1=CC=CC=C1C(=O)O', ... 'aspirin_3d.sdf' ... ) >>> print(f"3D structure saved to: {output_file}")
Note
Generates only one conformer (not multiple conformers)
Does not add hydrogens (use protonate() if needed)
Uses Open Babel’s make3D() method for coordinate generation
Output is always in SDF format regardless of output_file extension
- prepare_ligand(ligand_file, format='sdf', output_dir=None)[source]¶
Prepare ligand structure for docking.
This method prepares ligand structures for docking by adding all hydrogens at physiological pH. It’s designed for general-purpose ligand preparation and works with most docking engines.
- Parameters:
ligand_file (str) – Path to ligand structure file
format (str, optional) – Output format for prepared ligand. Defaults to “sdf”.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.
- Returns:
Path to the prepared ligand file
- Return type:
str
- Raises:
FileNotFoundError – If ligand file doesn’t exist
RuntimeError – If ligand preparation fails
Example
>>> prepared_ligand = preprocessor.prepare_ligand( ... 'molecule.sdf', ... format='sdf', ... output_dir='./prepared' ... ) >>> print(f"Prepared ligand: {prepared_ligand}")
Note
Adds all hydrogens at pH 7.4
Useful for most docking engines that require explicit hydrogens
Output filename includes “_prepared” suffix
- prepare_ligand_vina(ligand_file, pH=7.4, output_dir=None)[source]¶
Prepare ligand structure specifically for Vina docking.
This method prepares ligand structures for AutoDock Vina by converting them to PDBQT format with appropriate hydrogen handling. Vina requires PDBQT format with specific atom types, charges, and rotatable bonds.
- Parameters:
ligand_file (str) – Path to ligand structure file (typically SDF format)
pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.
- Returns:
Path to the prepared ligand file in PDBQT format
- Return type:
str
- Raises:
FileNotFoundError – If ligand file doesn’t exist
RuntimeError – If ligand preparation fails
subprocess.CalledProcessError – If Open Babel conversion fails
Example
>>> vina_ligand = preprocessor.prepare_ligand_vina( ... 'molecule.sdf', ... pH=7.4, ... output_dir='./vina_prepared' ... ) >>> print(f"Vina-ready ligand: {vina_ligand}")
Note
Converts to PDBQT format required by Vina
Adds polar hydrogens, removes non-polar hydrogens (-xpnh flag)
Assigns atom types, charges, and rotatable bonds
Assumes SDF input format (modify cmd if using different format)
- prepare_protein(protein_file, format='pdb', output_dir=None)[source]¶
Prepare protein structure for docking.
This method prepares protein structures for docking by adding polar hydrogens at physiological pH.
- Parameters:
protein_file (str) – Path to protein structure file
format (str, optional) – Output format for prepared protein. Defaults to “pdb”.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.
- Returns:
Path to the prepared protein file
- Return type:
str
- Raises:
FileNotFoundError – If protein file doesn’t exist
RuntimeError – If protein preparation fails
Example
>>> prepared_protein = preprocessor.prepare_protein( ... 'receptor.pdb', ... format='pdb', ... output_dir='./prepared' ... ) >>> print(f"Prepared protein: {prepared_protein}")
Note
Adds polar hydrogens at pH 7.4
Preserves non-polar hydrogens if present
Output filename includes “_prepared” suffix
- prepare_protein_vina(protein_file, pH=7.4, output_dir=None)[source]¶
Prepare protein structure specifically for Vina docking.
This method prepares protein structures for AutoDock Vina by converting them to PDBQT format with appropriate hydrogen handling. Vina requires PDBQT format with specific atom types which is properly handled by obabel binary instead of pybel.
- Parameters:
protein_file (str) – Path to protein structure file (typically PDB format)
pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.
- Returns:
Path to the prepared protein file in PDBQT format
- Return type:
str
- Raises:
FileNotFoundError – If protein file doesn’t exist
RuntimeError – If protein preparation fails
subprocess.CalledProcessError – If Open Babel conversion fails
Example
>>> vina_protein = preprocessor.prepare_protein_vina( ... 'receptor.pdb', ... pH=7.4, ... output_dir='./vina_prepared' ... ) >>> print(f"Vina-ready protein: {vina_protein}")
Note
Converts to PDBQT format required by Vina using system call to obabel binary
Removes non-polar hydrogens (-xr flag)
Adds polar hydrogens at specified pH
- protonate(input_file, output_file, pH=7.4, polar_only=True)[source]¶
Add hydrogens to a molecule at specified pH.
This method uses Open Babel to add hydrogens to molecular structures based on the specified pH value. It can add either all hydrogens or only polar hydrogens depending on the polar_only parameter.
- Parameters:
input_file (str) – Path to input structure file
output_file (str) – Path to save protonated structure
pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4 (physiological pH).
polar_only (bool, optional) – If True, only add polar hydrogens. If False, add all hydrogens. Defaults to True.
- Returns:
Path to the protonated structure file
- Return type:
str
- Raises:
FileNotFoundError – If input file doesn’t exist
RuntimeError – If protonation fails
Example
>>> protonated_file = preprocessor.protonate( ... 'molecule.sdf', ... 'molecule_protonated.sdf', ... pH=7.4, ... polar_only=False ... )
Note
Supports various input formats (SDF, PDB, MOL2, etc.)
Output format is determined by file extension
pH affects the protonation state of titratable groups
Docking Engines¶
GNINA Engine¶
- class docktopus.gnina_engine.GninaDockingEngine(gnina_path, work_dir, seed=0, exhaustiveness=16, num_modes=9, cpu=4)[source]¶
Bases:
objectGNINA-specific docking engine implementation.
This class provides an interface to the GNINA docking engine, which combines traditional molecular docking with deep learning-based scoring using convolutional neural networks (CNNs). GNINA is particularly effective for structure-based drug design and virtual screening.
GNINA features: - Traditional Vina scoring function - CNN-based pose scoring and affinity prediction - Support for flexible docking - Automatic binding site detection - Multiple output poses with comprehensive scoring
- Parameters:
gnina_path (str)
work_dir (str)
seed (int)
exhaustiveness (int)
num_modes (int)
cpu (int)
- gnina_path¶
Path to GNINA executable
- Type:
str
- work_dir¶
Directory for docking outputs
- Type:
Path
- receptor_format¶
Expected receptor file format (“pdb”)
- Type:
str
- ligand_format¶
Expected ligand file format (“sdf”)
- Type:
str
- exhaustiveness¶
Search exhaustiveness parameter
- Type:
int
- num_modes¶
Number of binding modes to generate
- Type:
int
- cpu¶
Number of CPU cores to use
- Type:
int
- autobox_ligand¶
Whether to use ligand for automatic box detection
- Type:
bool
- seed¶
Random seed for reproducibility
- Type:
int
- logger¶
Logger instance for engine events
- Type:
logging.Logger
- __init__(gnina_path, work_dir, seed=0, exhaustiveness=16, num_modes=9, cpu=4)[source]¶
Initialize GNINA docking engine.
- Parameters:
gnina_path (str) – Path to GNINA executable. Must be a valid path to the GNINA binary.
work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
seed (int, optional) – Random seed for reproducibility. Defaults to 0.
exhaustiveness (int, optional) – Search exhaustiveness (higher values give more thorough but slower searches). Defaults to 16.
num_modes (int, optional) – Number of binding modes to generate. Defaults to 9.
cpu (int, optional) – Number of CPU cores to use for docking. Defaults to 4.
autobox_ligand (bool, optional) – If True and no box_center is provided, automatically determine box center from ligand. Defaults to True.
- Raises:
FileNotFoundError – If GNINA executable is not found at the specified path
ValueError – If invalid parameters are provided
- dock(receptor_file, ligand_file, box_size=(30.0, 30.0, 30.0), box_center=None, output_prefix=None)[source]¶
Perform docking using GNINA.
This method executes GNINA docking with the specified parameters and returns comprehensive results including multiple poses with both traditional and CNN-based scores.
- Parameters:
receptor_file (str) – Path to prepared receptor file (PDB format)
ligand_file (str) – Path to prepared ligand file (SDF format)
box_center (Optional[Tuple[float, float, float]], optional) – (x,y,z) coordinates of docking box center. If uses autoboxing ligand center. Defaults to None.
box_size (Tuple[float, float, float], optional) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).
output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.
- Returns:
- Dictionary containing docking results with keys:
output_file: Path to SDF file with docked poses
log_file: Path to GNINA log file with detailed output
- scores: List of dictionaries, each containing scores for one pose:
pose: Pose number (1-based)
affinity: Vina binding affinity (kcal/mol)
intramol: Intramolecular energy (kcal/mol)
cnn_pose: CNN pose score
cnn_affinity: CNN affinity prediction
- Return type:
Dict[str, Any]
- Raises:
FileNotFoundError – If input files are not found
subprocess.CalledProcessError – If GNINA execution fails
RuntimeError – If score parsing fails
Note
Receptor should be in PDB format with polar hydrogens
Ligand should be in SDF format with all hydrogens
Box parameters are applied as specified during initialization
All poses are saved in a single SDF file
Log file contains detailed GNINA output and diagnostics
- precheck(file_path)[source]¶
Check if the provided file path exists.
This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.
- Parameters:
file_path (str) – Path to the file to check
- Returns:
True if the file exists, False otherwise
- Return type:
bool
Note
Only checks file existence, not file validity
Does not verify file format or content
Useful for basic input validation
Vina Engine¶
- class docktopus.vina_engine.VinaDockingEngine(work_dir, exhaustiveness=8, num_modes=9, cpu=4, seed=0)[source]¶
Bases:
objectVina-specific docking engine implementation using the vina Python interface.
This class provides an interface to AutoDock Vina, a popular molecular docking program that uses an empirical scoring function based on the AutoDock 4 force field. Vina is known for its speed and accuracy in structure-based drug design.
Vina features: - Empirical scoring function (Vina scoring) - Fast conformational search using iterated local search - Support for flexible ligand docking - Automatic binding site detection - Multiple output poses with binding affinities
- Parameters:
work_dir (str)
exhaustiveness (int)
num_modes (int)
cpu (int)
seed (int)
- work_dir¶
Directory for docking outputs
- Type:
Path
- receptor_format¶
Expected receptor file format (“pdbqt”)
- Type:
str
- ligand_format¶
Expected ligand file format (“pdbqt”)
- Type:
str
- exhaustiveness¶
Search exhaustiveness parameter
- Type:
int
- num_modes¶
Number of binding modes to generate
- Type:
int
- cpu¶
Number of CPU cores to use
- Type:
int
- seed¶
Random seed for reproducibility
- Type:
int
- vina¶
Vina object from the vina Python package
- logger¶
Logger instance for engine events
- Type:
logging.Logger
- __init__(work_dir, exhaustiveness=8, num_modes=9, cpu=4, seed=0)[source]¶
Initialize Vina docking engine.
- Parameters:
work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
exhaustiveness (int, optional) – Search exhaustiveness (higher values give more thorough but slower searches). Defaults to 8.
num_modes (int, optional) – Number of binding modes to generate. Defaults to 9.
cpu (int, optional) – Number of CPU cores to use for docking. Defaults to 4.
seed (int, optional) – Random seed for reproducibility. Defaults to 0.
- Raises:
ImportError – If vina Python package is not available
ValueError – If invalid parameters are provided
Example
>>> engine = VinaDockingEngine( ... work_dir='./docking_results', ... box_center=(15.2, 23.1, 18.7), ... box_size=(25.0, 25.0, 25.0), ... exhaustiveness=16, ... num_modes=20 ... )
Note
Requires the vina Python package to be installed: pip install vina
- dock(receptor_file, box_center, ligand_file, box_size=(30.0, 30.0, 30.0), output_prefix=None)[source]¶
Perform docking using Vina.
This method executes AutoDock Vina docking with the specified parameters and returns results including multiple poses with binding affinities.
- Parameters:
receptor_file (str) – Path to prepared receptor file (PDBQT format)
ligand_file (str) – Path to prepared ligand file (PDBQT format)
box_center (Tuple[float, float, float]) – (x,y,z) coordinates of docking box center. If None, uses ligand center. Defaults to None.
box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).
output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.
- Returns:
- Dictionary containing docking results with keys:
output_file: Path to PDBQT file with docked poses
log_file: Path to Vina log file with detailed output
- scores: List of dictionaries, each containing scores for one pose:
pose: Pose number (1-based)
affinity: Binding affinity in kcal/mol
- Return type:
Dict[str, Any]
- Raises:
FileNotFoundError – If input files are not found
RuntimeError – If Vina execution fails
ImportError – If vina package is not available
Note
Receptor and ligand must be in PDBQT format
PDBQT format includes atom types, charges, and rotatable bonds
Box parameters are applied as specified during initialization
All poses are saved in a single PDBQT file
Log file contains detailed Vina output and diagnostics
- precheck(file_path)[source]¶
Check if the provided file path exists.
This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.
- Parameters:
file_path (str) – Path to the file to check
- Returns:
True if the file exists, False otherwise
- Return type:
bool
Note
Only checks file existence, not file validity
Does not verify file format or content
Useful for basic input validation
GalaxyDock2 HEME Engine¶
- class docktopus.gdock_engine.GDockHEMEDockingEngine(gdock_dir, work_dir, seed=0)[source]¶
Bases:
objectGalaxyDock2 HEME-specific docking engine implementation.
This class provides an interface to GalaxyDock2 HEME, a specialized docking program designed for heme-containing proteins such as cytochromes P450. GalaxyDock2 HEME incorporates heme-specific scoring functions and binding site considerations.
GalaxyDock2 HEME features: - Specialized scoring for heme-containing proteins - Heme-specific binding site detection - Support for heme-ligand interactions - Multiple output poses with comprehensive scoring - Optimized for cytochrome P450 and similar enzymes
- Parameters:
gdock_dir (str)
work_dir (str)
seed (int)
- gdock_dir¶
Path to GalaxyDock2 HEME installation directory
- Type:
Path
- work_dir¶
Directory for docking outputs
- Type:
Path
- gd2_scratch_dir¶
Scratch directory for GalaxyDock2 HEME operations
- Type:
Path
- receptor_format¶
Expected receptor file format (“pdb”)
- Type:
str
- ligand_format¶
Expected ligand file format (“mol2”)
- Type:
str
- box_center¶
Docking box center coordinates
- Type:
Optional[Tuple[float, float, float]]
- seed¶
Random seed for reproducibility
- Type:
int
- gdock_script¶
Path to GalaxyDock2 HEME Python script
- Type:
Path
- logger¶
Logger instance for engine events
- Type:
logging.Logger
- __init__(gdock_dir, work_dir, seed=0)[source]¶
Initialize GalaxyDock2 HEME docking engine.
- Parameters:
gdock_dir (str) – Path to GalaxyDock2 HEME installation directory. Must contain the script/run_GalaxyDock2_heme.py file.
work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
seed (int, optional) – Random seed for reproducibility. Defaults to 0.
- Raises:
FileNotFoundError – If GalaxyDock2 HEME script is not found
ValueError – If required parameters are missing
Note
Requires GalaxyDock2 HEME to be installed and properly configured
Box center coordinates are essential for heme docking
Creates scratch directory for temporary files
- dock(receptor_file, ligand_file, box_center, box_size=(30, 30, 30), output_prefix=None)[source]¶
Perform docking using GalaxyDock2 HEME.
This method executes GalaxyDock2 HEME docking with the specified parameters and returns results including multiple poses with heme-specific scores.
- Parameters:
receptor_file (str) – Path to prepared receptor file (protonated PDB format)
ligand_file (str) – Path to prepared ligand file (protonated MOL2 format)
box_center (Tuple[float, float, float]) – (x,y,z) coordinates of docking box center.
box_size (Tuple[float, float, float]) – Size in Angstroms of the docking box. Defaults to (30, 30, 30)
output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.
- Returns:
- Dictionary containing docking results with keys:
output_file: Path to MOL2 file with docked poses
log_file: Path to GalaxyDock2 HEME log file
- scores: Dictionary containing pose information:
- poses: List of dictionaries, each containing:
pose: Pose number (1-based)
Energy: Total GalaxyDock2 HEME score
- Return type:
Dict[str, Any]
- Raises:
FileNotFoundError – If input files are not found
subprocess.CalledProcessError – If GalaxyDock2 HEME execution fails
RuntimeError – If score parsing fails
Note
Receptor should be in PDB format with polar hydrogens
Ligand should be in MOL2 format with all hydrogens
Box center coordinates are required and used for docking
Output is in MOL2 format with multiple poses
Scores are extracted from GalaxyDock2 HEME energy files
- precheck(file_path)[source]¶
Check if the provided file path exists.
This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.
- Parameters:
file_path (str) – Path to the file to check
- Returns:
True if the file exists, False otherwise
- Return type:
bool
Note
Only checks file existence, not file validity
Does not verify file format or content
Useful for basic input validation
RFAA Engine¶
- class docktopus.rfaa_engine.RFAADockingEngine(work_dir, model_runner_class, target='humanCYP3A4')[source]¶
Bases:
objectRosettaFoldAll-Atoms (RFAA) specific docking engine implementation using the model.
This class provides an interface to RFAA, which uses fully flexible protein structure prediction for protein-ligand complex modeling. The prediction results can be validated (enabled by default) to remove halucinated poses.
RFAA features: - Fully flexible docking with explicit bonding for covalentlly bound cofactors - Quality assessment using pLDDT and PAE metrics - Post-docking validation to detect hallucinations - Support for heme-containing proteins (CYP3A4)
- Parameters:
work_dir (str)
- work_dir¶
Directory for docking outputs
- Type:
Path
- model_runner_class¶
ModelRunner class from rf2aa.run_inference
- tmp_dir¶
Directory for temporary files
- Type:
Path
- results_dir¶
Directory for results files
- Type:
Path
- target_dir¶
Directory for target-specific files
- Type:
Path
- template_pdb¶
Template PDB ID for crossdocking
- Type:
str
- fasta_file¶
Path to target protein FASTA file
- Type:
str
- hem_file¶
Path to heme SDF file
- Type:
str
- _validator¶
Validator instance for post-docking checks
- Type:
Optional[RFAAValidator]
- validation_enabled¶
Whether validation is available
- Type:
bool
- logger¶
Logger instance for engine events
- Type:
logging.Logger
- __init__(work_dir, model_runner_class, target='humanCYP3A4')[source]¶
Initialize RFAA docking engine.
- Parameters:
work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
model_runner_class – ModelRunner class from rf2aa.run_inference. Required for RFAA model execution.
- Raises:
ImportError – If required RFAA dependencies are not available
FileNotFoundError – If template files are not found
RuntimeError – If initialization fails
Note
Requires rf2aa package and dependencies to be installed
Copies template files (FASTA, HEM SDF) to work directory
Attempts to initialize validation if dependencies are available
Creates necessary subdirectories for workflow
- dock(ligand_sdf_file, smiles='')[source]¶
Perform docking using RFAA.
This method executes RFAA protein-ligand structure prediction using the specified ligand and target. It generates a complete protein-ligand complex structure with quality metrics.
- Parameters:
ligand_sdf_file (str) – Path to ligand SDF file
smiles (str, optional) – SMILES string corresponding to the ligand. Required for validation. Defaults to “”.
target (str, optional) – Target protein identifier out of supported proteins. See set_target() method for list of supported targets
- Returns:
- Dictionary containing docking results with keys:
output_file: Path to PDB file with protein-ligand complex
log_file: Path to RFAA log file
valid: Boolean indicating if result passed validation (if available)
- metrics: Dictionary containing quality metrics:
ligand_mean_pae: Mean PAE for ligand atoms
mean_plddts: Mean pLDDT for ligand atoms
- Return type:
Dict[str, Any]
- Raises:
FileNotFoundError – If input files are not found
RuntimeError – If RFAA execution fails
ImportError – If RFAA dependencies are not available
Note
Generates complete protein-ligand complex structure
Uses template-based modeling approach
Performs post-docking validation if available
Quality metrics help assess prediction reliability
Output is a PDB file with both protein and ligand
- generate_config_files(ligand_sdf_file, output_path=None, config_dir=None)[source]¶
Generate RFAA config files based on provided inputs.
This method creates the configuration files required by RFAA for protein-ligand structure prediction. It uses template files and substitutes the provided parameters.
- Parameters:
ligand_sdf_file (str) – Path to ligand SDF file
output_path (str, optional) – Path for output files. If None, uses tmp_dir. Defaults to None.
config_dir (str, optional) – Directory to save config files. If None, uses work_dir/config. Defaults to None.
- Returns:
Generated configuration YAML string for RFAA
- Return type:
str
- Raises:
FileNotFoundError – If template files are not found
RuntimeError – If config generation fails
Note
Uses template configuration from package resources
Substitutes file paths and parameters in template
Configuration includes protein, ligand, and heme specifications
Output path is used for RFAA temporary files
- precheck(file_path)[source]¶
Check if the provided file path exists.
This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.
- Parameters:
file_path (str) – Path to the file to check
- Returns:
True if the file exists, False otherwise
- Return type:
bool
Note
Only checks file existence, not file validity
Does not verify file format or content
- set_target(target_name)[source]¶
Set the target protein for the RFAA docking engine.
This method updates the selected target protein and its associated configuration (such as FASTA file and residue count) based on the provided target name. It changes the internal state of the engine so that subsequent docking runs will use the new target.
- Parameters:
target_name (str) –
The key corresponding to the desired target protein. Available values:
”humanCYP3A4”
”humanCYP2C8”
”humanCYP2C9”
”humanCYP2C19”
”humanCYP2D6”
”humanCYP2A6”
”humanCYP2B6”
”humanCYP2E1”
”humanCYP1A2”
”humanCYP2D13”
”humanCYP46A1”
”CYP199A4”
”CYP121”
”CYP105A1”
”CYPcam”
”CYP125”
”CYP102A1”
- Raises:
KeyError – If the provided target_name is not found in the available targets.
Validation¶
RFAA Validator¶
- class docktopus.validator.RFAAValidator[source]¶
Bases:
objectValidator for RFAA docking results to detect potential hallucinations.
This class provides methods to validate protein-ligand structures generated by RFAA to detect potential hallucinations (incorrectly predicted structures). It performs chemical validity checks and compares predicted structures with reference SMILES strings.
- logger¶
Logger instance for validation events
- Type:
logging.Logger
Example
>>> validator = RFAAValidator() >>> sdf_string = validator.convert_pdb_to_sdf("mol.pdb") >>> is_valid = validator.validate_ligand(sdf_string, reference_smiles) >>> if is_valid: ... print("Structure passed validation") ... else: ... print("Structure may be hallucinated")
- __init__()[source]¶
Initialize the RFAA validator.
Sets up logging and prepares the validator for structure validation.
Example
>>> validator = RFAAValidator()
- adjacency_with_orders(mol)[source]¶
Create adjacency matrix with bond orders from an RDKit molecule.
This method generates a symmetric adjacency matrix where each element represents the bond order between two atoms (0 = no bond, 1 = single, 2 = double, 3 = triple).
- Parameters:
mol (Chem.Mol) – RDKit molecule object
- Returns:
Symmetric adjacency matrix with bond orders
- Return type:
np.ndarray
Note
Matrix is symmetric (A[i,j] = A[j,i])
Diagonal elements are 0 (no self-bonds)
Bond orders: 1=single, 2=double, 3=triple
Useful for comparing molecular connectivity
- convert_pdb_to_sdf(pdb_file)[source]¶
Convert a PDB file to SDF format using OpenBabel.
This method converts a PDB file containing molecular coordinates to SDF format using pybel which handle kekekulization of aromatic moieties better than rdkit.
- Parameters:
pdb_file (str) – Path to the input PDB file
- Returns:
SDF format string block of the molecule, or None if conversion fails
- Return type:
str
- Raises:
FileNotFoundError – If PDB file doesn’t exist
RuntimeError – If conversion fails
Note
Uses OpenBabel’s pybel interface for conversion
Returns SDF string block, not file path
Handles coordinate information and basic molecular properties
Returns None if conversion fails
- count_bond_diffs(A1, A2)[source]¶
Count the number of bond differences between two adjacency matrices.
This method compares two adjacency matrices and counts how many bonds differ between them. Only the upper triangle is considered to avoid double counting.
- Parameters:
A1 (np.ndarray) – First adjacency matrix
A2 (np.ndarray) – Second adjacency matrix
- Returns:
Number of bond differences between the molecules
- Return type:
int
- Raises:
ValueError – If matrices have different dimensions
Note
Returns 0 if molecules have identical connectivity
Higher values indicate more structural differences
- extract_ligand(input_pdb, output_pdb)[source]¶
Extract ligand coordinates from a protein-ligand complex PDB file.
This method uses pdb_selchain and pdb_tidy to extract only the ligand atoms (chain B) from a protein-ligand complex and clean up the PDB format.
- Parameters:
input_pdb (str) – Path to input PDB file containing protein-ligand complex
output_pdb (str) – Path to output PDB file containing only ligand
- Raises:
subprocess.CalledProcessError – If pdb_selchain or pdb_tidy fails
IOError – If output file cannot be written
Note
Assumes ligand is in chain B of the complex
Requires pdb_selchain and pdb_tidy executables to be installed
Output PDB is cleaned and formatted for further processing
- fix_bond_orders(sdf_string, smiles)[source]¶
Fix bond orders in a molecular structure using a reference SMILES.
This method attempts to correct bond orders in a molecular structure by using a reference SMILES string as a template.
- Parameters:
sdf_string (str) – SDF format string of the molecular structure
smiles (str) – Reference SMILES string to use as template
- Returns:
RDKit molecule with corrected bond orders, or False if failed
- Return type:
Chem.Mol
Note
Template SMILES should represent the same molecule
Returns False if correction fails
- standardize_smiles(smiles)[source]¶
Standardize a SMILES string to canonical form.
This method converts a SMILES string to its canonical tautomeric form using RDKit’s MolStandardize module. This ensures consistent comparison between different representations of the same molecule.
- Parameters:
smiles (str) – Input SMILES string
- Returns:
Canonical, standardized SMILES string, or None if invalid
- Return type:
str
- Raises:
RuntimeError – If SMILES standardization fails
Note
Handles tautomeric forms automatically
Removes stereochemistry information
Returns None for invalid SMILES strings
Uses RDKit’s MolStandardize for robust standardization
- validate_ligand(sdf_string, smiles, threshold=1)[source]¶
Validate a ligand structure against a reference SMILES string.
This method performs comprehensive validation of a predicted ligand structure by comparing it with a reference SMILES string. It checks chemical validity, atom counts, bond connectivity, and bond orders.
- Parameters:
sdf_string (str) – SDF format string of the predicted ligand structure
smiles (str) – Reference SMILES string of the same molecule as in the pdb file for comparison
threshold (int, optional) – Maximum allowed bond differences. Defaults to 1.
- Returns:
True if structure passes validation, False otherwise
- Return type:
bool
Note
Validation steps: 1. Chemical validity check (can be converted to SMILES) 2. Atom count comparison with reference 3. Bond connectivity comparison (adjacency matrices) 4. Bond order assignment and verification - Higher threshold allows more bond differences - Returns False if any step fails
Analyser¶
- class docktopus.analyser.Analyser(workdir)[source]¶
Bases:
objectAnalyser is a utility class for analyzing molecular docking results, Currently offers computing distances Fe-ligand distances for CYP complexes. Still has limited funcitonality of only Fe distances calculation and calculating MAE with bootstrapped errors. Contains some unused functions for future functionalities of full post-docking analysis.
- This class provides methods to:
Compute Fe distances between docked and reference ligand/hem structures.
Facilitate downstream statistical analysis of docking results.
- Parameters:
workdir (str) – The working directory containing the docking and reference results. Expected subdirectories: ‘docked’ and ‘ref’.
Examples
>>> analyser = Analyser(workdir="./results") >>> docked_data = ("./results/docked/sample1-docked.pdb", "./results/docked/sample1-hem.pdb") >>> reference_data = ("./results/ref/sample1-ligand.pdb", "./results/ref/sample1-hem.pdb") >>> ligand_dist, docked_dist = analyser.get_Fedistance(docked_data, reference_data, kind="Fed1") >>> print("Ligand Fe distance:", ligand_dist) >>> print("Docked Fe distance:", docked_dist)
- calculate_molecular_weights(smiles_list)[source]¶
Calculate molecular weights for a list of SMILES strings.
- Parameters:
smiles_list (list of str) – List of SMILES strings.
- Returns:
Molecular weights for each molecule.
- Return type:
list of float
- calculate_num_rotatable_bonds(smiles_list)[source]¶
Calculate the number of rotatable bonds for a list of SMILES strings.
- Parameters:
smiles_list (list of str) – List of SMILES strings.
- Returns:
Number of rotatable bonds for each molecule.
- Return type:
list of int
- get_sample_Fedist(sample_names, kind)[source]¶
Wrapper function to compute Fe distances for a list of sample names. !! DOES NOT WORK CURRENTLY !! Main issue is to provide consistent filenaming to traverse over full dataset of docking resuts. Currently contains hardcoded names for docked_{ligand, hem} and reference_{ligand, hem} which are not consistent with the rest of the library.
- Parameters:
sample_names (list of str) – List of sample names.
kind (str) – The kind of Fe distance to compute (“Fed1”, “Fed2”, “Fed3”).
- Returns:
DataFrame with columns [‘sample’, ‘ligand_dist’, ‘docked_dist’].
- Return type:
pd.DataFrame