API Reference¶

This page contains the complete API reference for the DOCKTOPUS package.

Main Classes¶

Docking¶

class docktopus.docking.Docking(engine, work_dir, model_runner_class=None, **engine_params)[source]¶

Bases: object

Main orchestrator class for molecular docking workflows.

This class provides a unified interface for running molecular docking simulations using various docking engines. It handles input preparation, docking execution, and results processing in a streamlined workflow.

The class supports multiple docking engines: - GNINA: Deep learning-based docking with CNN scoring - Vina: Traditional molecular docking with empirical scoring - GalaxyDock2 HEME: Specialized docking for heme-containing proteins - RFAA: AlphaFold2-based protein-ligand structure prediction

Parameters:

engine (str)
work_dir (str)

work_dir¶

Base directory for all workflow outputs

Type:: Path

logger¶

Logger instance for workflow events

Type:: logging.Logger

preprocessor¶

Instance for molecular preparation

Type:: DataPreprocessor

engine¶: Docking engine instance (type depends on engine parameter)

Example Usage:

>>> from docktopus import Docking

>>> # Initialize Vina docking
>>> dock = Docking(
...     engine="vina",
...     work_dir="./test-data",
...     box_center=[54.426, 78.117, 10.330],
...     box_size=[15, 15, 15],
...     seed=1000,
...     cpu=4,
...     exhaustiveness=8,
...     num_modes=3
... )

>>> # Prepare ligands from SMILES
>>> smiles = ["CCNC(=O)c1ccc2c(c1)NC(=O)/C2=C(\Nc1ccc(CN(C)C)cc1)c1ccccc1"]
>>> ligands = dock.prepare_ligands(smiles)

>>> # Prepare receptor
>>> receptor = "test-data/1W0F-cyp.pdb"
>>> target = dock.prepare_receptor(receptor_pdb=receptor)

>>> # Run docking
>>> results = dock.dock(
...     receptor=target,
...     ligands=ligands,
...     smiles=smiles
... )

__init__(engine, work_dir, model_runner_class=None, **engine_params)[source]¶

Initialize the docking workflow orchestrator.

Parameters:

engine (str) – Name of docking engine to use. Supported values: - ‘gnina’: GNINA deep learning docking engine - ‘vina’: AutoDock Vina docking engine - ‘galaxydock2-heme’: GalaxyDock2 HEME specialized engine - ‘rfaa’: RFAA AlphaFold2-based engine
work_dir (str) – Base directory for all workflow outputs. Will be created if it doesn’t exist.
model_runner_class – ModelRunner class for RFAA engine (required if engine=’rfaa’)
**engine_params – Additional parameters passed to the specific docking engine. Common parameters for non-rfaa engines include: - box_center: (x,y,z) coordinates of docking box center - box_size: (x,y,z) dimensions of docking box - exhaustiveness: Search exhaustiveness (higher = more thorough) - num_modes: Number of binding modes to generate - cpu: Number of CPU cores to use - seed: Random seed for reproducibility
executable (GNINA engine requires to pass path to gnina)
while (GalaxyDock2 HEME requires path to the driver script)
RFAA. (RosettaFold-All-Atoms requires ModelRunner object which you need to import directly from rf2aa.run_inference module shipped with)
RFAA (If you intend to use)
directory. (your driver script should be in the top RFAA repository)

Raises:

ValueError – If unsupported engine is specified or required parameters are missing
FileNotFoundError – If required executables or files are not found

Example

>>> # Initialize GNINA docking
>>> docking = Docking(
...     engine='gnina',
...     work_dir='./docking_results',
...     box_center=(10.0, 20.0, 30.0),
...     box_size=(20.0, 20.0, 20.0),
...     gnina_path="/home/username/gnina"
... )

>>> # Initialize RFAA docking
>>> from rf2aa.run_inference import ModelRunner
>>> docking = Docking(
...     engine='rfaa',
...     work_dir='./rfaa_results',
...     model_runner_class=ModelRunner
... )

dock(ligands, receptor, smiles=None, **kwargs)[source]¶

Orchestrate docking for single or multiple ligands.

This method automatically determines whether to run single or batch docking based on the input type. It performs pre-docking checks and handles both file-based and SMILES-based ligand inputs. The method assumes the provided ligand (and receptor) files are already protonated to the correct pH. To prepare them refer to prepare_ligand (prepare_receptor) methods.

Parameters:

ligands – Ligand input specification. Can be: - str: Path to single ligand file - list: List of ligand file paths for batch docking - str: SMILES string (single molecule) - list: List of SMILES strings (batch processing)
receptor (str) – Path to receptor structure file
smiles (Optional[str]) – SMILES string corresponding to the ligand(s). Required for RFAA engine, optional for others.
**kwargs – Additional arguments passed to dock_single or dock_many methods. Common parameters include: - output_prefix: Optional prefix for output files - prepare_inputs: Whether to preprocess input files (default: True)

Returns:

Docking results. For single ligand: dictionary containing scores and output file paths. For multiple ligands: list of result dictionaries.

Return type:

Union[Dict, List[Dict]]

Raises:

ValueError – If ligands parameter is not a string or list
FileNotFoundError – If input files are not found
RuntimeError – If docking engine fails

Example

>>> # Single ligand docking
>>> result = docking.dock(
...     ligands='ligand.sdf',
...     receptor='protein.pdb'
... )
>>> print(f"Docking metrics: {result}")

>>> # Batch docking with multiple ligands
>>> results = docking.dock(
...     ligands=['lig1.sdf', 'lig2.sdf', 'lig3.sdf'],
...     receptor='protein.pdb'
... )
>>> for i, result in enumerate(results):
...     print(f"Ligand {i+1}: {result['scores'][0]['affinity']}")

>>> # SMILES-based docking
>>> result = docking.dock(
...     ligands='CC(=O)OC1=CC=CC=C1C(=O)O',
...     receptor='protein.pdb',
...     smiles='CC(=O)OC1=CC=CC=C1C(=O)O'
... )

dock_many(receptor_file, ligand_files, smiles, box_center=None, box_size=(30.0, 30.0, 30.0))[source]¶

Perform docking for multiple receptor-ligand pairs.

This method executes batch docking for multiple ligands against a single receptor. It processes each ligand individually and collects results, with error handling to ensure that failures of individual ligands don’t stop the entire batch.

Parameters:

receptor_file (str) – Path to receptor structure file
ligand_files (List[str]) – List of paths to ligand structure files
smiles (Optional[str]) – SMILES string corresponding to the ligands. Required for RFAA engine, optional for others.
box_center (Optional[Tuple[float, float, float]]) – (x,y,z) coordinates of docking box center. If None, the engine will use ligand center. Defaults to None.
box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).

Returns:

List of docking result dictionaries. Each dictionary has the same structure as returned by dock_single(). Failed dockings will have an “error” key with the error message.

Return type:

List[Dict]

Raises:

FileNotFoundError – If receptor file is not found
ValueError – If ligand_files is not a list

Example

>>> ligand_files = ['lig1.sdf', 'lig2.sdf', 'lig3.sdf']
>>> results = docking.dock_many(
...     receptor_file='protein.pdb',
...     ligand_files=ligand_files
... )
>>> for i, result in enumerate(results):
...     if 'error' in result:
...         print(f"Ligand {i+1} failed: {result['error']}")
...     else:
...         print(f"Ligand {i+1} score: {result['scores'][0]['affinity']}")

Note

Each ligand is processed independently
Failed dockings are logged but don’t stop the batch
Results maintain the same order as input ligand_files
Error handling ensures robust batch processing

dock_single(receptor_file, ligand_file, smiles=None, output_prefix=None, box_center=None, box_size=(30.0, 30.0, 30.0))[source]¶

Perform docking for a single receptor-ligand pair.

This method executes the actual docking calculation using the configured docking engine. It handles the specific requirements of each engine and returns comprehensive results including scores and output file paths.

Parameters:

receptor_file (str) – Path to receptor structure file
ligand_file (str) – Path to ligand structure file
smiles (Optional[str]) – SMILES string corresponding to the ligand. Required for RFAA engine, optional for others.
output_prefix (Optional[str]) – Optional prefix for output files. If not provided, uses the ligand filename stem.
box_center (Optional[Tuple[float, float, float]]) – (x,y,z) coordinates of docking box center. If None, the engine will use ligand center. Defaults to None.
box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).

Returns:

Dictionary containing docking results with the following keys:

output_file: Path to the main output file (docked structure)
log_file: Path to the docking log file
scores: List of dictionaries containing scores for each pose
valid: (RFAA only) Boolean indicating if the result passed validation
metrics: (RFAA only) Additional quality metrics

Return type:

Dict

Raises:

FileNotFoundError – If input files are not found
RuntimeError – If docking calculation fails
ValueError – If required parameters are missing

Example

>>> result = docking.dock_single(
...     receptor_file='protein.pdb',
...     ligand_file='ligand.sdf',
...     output_prefix='my_docking'
... )
>>> print(f"Best pose score: result")
>>> print(f"Output structure: {result}")

Note

The exact content of the scores list depends on the docking engine: - GNINA: affinity, intramol, cnn_pose, cnn_affinity - Vina: affinity - GalaxyDock2 HEME: Energy - RFAA: ligand_mean_pae, mean_plddts

prepare_ligands(ligands)[source]¶

Prepare ligand structures from SMILES strings for docking.

This method generates 3D conformers from SMILES strings and prepares them for docking using the appropriate preprocessor methods. The preparation process depends on the docking engine being used.

Parameters:

ligands (Union[str, List[str]]) – SMILES string or list of SMILES strings representing the molecules to prepare.

Returns:

Path(s) to prepared ligand file(s). Returns a single path if input was a single SMILES, or a list of paths if input was a list of SMILES.

Return type:

Union[str, List[str]]

Raises:

ValueError – If ligands parameter is not a string or list of strings
RuntimeError – If conformer generation or preparation fails

Example

>>> # Prepare single ligand
>>> prepared_file = docking.prepare_ligands('CC(=O)OC1=CC=CC=C1C(=O)O')
>>> print(f"Prepared ligand saved to: {prepared_file}")

>>> # Prepare multiple ligands
>>> smiles_list = [
...     'CC(=O)OC1=CC=CC=C1C(=O)O',
...     'c1ccccc1',
...     'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O'
... ]
>>> prepared_files = docking.prepare_ligands(smiles_list)
>>> print(f"Prepared {len(prepared_files)} ligands")

Note

The preparation process includes: 1. 3D conformer generation from SMILES using openbabel 2. Hydrogen addition at physiological pH=7.4 3. Format conversion to engine-specific requirements 4. For RFAA engine, you don’t need to use this function

prepare_receptor(receptor_pdb)[source]¶

Prepare receptor structure for docking.

This method handles receptor preparation including protonation and format conversion as required by the specific docking engine. Some engines (like RFAA) may not require receptor preparation.

Parameters:

receptor_pdb (str) – Path to receptor structure file (typically PDB format)

Returns:

Path to prepared receptor file, or None if no preparation is required (e.g., for RFAA engine).

Return type:

Optional[str]

Raises:

FileNotFoundError – If receptor file is not found
RuntimeError – If receptor preparation fails

Example

>>> prepared_receptor = docking.prepare_receptor('protein.pdb')
>>> if prepared_receptor:
...     print(f"Prepared receptor saved to: {prepared_receptor}")
... else:
...     print("No receptor preparation required")

Note

Preparation steps may include: - Hydrogen addition at physiological pH=7.4 - Format conversion (e.g., PDB to PDBQT for Vina) - Structure cleaning and validation

DataPreprocessor¶

class docktopus.preprocessor.DataPreprocessor(work_dir)[source]¶

Bases: object

Handles molecular preparation and format conversion for docking workflows.

This class provides methods for preparing molecular structures for docking simulations, including protonation, format conversion, and 3D conformer generation. It uses Open Babel for molecular manipulation and supports various input/output formats. This is a helper class called internally by the Docking class.

The preprocessor handles: - Hydrogen addition at specified pH values - Format conversion between molecular file formats - Protein and ligand preparation for specific docking engines - 3D conformer generation from SMILES strings

Parameters:: work_dir (str)

work_dir¶

Directory where processed files are stored

Type:: Path

__init__(work_dir)[source]¶

Initialize the preprocessor with a working directory.

Parameters:: work_dir (str) – Directory where processed files will be stored. Will be created if it doesn’t exist.

Example

>>> preprocessor = DataPreprocessor('./molecular_data')

convert_format(input_file, output_file, remove_hydrogens=False)[source]¶

Convert between molecular file formats.

This method converts molecular structures between different file formats using Open Babel. It can optionally remove hydrogens during conversion.

Parameters:

input_file (str) – Path to input file
output_file (str) – Path to output file
remove_hydrogens (bool, optional) – Whether to remove hydrogens during conversion. Defaults to False.

Returns:

Path to the converted file

Return type:

str

Raises:

FileNotFoundError – If input file doesn’t exist
RuntimeError – If format conversion fails

Example

>>> # Convert SDF to MOL2
>>> mol2_file = preprocessor.convert_format(
...     'ligand.sdf',
...     'ligand.mol2'
... )

>>> # Convert PDB to PDBQT (removing hydrogens)
>>> pdbqt_file = preprocessor.convert_format(
...     'protein.pdb',
...     'protein.pdbqt',
...     remove_hydrogens=True
... )

Note

Input and output formats are determined by file extensions
Common formats: SDF, PDB, MOL2, PDBQT, SMILES
Removing hydrogens can be useful for certain docking engines

generate_conformers(smiles, output_file)[source]¶

Generate a 3D conformer from a SMILES string and write to an SDF file.

This method generates a single 3D conformer from a SMILES string using Open Babel’s 3D coordinate generation. The resulting structure is saved in SDF format without hydrogens added.

Parameters:

smiles (str) – SMILES string of the molecule
output_file (str) – Path to save the 3D structure (should have .sdf extension)

Returns:

Path to the output SDF file containing the 3D conformer

Return type:

str

Raises:

ValueError – If SMILES string is invalid
RuntimeError – If 3D generation fails

Example

>>> output_file = preprocessor.generate_conformers(
...     'CC(=O)OC1=CC=CC=C1C(=O)O',
...     'aspirin_3d.sdf'
... )
>>> print(f"3D structure saved to: {output_file}")

Note

Generates only one conformer (not multiple conformers)
Does not add hydrogens (use protonate() if needed)
Uses Open Babel’s make3D() method for coordinate generation
Output is always in SDF format regardless of output_file extension

prepare_ligand(ligand_file, format='sdf', output_dir=None)[source]¶

Prepare ligand structure for docking.

This method prepares ligand structures for docking by adding all hydrogens at physiological pH. It’s designed for general-purpose ligand preparation and works with most docking engines.

Parameters:

ligand_file (str) – Path to ligand structure file
format (str, optional) – Output format for prepared ligand. Defaults to “sdf”.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared ligand file

Return type:

str

Raises:

FileNotFoundError – If ligand file doesn’t exist
RuntimeError – If ligand preparation fails

Example

>>> prepared_ligand = preprocessor.prepare_ligand(
...     'molecule.sdf',
...     format='sdf',
...     output_dir='./prepared'
... )
>>> print(f"Prepared ligand: {prepared_ligand}")

Note

Adds all hydrogens at pH 7.4
Useful for most docking engines that require explicit hydrogens
Output filename includes “_prepared” suffix

prepare_ligand_vina(ligand_file, pH=7.4, output_dir=None)[source]¶

Prepare ligand structure specifically for Vina docking.

This method prepares ligand structures for AutoDock Vina by converting them to PDBQT format with appropriate hydrogen handling. Vina requires PDBQT format with specific atom types, charges, and rotatable bonds.

Parameters:

ligand_file (str) – Path to ligand structure file (typically SDF format)
pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared ligand file in PDBQT format

Return type:

str

Raises:

FileNotFoundError – If ligand file doesn’t exist
RuntimeError – If ligand preparation fails
subprocess.CalledProcessError – If Open Babel conversion fails

Example

>>> vina_ligand = preprocessor.prepare_ligand_vina(
...     'molecule.sdf',
...     pH=7.4,
...     output_dir='./vina_prepared'
... )
>>> print(f"Vina-ready ligand: {vina_ligand}")

Note

Converts to PDBQT format required by Vina
Adds polar hydrogens, removes non-polar hydrogens (-xpnh flag)
Assigns atom types, charges, and rotatable bonds
Assumes SDF input format (modify cmd if using different format)

prepare_protein(protein_file, format='pdb', output_dir=None)[source]¶

Prepare protein structure for docking.

This method prepares protein structures for docking by adding polar hydrogens at physiological pH.

Parameters:

protein_file (str) – Path to protein structure file
format (str, optional) – Output format for prepared protein. Defaults to “pdb”.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared protein file

Return type:

str

Raises:

FileNotFoundError – If protein file doesn’t exist
RuntimeError – If protein preparation fails

Example

>>> prepared_protein = preprocessor.prepare_protein(
...     'receptor.pdb',
...     format='pdb',
...     output_dir='./prepared'
... )
>>> print(f"Prepared protein: {prepared_protein}")

Note

Adds polar hydrogens at pH 7.4
Preserves non-polar hydrogens if present
Output filename includes “_prepared” suffix

prepare_protein_vina(protein_file, pH=7.4, output_dir=None)[source]¶

Prepare protein structure specifically for Vina docking.

This method prepares protein structures for AutoDock Vina by converting them to PDBQT format with appropriate hydrogen handling. Vina requires PDBQT format with specific atom types which is properly handled by obabel binary instead of pybel.

Parameters:

protein_file (str) – Path to protein structure file (typically PDB format)
pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4.
output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared protein file in PDBQT format

Return type:

str

Raises:

FileNotFoundError – If protein file doesn’t exist
RuntimeError – If protein preparation fails
subprocess.CalledProcessError – If Open Babel conversion fails

Example

>>> vina_protein = preprocessor.prepare_protein_vina(
...     'receptor.pdb',
...     pH=7.4,
...     output_dir='./vina_prepared'
... )
>>> print(f"Vina-ready protein: {vina_protein}")

Note

Converts to PDBQT format required by Vina using system call to obabel binary
Removes non-polar hydrogens (-xr flag)
Adds polar hydrogens at specified pH

protonate(input_file, output_file, pH=7.4, polar_only=True)[source]¶

Add hydrogens to a molecule at specified pH.

This method uses Open Babel to add hydrogens to molecular structures based on the specified pH value. It can add either all hydrogens or only polar hydrogens depending on the polar_only parameter.

Parameters:

input_file (str) – Path to input structure file
output_file (str) – Path to save protonated structure
pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4 (physiological pH).
polar_only (bool, optional) – If True, only add polar hydrogens. If False, add all hydrogens. Defaults to True.

Returns:

Path to the protonated structure file

Return type:

str

Raises:

FileNotFoundError – If input file doesn’t exist
RuntimeError – If protonation fails

Example

>>> protonated_file = preprocessor.protonate(
...     'molecule.sdf',
...     'molecule_protonated.sdf',
...     pH=7.4,
...     polar_only=False
... )

Note

Supports various input formats (SDF, PDB, MOL2, etc.)
Output format is determined by file extension
pH affects the protonation state of titratable groups

Docking Engines¶

GNINA Engine¶

class docktopus.gnina_engine.GninaDockingEngine(gnina_path, work_dir, seed=0, exhaustiveness=16, num_modes=9, cpu=4)[source]¶

Bases: object

GNINA-specific docking engine implementation.

This class provides an interface to the GNINA docking engine, which combines traditional molecular docking with deep learning-based scoring using convolutional neural networks (CNNs). GNINA is particularly effective for structure-based drug design and virtual screening.

GNINA features: - Traditional Vina scoring function - CNN-based pose scoring and affinity prediction - Support for flexible docking - Automatic binding site detection - Multiple output poses with comprehensive scoring

Parameters:

gnina_path (str)
work_dir (str)
seed (int)
exhaustiveness (int)
num_modes (int)
cpu (int)

gnina_path¶

Path to GNINA executable

Type:: str

work_dir¶

Directory for docking outputs

Type:: Path

receptor_format¶

Expected receptor file format (“pdb”)

Type:: str

ligand_format¶

Expected ligand file format (“sdf”)

Type:: str

exhaustiveness¶

Search exhaustiveness parameter

Type:: int

num_modes¶

Number of binding modes to generate

Type:: int

cpu¶

Number of CPU cores to use

Type:: int

autobox_ligand¶

Whether to use ligand for automatic box detection

Type:: bool

seed¶

Random seed for reproducibility

Type:: int

logger¶

Logger instance for engine events

Type:: logging.Logger

__init__(gnina_path, work_dir, seed=0, exhaustiveness=16, num_modes=9, cpu=4)[source]¶

Initialize GNINA docking engine.

Parameters:

gnina_path (str) – Path to GNINA executable. Must be a valid path to the GNINA binary.
work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
seed (int, optional) – Random seed for reproducibility. Defaults to 0.
exhaustiveness (int, optional) – Search exhaustiveness (higher values give more thorough but slower searches). Defaults to 16.
num_modes (int, optional) – Number of binding modes to generate. Defaults to 9.
cpu (int, optional) – Number of CPU cores to use for docking. Defaults to 4.
autobox_ligand (bool, optional) – If True and no box_center is provided, automatically determine box center from ligand. Defaults to True.

Raises:

FileNotFoundError – If GNINA executable is not found at the specified path
ValueError – If invalid parameters are provided

dock(receptor_file, ligand_file, box_size=(30.0, 30.0, 30.0), box_center=None, output_prefix=None)[source]¶

Perform docking using GNINA.

This method executes GNINA docking with the specified parameters and returns comprehensive results including multiple poses with both traditional and CNN-based scores.

Parameters:

receptor_file (str) – Path to prepared receptor file (PDB format)
ligand_file (str) – Path to prepared ligand file (SDF format)
box_center (Optional[Tuple[float, float, float]], optional) – (x,y,z) coordinates of docking box center. If uses autoboxing ligand center. Defaults to None.
box_size (Tuple[float, float, float], optional) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).
output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.

Returns:

Dictionary containing docking results with keys:

output_file: Path to SDF file with docked poses
log_file: Path to GNINA log file with detailed output
scores: List of dictionaries, each containing scores for one pose:
- pose: Pose number (1-based)
- affinity: Vina binding affinity (kcal/mol)
- intramol: Intramolecular energy (kcal/mol)
- cnn_pose: CNN pose score
- cnn_affinity: CNN affinity prediction

Return type:

Dict[str, Any]

Raises:

FileNotFoundError – If input files are not found
subprocess.CalledProcessError – If GNINA execution fails
RuntimeError – If score parsing fails

Note

Receptor should be in PDB format with polar hydrogens
Ligand should be in SDF format with all hydrogens
Box parameters are applied as specified during initialization
All poses are saved in a single SDF file
Log file contains detailed GNINA output and diagnostics

precheck(file_path)[source]¶

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:: file_path (str) – Path to the file to check
Returns:: True if the file exists, False otherwise
Return type:: bool

Note

Only checks file existence, not file validity
Does not verify file format or content
Useful for basic input validation

Vina Engine¶

class docktopus.vina_engine.VinaDockingEngine(work_dir, exhaustiveness=8, num_modes=9, cpu=4, seed=0)[source]¶

Bases: object

Vina-specific docking engine implementation using the vina Python interface.

This class provides an interface to AutoDock Vina, a popular molecular docking program that uses an empirical scoring function based on the AutoDock 4 force field. Vina is known for its speed and accuracy in structure-based drug design.

Vina features: - Empirical scoring function (Vina scoring) - Fast conformational search using iterated local search - Support for flexible ligand docking - Automatic binding site detection - Multiple output poses with binding affinities

Parameters:

work_dir (str)
exhaustiveness (int)
num_modes (int)
cpu (int)
seed (int)

work_dir¶

Directory for docking outputs

Type:: Path

receptor_format¶

Expected receptor file format (“pdbqt”)

Type:: str

ligand_format¶

Expected ligand file format (“pdbqt”)

Type:: str

exhaustiveness¶

Search exhaustiveness parameter

Type:: int

num_modes¶

Number of binding modes to generate

Type:: int

cpu¶

Number of CPU cores to use

Type:: int

seed¶

Random seed for reproducibility

Type:: int

vina¶: Vina object from the vina Python package

logger¶

Logger instance for engine events

Type:: logging.Logger

__init__(work_dir, exhaustiveness=8, num_modes=9, cpu=4, seed=0)[source]¶

Initialize Vina docking engine.

Parameters:

work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
exhaustiveness (int, optional) – Search exhaustiveness (higher values give more thorough but slower searches). Defaults to 8.
num_modes (int, optional) – Number of binding modes to generate. Defaults to 9.
cpu (int, optional) – Number of CPU cores to use for docking. Defaults to 4.
seed (int, optional) – Random seed for reproducibility. Defaults to 0.

Raises:

ImportError – If vina Python package is not available
ValueError – If invalid parameters are provided

Example

>>> engine = VinaDockingEngine(
...     work_dir='./docking_results',
...     box_center=(15.2, 23.1, 18.7),
...     box_size=(25.0, 25.0, 25.0),
...     exhaustiveness=16,
...     num_modes=20
... )

Note

Requires the vina Python package to be installed: pip install vina

dock(receptor_file, box_center, ligand_file, box_size=(30.0, 30.0, 30.0), output_prefix=None)[source]¶

Perform docking using Vina.

This method executes AutoDock Vina docking with the specified parameters and returns results including multiple poses with binding affinities.

Parameters:

receptor_file (str) – Path to prepared receptor file (PDBQT format)
ligand_file (str) – Path to prepared ligand file (PDBQT format)
box_center (Tuple[float, float, float]) – (x,y,z) coordinates of docking box center. If None, uses ligand center. Defaults to None.
box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).
output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.

Returns:

Dictionary containing docking results with keys:

output_file: Path to PDBQT file with docked poses
log_file: Path to Vina log file with detailed output
scores: List of dictionaries, each containing scores for one pose:
- pose: Pose number (1-based)
- affinity: Binding affinity in kcal/mol

Return type:

Dict[str, Any]

Raises:

FileNotFoundError – If input files are not found
RuntimeError – If Vina execution fails
ImportError – If vina package is not available

Note

Receptor and ligand must be in PDBQT format
PDBQT format includes atom types, charges, and rotatable bonds
Box parameters are applied as specified during initialization
All poses are saved in a single PDBQT file
Log file contains detailed Vina output and diagnostics

precheck(file_path)[source]¶

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:: file_path (str) – Path to the file to check
Returns:: True if the file exists, False otherwise
Return type:: bool

Note

Only checks file existence, not file validity
Does not verify file format or content
Useful for basic input validation

GalaxyDock2 HEME Engine¶

class docktopus.gdock_engine.GDockHEMEDockingEngine(gdock_dir, work_dir, seed=0)[source]¶

Bases: object

GalaxyDock2 HEME-specific docking engine implementation.

This class provides an interface to GalaxyDock2 HEME, a specialized docking program designed for heme-containing proteins such as cytochromes P450. GalaxyDock2 HEME incorporates heme-specific scoring functions and binding site considerations.

GalaxyDock2 HEME features: - Specialized scoring for heme-containing proteins - Heme-specific binding site detection - Support for heme-ligand interactions - Multiple output poses with comprehensive scoring - Optimized for cytochrome P450 and similar enzymes

Parameters:

gdock_dir (str)
work_dir (str)
seed (int)

gdock_dir¶

Path to GalaxyDock2 HEME installation directory

Type:: Path

work_dir¶

Directory for docking outputs

Type:: Path

gd2_scratch_dir¶

Scratch directory for GalaxyDock2 HEME operations

Type:: Path

receptor_format¶

Expected receptor file format (“pdb”)

Type:: str

ligand_format¶

Expected ligand file format (“mol2”)

Type:: str

box_center¶

Docking box center coordinates

Type:: Optional[Tuple[float, float, float]]

seed¶

Random seed for reproducibility

Type:: int

gdock_script¶

Path to GalaxyDock2 HEME Python script

Type:: Path

logger¶

Logger instance for engine events

Type:: logging.Logger

__init__(gdock_dir, work_dir, seed=0)[source]¶

Initialize GalaxyDock2 HEME docking engine.

Parameters:

gdock_dir (str) – Path to GalaxyDock2 HEME installation directory. Must contain the script/run_GalaxyDock2_heme.py file.
work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
seed (int, optional) – Random seed for reproducibility. Defaults to 0.

Raises:

FileNotFoundError – If GalaxyDock2 HEME script is not found
ValueError – If required parameters are missing

Note

Requires GalaxyDock2 HEME to be installed and properly configured
Box center coordinates are essential for heme docking
Creates scratch directory for temporary files

dock(receptor_file, ligand_file, box_center, box_size=(30, 30, 30), output_prefix=None)[source]¶

Perform docking using GalaxyDock2 HEME.

This method executes GalaxyDock2 HEME docking with the specified parameters and returns results including multiple poses with heme-specific scores.

Parameters:

receptor_file (str) – Path to prepared receptor file (protonated PDB format)
ligand_file (str) – Path to prepared ligand file (protonated MOL2 format)
box_center (Tuple[float, float, float]) – (x,y,z) coordinates of docking box center.
box_size (Tuple[float, float, float]) – Size in Angstroms of the docking box. Defaults to (30, 30, 30)
output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.

Returns:

Dictionary containing docking results with keys:

output_file: Path to MOL2 file with docked poses
log_file: Path to GalaxyDock2 HEME log file
scores: Dictionary containing pose information:
- poses: List of dictionaries, each containing:
  
  pose: Pose number (1-based)
  
  Energy: Total GalaxyDock2 HEME score

Return type:

Dict[str, Any]

Raises:

FileNotFoundError – If input files are not found
subprocess.CalledProcessError – If GalaxyDock2 HEME execution fails
RuntimeError – If score parsing fails

Note

Receptor should be in PDB format with polar hydrogens
Ligand should be in MOL2 format with all hydrogens
Box center coordinates are required and used for docking
Output is in MOL2 format with multiple poses
Scores are extracted from GalaxyDock2 HEME energy files

precheck(file_path)[source]¶

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:: file_path (str) – Path to the file to check
Returns:: True if the file exists, False otherwise
Return type:: bool

Note

Only checks file existence, not file validity
Does not verify file format or content
Useful for basic input validation

RFAA Engine¶

class docktopus.rfaa_engine.RFAADockingEngine(work_dir, model_runner_class, target='humanCYP3A4')[source]¶

Bases: object

RosettaFoldAll-Atoms (RFAA) specific docking engine implementation using the model.

This class provides an interface to RFAA, which uses fully flexible protein structure prediction for protein-ligand complex modeling. The prediction results can be validated (enabled by default) to remove halucinated poses.

RFAA features: - Fully flexible docking with explicit bonding for covalentlly bound cofactors - Quality assessment using pLDDT and PAE metrics - Post-docking validation to detect hallucinations - Support for heme-containing proteins (CYP3A4)

Parameters:: work_dir (str)

work_dir¶

Directory for docking outputs

Type:: Path

model_runner_class¶: ModelRunner class from rf2aa.run_inference

tmp_dir¶

Directory for temporary files

Type:: Path

results_dir¶

Directory for results files

Type:: Path

target_dir¶

Directory for target-specific files

Type:: Path

template_pdb¶

Template PDB ID for crossdocking

Type:: str

fasta_file¶

Path to target protein FASTA file

Type:: str

hem_file¶

Path to heme SDF file

Type:: str

_validator¶

Validator instance for post-docking checks

Type:: Optional[RFAAValidator]

validation_enabled¶

Whether validation is available

Type:: bool

logger¶

Logger instance for engine events

Type:: logging.Logger

__init__(work_dir, model_runner_class, target='humanCYP3A4')[source]¶

Initialize RFAA docking engine.

Parameters:

work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.
model_runner_class – ModelRunner class from rf2aa.run_inference. Required for RFAA model execution.

Raises:

ImportError – If required RFAA dependencies are not available
FileNotFoundError – If template files are not found
RuntimeError – If initialization fails

Note

Requires rf2aa package and dependencies to be installed
Copies template files (FASTA, HEM SDF) to work directory
Attempts to initialize validation if dependencies are available
Creates necessary subdirectories for workflow

dock(ligand_sdf_file, smiles='')[source]¶

Perform docking using RFAA.

This method executes RFAA protein-ligand structure prediction using the specified ligand and target. It generates a complete protein-ligand complex structure with quality metrics.

Parameters:

ligand_sdf_file (str) – Path to ligand SDF file
smiles (str, optional) – SMILES string corresponding to the ligand. Required for validation. Defaults to “”.
target (str, optional) – Target protein identifier out of supported proteins. See set_target() method for list of supported targets

Returns:

Dictionary containing docking results with keys:

output_file: Path to PDB file with protein-ligand complex
log_file: Path to RFAA log file
valid: Boolean indicating if result passed validation (if available)
metrics: Dictionary containing quality metrics:
- ligand_mean_pae: Mean PAE for ligand atoms
- mean_plddts: Mean pLDDT for ligand atoms

Return type:

Dict[str, Any]

Raises:

FileNotFoundError – If input files are not found
RuntimeError – If RFAA execution fails
ImportError – If RFAA dependencies are not available

Note

Generates complete protein-ligand complex structure
Uses template-based modeling approach
Performs post-docking validation if available
Quality metrics help assess prediction reliability
Output is a PDB file with both protein and ligand

generate_config_files(ligand_sdf_file, output_path=None, config_dir=None)[source]¶

Generate RFAA config files based on provided inputs.

This method creates the configuration files required by RFAA for protein-ligand structure prediction. It uses template files and substitutes the provided parameters.

Parameters:

ligand_sdf_file (str) – Path to ligand SDF file
output_path (str, optional) – Path for output files. If None, uses tmp_dir. Defaults to None.
config_dir (str, optional) – Directory to save config files. If None, uses work_dir/config. Defaults to None.

Returns:

Generated configuration YAML string for RFAA

Return type:

str

Raises:

FileNotFoundError – If template files are not found
RuntimeError – If config generation fails

Note

Uses template configuration from package resources
Substitutes file paths and parameters in template
Configuration includes protein, ligand, and heme specifications
Output path is used for RFAA temporary files

precheck(file_path)[source]¶

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:: file_path (str) – Path to the file to check
Returns:: True if the file exists, False otherwise
Return type:: bool

Note

Only checks file existence, not file validity
Does not verify file format or content

set_target(target_name)[source]¶

Set the target protein for the RFAA docking engine.

This method updates the selected target protein and its associated configuration (such as FASTA file and residue count) based on the provided target name. It changes the internal state of the engine so that subsequent docking runs will use the new target.

Parameters:

target_name (str) –

The key corresponding to the desired target protein. Available values:

”humanCYP3A4”

”humanCYP2C8”

”humanCYP2C9”

”humanCYP2C19”

”humanCYP2D6”

”humanCYP2A6”

”humanCYP2B6”

”humanCYP2E1”

”humanCYP1A2”

”humanCYP2D13”

”humanCYP46A1”

”CYP199A4”

”CYP121”

”CYP105A1”

”CYPcam”

”CYP125”

”CYP102A1”

Raises:

KeyError – If the provided target_name is not found in the available targets.

Validation¶

RFAA Validator¶

class docktopus.validator.RFAAValidator[source]¶

Bases: object

Validator for RFAA docking results to detect potential hallucinations.

This class provides methods to validate protein-ligand structures generated by RFAA to detect potential hallucinations (incorrectly predicted structures). It performs chemical validity checks and compares predicted structures with reference SMILES strings.

logger¶

Logger instance for validation events

Type:: logging.Logger

Example

>>> validator = RFAAValidator()
>>> sdf_string = validator.convert_pdb_to_sdf("mol.pdb")
>>> is_valid = validator.validate_ligand(sdf_string, reference_smiles)
>>> if is_valid:
...     print("Structure passed validation")
... else:
...     print("Structure may be hallucinated")

__init__()[source]¶

Initialize the RFAA validator.

Sets up logging and prepares the validator for structure validation.

Example

>>> validator = RFAAValidator()

adjacency_with_orders(mol)[source]¶

Create adjacency matrix with bond orders from an RDKit molecule.

This method generates a symmetric adjacency matrix where each element represents the bond order between two atoms (0 = no bond, 1 = single, 2 = double, 3 = triple).

Parameters:: mol (Chem.Mol) – RDKit molecule object
Returns:: Symmetric adjacency matrix with bond orders
Return type:: np.ndarray

Note

Matrix is symmetric (A[i,j] = A[j,i])
Diagonal elements are 0 (no self-bonds)
Bond orders: 1=single, 2=double, 3=triple
Useful for comparing molecular connectivity

convert_pdb_to_sdf(pdb_file)[source]¶

Convert a PDB file to SDF format using OpenBabel.

This method converts a PDB file containing molecular coordinates to SDF format using pybel which handle kekekulization of aromatic moieties better than rdkit.

Parameters:

pdb_file (str) – Path to the input PDB file

Returns:

SDF format string block of the molecule, or None if conversion fails

Return type:

str

Raises:

FileNotFoundError – If PDB file doesn’t exist
RuntimeError – If conversion fails

Note

Uses OpenBabel’s pybel interface for conversion
Returns SDF string block, not file path
Handles coordinate information and basic molecular properties
Returns None if conversion fails

count_bond_diffs(A1, A2)[source]¶

Count the number of bond differences between two adjacency matrices.

This method compares two adjacency matrices and counts how many bonds differ between them. Only the upper triangle is considered to avoid double counting.

Parameters:

A1 (np.ndarray) – First adjacency matrix
A2 (np.ndarray) – Second adjacency matrix

Returns:

Number of bond differences between the molecules

Return type:

int

Raises:

ValueError – If matrices have different dimensions

Note

Returns 0 if molecules have identical connectivity
Higher values indicate more structural differences

extract_ligand(input_pdb, output_pdb)[source]¶

Extract ligand coordinates from a protein-ligand complex PDB file.

This method uses pdb_selchain and pdb_tidy to extract only the ligand atoms (chain B) from a protein-ligand complex and clean up the PDB format.

Parameters:

input_pdb (str) – Path to input PDB file containing protein-ligand complex
output_pdb (str) – Path to output PDB file containing only ligand

Raises:

subprocess.CalledProcessError – If pdb_selchain or pdb_tidy fails
IOError – If output file cannot be written

Note

Assumes ligand is in chain B of the complex
Requires pdb_selchain and pdb_tidy executables to be installed
Output PDB is cleaned and formatted for further processing

fix_bond_orders(sdf_string, smiles)[source]¶

Fix bond orders in a molecular structure using a reference SMILES.

This method attempts to correct bond orders in a molecular structure by using a reference SMILES string as a template.

Parameters:

sdf_string (str) – SDF format string of the molecular structure
smiles (str) – Reference SMILES string to use as template

Returns:

RDKit molecule with corrected bond orders, or False if failed

Return type:

Chem.Mol

Note

Template SMILES should represent the same molecule
Returns False if correction fails

standardize_smiles(smiles)[source]¶

Standardize a SMILES string to canonical form.

This method converts a SMILES string to its canonical tautomeric form using RDKit’s MolStandardize module. This ensures consistent comparison between different representations of the same molecule.

Parameters:: smiles (str) – Input SMILES string
Returns:: Canonical, standardized SMILES string, or None if invalid
Return type:: str
Raises:: RuntimeError – If SMILES standardization fails

Note

Handles tautomeric forms automatically
Removes stereochemistry information
Returns None for invalid SMILES strings
Uses RDKit’s MolStandardize for robust standardization

validate_ligand(sdf_string, smiles, threshold=1)[source]¶

Validate a ligand structure against a reference SMILES string.

This method performs comprehensive validation of a predicted ligand structure by comparing it with a reference SMILES string. It checks chemical validity, atom counts, bond connectivity, and bond orders.

Parameters:

sdf_string (str) – SDF format string of the predicted ligand structure
smiles (str) – Reference SMILES string of the same molecule as in the pdb file for comparison
threshold (int, optional) – Maximum allowed bond differences. Defaults to 1.

Returns:

True if structure passes validation, False otherwise

Return type:

bool

Note

Validation steps: 1. Chemical validity check (can be converted to SMILES) 2. Atom count comparison with reference 3. Bond connectivity comparison (adjacency matrices) 4. Bond order assignment and verification - Higher threshold allows more bond differences - Returns False if any step fails

Analyser¶

class docktopus.analyser.Analyser(workdir)[source]¶

Bases: object

Analyser is a utility class for analyzing molecular docking results, Currently offers computing distances Fe-ligand distances for CYP complexes. Still has limited funcitonality of only Fe distances calculation and calculating MAE with bootstrapped errors. Contains some unused functions for future functionalities of full post-docking analysis.

This class provides methods to:

Compute Fe distances between docked and reference ligand/hem structures.
Facilitate downstream statistical analysis of docking results.

Parameters:: workdir (str) – The working directory containing the docking and reference results. Expected subdirectories: ‘docked’ and ‘ref’.

Examples

>>> analyser = Analyser(workdir="./results")
>>> docked_data = ("./results/docked/sample1-docked.pdb", "./results/docked/sample1-hem.pdb")
>>> reference_data = ("./results/ref/sample1-ligand.pdb", "./results/ref/sample1-hem.pdb")
>>> ligand_dist, docked_dist = analyser.get_Fedistance(docked_data, reference_data, kind="Fed1")
>>> print("Ligand Fe distance:", ligand_dist)
>>> print("Docked Fe distance:", docked_dist)

bootstrap_mae_error(data, ref_data, n_bootstrap=1000, confidence_level=0.95)[source]¶

bootstrap_ratio_error(Fe_dist_ratio, n_bootstrap=1000, confidence_level=0.95)[source]¶

calculate_molecular_weights(smiles_list)[source]¶

Calculate molecular weights for a list of SMILES strings.

Parameters:: smiles_list (list of str) – List of SMILES strings.
Returns:: Molecular weights for each molecule.
Return type:: list of float

calculate_num_rotatable_bonds(smiles_list)[source]¶

Calculate the number of rotatable bonds for a list of SMILES strings.

Parameters:: smiles_list (list of str) – List of SMILES strings.
Returns:: Number of rotatable bonds for each molecule.
Return type:: list of int

get_Fedistance(docked_data, reference_data, kind)[source]¶

get_sample_Fedist(sample_names, kind)[source]¶

Wrapper function to compute Fe distances for a list of sample names. !! DOES NOT WORK CURRENTLY !! Main issue is to provide consistent filenaming to traverse over full dataset of docking resuts. Currently contains hardcoded names for docked_{ligand, hem} and reference_{ligand, hem} which are not consistent with the rest of the library.

Parameters:

sample_names (list of str) – List of sample names.
kind (str) – The kind of Fe distance to compute (“Fed1”, “Fed2”, “Fed3”).

Returns:

DataFrame with columns [‘sample’, ‘ligand_dist’, ‘docked_dist’].

Return type:

pd.DataFrame

get_similarity_scores(smiles_list)[source]¶

Parameters:: smiles_list (list[str])
Return type:: list

API Reference¶

Main Classes¶

Docking¶

DataPreprocessor¶

Docking Engines¶

GNINA Engine¶

Vina Engine¶

GalaxyDock2 HEME Engine¶

RFAA Engine¶

Validation¶

RFAA Validator¶

Analyser¶

docktopus

Navigation

Related Topics