API Reference

This page contains the complete API reference for the DOCKTOPUS package.

Main Classes

Docking

class docktopus.docking.Docking(engine, work_dir, model_runner_class=None, **engine_params)[source]

Bases: object

Main orchestrator class for molecular docking workflows.

This class provides a unified interface for running molecular docking simulations using various docking engines. It handles input preparation, docking execution, and results processing in a streamlined workflow.

The class supports multiple docking engines: - GNINA: Deep learning-based docking with CNN scoring - Vina: Traditional molecular docking with empirical scoring - GalaxyDock2 HEME: Specialized docking for heme-containing proteins - RFAA: AlphaFold2-based protein-ligand structure prediction

Parameters:
  • engine (str)

  • work_dir (str)

work_dir

Base directory for all workflow outputs

Type:

Path

logger

Logger instance for workflow events

Type:

logging.Logger

preprocessor

Instance for molecular preparation

Type:

DataPreprocessor

engine

Docking engine instance (type depends on engine parameter)

Example Usage:
>>> from docktopus import Docking
>>> # Initialize Vina docking
>>> dock = Docking(
...     engine="vina",
...     work_dir="./test-data",
...     box_center=[54.426, 78.117, 10.330],
...     box_size=[15, 15, 15],
...     seed=1000,
...     cpu=4,
...     exhaustiveness=8,
...     num_modes=3
... )
>>> # Prepare ligands from SMILES
>>> smiles = ["CCNC(=O)c1ccc2c(c1)NC(=O)/C2=C(\Nc1ccc(CN(C)C)cc1)c1ccccc1"]
>>> ligands = dock.prepare_ligands(smiles)
>>> # Prepare receptor
>>> receptor = "test-data/1W0F-cyp.pdb"
>>> target = dock.prepare_receptor(receptor_pdb=receptor)
>>> # Run docking
>>> results = dock.dock(
...     receptor=target,
...     ligands=ligands,
...     smiles=smiles
... )
__init__(engine, work_dir, model_runner_class=None, **engine_params)[source]

Initialize the docking workflow orchestrator.

Parameters:
  • engine (str) – Name of docking engine to use. Supported values: - ‘gnina’: GNINA deep learning docking engine - ‘vina’: AutoDock Vina docking engine - ‘galaxydock2-heme’: GalaxyDock2 HEME specialized engine - ‘rfaa’: RFAA AlphaFold2-based engine

  • work_dir (str) – Base directory for all workflow outputs. Will be created if it doesn’t exist.

  • model_runner_class – ModelRunner class for RFAA engine (required if engine=’rfaa’)

  • **engine_params – Additional parameters passed to the specific docking engine. Common parameters for non-rfaa engines include: - box_center: (x,y,z) coordinates of docking box center - box_size: (x,y,z) dimensions of docking box - exhaustiveness: Search exhaustiveness (higher = more thorough) - num_modes: Number of binding modes to generate - cpu: Number of CPU cores to use - seed: Random seed for reproducibility

  • executable (GNINA engine requires to pass path to gnina)

  • while (GalaxyDock2 HEME requires path to the driver script)

  • RFAA. (RosettaFold-All-Atoms requires ModelRunner object which you need to import directly from rf2aa.run_inference module shipped with)

  • RFAA (If you intend to use)

  • directory. (your driver script should be in the top RFAA repository)

Raises:
  • ValueError – If unsupported engine is specified or required parameters are missing

  • FileNotFoundError – If required executables or files are not found

Example

>>> # Initialize GNINA docking
>>> docking = Docking(
...     engine='gnina',
...     work_dir='./docking_results',
...     box_center=(10.0, 20.0, 30.0),
...     box_size=(20.0, 20.0, 20.0),
...     gnina_path="/home/username/gnina"
... )
>>> # Initialize RFAA docking
>>> from rf2aa.run_inference import ModelRunner
>>> docking = Docking(
...     engine='rfaa',
...     work_dir='./rfaa_results',
...     model_runner_class=ModelRunner
... )
dock(ligands, receptor, smiles=None, **kwargs)[source]

Orchestrate docking for single or multiple ligands.

This method automatically determines whether to run single or batch docking based on the input type. It performs pre-docking checks and handles both file-based and SMILES-based ligand inputs. The method assumes the provided ligand (and receptor) files are already protonated to the correct pH. To prepare them refer to prepare_ligand (prepare_receptor) methods.

Parameters:
  • ligands – Ligand input specification. Can be: - str: Path to single ligand file - list: List of ligand file paths for batch docking - str: SMILES string (single molecule) - list: List of SMILES strings (batch processing)

  • receptor (str) – Path to receptor structure file

  • smiles (Optional[str]) – SMILES string corresponding to the ligand(s). Required for RFAA engine, optional for others.

  • **kwargs – Additional arguments passed to dock_single or dock_many methods. Common parameters include: - output_prefix: Optional prefix for output files - prepare_inputs: Whether to preprocess input files (default: True)

Returns:

Docking results. For single ligand: dictionary containing scores and output file paths. For multiple ligands: list of result dictionaries.

Return type:

Union[Dict, List[Dict]]

Raises:
  • ValueError – If ligands parameter is not a string or list

  • FileNotFoundError – If input files are not found

  • RuntimeError – If docking engine fails

Example

>>> # Single ligand docking
>>> result = docking.dock(
...     ligands='ligand.sdf',
...     receptor='protein.pdb'
... )
>>> print(f"Docking metrics: {result}")
>>> # Batch docking with multiple ligands
>>> results = docking.dock(
...     ligands=['lig1.sdf', 'lig2.sdf', 'lig3.sdf'],
...     receptor='protein.pdb'
... )
>>> for i, result in enumerate(results):
...     print(f"Ligand {i+1}: {result['scores'][0]['affinity']}")
>>> # SMILES-based docking
>>> result = docking.dock(
...     ligands='CC(=O)OC1=CC=CC=C1C(=O)O',
...     receptor='protein.pdb',
...     smiles='CC(=O)OC1=CC=CC=C1C(=O)O'
... )
dock_many(receptor_file, ligand_files, smiles, box_center=None, box_size=(30.0, 30.0, 30.0))[source]

Perform docking for multiple receptor-ligand pairs.

This method executes batch docking for multiple ligands against a single receptor. It processes each ligand individually and collects results, with error handling to ensure that failures of individual ligands don’t stop the entire batch.

Parameters:
  • receptor_file (str) – Path to receptor structure file

  • ligand_files (List[str]) – List of paths to ligand structure files

  • smiles (Optional[str]) – SMILES string corresponding to the ligands. Required for RFAA engine, optional for others.

  • box_center (Optional[Tuple[float, float, float]]) – (x,y,z) coordinates of docking box center. If None, the engine will use ligand center. Defaults to None.

  • box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).

Returns:

List of docking result dictionaries. Each dictionary has the same structure as returned by dock_single(). Failed dockings will have an “error” key with the error message.

Return type:

List[Dict]

Raises:
  • FileNotFoundError – If receptor file is not found

  • ValueError – If ligand_files is not a list

Example

>>> ligand_files = ['lig1.sdf', 'lig2.sdf', 'lig3.sdf']
>>> results = docking.dock_many(
...     receptor_file='protein.pdb',
...     ligand_files=ligand_files
... )
>>> for i, result in enumerate(results):
...     if 'error' in result:
...         print(f"Ligand {i+1} failed: {result['error']}")
...     else:
...         print(f"Ligand {i+1} score: {result['scores'][0]['affinity']}")

Note

  • Each ligand is processed independently

  • Failed dockings are logged but don’t stop the batch

  • Results maintain the same order as input ligand_files

  • Error handling ensures robust batch processing

dock_single(receptor_file, ligand_file, smiles=None, output_prefix=None, box_center=None, box_size=(30.0, 30.0, 30.0))[source]

Perform docking for a single receptor-ligand pair.

This method executes the actual docking calculation using the configured docking engine. It handles the specific requirements of each engine and returns comprehensive results including scores and output file paths.

Parameters:
  • receptor_file (str) – Path to receptor structure file

  • ligand_file (str) – Path to ligand structure file

  • smiles (Optional[str]) – SMILES string corresponding to the ligand. Required for RFAA engine, optional for others.

  • output_prefix (Optional[str]) – Optional prefix for output files. If not provided, uses the ligand filename stem.

  • box_center (Optional[Tuple[float, float, float]]) – (x,y,z) coordinates of docking box center. If None, the engine will use ligand center. Defaults to None.

  • box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).

Returns:

Dictionary containing docking results with the following keys:
  • output_file: Path to the main output file (docked structure)

  • log_file: Path to the docking log file

  • scores: List of dictionaries containing scores for each pose

  • valid: (RFAA only) Boolean indicating if the result passed validation

  • metrics: (RFAA only) Additional quality metrics

Return type:

Dict

Raises:
  • FileNotFoundError – If input files are not found

  • RuntimeError – If docking calculation fails

  • ValueError – If required parameters are missing

Example

>>> result = docking.dock_single(
...     receptor_file='protein.pdb',
...     ligand_file='ligand.sdf',
...     output_prefix='my_docking'
... )
>>> print(f"Best pose score: result")
>>> print(f"Output structure: {result}")

Note

The exact content of the scores list depends on the docking engine: - GNINA: affinity, intramol, cnn_pose, cnn_affinity - Vina: affinity - GalaxyDock2 HEME: Energy - RFAA: ligand_mean_pae, mean_plddts

prepare_ligands(ligands)[source]

Prepare ligand structures from SMILES strings for docking.

This method generates 3D conformers from SMILES strings and prepares them for docking using the appropriate preprocessor methods. The preparation process depends on the docking engine being used.

Parameters:

ligands (Union[str, List[str]]) – SMILES string or list of SMILES strings representing the molecules to prepare.

Returns:

Path(s) to prepared ligand file(s). Returns a single path if input was a single SMILES, or a list of paths if input was a list of SMILES.

Return type:

Union[str, List[str]]

Raises:
  • ValueError – If ligands parameter is not a string or list of strings

  • RuntimeError – If conformer generation or preparation fails

Example

>>> # Prepare single ligand
>>> prepared_file = docking.prepare_ligands('CC(=O)OC1=CC=CC=C1C(=O)O')
>>> print(f"Prepared ligand saved to: {prepared_file}")
>>> # Prepare multiple ligands
>>> smiles_list = [
...     'CC(=O)OC1=CC=CC=C1C(=O)O',
...     'c1ccccc1',
...     'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O'
... ]
>>> prepared_files = docking.prepare_ligands(smiles_list)
>>> print(f"Prepared {len(prepared_files)} ligands")

Note

The preparation process includes: 1. 3D conformer generation from SMILES using openbabel 2. Hydrogen addition at physiological pH=7.4 3. Format conversion to engine-specific requirements 4. For RFAA engine, you don’t need to use this function

prepare_receptor(receptor_pdb)[source]

Prepare receptor structure for docking.

This method handles receptor preparation including protonation and format conversion as required by the specific docking engine. Some engines (like RFAA) may not require receptor preparation.

Parameters:

receptor_pdb (str) – Path to receptor structure file (typically PDB format)

Returns:

Path to prepared receptor file, or None if no preparation is required (e.g., for RFAA engine).

Return type:

Optional[str]

Raises:
  • FileNotFoundError – If receptor file is not found

  • RuntimeError – If receptor preparation fails

Example

>>> prepared_receptor = docking.prepare_receptor('protein.pdb')
>>> if prepared_receptor:
...     print(f"Prepared receptor saved to: {prepared_receptor}")
... else:
...     print("No receptor preparation required")

Note

Preparation steps may include: - Hydrogen addition at physiological pH=7.4 - Format conversion (e.g., PDB to PDBQT for Vina) - Structure cleaning and validation

DataPreprocessor

class docktopus.preprocessor.DataPreprocessor(work_dir)[source]

Bases: object

Handles molecular preparation and format conversion for docking workflows.

This class provides methods for preparing molecular structures for docking simulations, including protonation, format conversion, and 3D conformer generation. It uses Open Babel for molecular manipulation and supports various input/output formats. This is a helper class called internally by the Docking class.

The preprocessor handles: - Hydrogen addition at specified pH values - Format conversion between molecular file formats - Protein and ligand preparation for specific docking engines - 3D conformer generation from SMILES strings

Parameters:

work_dir (str)

work_dir

Directory where processed files are stored

Type:

Path

__init__(work_dir)[source]

Initialize the preprocessor with a working directory.

Parameters:

work_dir (str) – Directory where processed files will be stored. Will be created if it doesn’t exist.

Example

>>> preprocessor = DataPreprocessor('./molecular_data')
convert_format(input_file, output_file, remove_hydrogens=False)[source]

Convert between molecular file formats.

This method converts molecular structures between different file formats using Open Babel. It can optionally remove hydrogens during conversion.

Parameters:
  • input_file (str) – Path to input file

  • output_file (str) – Path to output file

  • remove_hydrogens (bool, optional) – Whether to remove hydrogens during conversion. Defaults to False.

Returns:

Path to the converted file

Return type:

str

Raises:
  • FileNotFoundError – If input file doesn’t exist

  • RuntimeError – If format conversion fails

Example

>>> # Convert SDF to MOL2
>>> mol2_file = preprocessor.convert_format(
...     'ligand.sdf',
...     'ligand.mol2'
... )
>>> # Convert PDB to PDBQT (removing hydrogens)
>>> pdbqt_file = preprocessor.convert_format(
...     'protein.pdb',
...     'protein.pdbqt',
...     remove_hydrogens=True
... )

Note

  • Input and output formats are determined by file extensions

  • Common formats: SDF, PDB, MOL2, PDBQT, SMILES

  • Removing hydrogens can be useful for certain docking engines

generate_conformers(smiles, output_file)[source]

Generate a 3D conformer from a SMILES string and write to an SDF file.

This method generates a single 3D conformer from a SMILES string using Open Babel’s 3D coordinate generation. The resulting structure is saved in SDF format without hydrogens added.

Parameters:
  • smiles (str) – SMILES string of the molecule

  • output_file (str) – Path to save the 3D structure (should have .sdf extension)

Returns:

Path to the output SDF file containing the 3D conformer

Return type:

str

Raises:
  • ValueError – If SMILES string is invalid

  • RuntimeError – If 3D generation fails

Example

>>> output_file = preprocessor.generate_conformers(
...     'CC(=O)OC1=CC=CC=C1C(=O)O',
...     'aspirin_3d.sdf'
... )
>>> print(f"3D structure saved to: {output_file}")

Note

  • Generates only one conformer (not multiple conformers)

  • Does not add hydrogens (use protonate() if needed)

  • Uses Open Babel’s make3D() method for coordinate generation

  • Output is always in SDF format regardless of output_file extension

prepare_ligand(ligand_file, format='sdf', output_dir=None)[source]

Prepare ligand structure for docking.

This method prepares ligand structures for docking by adding all hydrogens at physiological pH. It’s designed for general-purpose ligand preparation and works with most docking engines.

Parameters:
  • ligand_file (str) – Path to ligand structure file

  • format (str, optional) – Output format for prepared ligand. Defaults to “sdf”.

  • output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared ligand file

Return type:

str

Raises:
  • FileNotFoundError – If ligand file doesn’t exist

  • RuntimeError – If ligand preparation fails

Example

>>> prepared_ligand = preprocessor.prepare_ligand(
...     'molecule.sdf',
...     format='sdf',
...     output_dir='./prepared'
... )
>>> print(f"Prepared ligand: {prepared_ligand}")

Note

  • Adds all hydrogens at pH 7.4

  • Useful for most docking engines that require explicit hydrogens

  • Output filename includes “_prepared” suffix

prepare_ligand_vina(ligand_file, pH=7.4, output_dir=None)[source]

Prepare ligand structure specifically for Vina docking.

This method prepares ligand structures for AutoDock Vina by converting them to PDBQT format with appropriate hydrogen handling. Vina requires PDBQT format with specific atom types, charges, and rotatable bonds.

Parameters:
  • ligand_file (str) – Path to ligand structure file (typically SDF format)

  • pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4.

  • output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared ligand file in PDBQT format

Return type:

str

Raises:
  • FileNotFoundError – If ligand file doesn’t exist

  • RuntimeError – If ligand preparation fails

  • subprocess.CalledProcessError – If Open Babel conversion fails

Example

>>> vina_ligand = preprocessor.prepare_ligand_vina(
...     'molecule.sdf',
...     pH=7.4,
...     output_dir='./vina_prepared'
... )
>>> print(f"Vina-ready ligand: {vina_ligand}")

Note

  • Converts to PDBQT format required by Vina

  • Adds polar hydrogens, removes non-polar hydrogens (-xpnh flag)

  • Assigns atom types, charges, and rotatable bonds

  • Assumes SDF input format (modify cmd if using different format)

prepare_protein(protein_file, format='pdb', output_dir=None)[source]

Prepare protein structure for docking.

This method prepares protein structures for docking by adding polar hydrogens at physiological pH.

Parameters:
  • protein_file (str) – Path to protein structure file

  • format (str, optional) – Output format for prepared protein. Defaults to “pdb”.

  • output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared protein file

Return type:

str

Raises:
  • FileNotFoundError – If protein file doesn’t exist

  • RuntimeError – If protein preparation fails

Example

>>> prepared_protein = preprocessor.prepare_protein(
...     'receptor.pdb',
...     format='pdb',
...     output_dir='./prepared'
... )
>>> print(f"Prepared protein: {prepared_protein}")

Note

  • Adds polar hydrogens at pH 7.4

  • Preserves non-polar hydrogens if present

  • Output filename includes “_prepared” suffix

prepare_protein_vina(protein_file, pH=7.4, output_dir=None)[source]

Prepare protein structure specifically for Vina docking.

This method prepares protein structures for AutoDock Vina by converting them to PDBQT format with appropriate hydrogen handling. Vina requires PDBQT format with specific atom types which is properly handled by obabel binary instead of pybel.

Parameters:
  • protein_file (str) – Path to protein structure file (typically PDB format)

  • pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4.

  • output_dir (Optional[str], optional) – Directory to save processed file. If None, uses the preprocessor’s work_dir. Defaults to None.

Returns:

Path to the prepared protein file in PDBQT format

Return type:

str

Raises:
  • FileNotFoundError – If protein file doesn’t exist

  • RuntimeError – If protein preparation fails

  • subprocess.CalledProcessError – If Open Babel conversion fails

Example

>>> vina_protein = preprocessor.prepare_protein_vina(
...     'receptor.pdb',
...     pH=7.4,
...     output_dir='./vina_prepared'
... )
>>> print(f"Vina-ready protein: {vina_protein}")

Note

  • Converts to PDBQT format required by Vina using system call to obabel binary

  • Removes non-polar hydrogens (-xr flag)

  • Adds polar hydrogens at specified pH

protonate(input_file, output_file, pH=7.4, polar_only=True)[source]

Add hydrogens to a molecule at specified pH.

This method uses Open Babel to add hydrogens to molecular structures based on the specified pH value. It can add either all hydrogens or only polar hydrogens depending on the polar_only parameter.

Parameters:
  • input_file (str) – Path to input structure file

  • output_file (str) – Path to save protonated structure

  • pH (float, optional) – pH value for protonation state calculation. Defaults to 7.4 (physiological pH).

  • polar_only (bool, optional) – If True, only add polar hydrogens. If False, add all hydrogens. Defaults to True.

Returns:

Path to the protonated structure file

Return type:

str

Raises:
  • FileNotFoundError – If input file doesn’t exist

  • RuntimeError – If protonation fails

Example

>>> protonated_file = preprocessor.protonate(
...     'molecule.sdf',
...     'molecule_protonated.sdf',
...     pH=7.4,
...     polar_only=False
... )

Note

  • Supports various input formats (SDF, PDB, MOL2, etc.)

  • Output format is determined by file extension

  • pH affects the protonation state of titratable groups

Docking Engines

GNINA Engine

class docktopus.gnina_engine.GninaDockingEngine(gnina_path, work_dir, seed=0, exhaustiveness=16, num_modes=9, cpu=4)[source]

Bases: object

GNINA-specific docking engine implementation.

This class provides an interface to the GNINA docking engine, which combines traditional molecular docking with deep learning-based scoring using convolutional neural networks (CNNs). GNINA is particularly effective for structure-based drug design and virtual screening.

GNINA features: - Traditional Vina scoring function - CNN-based pose scoring and affinity prediction - Support for flexible docking - Automatic binding site detection - Multiple output poses with comprehensive scoring

Parameters:
  • gnina_path (str)

  • work_dir (str)

  • seed (int)

  • exhaustiveness (int)

  • num_modes (int)

  • cpu (int)

gnina_path

Path to GNINA executable

Type:

str

work_dir

Directory for docking outputs

Type:

Path

receptor_format

Expected receptor file format (“pdb”)

Type:

str

ligand_format

Expected ligand file format (“sdf”)

Type:

str

exhaustiveness

Search exhaustiveness parameter

Type:

int

num_modes

Number of binding modes to generate

Type:

int

cpu

Number of CPU cores to use

Type:

int

autobox_ligand

Whether to use ligand for automatic box detection

Type:

bool

seed

Random seed for reproducibility

Type:

int

logger

Logger instance for engine events

Type:

logging.Logger

__init__(gnina_path, work_dir, seed=0, exhaustiveness=16, num_modes=9, cpu=4)[source]

Initialize GNINA docking engine.

Parameters:
  • gnina_path (str) – Path to GNINA executable. Must be a valid path to the GNINA binary.

  • work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.

  • seed (int, optional) – Random seed for reproducibility. Defaults to 0.

  • exhaustiveness (int, optional) – Search exhaustiveness (higher values give more thorough but slower searches). Defaults to 16.

  • num_modes (int, optional) – Number of binding modes to generate. Defaults to 9.

  • cpu (int, optional) – Number of CPU cores to use for docking. Defaults to 4.

  • autobox_ligand (bool, optional) – If True and no box_center is provided, automatically determine box center from ligand. Defaults to True.

Raises:
  • FileNotFoundError – If GNINA executable is not found at the specified path

  • ValueError – If invalid parameters are provided

dock(receptor_file, ligand_file, box_size=(30.0, 30.0, 30.0), box_center=None, output_prefix=None)[source]

Perform docking using GNINA.

This method executes GNINA docking with the specified parameters and returns comprehensive results including multiple poses with both traditional and CNN-based scores.

Parameters:
  • receptor_file (str) – Path to prepared receptor file (PDB format)

  • ligand_file (str) – Path to prepared ligand file (SDF format)

  • box_center (Optional[Tuple[float, float, float]], optional) – (x,y,z) coordinates of docking box center. If uses autoboxing ligand center. Defaults to None.

  • box_size (Tuple[float, float, float], optional) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).

  • output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.

Returns:

Dictionary containing docking results with keys:
  • output_file: Path to SDF file with docked poses

  • log_file: Path to GNINA log file with detailed output

  • scores: List of dictionaries, each containing scores for one pose:
    • pose: Pose number (1-based)

    • affinity: Vina binding affinity (kcal/mol)

    • intramol: Intramolecular energy (kcal/mol)

    • cnn_pose: CNN pose score

    • cnn_affinity: CNN affinity prediction

Return type:

Dict[str, Any]

Raises:
  • FileNotFoundError – If input files are not found

  • subprocess.CalledProcessError – If GNINA execution fails

  • RuntimeError – If score parsing fails

Note

  • Receptor should be in PDB format with polar hydrogens

  • Ligand should be in SDF format with all hydrogens

  • Box parameters are applied as specified during initialization

  • All poses are saved in a single SDF file

  • Log file contains detailed GNINA output and diagnostics

precheck(file_path)[source]

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:

file_path (str) – Path to the file to check

Returns:

True if the file exists, False otherwise

Return type:

bool

Note

  • Only checks file existence, not file validity

  • Does not verify file format or content

  • Useful for basic input validation

Vina Engine

class docktopus.vina_engine.VinaDockingEngine(work_dir, exhaustiveness=8, num_modes=9, cpu=4, seed=0)[source]

Bases: object

Vina-specific docking engine implementation using the vina Python interface.

This class provides an interface to AutoDock Vina, a popular molecular docking program that uses an empirical scoring function based on the AutoDock 4 force field. Vina is known for its speed and accuracy in structure-based drug design.

Vina features: - Empirical scoring function (Vina scoring) - Fast conformational search using iterated local search - Support for flexible ligand docking - Automatic binding site detection - Multiple output poses with binding affinities

Parameters:
  • work_dir (str)

  • exhaustiveness (int)

  • num_modes (int)

  • cpu (int)

  • seed (int)

work_dir

Directory for docking outputs

Type:

Path

receptor_format

Expected receptor file format (“pdbqt”)

Type:

str

ligand_format

Expected ligand file format (“pdbqt”)

Type:

str

exhaustiveness

Search exhaustiveness parameter

Type:

int

num_modes

Number of binding modes to generate

Type:

int

cpu

Number of CPU cores to use

Type:

int

seed

Random seed for reproducibility

Type:

int

vina

Vina object from the vina Python package

logger

Logger instance for engine events

Type:

logging.Logger

__init__(work_dir, exhaustiveness=8, num_modes=9, cpu=4, seed=0)[source]

Initialize Vina docking engine.

Parameters:
  • work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.

  • exhaustiveness (int, optional) – Search exhaustiveness (higher values give more thorough but slower searches). Defaults to 8.

  • num_modes (int, optional) – Number of binding modes to generate. Defaults to 9.

  • cpu (int, optional) – Number of CPU cores to use for docking. Defaults to 4.

  • seed (int, optional) – Random seed for reproducibility. Defaults to 0.

Raises:
  • ImportError – If vina Python package is not available

  • ValueError – If invalid parameters are provided

Example

>>> engine = VinaDockingEngine(
...     work_dir='./docking_results',
...     box_center=(15.2, 23.1, 18.7),
...     box_size=(25.0, 25.0, 25.0),
...     exhaustiveness=16,
...     num_modes=20
... )

Note

Requires the vina Python package to be installed: pip install vina

dock(receptor_file, box_center, ligand_file, box_size=(30.0, 30.0, 30.0), output_prefix=None)[source]

Perform docking using Vina.

This method executes AutoDock Vina docking with the specified parameters and returns results including multiple poses with binding affinities.

Parameters:
  • receptor_file (str) – Path to prepared receptor file (PDBQT format)

  • ligand_file (str) – Path to prepared ligand file (PDBQT format)

  • box_center (Tuple[float, float, float]) – (x,y,z) coordinates of docking box center. If None, uses ligand center. Defaults to None.

  • box_size (Tuple[float, float, float]) – (x,y,z) dimensions of search box in Angstroms. Defaults to (30.0, 30.0, 30.0).

  • output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.

Returns:

Dictionary containing docking results with keys:
  • output_file: Path to PDBQT file with docked poses

  • log_file: Path to Vina log file with detailed output

  • scores: List of dictionaries, each containing scores for one pose:
    • pose: Pose number (1-based)

    • affinity: Binding affinity in kcal/mol

Return type:

Dict[str, Any]

Raises:
  • FileNotFoundError – If input files are not found

  • RuntimeError – If Vina execution fails

  • ImportError – If vina package is not available

Note

  • Receptor and ligand must be in PDBQT format

  • PDBQT format includes atom types, charges, and rotatable bonds

  • Box parameters are applied as specified during initialization

  • All poses are saved in a single PDBQT file

  • Log file contains detailed Vina output and diagnostics

precheck(file_path)[source]

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:

file_path (str) – Path to the file to check

Returns:

True if the file exists, False otherwise

Return type:

bool

Note

  • Only checks file existence, not file validity

  • Does not verify file format or content

  • Useful for basic input validation

GalaxyDock2 HEME Engine

class docktopus.gdock_engine.GDockHEMEDockingEngine(gdock_dir, work_dir, seed=0)[source]

Bases: object

GalaxyDock2 HEME-specific docking engine implementation.

This class provides an interface to GalaxyDock2 HEME, a specialized docking program designed for heme-containing proteins such as cytochromes P450. GalaxyDock2 HEME incorporates heme-specific scoring functions and binding site considerations.

GalaxyDock2 HEME features: - Specialized scoring for heme-containing proteins - Heme-specific binding site detection - Support for heme-ligand interactions - Multiple output poses with comprehensive scoring - Optimized for cytochrome P450 and similar enzymes

Parameters:
  • gdock_dir (str)

  • work_dir (str)

  • seed (int)

gdock_dir

Path to GalaxyDock2 HEME installation directory

Type:

Path

work_dir

Directory for docking outputs

Type:

Path

gd2_scratch_dir

Scratch directory for GalaxyDock2 HEME operations

Type:

Path

receptor_format

Expected receptor file format (“pdb”)

Type:

str

ligand_format

Expected ligand file format (“mol2”)

Type:

str

box_center

Docking box center coordinates

Type:

Optional[Tuple[float, float, float]]

seed

Random seed for reproducibility

Type:

int

gdock_script

Path to GalaxyDock2 HEME Python script

Type:

Path

logger

Logger instance for engine events

Type:

logging.Logger

__init__(gdock_dir, work_dir, seed=0)[source]

Initialize GalaxyDock2 HEME docking engine.

Parameters:
  • gdock_dir (str) – Path to GalaxyDock2 HEME installation directory. Must contain the script/run_GalaxyDock2_heme.py file.

  • work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.

  • seed (int, optional) – Random seed for reproducibility. Defaults to 0.

Raises:
  • FileNotFoundError – If GalaxyDock2 HEME script is not found

  • ValueError – If required parameters are missing

Note

  • Requires GalaxyDock2 HEME to be installed and properly configured

  • Box center coordinates are essential for heme docking

  • Creates scratch directory for temporary files

dock(receptor_file, ligand_file, box_center, box_size=(30, 30, 30), output_prefix=None)[source]

Perform docking using GalaxyDock2 HEME.

This method executes GalaxyDock2 HEME docking with the specified parameters and returns results including multiple poses with heme-specific scores.

Parameters:
  • receptor_file (str) – Path to prepared receptor file (protonated PDB format)

  • ligand_file (str) – Path to prepared ligand file (protonated MOL2 format)

  • box_center (Tuple[float, float, float]) – (x,y,z) coordinates of docking box center.

  • box_size (Tuple[float, float, float]) – Size in Angstroms of the docking box. Defaults to (30, 30, 30)

  • output_prefix (Optional[str], optional) – Prefix for output files. If None, uses the ligand filename stem. Defaults to None.

Returns:

Dictionary containing docking results with keys:
  • output_file: Path to MOL2 file with docked poses

  • log_file: Path to GalaxyDock2 HEME log file

  • scores: Dictionary containing pose information:
    • poses: List of dictionaries, each containing:
      • pose: Pose number (1-based)

      • Energy: Total GalaxyDock2 HEME score

Return type:

Dict[str, Any]

Raises:
  • FileNotFoundError – If input files are not found

  • subprocess.CalledProcessError – If GalaxyDock2 HEME execution fails

  • RuntimeError – If score parsing fails

Note

  • Receptor should be in PDB format with polar hydrogens

  • Ligand should be in MOL2 format with all hydrogens

  • Box center coordinates are required and used for docking

  • Output is in MOL2 format with multiple poses

  • Scores are extracted from GalaxyDock2 HEME energy files

precheck(file_path)[source]

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:

file_path (str) – Path to the file to check

Returns:

True if the file exists, False otherwise

Return type:

bool

Note

  • Only checks file existence, not file validity

  • Does not verify file format or content

  • Useful for basic input validation

RFAA Engine

class docktopus.rfaa_engine.RFAADockingEngine(work_dir, model_runner_class, target='humanCYP3A4')[source]

Bases: object

RosettaFoldAll-Atoms (RFAA) specific docking engine implementation using the model.

This class provides an interface to RFAA, which uses fully flexible protein structure prediction for protein-ligand complex modeling. The prediction results can be validated (enabled by default) to remove halucinated poses.

RFAA features: - Fully flexible docking with explicit bonding for covalentlly bound cofactors - Quality assessment using pLDDT and PAE metrics - Post-docking validation to detect hallucinations - Support for heme-containing proteins (CYP3A4)

Parameters:

work_dir (str)

work_dir

Directory for docking outputs

Type:

Path

model_runner_class

ModelRunner class from rf2aa.run_inference

tmp_dir

Directory for temporary files

Type:

Path

results_dir

Directory for results files

Type:

Path

target_dir

Directory for target-specific files

Type:

Path

template_pdb

Template PDB ID for crossdocking

Type:

str

fasta_file

Path to target protein FASTA file

Type:

str

hem_file

Path to heme SDF file

Type:

str

_validator

Validator instance for post-docking checks

Type:

Optional[RFAAValidator]

validation_enabled

Whether validation is available

Type:

bool

logger

Logger instance for engine events

Type:

logging.Logger

__init__(work_dir, model_runner_class, target='humanCYP3A4')[source]

Initialize RFAA docking engine.

Parameters:
  • work_dir (str) – Directory for docking outputs. Will be created if it doesn’t exist.

  • model_runner_class – ModelRunner class from rf2aa.run_inference. Required for RFAA model execution.

Raises:
  • ImportError – If required RFAA dependencies are not available

  • FileNotFoundError – If template files are not found

  • RuntimeError – If initialization fails

Note

  • Requires rf2aa package and dependencies to be installed

  • Copies template files (FASTA, HEM SDF) to work directory

  • Attempts to initialize validation if dependencies are available

  • Creates necessary subdirectories for workflow

dock(ligand_sdf_file, smiles='')[source]

Perform docking using RFAA.

This method executes RFAA protein-ligand structure prediction using the specified ligand and target. It generates a complete protein-ligand complex structure with quality metrics.

Parameters:
  • ligand_sdf_file (str) – Path to ligand SDF file

  • smiles (str, optional) – SMILES string corresponding to the ligand. Required for validation. Defaults to “”.

  • target (str, optional) – Target protein identifier out of supported proteins. See set_target() method for list of supported targets

Returns:

Dictionary containing docking results with keys:
  • output_file: Path to PDB file with protein-ligand complex

  • log_file: Path to RFAA log file

  • valid: Boolean indicating if result passed validation (if available)

  • metrics: Dictionary containing quality metrics:
    • ligand_mean_pae: Mean PAE for ligand atoms

    • mean_plddts: Mean pLDDT for ligand atoms

Return type:

Dict[str, Any]

Raises:
  • FileNotFoundError – If input files are not found

  • RuntimeError – If RFAA execution fails

  • ImportError – If RFAA dependencies are not available

Note

  • Generates complete protein-ligand complex structure

  • Uses template-based modeling approach

  • Performs post-docking validation if available

  • Quality metrics help assess prediction reliability

  • Output is a PDB file with both protein and ligand

generate_config_files(ligand_sdf_file, output_path=None, config_dir=None)[source]

Generate RFAA config files based on provided inputs.

This method creates the configuration files required by RFAA for protein-ligand structure prediction. It uses template files and substitutes the provided parameters.

Parameters:
  • ligand_sdf_file (str) – Path to ligand SDF file

  • output_path (str, optional) – Path for output files. If None, uses tmp_dir. Defaults to None.

  • config_dir (str, optional) – Directory to save config files. If None, uses work_dir/config. Defaults to None.

Returns:

Generated configuration YAML string for RFAA

Return type:

str

Raises:
  • FileNotFoundError – If template files are not found

  • RuntimeError – If config generation fails

Note

  • Uses template configuration from package resources

  • Substitutes file paths and parameters in template

  • Configuration includes protein, ligand, and heme specifications

  • Output path is used for RFAA temporary files

precheck(file_path)[source]

Check if the provided file path exists.

This method performs a simple file existence check, which is useful for validating input files before attempting docking calculations. Runs automatically before each docking to make sure you have all the files you think you have. It does not check if those files are correct.

Parameters:

file_path (str) – Path to the file to check

Returns:

True if the file exists, False otherwise

Return type:

bool

Note

  • Only checks file existence, not file validity

  • Does not verify file format or content

set_target(target_name)[source]

Set the target protein for the RFAA docking engine.

This method updates the selected target protein and its associated configuration (such as FASTA file and residue count) based on the provided target name. It changes the internal state of the engine so that subsequent docking runs will use the new target.

Parameters:

target_name (str) –

The key corresponding to the desired target protein. Available values:

  • ”humanCYP3A4”

  • ”humanCYP2C8”

  • ”humanCYP2C9”

  • ”humanCYP2C19”

  • ”humanCYP2D6”

  • ”humanCYP2A6”

  • ”humanCYP2B6”

  • ”humanCYP2E1”

  • ”humanCYP1A2”

  • ”humanCYP2D13”

  • ”humanCYP46A1”

  • ”CYP199A4”

  • ”CYP121”

  • ”CYP105A1”

  • ”CYPcam”

  • ”CYP125”

  • ”CYP102A1”

Raises:

KeyError – If the provided target_name is not found in the available targets.

Validation

RFAA Validator

class docktopus.validator.RFAAValidator[source]

Bases: object

Validator for RFAA docking results to detect potential hallucinations.

This class provides methods to validate protein-ligand structures generated by RFAA to detect potential hallucinations (incorrectly predicted structures). It performs chemical validity checks and compares predicted structures with reference SMILES strings.

logger

Logger instance for validation events

Type:

logging.Logger

Example

>>> validator = RFAAValidator()
>>> sdf_string = validator.convert_pdb_to_sdf("mol.pdb")
>>> is_valid = validator.validate_ligand(sdf_string, reference_smiles)
>>> if is_valid:
...     print("Structure passed validation")
... else:
...     print("Structure may be hallucinated")
__init__()[source]

Initialize the RFAA validator.

Sets up logging and prepares the validator for structure validation.

Example

>>> validator = RFAAValidator()
adjacency_with_orders(mol)[source]

Create adjacency matrix with bond orders from an RDKit molecule.

This method generates a symmetric adjacency matrix where each element represents the bond order between two atoms (0 = no bond, 1 = single, 2 = double, 3 = triple).

Parameters:

mol (Chem.Mol) – RDKit molecule object

Returns:

Symmetric adjacency matrix with bond orders

Return type:

np.ndarray

Note

  • Matrix is symmetric (A[i,j] = A[j,i])

  • Diagonal elements are 0 (no self-bonds)

  • Bond orders: 1=single, 2=double, 3=triple

  • Useful for comparing molecular connectivity

convert_pdb_to_sdf(pdb_file)[source]

Convert a PDB file to SDF format using OpenBabel.

This method converts a PDB file containing molecular coordinates to SDF format using pybel which handle kekekulization of aromatic moieties better than rdkit.

Parameters:

pdb_file (str) – Path to the input PDB file

Returns:

SDF format string block of the molecule, or None if conversion fails

Return type:

str

Raises:
  • FileNotFoundError – If PDB file doesn’t exist

  • RuntimeError – If conversion fails

Note

  • Uses OpenBabel’s pybel interface for conversion

  • Returns SDF string block, not file path

  • Handles coordinate information and basic molecular properties

  • Returns None if conversion fails

count_bond_diffs(A1, A2)[source]

Count the number of bond differences between two adjacency matrices.

This method compares two adjacency matrices and counts how many bonds differ between them. Only the upper triangle is considered to avoid double counting.

Parameters:
  • A1 (np.ndarray) – First adjacency matrix

  • A2 (np.ndarray) – Second adjacency matrix

Returns:

Number of bond differences between the molecules

Return type:

int

Raises:

ValueError – If matrices have different dimensions

Note

  • Returns 0 if molecules have identical connectivity

  • Higher values indicate more structural differences

extract_ligand(input_pdb, output_pdb)[source]

Extract ligand coordinates from a protein-ligand complex PDB file.

This method uses pdb_selchain and pdb_tidy to extract only the ligand atoms (chain B) from a protein-ligand complex and clean up the PDB format.

Parameters:
  • input_pdb (str) – Path to input PDB file containing protein-ligand complex

  • output_pdb (str) – Path to output PDB file containing only ligand

Raises:
  • subprocess.CalledProcessError – If pdb_selchain or pdb_tidy fails

  • IOError – If output file cannot be written

Note

  • Assumes ligand is in chain B of the complex

  • Requires pdb_selchain and pdb_tidy executables to be installed

  • Output PDB is cleaned and formatted for further processing

fix_bond_orders(sdf_string, smiles)[source]

Fix bond orders in a molecular structure using a reference SMILES.

This method attempts to correct bond orders in a molecular structure by using a reference SMILES string as a template.

Parameters:
  • sdf_string (str) – SDF format string of the molecular structure

  • smiles (str) – Reference SMILES string to use as template

Returns:

RDKit molecule with corrected bond orders, or False if failed

Return type:

Chem.Mol

Note

  • Template SMILES should represent the same molecule

  • Returns False if correction fails

standardize_smiles(smiles)[source]

Standardize a SMILES string to canonical form.

This method converts a SMILES string to its canonical tautomeric form using RDKit’s MolStandardize module. This ensures consistent comparison between different representations of the same molecule.

Parameters:

smiles (str) – Input SMILES string

Returns:

Canonical, standardized SMILES string, or None if invalid

Return type:

str

Raises:

RuntimeError – If SMILES standardization fails

Note

  • Handles tautomeric forms automatically

  • Removes stereochemistry information

  • Returns None for invalid SMILES strings

  • Uses RDKit’s MolStandardize for robust standardization

validate_ligand(sdf_string, smiles, threshold=1)[source]

Validate a ligand structure against a reference SMILES string.

This method performs comprehensive validation of a predicted ligand structure by comparing it with a reference SMILES string. It checks chemical validity, atom counts, bond connectivity, and bond orders.

Parameters:
  • sdf_string (str) – SDF format string of the predicted ligand structure

  • smiles (str) – Reference SMILES string of the same molecule as in the pdb file for comparison

  • threshold (int, optional) – Maximum allowed bond differences. Defaults to 1.

Returns:

True if structure passes validation, False otherwise

Return type:

bool

Note

Validation steps: 1. Chemical validity check (can be converted to SMILES) 2. Atom count comparison with reference 3. Bond connectivity comparison (adjacency matrices) 4. Bond order assignment and verification - Higher threshold allows more bond differences - Returns False if any step fails

Analyser

class docktopus.analyser.Analyser(workdir)[source]

Bases: object

Analyser is a utility class for analyzing molecular docking results, Currently offers computing distances Fe-ligand distances for CYP complexes. Still has limited funcitonality of only Fe distances calculation and calculating MAE with bootstrapped errors. Contains some unused functions for future functionalities of full post-docking analysis.

This class provides methods to:
  • Compute Fe distances between docked and reference ligand/hem structures.

  • Facilitate downstream statistical analysis of docking results.

Parameters:

workdir (str) – The working directory containing the docking and reference results. Expected subdirectories: ‘docked’ and ‘ref’.

Examples

>>> analyser = Analyser(workdir="./results")
>>> docked_data = ("./results/docked/sample1-docked.pdb", "./results/docked/sample1-hem.pdb")
>>> reference_data = ("./results/ref/sample1-ligand.pdb", "./results/ref/sample1-hem.pdb")
>>> ligand_dist, docked_dist = analyser.get_Fedistance(docked_data, reference_data, kind="Fed1")
>>> print("Ligand Fe distance:", ligand_dist)
>>> print("Docked Fe distance:", docked_dist)
bootstrap_mae_error(data, ref_data, n_bootstrap=1000, confidence_level=0.95)[source]
bootstrap_ratio_error(Fe_dist_ratio, n_bootstrap=1000, confidence_level=0.95)[source]
calculate_molecular_weights(smiles_list)[source]

Calculate molecular weights for a list of SMILES strings.

Parameters:

smiles_list (list of str) – List of SMILES strings.

Returns:

Molecular weights for each molecule.

Return type:

list of float

calculate_num_rotatable_bonds(smiles_list)[source]

Calculate the number of rotatable bonds for a list of SMILES strings.

Parameters:

smiles_list (list of str) – List of SMILES strings.

Returns:

Number of rotatable bonds for each molecule.

Return type:

list of int

get_Fedistance(docked_data, reference_data, kind)[source]
get_sample_Fedist(sample_names, kind)[source]

Wrapper function to compute Fe distances for a list of sample names. !! DOES NOT WORK CURRENTLY !! Main issue is to provide consistent filenaming to traverse over full dataset of docking resuts. Currently contains hardcoded names for docked_{ligand, hem} and reference_{ligand, hem} which are not consistent with the rest of the library.

Parameters:
  • sample_names (list of str) – List of sample names.

  • kind (str) – The kind of Fe distance to compute (“Fed1”, “Fed2”, “Fed3”).

Returns:

DataFrame with columns [‘sample’, ‘ligand_dist’, ‘docked_dist’].

Return type:

pd.DataFrame

get_similarity_scores(smiles_list)[source]
Parameters:

smiles_list (list[str])

Return type:

list