luna.mol.entry module

class ChainEntry(pdb_id, chain_id, sep=':', parser=None)[source]

Bases: luna.mol.entry.Entry

Define a chain.

Parameters
  • pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.

  • chain_id (str) – A 1-symbol chain id. Example: ‘A’.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

Raises

InvalidEntry – If the provided information does not match the PDB format.

Examples

>>> from luna.mol.entry import ChainEntry
>>> e = ChainEntry(pdb_id="3QL8", chain_id="A")
>>> print(e)
<ChainEntry: 3QL8:A>
classmethod from_string(entry_str, sep=':')[source]

Initialize from a string.

Parameters
  • entry_str (str) – A string representing the entry. Example: ‘3QL8:A’.

  • sep (str) – The separator character used in entry_str. The default value is ‘:’. For example: if entry_str is set to ‘3QL8|A’, then sep should be defined as ‘|’.

Return type

Entry

Raises

IllegalArgumentError – If the fields in entry_str do not match the format expected to define a chain.

Examples

>>> from luna.mol.entry import ChainEntry
>>> e = ChainEntry.from_string("3QL8:A", sep=":")
>>> print(e)
<ChainEntry: 3QL8:A>
property full_id

The full id of the entry is the tuple (PDB id or filename, chain id).

Type

tuple, read-only

class Entry(pdb_id, chain_id, comp_name=None, comp_num=None, comp_icode=None, is_hetatm=True, sep=':', parser=None)[source]

Bases: object

Entries determine the target molecule to which interactions and other properties will be calculated. They can be ligands, chains, etc, and can be defined in a number of ways. Each entry has an associated PDB file that may contain macromolecules (protein, RNA, DNA) and other small molecules, water, and ions. The PDB file provides the context to where the interactions with the target molecule will be calculated.

Parameters
  • pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.

  • chain_id (str) – A 1-symbol chain id. Example: ‘A’.

  • comp_name (str, optional) – A 1 to 3-symbols compound name (residue name in the PDB format). Obligatory if is_hetatm is True. Example: ‘X01’.

  • comp_num (int, optional) – A valid 4-digits integer (residue sequence number in the PDB format). Obligatory if is_hetatm is True. Example: 300 or -1.

  • comp_icode (str, optional) – A 1-character compound insertion code (residue insertion code in the PDB format). Example: ‘A’.

  • is_hetatm (bool) – If the compound is a ligand or not. The default value is True.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

  • parser (PDBParser or FTMapParser, optional) – Define a PDB parser object. If not provided, the default parser will be used.

Raises
  • IllegalArgumentError – If is_hetatm is True, but the compound name and number are not provided. If the compound number is provided but it is not an integer. If comp_icode is provided but it is not a valid character.

  • InvalidEntry – If the provided information does not match the PDB format.

Examples

Chain entry: can be used to calculate interactions with a given chain.

>>> from luna.mol.entry import Entry
>>> e = Entry(pdb_id="3QL8", chain_id="A")
>>> print(e)
<Entry: 3QL8:A>

Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).

>>> from luna.mol.entry import Entry
>>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="HIS", comp_num=125, is_hetatm=False)
>>> print(e)
<Entry: 3QL8:A:HIS:125>

Ligand entry: can be used to calculate interactions with a given ligand.

>>> from luna.mol.entry import Entry
>>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True)
>>> print(e)
<Entry: 3QL8:A:X01:300>

You can use a different character separator for the entries. For example:

>>> from luna.mol.entry import Entry
>>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True, sep="/")
>>> print(e)
<Entry: 3QL8/A/X01/300>
property chain_id

the chain id.

Type

str, read-only

property comp_icode

the compound insertion code.

Type

str, read-only

property comp_name

the compound name.

Type

str, read-only

property comp_num

the compound number.

Type

int, read-only

classmethod from_string(entry_str, is_hetatm=True, sep=':')[source]

Initialize from a string.

Parameters
  • entry_str (str) – A string representing the entry. Example: ‘3QL8:A:X01:300’.

  • is_hetatm (bool) – Defines if the compound is a ligand or not. The default value is True.

  • sep (str) – The separator character used in entry_str. The default value is ‘:’. For example: if entry_str is set to ‘3QL8|A|X01|300’, then sep should be defined as ‘|’.

Return type

Entry

Raises

IllegalArgumentError – If the fields in entry_str do not match the format expected to define a chain (ChainEntry) or a compound (MolEntry).

Examples

Chain entry: can be used to calculate interactions with a given chain.

>>> from luna.mol.entry import Entry
>>> e = Entry.from_string("3QL8:A", sep=":")
>>> print(e)
<Entry: 3QL8:A>

Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).

>>> from luna.mol.entry import Entry
>>> e = Entry.from_string("3QL8:A:HIS:125", sep=":")
>>> print(e)
<Entry: 3QL8:A:HIS:125>

Ligand entry: can be used to calculate interactions with a given ligand.

>>> from luna.mol.entry import Entry
>>> e = Entry.from_string("3QL8:A:X01:300", sep=":")
>>> print(e)
<Entry: 3QL8:A:X01:300>
property full_id

The full id of the entry is the tuple (PDB id or filename, chain id) for entries representing chains and (PDB id or filename, chain id, compound name, compound number, insertion code) for entries representing compounds.

Type

tuple, read-only

get_biopython_key(full_id=False)[source]

Represent the entry as a key to select chains or compounds from Biopython Entity objects.

Parameters full_id : bool

If True, return the full id of a chain or ligand. For chains, it consists of a tuple containing the PDB and the chain id. For ligands, it consists of a tuple containing the PDB, the chain, and the ligand id. The default value is False.

Returns

Return str if the entry represents a chain and if full_id is False. Otherwise, return a tuple.

Return type

str or tuple

Examples

>>> from luna.mol.entry import Entry
>>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True, sep=":")
>>> print(e.get_biopython_key())
('H_X01', 300, ' ')
is_valid()[source]

Check if the entry matches the expected format for protein-protein or protein-compound complexes.

Return type

bool

property pdb_id

the pdb id.

Type

str, read-only

to_string(sep=None)[source]

Convert the entry to a string using sep as a separator character.

Parameters

sep (str or None) – If None (the default), use the separator character defined during the entry object creation. Otherwise, uses sep as the separator character.

Examples

>>> from luna.mol.entry import Entry
>>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True, sep=":")
>>> print(e.to_string("/"))
3QL8/A/X01/300
class MolEntry(pdb_id, chain_id, comp_name, comp_num, comp_icode=None, sep=':', parser=None)[source]

Bases: luna.mol.entry.Entry

Define a compound from a PDB file, which can be a residue, nucleotide, or ligand.

Parameters
  • pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.

  • chain_id (str) – A 1-symbol chain id. Example: ‘A’.

  • comp_name (str) – A 1 to 3-symbols compound name (residue name in the PDB format). Example: ‘X01’.

  • comp_num (int) – A valid 4-digits integer (residue sequence number in the PDB format). Example: 300 or -1.

  • comp_icode (str, optional) – A 1-character compound insertion code (residue insertion code in the PDB format). Example: ‘A’.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

Raises

InvalidEntry – If the provided information does not match the PDB format.

Examples

Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).

>>> from luna.mol.entry import MolEntry
>>> e = MolEntry(pdb_id="3QL8", chain_id="A", comp_name="HIS", comp_num=125, is_hetatm=False)
>>> print(e)
<MolEntry: 3QL8:A:HIS:125>

Ligand entry: can be used to calculate interactions with a given ligand.

>>> from luna.mol.entry import MolEntry
>>> e = MolEntry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True)
>>> print(e)
<MolEntry: 3QL8:A:X01:300>
classmethod from_file(input_file, sep=':')[source]

Initialize from a list of strings representing compounds.

Parameters
  • input_file (str) – The file from where the list of strings (one per line) will be read from.

  • sep (str) – The separator character used in input_file. The default value is ‘:’. For example: if entries from input_file use ‘|’ as the separator, then sep should be defined as ‘|’.

Yields

MolEntry – An entry recovered from input_file.

classmethod from_string(entry_str, sep=':')[source]

Initialize from a string.

Parameters
  • entry_str (str) – A string representing the entry. Example: ‘3QL8:A:X01:300’.

  • is_hetatm (bool) – Defines if the compound is a ligand or not. The default value is True.

  • sep (str) – The separator character used in entry_str. The default value is ‘:’. For example: if entry_str is set to ‘3QL8|A|X01|300’, then sep should be defined as ‘|’.

Return type

Entry

Raises

IllegalArgumentError – If the fields in entry_str do not match the format expected to define a chain (ChainEntry) or a compound (MolEntry).

Examples

Chain entry: can be used to calculate interactions with a given chain.

>>> from luna.mol.entry import Entry
>>> e = Entry.from_string("3QL8:A", sep=":")
>>> print(e)
<Entry: 3QL8:A>

Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).

>>> from luna.mol.entry import Entry
>>> e = Entry.from_string("3QL8:A:HIS:125", sep=":")
>>> print(e)
<Entry: 3QL8:A:HIS:125>

Ligand entry: can be used to calculate interactions with a given ligand.

>>> from luna.mol.entry import Entry
>>> e = Entry.from_string("3QL8:A:X01:300", sep=":")
>>> print(e)
<Entry: 3QL8:A:X01:300>
class MolFileEntry(pdb_id, mol_id, sep=':')[source]

Bases: luna.mol.entry.Entry

Define a ligand from a molecular file. This class should be used for docking and molecular dynamics campaigns where usually one has the protein structure in the PDB format and the ligand structure in a separate molecular file.

Parameters
  • pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.

  • mol_id (str) – The ligand id in the molecular file.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

Variables
  • ~MolFileEntry.mol_id (str) – The ligand id.

  • ~MolFileEntry.mol_file (str) – Pathname of the molecular file.

  • ~MolFileEntry.mol_file_ext (str) – The molecular file format. If not provided, try to recover the molecular file extension directly from mol_file.

  • ~MolFileEntry.mol_obj_type ({'rdkit', 'openbabel'}) – Define which library (RDKit or Open Babel) to use to parse the molecular file.

  • ~MolFileEntry.overwrite_mol_name (bool) – If True, substitute the ligand name in the parsed molecular object with mol_id. Only works for single-molecule files (is_multimol_file = False) as in these cases mol_id does not need to match the ligand name in the molecular file.

  • ~MolFileEntry.is_multimol_file (bool) – If mol_file contains multiple molecules or not. If True, mol_id should match some ligand name in mol_file.

classmethod from_file(input_file, pdb_id, mol_file, **kwargs)[source]

Initialize from a list of ligand names.

Parameters
  • input_file (str) – The file from where the list of ligand names (one per line) will be read from.

  • pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.

  • mol_file (str) – Pathname of a multi-molecular file.

  • mol_file_ext (str, optional) – The molecular file format. If not provided, try to recover the molecular file extension directly from mol_file.

  • mol_obj_type ({‘rdkit’, ‘openbabel’}) – If “rdkit”, parse the converted molecule with RDKit and return an instance of rdkit.Chem.rdchem.Mol. If “openbabel”, parse the converted molecule with Open Babel and return an instance of openbabel.pybel.Molecule. The default value is ‘rdkit’.

  • autoload (bool) – If True, parse the ligand from the molecular file during the entry initialization. Otherwise, only load the ligand when first used.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

Yields

MolFileEntry – An entry recovered from input_file.

Raises
  • FileNotFoundError – If mol_file does not exist.

  • IllegalArgumentError – If mol_obj_type is not either ‘rdkit’ nor ‘openbabel’.

  • MoleculeObjectError – If any errors occur while parsing the molecular file. Detailed information about the errors can be found in the logging outputs.

  • MoleculeNotFoundError – If some ligand from input_file was not found in mol_file.

Examples

>>> from luna.mol.entry import MolFileEntry
>>> entries = MolFileEntry.from_file(input_file="tutorial/inputs/MolEntries.txt",
...                                  pdb_id="D4", mol_file="tutorial/inputs/ligands.mol2",
...                                  mol_obj_type="openbabel", autoload=True)
>>> for e in entries:
>>>     print(e)
<MolFileEntry: D4:ZINC000012442563>
<MolFileEntry: D4:ZINC000065293174>
<MolFileEntry: D4:ZINC000096459890>
<MolFileEntry: D4:ZINC000343043015>
<MolFileEntry: D4:ZINC000575033470>
classmethod from_mol_file(pdb_id, mol_id, mol_file, is_multimol_file, mol_file_ext=None, mol_obj_type='rdkit', autoload=False, overwrite_mol_name=False, sep=':')[source]

Initialize from a molecular file.

Parameters
  • pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.

  • mol_id (str) – The ligand id in the molecular file.

  • mol_file (str) – Pathname of the molecular file.

  • is_multimol_file (bool) – If mol_file contains multiple molecules or not. If True, mol_id should match some ligand name in mol_file.

  • mol_file_ext (str, optional) – The molecular file format. If not provided, try to recover the molecular file extension directly from mol_file.

  • mol_obj_type ({‘rdkit’, ‘openbabel’}) – If “rdkit”, parse the converted molecule with RDKit and return an instance of rdkit.Chem.rdchem.Mol. If “openbabel”, parse the converted molecule with Open Babel and return an instance of openbabel.pybel.Molecule. The default value is ‘rdkit’.

  • autoload (bool) – If True, parse the ligand from the molecular file during the entry initialization. Otherwise, only load the ligand when first used.

  • overwrite_mol_name (bool) – If True, substitute the ligand name in the parsed molecular object with mol_id. Only works for single-molecule files (is_multimol_file = False) as in these cases mol_id does not need to match the ligand name in the molecular file.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

Return type

MolFileEntry

Raises
  • FileNotFoundError – If mol_file does not exist.

  • IllegalArgumentError – If mol_obj_type is not either ‘rdkit’ nor ‘openbabel’.

  • MoleculeObjectError – If any errors occur while parsing the molecular file. Detailed information about the errors can be found in the logging outputs.

  • MoleculeNotFoundError – If the ligand mol_id was not found in the input file and is_multimol_file is True.

Examples

In this first example, we will read the ligand ‘ZINC000007786517’ from a single-molecule file. As we are working with a single-molecule file, mol_id can be any value you prefer.

>>> from luna.mol.entry import MolFileEntry
>>> e = MolFileEntry.from_mol_file(pdb_id="D4", mol_id="Ligand", mol_file="tutorial/inputs/ZINC000007786517.mol",
...                                mol_obj_type='rdkit', is_multimol_file=False)
>>> print(e)
D4:Ligand
>>> print(e.mol_obj.to_smiles())
Cc1cccc(NC(=O)C[N@@H+](C)C2CCCCC2)c1C

Now, let’s say we need to read the ligand ‘ZINC000096459890’ from a multi-molecular file and that we want to use Open Babel to parse the molecule. To do so, remember that it should exist a ligand with the name mol_id in mol_file. Otherwise, it will raise the exception MoleculeNotFoundError.

>>> from luna.mol.entry import MolFileEntry
>>> e = MolFileEntry.from_mol_file(pdb_id="D4", mol_id="ZINC000096459890", mol_file="tutorial/inputs/ligands.mol2",
...                                mol_obj_type='openbabel', is_multimol_file=True)
>>> print(e)
<MolFileEntry: D4:ZINC000096459890>
>>> print(e.mol_obj.to_smiles())
O=C(OCCCN1C=CC=CC1=O)c1ccc2ccc(Cl)cc2n1

Below, we show what happens if mol_id does not exist in mol_file. Observe we set autoload to True to parse the molecule right away.

>>> from luna.mol.entry import MolFileEntry
>>> e = MolFileEntry.from_mol_file(pdb_id="D4", mol_id="Ligand", mol_file="tutorial/inputs/ligands.mol2",
...                                mol_obj_type='openbabel', is_multimol_file=True, autoload=True)
luna.util.exceptions.MoleculeNotFoundError: "The ligand 'Ligand' was not found in the input file         or generated errors while parsing it with Open Babel."
classmethod from_mol_obj(pdb_id, mol_id, mol_obj, sep=':')[source]

Initialize from an already loaded molecular object.

This function is useful in cases where a molecular object is parsed and pre-processed using a different protocol defined by the user.

Parameters
  • pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.

  • mol_id (str) – The ligand id. As the molecular object is already provided, the ligand id does not need to match the ligand name in the molecular object.

  • mol_obj (MolWrapper, rdkit.Chem.rdchem.Mol, or openbabel.pybel.Molecule) – The molecular object.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

Return type

MolFileEntry

Raises
  • MoleculeObjectTypeError – If the molecular object is not an instance of MolWrapper, rdkit.Chem.rdchem.Mol, or openbabel.pybel.Molecule.

  • IllegalArgumentError – If entity is not a valid Biopython object.

Examples

In this example, we will initialize a new MolFileEntry with the ligand ‘ZINC000007786517’ and the structure located in a PDB file of name ‘D4’, which is the structure used for docking the molecule.

First, let’s parse the molecular file.

>>> from luna.wrappers.rdkit import read_mol_from_file
>>> mol_obj = read_mol_from_file("tutorial/inputs/ZINC000007786517.mol", mol_format="mol")

Now, we create the new MolFileEntry object as follows:

>>> e = MolFileEntry.from_mol_obj("D4", "ZINC000007786517", mol_obj, sep=ENTRY_SEPARATOR)
>>> print(e)
<MolFileEntry: D4:ZINC000007786517>
>>> print(e.mol_obj.to_smiles())
Cc1cccc(NC(=O)C[N@@H+](C)C2CCCCC2)c1C
property full_id

The full id of the entry is the tuple (PDB id or filename, ligand id).

Type

tuple, read-only

get_biopython_structure(entity=None, parser=None)[source]

Transform the molecular object into a Biopython Entity object.

If entity is provided, the molecular object is appended to it, i.e., this function can be used to join a ligand and the structure used during docking or molecular dynamics.

By default, the ligand is added to a chain of id z.

Parameters
  • entity (Entity, optional) – Append the molecular object to entity. If not provided, a new Entity is created.

  • parser (PDBParser, optional) – Define a PDB parser object. If not provided, the default parser will be used.

Return type

Entity

Raises

IllegalArgumentError – If entity is not a valid Biopython object.

Examples

In this example, we will demonstrate how to join a protein structure and a ligand docked against it.

First, let’s parse the PDB file.

>>> from luna.MyBio.PDB.PDBParser import PDBParser
>>> pdb_parser = PDBParser(PERMISSIVE=True, QUIET=True)
>>> structure = pdb_parser.get_structure("Protein", "tutorial/inputs/D4.pdb")

Observe that the list of chains in the parsed structure contains only one element.

>>> print(structure[0].child_list)
[<Chain id=A>]

Now, we will read the ligand and append it to the existing protein structure.

>>> from luna.mol.entry import MolFileEntry
>>> e = MolFileEntry.from_mol_file("D4", "ZINC000007786517", "tutorial/inputs/ZINC000007786517.mol",
...                                mol_obj_type='rdkit', is_multimol_file=False)
>>> joined_structure = e.get_biopython_structure(structure)

Observe that now the list of chains contains chains ‘A’ and ‘z’, which is the default chain where ligands are added.

>>> print(joined_structure[0].child_list)
[<Chain id=A>, <Chain id=z>]

If we loop over the residues in chain ‘z’, we will find our ligand.

>>> for r in structure[0]["z"]:
>>>     print(r)
<Residue LIG het=H_LIG resseq=9999 icode= >
is_mol_obj_loaded()[source]

Check if the molecular object has already been loaded.

Return type

bool

is_valid()[source]

Check if the entry represents a valid protein-ligand complex.

Return type

bool

property mol_obj

The molecule.

Type

MolWrapper, rdkit.Chem.rdchem.Mol, or openbabel.pybel.Molecule

recover_entries_from_entity(entity, get_small_molecules=True, get_chains=False, ignore_artifacts=True, by_cluster=False, sep=':')[source]

Search for chains and small molecules in entity and return them as strings.

Parameters
  • entity (Entity) – An entity from where chains or small molecules will be recovered.

  • get_small_molecules (bool) – If True, identify small molecules and return them as MolEntry objects. The default value is True.

  • get_chains (bool) – If True, identify chains and return them as ChainEntry objects. The default value is False.

  • ignore_artifacts (bool) – If True, ignore the following crystallography artifacts: ACE, ACT, BME, CSD, CSW, EDO, FMT, GOL, MSE, NAG, NO3, PO4, SGM, SO4, or TPO. The default value is True.

  • by_cluster (bool) – If True, aggregate entries by cluster. Cluster ids are exclusive to Residue instances and are automatically set by FTMapParser, a parser for FTMap results. By default, the cluster id of Residue instances are set to None, therefore, if the cluster id is not explicitly defined, all entries will be aggregated to the same key None.

  • sep (str) – A separator character to format the entry string. The default value is ‘:’.

Returns

If by_cluster is set to False, a list of ChainEntry or MolEntry objects is returned. Otherwise, a dict is returned, in which keys are clusters and values are lists of ChainEntry or MolEntry objects. When no cluster information is available, all entries are aggregated in a key of value None. Cluster ids are exclusive to Residue instances, therefore, ChainEntry objects are always placed in a key of value None.

Return type

list or dict

Examples

First, let’s parse a PDB file.

>>> from luna.MyBio.PDB.PDBParser import PDBParser
>>> pdb_parser = PDBParser(PERMISSIVE=True, QUIET=True)
>>> structure = pdb_parser.get_structure("Protein", "tutorial/inputs/3QQK.pdb")

Now, we can recover entries from the parsed PDB file:

>>> from luna.mol.entry import recover_entries_from_entity
>>> entries = recover_entries_from_entity(structure, get_chains=True)
>>> for e in entries:
>>>     print(e)
<MolEntry: Protein:A:X02:497>
<ChainEntry: Protein:A>