luna.mol.entry module¶
- class ChainEntry(pdb_id, chain_id, sep=':', parser=None)[source]¶
Bases:
luna.mol.entry.Entry
Define a chain.
- Parameters
pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.
chain_id (str) – A 1-symbol chain id. Example: ‘A’.
sep (str) – A separator character to format the entry string. The default value is ‘:’.
- Raises
InvalidEntry – If the provided information does not match the PDB format.
Examples
>>> from luna.mol.entry import ChainEntry >>> e = ChainEntry(pdb_id="3QL8", chain_id="A") >>> print(e) <ChainEntry: 3QL8:A>
- classmethod from_string(entry_str, sep=':')[source]¶
Initialize from a string.
- Parameters
entry_str (str) – A string representing the entry. Example: ‘3QL8:A’.
sep (str) – The separator character used in
entry_str
. The default value is ‘:’. For example: ifentry_str
is set to ‘3QL8|A’, thensep
should be defined as ‘|’.
- Return type
- Raises
IllegalArgumentError – If the fields in
entry_str
do not match the format expected to define a chain.
Examples
>>> from luna.mol.entry import ChainEntry >>> e = ChainEntry.from_string("3QL8:A", sep=":") >>> print(e) <ChainEntry: 3QL8:A>
- class Entry(pdb_id, chain_id, comp_name=None, comp_num=None, comp_icode=None, is_hetatm=True, sep=':', parser=None)[source]¶
Bases:
object
Entries determine the target molecule to which interactions and other properties will be calculated. They can be ligands, chains, etc, and can be defined in a number of ways. Each entry has an associated PDB file that may contain macromolecules (protein, RNA, DNA) and other small molecules, water, and ions. The PDB file provides the context to where the interactions with the target molecule will be calculated.
- Parameters
pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.
chain_id (str) – A 1-symbol chain id. Example: ‘A’.
comp_name (str, optional) – A 1 to 3-symbols compound name (residue name in the PDB format). Obligatory if
is_hetatm
is True. Example: ‘X01’.comp_num (int, optional) – A valid 4-digits integer (residue sequence number in the PDB format). Obligatory if
is_hetatm
is True. Example: 300 or -1.comp_icode (str, optional) – A 1-character compound insertion code (residue insertion code in the PDB format). Example: ‘A’.
is_hetatm (bool) – If the compound is a ligand or not. The default value is True.
sep (str) – A separator character to format the entry string. The default value is ‘:’.
parser (
PDBParser
orFTMapParser
, optional) – Define a PDB parser object. If not provided, the default parser will be used.
- Raises
IllegalArgumentError – If
is_hetatm
is True, but the compound name and number are not provided. If the compound number is provided but it is not an integer. Ifcomp_icode
is provided but it is not a valid character.InvalidEntry – If the provided information does not match the PDB format.
Examples
Chain entry: can be used to calculate interactions with a given chain.
>>> from luna.mol.entry import Entry >>> e = Entry(pdb_id="3QL8", chain_id="A") >>> print(e) <Entry: 3QL8:A>
Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).
>>> from luna.mol.entry import Entry >>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="HIS", comp_num=125, is_hetatm=False) >>> print(e) <Entry: 3QL8:A:HIS:125>
Ligand entry: can be used to calculate interactions with a given ligand.
>>> from luna.mol.entry import Entry >>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True) >>> print(e) <Entry: 3QL8:A:X01:300>
You can use a different character separator for the entries. For example:
>>> from luna.mol.entry import Entry >>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True, sep="/") >>> print(e) <Entry: 3QL8/A/X01/300>
- classmethod from_string(entry_str, is_hetatm=True, sep=':')[source]¶
Initialize from a string.
- Parameters
entry_str (str) – A string representing the entry. Example: ‘3QL8:A:X01:300’.
is_hetatm (bool) – Defines if the compound is a ligand or not. The default value is True.
sep (str) – The separator character used in
entry_str
. The default value is ‘:’. For example: ifentry_str
is set to ‘3QL8|A|X01|300’, thensep
should be defined as ‘|’.
- Return type
- Raises
IllegalArgumentError – If the fields in
entry_str
do not match the format expected to define a chain (ChainEntry) or a compound (MolEntry).
Examples
Chain entry: can be used to calculate interactions with a given chain.
>>> from luna.mol.entry import Entry >>> e = Entry.from_string("3QL8:A", sep=":") >>> print(e) <Entry: 3QL8:A>
Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).
>>> from luna.mol.entry import Entry >>> e = Entry.from_string("3QL8:A:HIS:125", sep=":") >>> print(e) <Entry: 3QL8:A:HIS:125>
Ligand entry: can be used to calculate interactions with a given ligand.
>>> from luna.mol.entry import Entry >>> e = Entry.from_string("3QL8:A:X01:300", sep=":") >>> print(e) <Entry: 3QL8:A:X01:300>
- property full_id¶
The full id of the entry is the tuple (PDB id or filename, chain id) for entries representing chains and (PDB id or filename, chain id, compound name, compound number, insertion code) for entries representing compounds.
- Type
tuple, read-only
- get_biopython_key(full_id=False)[source]¶
Represent the entry as a key to select chains or compounds from Biopython Entity objects.
Parameters full_id : bool
If True, return the full id of a chain or ligand. For chains, it consists of a tuple containing the PDB and the chain id. For ligands, it consists of a tuple containing the PDB, the chain, and the ligand id. The default value is False.
- Returns
Return str if the entry represents a chain and if
full_id
is False. Otherwise, return a tuple.- Return type
Examples
>>> from luna.mol.entry import Entry >>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True, sep=":") >>> print(e.get_biopython_key()) ('H_X01', 300, ' ')
- is_valid()[source]¶
Check if the entry matches the expected format for protein-protein or protein-compound complexes.
- Return type
- to_string(sep=None)[source]¶
Convert the entry to a string using
sep
as a separator character.- Parameters
sep (str or None) – If None (the default), use the separator character defined during the entry object creation. Otherwise, uses
sep
as the separator character.
Examples
>>> from luna.mol.entry import Entry >>> e = Entry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True, sep=":") >>> print(e.to_string("/")) 3QL8/A/X01/300
- class MolEntry(pdb_id, chain_id, comp_name, comp_num, comp_icode=None, sep=':', parser=None)[source]¶
Bases:
luna.mol.entry.Entry
Define a compound from a PDB file, which can be a residue, nucleotide, or ligand.
- Parameters
pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.
chain_id (str) – A 1-symbol chain id. Example: ‘A’.
comp_name (str) – A 1 to 3-symbols compound name (residue name in the PDB format). Example: ‘X01’.
comp_num (int) – A valid 4-digits integer (residue sequence number in the PDB format). Example: 300 or -1.
comp_icode (str, optional) – A 1-character compound insertion code (residue insertion code in the PDB format). Example: ‘A’.
sep (str) – A separator character to format the entry string. The default value is ‘:’.
- Raises
InvalidEntry – If the provided information does not match the PDB format.
Examples
Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).
>>> from luna.mol.entry import MolEntry >>> e = MolEntry(pdb_id="3QL8", chain_id="A", comp_name="HIS", comp_num=125, is_hetatm=False) >>> print(e) <MolEntry: 3QL8:A:HIS:125>
Ligand entry: can be used to calculate interactions with a given ligand.
>>> from luna.mol.entry import MolEntry >>> e = MolEntry(pdb_id="3QL8", chain_id="A", comp_name="X01", comp_num=300, is_hetatm=True) >>> print(e) <MolEntry: 3QL8:A:X01:300>
- classmethod from_file(input_file, sep=':')[source]¶
Initialize from a list of strings representing compounds.
- Parameters
input_file (str) – The file from where the list of strings (one per line) will be read from.
sep (str) – The separator character used in
input_file
. The default value is ‘:’. For example: if entries frominput_file
use ‘|’ as the separator, thensep
should be defined as ‘|’.
- Yields
MolEntry
– An entry recovered frominput_file
.
- classmethod from_string(entry_str, sep=':')[source]¶
Initialize from a string.
- Parameters
entry_str (str) – A string representing the entry. Example: ‘3QL8:A:X01:300’.
is_hetatm (bool) – Defines if the compound is a ligand or not. The default value is True.
sep (str) – The separator character used in
entry_str
. The default value is ‘:’. For example: ifentry_str
is set to ‘3QL8|A|X01|300’, thensep
should be defined as ‘|’.
- Return type
- Raises
IllegalArgumentError – If the fields in
entry_str
do not match the format expected to define a chain (ChainEntry) or a compound (MolEntry).
Examples
Chain entry: can be used to calculate interactions with a given chain.
>>> from luna.mol.entry import Entry >>> e = Entry.from_string("3QL8:A", sep=":") >>> print(e) <Entry: 3QL8:A>
Compound entry: can be used to calculate interactions with a given compound (residue or nucleotide).
>>> from luna.mol.entry import Entry >>> e = Entry.from_string("3QL8:A:HIS:125", sep=":") >>> print(e) <Entry: 3QL8:A:HIS:125>
Ligand entry: can be used to calculate interactions with a given ligand.
>>> from luna.mol.entry import Entry >>> e = Entry.from_string("3QL8:A:X01:300", sep=":") >>> print(e) <Entry: 3QL8:A:X01:300>
- class MolFileEntry(pdb_id, mol_id, sep=':')[source]¶
Bases:
luna.mol.entry.Entry
Define a ligand from a molecular file. This class should be used for docking and molecular dynamics campaigns where usually one has the protein structure in the PDB format and the ligand structure in a separate molecular file.
- Parameters
pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.
mol_id (str) – The ligand id in the molecular file.
sep (str) – A separator character to format the entry string. The default value is ‘:’.
- Variables
~MolFileEntry.mol_id (str) – The ligand id.
~MolFileEntry.mol_file (str) – Pathname of the molecular file.
~MolFileEntry.mol_file_ext (str) – The molecular file format. If not provided, try to recover the molecular file extension directly from
mol_file
.~MolFileEntry.mol_obj_type ({'rdkit', 'openbabel'}) – Define which library (RDKit or Open Babel) to use to parse the molecular file.
~MolFileEntry.overwrite_mol_name (bool) – If True, substitute the ligand name in the parsed molecular object with
mol_id
. Only works for single-molecule files (is_multimol_file
= False) as in these casesmol_id
does not need to match the ligand name in the molecular file.~MolFileEntry.is_multimol_file (bool) – If
mol_file
contains multiple molecules or not. If True,mol_id
should match some ligand name inmol_file
.
- classmethod from_file(input_file, pdb_id, mol_file, **kwargs)[source]¶
Initialize from a list of ligand names.
- Parameters
input_file (str) – The file from where the list of ligand names (one per line) will be read from.
pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.
mol_file (str) – Pathname of a multi-molecular file.
mol_file_ext (str, optional) – The molecular file format. If not provided, try to recover the molecular file extension directly from
mol_file
.mol_obj_type ({‘rdkit’, ‘openbabel’}) – If “rdkit”, parse the converted molecule with RDKit and return an instance of
rdkit.Chem.rdchem.Mol
. If “openbabel”, parse the converted molecule with Open Babel and return an instance ofopenbabel.pybel.Molecule
. The default value is ‘rdkit’.autoload (bool) – If True, parse the ligand from the molecular file during the entry initialization. Otherwise, only load the ligand when first used.
sep (str) – A separator character to format the entry string. The default value is ‘:’.
- Yields
MolFileEntry
– An entry recovered frominput_file
.- Raises
FileNotFoundError – If
mol_file
does not exist.IllegalArgumentError – If
mol_obj_type
is not either ‘rdkit’ nor ‘openbabel’.MoleculeObjectError – If any errors occur while parsing the molecular file. Detailed information about the errors can be found in the logging outputs.
MoleculeNotFoundError – If some ligand from
input_file
was not found inmol_file
.
Examples
>>> from luna.mol.entry import MolFileEntry >>> entries = MolFileEntry.from_file(input_file="tutorial/inputs/MolEntries.txt", ... pdb_id="D4", mol_file="tutorial/inputs/ligands.mol2", ... mol_obj_type="openbabel", autoload=True) >>> for e in entries: >>> print(e) <MolFileEntry: D4:ZINC000012442563> <MolFileEntry: D4:ZINC000065293174> <MolFileEntry: D4:ZINC000096459890> <MolFileEntry: D4:ZINC000343043015> <MolFileEntry: D4:ZINC000575033470>
- classmethod from_mol_file(pdb_id, mol_id, mol_file, is_multimol_file, mol_file_ext=None, mol_obj_type='rdkit', autoload=False, overwrite_mol_name=False, sep=':')[source]¶
Initialize from a molecular file.
- Parameters
pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.
mol_id (str) – The ligand id in the molecular file.
mol_file (str) – Pathname of the molecular file.
is_multimol_file (bool) – If
mol_file
contains multiple molecules or not. If True,mol_id
should match some ligand name inmol_file
.mol_file_ext (str, optional) – The molecular file format. If not provided, try to recover the molecular file extension directly from
mol_file
.mol_obj_type ({‘rdkit’, ‘openbabel’}) – If “rdkit”, parse the converted molecule with RDKit and return an instance of
rdkit.Chem.rdchem.Mol
. If “openbabel”, parse the converted molecule with Open Babel and return an instance ofopenbabel.pybel.Molecule
. The default value is ‘rdkit’.autoload (bool) – If True, parse the ligand from the molecular file during the entry initialization. Otherwise, only load the ligand when first used.
overwrite_mol_name (bool) – If True, substitute the ligand name in the parsed molecular object with
mol_id
. Only works for single-molecule files (is_multimol_file
= False) as in these casesmol_id
does not need to match the ligand name in the molecular file.sep (str) – A separator character to format the entry string. The default value is ‘:’.
- Return type
- Raises
FileNotFoundError – If
mol_file
does not exist.IllegalArgumentError – If
mol_obj_type
is not either ‘rdkit’ nor ‘openbabel’.MoleculeObjectError – If any errors occur while parsing the molecular file. Detailed information about the errors can be found in the logging outputs.
MoleculeNotFoundError – If the ligand
mol_id
was not found in the input file andis_multimol_file
is True.
Examples
In this first example, we will read the ligand ‘ZINC000007786517’ from a single-molecule file. As we are working with a single-molecule file,
mol_id
can be any value you prefer.>>> from luna.mol.entry import MolFileEntry >>> e = MolFileEntry.from_mol_file(pdb_id="D4", mol_id="Ligand", mol_file="tutorial/inputs/ZINC000007786517.mol", ... mol_obj_type='rdkit', is_multimol_file=False) >>> print(e) D4:Ligand >>> print(e.mol_obj.to_smiles()) Cc1cccc(NC(=O)C[N@@H+](C)C2CCCCC2)c1C
Now, let’s say we need to read the ligand ‘ZINC000096459890’ from a multi-molecular file and that we want to use Open Babel to parse the molecule. To do so, remember that it should exist a ligand with the name
mol_id
inmol_file
. Otherwise, it will raise the exception MoleculeNotFoundError.>>> from luna.mol.entry import MolFileEntry >>> e = MolFileEntry.from_mol_file(pdb_id="D4", mol_id="ZINC000096459890", mol_file="tutorial/inputs/ligands.mol2", ... mol_obj_type='openbabel', is_multimol_file=True) >>> print(e) <MolFileEntry: D4:ZINC000096459890> >>> print(e.mol_obj.to_smiles()) O=C(OCCCN1C=CC=CC1=O)c1ccc2ccc(Cl)cc2n1
Below, we show what happens if
mol_id
does not exist inmol_file
. Observe we setautoload
to True to parse the molecule right away.>>> from luna.mol.entry import MolFileEntry >>> e = MolFileEntry.from_mol_file(pdb_id="D4", mol_id="Ligand", mol_file="tutorial/inputs/ligands.mol2", ... mol_obj_type='openbabel', is_multimol_file=True, autoload=True) luna.util.exceptions.MoleculeNotFoundError: "The ligand 'Ligand' was not found in the input file or generated errors while parsing it with Open Babel."
- classmethod from_mol_obj(pdb_id, mol_id, mol_obj, sep=':')[source]¶
Initialize from an already loaded molecular object.
This function is useful in cases where a molecular object is parsed and pre-processed using a different protocol defined by the user.
- Parameters
pdb_id (str) – A 4-symbols structure id from PDB or a local PDB filename. Example: ‘3QL8’ or ‘file1’.
mol_id (str) – The ligand id. As the molecular object is already provided, the ligand id does not need to match the ligand name in the molecular object.
mol_obj (
MolWrapper
,rdkit.Chem.rdchem.Mol
, oropenbabel.pybel.Molecule
) – The molecular object.sep (str) – A separator character to format the entry string. The default value is ‘:’.
- Return type
- Raises
MoleculeObjectTypeError – If the molecular object is not an instance of
MolWrapper
,rdkit.Chem.rdchem.Mol
, oropenbabel.pybel.Molecule
.IllegalArgumentError – If
entity
is not a valid Biopython object.
Examples
In this example, we will initialize a new MolFileEntry with the ligand ‘ZINC000007786517’ and the structure located in a PDB file of name ‘D4’, which is the structure used for docking the molecule.
First, let’s parse the molecular file.
>>> from luna.wrappers.rdkit import read_mol_from_file >>> mol_obj = read_mol_from_file("tutorial/inputs/ZINC000007786517.mol", mol_format="mol")
Now, we create the new MolFileEntry object as follows:
>>> e = MolFileEntry.from_mol_obj("D4", "ZINC000007786517", mol_obj, sep=ENTRY_SEPARATOR) >>> print(e) <MolFileEntry: D4:ZINC000007786517> >>> print(e.mol_obj.to_smiles()) Cc1cccc(NC(=O)C[N@@H+](C)C2CCCCC2)c1C
- property full_id¶
The full id of the entry is the tuple (PDB id or filename, ligand id).
- Type
tuple, read-only
- get_biopython_structure(entity=None, parser=None)[source]¶
Transform the molecular object into a Biopython Entity object.
If
entity
is provided, the molecular object is appended to it, i.e., this function can be used to join a ligand and the structure used during docking or molecular dynamics.By default, the ligand is added to a chain of id z.
- Parameters
entity (
Entity
, optional) – Append the molecular object toentity
. If not provided, a newEntity
is created.parser (
PDBParser
, optional) – Define a PDB parser object. If not provided, the default parser will be used.
- Return type
Entity
- Raises
IllegalArgumentError – If
entity
is not a valid Biopython object.
Examples
In this example, we will demonstrate how to join a protein structure and a ligand docked against it.
First, let’s parse the PDB file.
>>> from luna.MyBio.PDB.PDBParser import PDBParser >>> pdb_parser = PDBParser(PERMISSIVE=True, QUIET=True) >>> structure = pdb_parser.get_structure("Protein", "tutorial/inputs/D4.pdb")
Observe that the list of chains in the parsed structure contains only one element.
>>> print(structure[0].child_list) [<Chain id=A>]
Now, we will read the ligand and append it to the existing protein structure.
>>> from luna.mol.entry import MolFileEntry >>> e = MolFileEntry.from_mol_file("D4", "ZINC000007786517", "tutorial/inputs/ZINC000007786517.mol", ... mol_obj_type='rdkit', is_multimol_file=False) >>> joined_structure = e.get_biopython_structure(structure)
Observe that now the list of chains contains chains ‘A’ and ‘z’, which is the default chain where ligands are added.
>>> print(joined_structure[0].child_list) [<Chain id=A>, <Chain id=z>]
If we loop over the residues in chain ‘z’, we will find our ligand.
>>> for r in structure[0]["z"]: >>> print(r) <Residue LIG het=H_LIG resseq=9999 icode= >
- property mol_obj¶
The molecule.
- Type
MolWrapper
,rdkit.Chem.rdchem.Mol
, oropenbabel.pybel.Molecule
- recover_entries_from_entity(entity, get_small_molecules=True, get_chains=False, ignore_artifacts=True, by_cluster=False, sep=':')[source]¶
Search for chains and small molecules in
entity
and return them as strings.- Parameters
entity (
Entity
) – An entity from where chains or small molecules will be recovered.get_small_molecules (bool) – If True, identify small molecules and return them as
MolEntry
objects. The default value is True.get_chains (bool) – If True, identify chains and return them as
ChainEntry
objects. The default value is False.ignore_artifacts (bool) – If True, ignore the following crystallography artifacts: ACE, ACT, BME, CSD, CSW, EDO, FMT, GOL, MSE, NAG, NO3, PO4, SGM, SO4, or TPO. The default value is True.
by_cluster (bool) – If True, aggregate entries by cluster. Cluster ids are exclusive to
Residue
instances and are automatically set byFTMapParser
, a parser for FTMap results. By default, the cluster id ofResidue
instances are set to None, therefore, if the cluster id is not explicitly defined, all entries will be aggregated to the same keyNone
.sep (str) – A separator character to format the entry string. The default value is ‘:’.
- Returns
If
by_cluster
is set to False, a list ofChainEntry
orMolEntry
objects is returned. Otherwise, a dict is returned, in which keys are clusters and values are lists ofChainEntry
orMolEntry
objects. When no cluster information is available, all entries are aggregated in a key of valueNone
. Cluster ids are exclusive toResidue
instances, therefore,ChainEntry
objects are always placed in a key of valueNone
.- Return type
Examples
First, let’s parse a PDB file.
>>> from luna.MyBio.PDB.PDBParser import PDBParser >>> pdb_parser = PDBParser(PERMISSIVE=True, QUIET=True) >>> structure = pdb_parser.get_structure("Protein", "tutorial/inputs/3QQK.pdb")
Now, we can recover entries from the parsed PDB file:
>>> from luna.mol.entry import recover_entries_from_entity >>> entries = recover_entries_from_entity(structure, get_chains=True) >>> for e in entries: >>> print(e) <MolEntry: Protein:A:X02:497> <ChainEntry: Protein:A>