luna.mol.fingerprint module¶
- class FingerprintGenerator(mol_obj=None)[source]¶
Bases:
object
Generate molecular fingerprints for the molecule
mol
.- Parameters
mol (
MolWrapper
,rdkit.Chem.rdchem.Mol
, oropenbabel.pybel.Molecule
, optional) – The molecule.
Examples
First, create a new
FingerprintGenerator
object.>>> from luna.mol.fingerprint import FingerprintGenerator >>> fg = FingerprintGenerator()
Now, let’s read a molecule (glutamine) and set it to the
FingerprintGenerator
object.>>> from luna.wrappers.base import MolWrapper >>> fg.mol = MolWrapper.from_smiles("N[C@@H](CCC(N)=O)C(O)=O")
Finally, you can call any available function to generate the desired fingerprint type. In the below example, a count ECFP4 fingerprint of size 1,024 is created.
>>> fp = fg.morgan_fp(radius=2, length=1024, type=2) >>> print(fp.GetNonzeroElements()) {1: 1, 80: 2, 140: 1, 147: 2, 389: 1, 403: 1, 540: 1, 545: 1, 650: 2, 728: 1, 739: 1, 767: 1, 786: 1, 807: 3, 820: 1, 825: 1, 874: 1, 893: 2, 900: 1}
You can then continue using the
FingerprintGenerator
object to create other fingerprint types. For example, let’s create a 2D pharmacophore fingerprint.>>> fp = fg.pharm2d_fp() >>> print(fp.GetNumOnBits()) 90
- atom_pairs_fp()[source]¶
Generate an atom pairs fingerprint for the molecule
mol
.- Raises
FingerprintNotCreated – If the fingerprint could not be created.
- maccs_keys_fp()[source]¶
Generate a MACCS keys fingerprint for the molecule
mol
.- Raises
FingerprintNotCreated – If the fingerprint could not be created.
- property mol¶
The molecule.
- Type
MolWrapper
,rdkit.Chem.rdchem.Mol
, oropenbabel.pybel.Molecule
- morgan_fp(radius=2, length=2048, features=False, type=2)[source]¶
Generate a Morgan fingerprint for the molecule
mol
.- Parameters
radius (int) – Define the maximum radius of the circular neighborhoods considered for each atom. The default value is 2, which is roughly equivalent to ECFP4 and FCFP4.
length (int) – The length of the fingerprint. The default value is 2,048.
features (bool) – If True, use pharmacophoric properties (FCFP) instead of atomic invariants (ECFP). The default value is False.
type ({1, 2, 3}) – Define the type of the Morgan fingerprint function to be used, where:
1
means GetMorganFingerprintAsBitVect(). It returns an explicit bit vector of sizelength
(hashed fingerprint), where 0s and 1s represent the presence or absence of a given feature, respectively.2
means GetHashedMorganFingerprint(). It returns a sparse int vectorlength
elements long (hashed fingerprint) containing the occurrence number of each feature.3
means GetMorganFingerprint(). It returns a sparse int vector 2^32 elements long containing the occurrence number of each feature.
The default value is
2
.
- Raises
FingerprintNotCreated – If the fingerprint could not be created.
IllegalArgumentError – If
type
is a value other than 1, 2, or 3.
- pharm2d_fp(sig_factory=None)[source]¶
Generate a 2D pharmacophore fingerprint for the molecule
mol
.- Parameters
sig_factory (RDKit
SigFactory
, optional) – Factory object for producing signatures. The default signature factory is defined as shown below:>>> feat_factory = ChemicalFeatures.BuildFeatureFactory(MIN_FDEF_FILE) >>> sig_factory = SigFactory(feat_factory, minPointCount=2, ... maxPointCount=3, trianglePruneBins=False) >>> sig_factory.SetBins([(0, 2), (2, 5), (5, 8)]) >>> sig_factory.Init()
- Raises
FingerprintNotCreated – If the fingerprint could not be created.
- available_fp_functions()[source]¶
Return a list of all fingerprints available at
FingerprintGenerator
.
- generate_fp_for_mols(mols, fp_function=None, fp_opt=None, critical=False)[source]¶
Generate molecular fingerprints for a sequence of molecules.
- Parameters
mols (iterable of
MolWrapper
,rdkit.Chem.rdchem.Mol
, oropenbabel.pybel.Molecule
) – A sequence of molecules.fp_function (str) – The fingerprint function to use. The default value is ‘pharm2d_fp’.
To check out the list of available functions, call the function
available_fp_functions()
.fp_opt (dict, optional) – A set of parameters to pass to
fp_function
.critical (bool) – If True, raises any exceptions caught during the generation of fingerprints. Otherwise, ignores all exceptions (the default). The error messages are always printed to the logging output.
- Returns
A list of dictionaries where each item contains the molecule name and its fingerprint.
The dict is defined as follows:
mol
(str): the molecule name;fp
(RDKitExplicitBitVect
orSparseBitVect
): the fingerprint;
- Return type
list of dict
- Raises
IllegalArgumentError – If
fp_function
is not a function available inFingerprintGenerator
.
Examples
First, let’s define a set of molecules.
>>> from luna.wrappers.base import MolWrapper >>> mols = [MolWrapper.from_smiles("N[C@@H](CCC(N)=O)C(O)=O"), ... MolWrapper.from_smiles("C[C@@H](C(=O)O)N"), ... MolWrapper.from_smiles("C1=CC(=CC=C1CC(C(=O)O)N)O")]
Now, you can generate fingerprints for these molecules using the function
generate_fp_for_mols()
. For example, let’s create count ECFP4 fingerprints of size 1,024 for the above molecules.>>> from luna.mol.fingerprint import generate_fp_for_mols >>> fps = generate_fp_for_mols(mols, fp_function="morgan_fp", fp_opt={"length": 1024})
Then, you can loop through the results as shown below:
>>> for d in fps: >>> print(f"{d['mol'].ljust(25)} - {len(d['fp'].GetNonzeroElements())}") N[C@@H](CCC(N)=O)C(O)=O - 19 C[C@@H](C(=O)O)N - 12 C1=CC(=CC=C1CC(C(=O)O)N)O - 24