luna.mol.fingerprint module

class FingerprintGenerator(mol_obj=None)[source]

Bases: object

Generate molecular fingerprints for the molecule mol.

Parameters

mol (MolWrapper, rdkit.Chem.rdchem.Mol, or openbabel.pybel.Molecule, optional) – The molecule.

Examples

First, create a new FingerprintGenerator object.

>>> from luna.mol.fingerprint import FingerprintGenerator
>>> fg = FingerprintGenerator()

Now, let’s read a molecule (glutamine) and set it to the FingerprintGenerator object.

>>> from luna.wrappers.base import MolWrapper
>>> fg.mol = MolWrapper.from_smiles("N[C@@H](CCC(N)=O)C(O)=O")

Finally, you can call any available function to generate the desired fingerprint type. In the below example, a count ECFP4 fingerprint of size 1,024 is created.

>>> fp = fg.morgan_fp(radius=2, length=1024, type=2)
>>> print(fp.GetNonzeroElements())
{1: 1, 80: 2, 140: 1, 147: 2, 389: 1, 403: 1, 540: 1, 545: 1, 650: 2, 728: 1, 739: 1, 767: 1, 786: 1, 807: 3, 820: 1, 825: 1, 874: 1, 893: 2, 900: 1}

You can then continue using the FingerprintGenerator object to create other fingerprint types. For example, let’s create a 2D pharmacophore fingerprint.

>>> fp = fg.pharm2d_fp()
>>> print(fp.GetNumOnBits())
90
atom_pairs_fp()[source]

Generate an atom pairs fingerprint for the molecule mol.

Raises

FingerprintNotCreated – If the fingerprint could not be created.

maccs_keys_fp()[source]

Generate a MACCS keys fingerprint for the molecule mol.

Raises

FingerprintNotCreated – If the fingerprint could not be created.

property mol

The molecule.

Type

MolWrapper, rdkit.Chem.rdchem.Mol, or openbabel.pybel.Molecule

morgan_fp(radius=2, length=2048, features=False, type=2)[source]

Generate a Morgan fingerprint for the molecule mol.

Parameters
  • radius (int) – Define the maximum radius of the circular neighborhoods considered for each atom. The default value is 2, which is roughly equivalent to ECFP4 and FCFP4.

  • length (int) – The length of the fingerprint. The default value is 2,048.

  • features (bool) – If True, use pharmacophoric properties (FCFP) instead of atomic invariants (ECFP). The default value is False.

  • type ({1, 2, 3}) – Define the type of the Morgan fingerprint function to be used, where:

    • 1 means GetMorganFingerprintAsBitVect(). It returns an explicit bit vector of size length (hashed fingerprint), where 0s and 1s represent the presence or absence of a given feature, respectively.

    • 2 means GetHashedMorganFingerprint(). It returns a sparse int vector length elements long (hashed fingerprint) containing the occurrence number of each feature.

    • 3 means GetMorganFingerprint(). It returns a sparse int vector 2^32 elements long containing the occurrence number of each feature.

    The default value is 2.

Raises
  • FingerprintNotCreated – If the fingerprint could not be created.

  • IllegalArgumentError – If type is a value other than 1, 2, or 3.

pharm2d_fp(sig_factory=None)[source]

Generate a 2D pharmacophore fingerprint for the molecule mol.

Parameters

sig_factory (RDKit SigFactory, optional) – Factory object for producing signatures. The default signature factory is defined as shown below:

>>> feat_factory = ChemicalFeatures.BuildFeatureFactory(MIN_FDEF_FILE)
>>> sig_factory = SigFactory(feat_factory, minPointCount=2,
...                          maxPointCount=3, trianglePruneBins=False)
>>> sig_factory.SetBins([(0, 2), (2, 5), (5, 8)])
>>> sig_factory.Init()
Raises

FingerprintNotCreated – If the fingerprint could not be created.

rdk_fp()[source]

Generate an RDKit topological fingerprint for the molecule mol.

Raises

FingerprintNotCreated – If the fingerprint could not be created.

torsion_fp()[source]

Generate a topological torsion fingerprint for the molecule mol.

Raises

FingerprintNotCreated – If the fingerprint could not be created.

available_fp_functions()[source]

Return a list of all fingerprints available at FingerprintGenerator.

generate_fp_for_mols(mols, fp_function=None, fp_opt=None, critical=False)[source]

Generate molecular fingerprints for a sequence of molecules.

Parameters
  • mols (iterable of MolWrapper, rdkit.Chem.rdchem.Mol, or openbabel.pybel.Molecule) – A sequence of molecules.

  • fp_function (str) – The fingerprint function to use. The default value is ‘pharm2d_fp’.

    To check out the list of available functions, call the function available_fp_functions().

  • fp_opt (dict, optional) – A set of parameters to pass to fp_function.

  • critical (bool) – If True, raises any exceptions caught during the generation of fingerprints. Otherwise, ignores all exceptions (the default). The error messages are always printed to the logging output.

Returns

A list of dictionaries where each item contains the molecule name and its fingerprint.

The dict is defined as follows:

Return type

list of dict

Raises

IllegalArgumentError – If fp_function is not a function available in FingerprintGenerator.

Examples

First, let’s define a set of molecules.

>>> from luna.wrappers.base import MolWrapper
>>> mols = [MolWrapper.from_smiles("N[C@@H](CCC(N)=O)C(O)=O"),
...         MolWrapper.from_smiles("C[C@@H](C(=O)O)N"),
...         MolWrapper.from_smiles("C1=CC(=CC=C1CC(C(=O)O)N)O")]

Now, you can generate fingerprints for these molecules using the function generate_fp_for_mols(). For example, let’s create count ECFP4 fingerprints of size 1,024 for the above molecules.

>>> from luna.mol.fingerprint import generate_fp_for_mols
>>> fps = generate_fp_for_mols(mols, fp_function="morgan_fp", fp_opt={"length": 1024})

Then, you can loop through the results as shown below:

>>> for d in fps:
>>>     print(f"{d['mol'].ljust(25)} - {len(d['fp'].GetNonzeroElements())}")
N[C@@H](CCC(N)=O)C(O)=O   - 19
C[C@@H](C(=O)O)N          - 12
C1=CC(=CC=C1CC(C(=O)O)N)O - 24