luna.mol.clustering module

available_similarity_functions()[source]

Return a list of all similarity metrics available at RDKit.

calc_distance_matrix(fps, similarity_func='BulkTanimotoSimilarity')[source]
Calculate the pairwise distance (dissimilarity) between fingerprints in fps using

the similarity metric similarity_func.

Parameters
  • fps (iterable of RDKit ExplicitBitVect or SparseBitVect) – A sequence of fingerprints.

  • similarity_func (str) – A similarity metric to calculate the distance between the provided fingerprints. The default value is ‘BulkTanimotoSimilarity’.

    To check out the list of available similarity metrics, call the function available_similarity_functions().

Examples

First, let’s define a set of molecules.

>>> from luna.wrappers.base import MolWrapper
>>> mols = [MolWrapper.from_smiles("CCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCCO").unwrap()]

Now, we generate fingerprints for those molecules.

>>> from luna.mol.fingerprint import generate_fp_for_mols
>>> fps = [d["fp"] for d in generate_fp_for_mols(mols, "morgan_fp")]

Finally, calculate the distance between the molecules based on their fingerprints.

>>> from luna.mol.clustering import calc_distance_matrix
>>> print(calc_distance_matrix(fps))
[0.125, 0.46153846153846156, 0.3846153846153846]
Returns

distances – Flattened diagonal matrix.

Return type

list of float

cluster_fps(fps, cutoff=0.2, similarity_func='BulkTanimotoSimilarity')[source]

Clusterize molecules based on fingerprints using the Butina clustering algorithm.

Parameters
  • fps (iterable of RDKit ExplicitBitVect or SparseBitVect) – A sequence of fingerprints.

  • cutoff (float) – Elements within this range of each other are considered to be neighbors.

  • similarity_func (str) – A similarity metric to calculate the distance between the provided fingerprints. The default value is ‘BulkTanimotoSimilarity’.

    To check out the list of available similarity metrics, call the function available_similarity_functions().

Examples

First, let’s define a set of molecules.

>>> from luna.wrappers.base import MolWrapper
>>> mols = [MolWrapper.from_smiles("CCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCCO").unwrap()]

Now, we generate fingerprints for those molecules.

>>> from luna.mol.fingerprint import generate_fp_for_mols
>>> fps = [d["fp"] for d in generate_fp_for_mols(mols, "morgan_fp")]

Finally, clusterize the molecules based on their fingerprints.

>>> from luna.mol.clustering import cluster_fps
>>> print(cluster_fps(fps, cutoff=0.2))
((1, 0), (2,))
Returns

clusters – Each cluster is defined as a tuple of tuples, where the first element for each cluster is its centroid.

Return type

tuple of tuples