luna.mol.clustering module¶

available_similarity_functions()[source]¶: Return a list of all similarity metrics available at RDKit.

calc_distance_matrix(fps, similarity_func='BulkTanimotoSimilarity')[source]¶

Calculate the pairwise distance (dissimilarity) between fingerprints in fps using: the similarity metric similarity_func.

Parameters

fps (iterable of RDKit ExplicitBitVect or SparseBitVect) – A sequence of fingerprints.
similarity_func (str) – A similarity metric to calculate the distance between the provided fingerprints. The default value is ‘BulkTanimotoSimilarity’.

To check out the list of available similarity metrics, call the function available_similarity_functions().

Examples

First, let’s define a set of molecules.

>>> from luna.wrappers.base import MolWrapper
>>> mols = [MolWrapper.from_smiles("CCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCCO").unwrap()]

Now, we generate fingerprints for those molecules.

>>> from luna.mol.fingerprint import generate_fp_for_mols
>>> fps = [d["fp"] for d in generate_fp_for_mols(mols, "morgan_fp")]

Finally, calculate the distance between the molecules based on their fingerprints.

>>> from luna.mol.clustering import calc_distance_matrix
>>> print(calc_distance_matrix(fps))
[0.125, 0.46153846153846156, 0.3846153846153846]

Returns: distances – Flattened diagonal matrix.
Return type: list of float

cluster_fps(fps, cutoff=0.2, similarity_func='BulkTanimotoSimilarity')[source]¶

Clusterize molecules based on fingerprints using the Butina clustering algorithm.

Parameters

fps (iterable of RDKit ExplicitBitVect or SparseBitVect) – A sequence of fingerprints.
cutoff (float) – Elements within this range of each other are considered to be neighbors.
similarity_func (str) – A similarity metric to calculate the distance between the provided fingerprints. The default value is ‘BulkTanimotoSimilarity’.

To check out the list of available similarity metrics, call the function available_similarity_functions().

Examples

First, let’s define a set of molecules.

>>> from luna.wrappers.base import MolWrapper
>>> mols = [MolWrapper.from_smiles("CCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCC").unwrap(),
...         MolWrapper.from_smiles("CCCCCCCCO").unwrap()]

Now, we generate fingerprints for those molecules.

>>> from luna.mol.fingerprint import generate_fp_for_mols
>>> fps = [d["fp"] for d in generate_fp_for_mols(mols, "morgan_fp")]

Finally, clusterize the molecules based on their fingerprints.

>>> from luna.mol.clustering import cluster_fps
>>> print(cluster_fps(fps, cutoff=0.2))
((1, 0), (2,))

Returns: clusters – Each cluster is defined as a tuple of tuples, where the first element for each cluster is its centroid.
Return type: tuple of tuples