luna.mol.clustering module¶
- available_similarity_functions()[source]¶
Return a list of all similarity metrics available at RDKit.
- calc_distance_matrix(fps, similarity_func='BulkTanimotoSimilarity')[source]¶
- Calculate the pairwise distance (dissimilarity) between fingerprints in
fps
using the similarity metric
similarity_func
.
- Parameters
fps (iterable of RDKit
ExplicitBitVect
orSparseBitVect
) – A sequence of fingerprints.similarity_func (str) – A similarity metric to calculate the distance between the provided fingerprints. The default value is ‘BulkTanimotoSimilarity’.
To check out the list of available similarity metrics, call the function
available_similarity_functions()
.
Examples
First, let’s define a set of molecules.
>>> from luna.wrappers.base import MolWrapper >>> mols = [MolWrapper.from_smiles("CCCCCC").unwrap(), ... MolWrapper.from_smiles("CCCCCCCC").unwrap(), ... MolWrapper.from_smiles("CCCCCCCCO").unwrap()]
Now, we generate fingerprints for those molecules.
>>> from luna.mol.fingerprint import generate_fp_for_mols >>> fps = [d["fp"] for d in generate_fp_for_mols(mols, "morgan_fp")]
Finally, calculate the distance between the molecules based on their fingerprints.
>>> from luna.mol.clustering import calc_distance_matrix >>> print(calc_distance_matrix(fps)) [0.125, 0.46153846153846156, 0.3846153846153846]
- Returns
distances – Flattened diagonal matrix.
- Return type
list of float
- Calculate the pairwise distance (dissimilarity) between fingerprints in
- cluster_fps(fps, cutoff=0.2, similarity_func='BulkTanimotoSimilarity')[source]¶
Clusterize molecules based on fingerprints using the Butina clustering algorithm.
- Parameters
fps (iterable of RDKit
ExplicitBitVect
orSparseBitVect
) – A sequence of fingerprints.cutoff (float) – Elements within this range of each other are considered to be neighbors.
similarity_func (str) – A similarity metric to calculate the distance between the provided fingerprints. The default value is ‘BulkTanimotoSimilarity’.
To check out the list of available similarity metrics, call the function
available_similarity_functions()
.
Examples
First, let’s define a set of molecules.
>>> from luna.wrappers.base import MolWrapper >>> mols = [MolWrapper.from_smiles("CCCCCC").unwrap(), ... MolWrapper.from_smiles("CCCCCCCC").unwrap(), ... MolWrapper.from_smiles("CCCCCCCCO").unwrap()]
Now, we generate fingerprints for those molecules.
>>> from luna.mol.fingerprint import generate_fp_for_mols >>> fps = [d["fp"] for d in generate_fp_for_mols(mols, "morgan_fp")]
Finally, clusterize the molecules based on their fingerprints.
>>> from luna.mol.clustering import cluster_fps >>> print(cluster_fps(fps, cutoff=0.2)) ((1, 0), (2,))
- Returns
clusters – Each cluster is defined as a tuple of tuples, where the first element for each cluster is its centroid.
- Return type
tuple of tuples