luna.interaction.fp.fingerprint module¶
- class CountFingerprint(indices=None, counts=None, fp_length=4294967296, unfolded_fp=None, unfolding_map=None, props=None)[source]¶
Bases:
luna.interaction.fp.fingerprint.Fingerprint
A fingerprint that stores the number of occurrences of each index.
- Parameters
indices (array_like of int, optional) – Indices of “on” bits. It is optional if
counts
is provided.counts (dict, optional) – Mapping between each index in
indices
to the number of counts. If not provided, the default count value of 1 will be used instead.fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).
unfolded_fp (
Fingerprint
or None) – The unfolded version of this fingerprint. If None, this fingerprint may have not been folded yet.unfolding_map (dict, optional) – A mapping between current indices and indices from the unfolded version of this fingerprint what makes it possible to trace folded bits back to the original shells (features).
props (dict, optional) – Custom properties of the fingerprint, consisting of a string keyword and some value. It can be used, for instance, to save the ligand name and parameters used to generate shells (IFP features).
- property counts¶
Mapping between each index in
indices
to the number of counts.- Type
dict, read-only
- fold(new_length=4096)[source]¶
Fold this fingerprint to size
new_length
.- Parameters
new_length (int) – Length of the new fingerprint, ideally multiple of 2. The default value is 4096.
- Returns
Folded
Fingerprint
.- Return type
- Raises
BitsValueError – If the new fingerprint length is not a multiple of 2 or is greater than the existing fingerprint length.
Examples
>>> from luna.interaction.fp.fingerprint import CountFingerprint >>> import numpy as np >>> np.random.seed(0) >>> on_bits = 8 >>> fp_length = 32 >>> indices, counts = np.unique(np.random.randint(0, fp_length, on_bits), return_counts=True) >>> counts = dict(zip(indices, counts)) >>> print(counts) {0: 1, 3: 2, 7: 1, 12: 1, 15: 1, 21: 1, 27: 1} >>> fp = CountFingerprint.from_indices(indices, counts=counts, fp_length=fp_length) >>> print(fp.indices) [ 0 3 7 12 15 21 27] >>> print(fp.to_vector(compressed=False)) [1 0 0 2 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0] >>> folded_fp = fp.fold(8) >>> print(folded_fp.indices) [0 3 4 5 7] >>> print(folded_fp.to_vector(compressed=False)) [1 0 0 3 1 1 0 2]
- classmethod from_bit_string(bit_string, counts=None, fp_length=None, **kwargs)[source]¶
Initialize from a bit string (e.g. ‘0010100110’).
- Parameters
bit_string (str) – String of 0s and 1s.
counts (dict, optional) – Mapping between each index in
indices
to the number of counts. If not provided, the default count value of 1 will be used instead.fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the string length.
**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
Examples
>>> from luna.interaction.fp.fingerprint import CountFingerprint >>> fp = CountFingerprint.from_bit_string("0010100110000010", ... counts={2: 5, 4: 1, 7: 3, 8: 1, 14: 2}) >>> print(fp.indices) [ 2 4 7 8 14] >>> print(fp.counts) {2: 5, 4: 1, 7: 3, 8: 1, 14: 2}
- classmethod from_counts(counts, fp_length=4294967296, **kwargs)[source]¶
Initialize from a counting map.
- Parameters
counts (dict) – Mapping between each index in
indices
to the number of counts.fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).
**kwargs (dict, optional) – Extra arguments to
CountFingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
Examples
>>> from luna.interaction.fp.fingerprint import CountFingerprint >>> import numpy as np >>> np.random.seed(0) >>> on_bits = 8 >>> fp_length = 32 >>> counts = dict(zip(*np.unique(np.random.randint(0, fp_length, on_bits), ... return_counts=True))) >>> print(counts) {0: 1, 3: 2, 7: 1, 12: 1, 15: 1, 21: 1, 27: 1} >>> fp = CountFingerprint.from_counts(counts=counts, fp_length=fp_length) >>> print(fp.indices) [ 0 3 7 12 15 21 27] >>> print(fp.to_vector(compressed=False)) 1 0 0 2 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
- classmethod from_fingerprint(fp, **kwargs)[source]¶
Initialize from an existing fingerprint.
- Parameters
fp (
Fingerprint
) – An existing fingerprint.**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
- classmethod from_indices(indices=None, counts=None, fp_length=4294967296, **kwargs)[source]¶
Initialize from an array of indices.
- Parameters
indices (array_like of int, optional) – Indices of “on” bits. It is optional if
counts
is provided.counts (dict, optional) – Mapping between each index in
indices
to the number of counts. If not provided, the default count value of 1 will be used instead.fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).
**kwargs (dict, optional) – Extra arguments to
CountFingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
Examples
>>> from luna.interaction.fp.fingerprint import CountFingerprint >>> import numpy as np >>> np.random.seed(0) >>> on_bits = 8 >>> fp_length = 32 >>> indices, counts = np.unique(np.random.randint(0, fp_length, on_bits), return_counts=True) >>> counts = dict(zip(indices, counts)) >>> print(counts) {0: 1, 3: 2, 7: 1, 12: 1, 15: 1, 21: 1, 27: 1} >>> fp = CountFingerprint.from_indices(indices, counts=counts, fp_length=fp_length) >>> print(fp.indices) [ 0 3 7 12 15 21 27] >>> print(fp.to_vector(compressed=False)) [1 0 0 2 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
- classmethod from_vector(vector, fp_length=None, **kwargs)[source]¶
Initialize from a vector.
- Parameters
vector (
numpy.ndarray
orscipy.sparse.csr_matrix
) – Array of counts.fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the
vector
shape.**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
Examples
>>> from luna.interaction.fp.fingerprint import CountFingerprint >>> import numpy as np >>> np.random.seed(0) >>> fp_length = 32 >>> vector = np.random.choice(5, size=(fp_length,), p=[0.76, 0.1, 0.1, 0.02, 0.02]) >>> print(vector) [0 0 0 0 2 3 0 1 0 0 2 0 0 0 1 1 2 3 1 0 1 0 0 0 2 0 0 0 1 0 0 0] >>> fp = CountFingerprint.from_vector(vector) >>> print(fp.indices) [ 4 5 7 10 14 15 16 17 18 20 24 28] >>> print(fp.counts) {4: 2, 5: 3, 7: 1, 10: 2, 14: 1, 15: 1, 16: 2, 17: 3, 18: 1, 20: 1, 24: 2, 28: 1}
- class Fingerprint(indices, fp_length=4294967296, unfolded_fp=None, unfolding_map=None, props=None)[source]¶
Bases:
object
A fingerprint that stores indices of “on” bits.
- Parameters
indices (array_like of int) – Indices of “on” bits.
fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).
unfolded_fp (
Fingerprint
or None) – The unfolded version of this fingerprint. If None, this fingerprint may have not been folded yet.unfolding_map (dict, optional) – A mapping between current indices and indices from the unfolded version of this fingerprint what makes it possible to trace folded bits back to the original shells (features).
props (dict, optional) – Custom properties of the fingerprint, consisting of a string keyword and some value. It can be used, for instance, to save the ligand name and parameters used to generate shells (IFP features).
- calc_similarity(other)[source]¶
Calculates the Tanimoto similarity between this fingeprint and
other
.- Return type
Examples
>>> from luna.interaction.fp.fingerprint import Fingerprint >>> fp1 = Fingerprint.from_bit_string("0010101110000010") >>> fp2 = Fingerprint.from_bit_string("1010100110010010") >>> print(fp1.calc_similarity(fp2)) 0.625
- property counts¶
Mapping between each index in
indices
to the number of counts, which is always 1 for bit fingerprints.- Type
dict, read-only
- difference(other)[source]¶
Return indices in this fingerprint but not in
other
.- Return type
- Raises
InvalidFingerprintType – If the informed fingerprint is not an instance of
Fingerprint
.BitsValueError – If the fingerprints have different lengths.
- fold(new_length=4096)[source]¶
Fold this fingerprint to size
new_length
.- Parameters
new_length (int) – Length of the new fingerprint, ideally multiple of 2. The default value is 4096.
- Returns
Folded
Fingerprint
.- Return type
- Raises
BitsValueError – If the new fingerprint length is not a multiple of 2 or is greater than the existing fingerprint length.
Examples
>>> from luna.interaction.fp.fingerprint import Fingerprint >>> import numpy as np >>> np.random.seed(0) >>> on_bits = 8 >>> fp_length = 32 >>> indices = np.random.randint(0, fp_length, on_bits) >>> print(indices) [12 15 21 0 3 27 3 7] >>> fp = Fingerprint.from_indices(indices, fp_length=fp_length) >>> print(fp.indices) [ 0 3 7 12 15 21 27] >>> print(fp.to_vector(compressed=False)) [1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0] >>> folded_fp = fp.fold(8) >>> print(folded_fp.indices) [0 3 4 5 7] >>> print(folded_fp.to_vector(compressed=False)) [1 0 0 1 1 1 0 1]
- classmethod from_bit_string(bit_string, fp_length=None, **kwargs)[source]¶
Initialize from a bit string (e.g. ‘0010100110’).
- Parameters
bit_string (str) – String of 0s and 1s.
fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the string length.
**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
Examples
>>> from luna.interaction.fp.fingerprint import Fingerprint >>> fp = Fingerprint.from_bit_string("0010100110000010") >>> print(fp.indices) [ 2 4 7 8 14] >>> print(fp.fp_length) 16
- classmethod from_fingerprint(fp, **kwargs)[source]¶
Initialize from an existing fingerprint.
- Parameters
fp (
Fingerprint
) – An existing fingerprint.**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
- classmethod from_indices(indices, fp_length=4294967296, **kwargs)[source]¶
Initialize from an array of indices.
- Parameters
indices (array_like of int) – Indices of “on” bits.
fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).
**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
Examples
>>> from luna.interaction.fp.fingerprint import Fingerprint >>> import numpy as np >>> np.random.seed(0) >>> on_bits = 8 >>> fp_length = 32 >>> indices = np.random.randint(0, fp_length, on_bits) >>> print(indices) [12 15 21 0 3 27 3 7] >>> fp = Fingerprint.from_indices(indices, fp_length=fp_length) >>> print(fp.indices) [ 0 3 7 12 15 21 27] >>> print(fp.to_vector(compressed=False)) [1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
- classmethod from_rdkit(rdkit_fp, **kwargs)[source]¶
Initialize from an RDKit fingerprint.
- Parameters
rdkit_fp (
ExplicitBitVect
orSparseBitVect
) – An existing RDKit fingerprint.**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
- classmethod from_vector(vector, fp_length=None, **kwargs)[source]¶
Initialize from a vector.
- Parameters
vector (
numpy.ndarray
orscipy.sparse.csr_matrix
) – Array of bits.fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the
vector
shape.**kwargs (dict, optional) – Extra arguments to
Fingerprint
. Refer to the documentation for a list of all possible arguments.
- Return type
Examples
>>> from luna.interaction.fp.fingerprint import Fingerprint >>> import numpy as np >>> np.random.seed(0) >>> fp_length = 32 >>> vector = np.random.choice([0, 1], size=(fp_length,), p=[0.8, 0.2]) >>> print(vector) [0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0] >>> fp = Fingerprint.from_vector(vector) >>> print(fp.indices) [ 7 8 13 17 19 20 27] >>> print(fp.fp_length) 32
- get_bit(index)[source]¶
Get the bit/count value at index
index
.- Raises
BitsValueError – If the provided index is in a different bit scale.
- property indices¶
Indices of “on” bits.
- Type
array_like of int, read-only
- intersection(other)[source]¶
Return the intersection between indices of two fingerprints.
- Return type
- Raises
InvalidFingerprintType – If the informed fingerprint is not an instance of
Fingerprint
.BitsValueError – If the fingerprints have different lengths.
- property num_levels¶
The property ‘num_levels’ used to generate this fingerprint (see
ShellGenerator
). If it was not provided, then return None.- Type
- property num_shells¶
The property ‘num_shells’ (see
ShellGenerator
). If it was not provided, then return None.- Type
- property radius_step¶
The property ‘radius_step’ used to generate this fingerprint (see
ShellGenerator
). If it was not provided, then return None.- Type
- symmetric_difference(other)[source]¶
Return indices in either this fingerprint or
other
but not both.- Return type
- Raises
InvalidFingerprintType – If the informed fingerprint is not an instance of
Fingerprint
.BitsValueError – If the fingerprints have different lengths.
- to_bit_string()[source]¶
Convert this fingerprint to a string of bits.
Warning
This function may raise a MemoryError exception when using huge indices vectors. If you found this issue, you may want to try a different data type or apply a folding operation before calling
to_bit_string
.- Return type
- Raises
MemoryError – If the operation ran out of memory.
- to_bit_vector(compressed=True)[source]¶
Convert this fingerprint to a vector of bits.
Warning
This function may raise a MemoryError exception when using huge indices vectors. If you found this issue, you may want to try a different data type or apply a folding operation before calling
to_bit_vector
.- Parameters
compressed (bool) – If True, build a compressed sparse matrix (scipy.sparse.csr_matrix).
- Returns
Vector of bits/counts. Return a compressed sparse matrix (
scipy.sparse.csr_matrix
) ifcompressed
is True. Otherwise, return a Numpy array (numpy.ndarray
)- Return type
- Raises
BitsValueError – If some of the fingerprint indices are greater than the fingerprint length.
MemoryError – If the operation ran out of memory.
- to_rdkit(rdkit_fp_cls=None)[source]¶
Convert this fingerprint to an RDKit fingerprint.
Note
If the fingerprint length exceeds the maximum RDKit fingerprint length (\(2^{31} - 1\)), this fingerprint will be folded to length \(2^{31} - 1\) before conversion.
- Returns
If
fp_length
is less than \(1e5\),ExplicitBitVect
is used. Otherwise,SparseBitVect
is used.- Return type
- to_vector(compressed=True, dtype=<class 'numpy.int32'>)[source]¶
Convert this fingerprint to a vector of bits/counts.
Warning
This function may raise a MemoryError exception when using huge indices vectors. If you found this issue, you may want to try a different data type or apply a folding operation before calling
to_vector
.- Parameters
compressed (bool) – If True, build a compressed sparse matrix (scipy.sparse.csr_matrix).
dtype (data-type) – The default value is np.int32.
- Returns
Vector of bits/counts. Return a compressed sparse matrix (
scipy.sparse.csr_matrix
) ifcompressed
is True. Otherwise, return a Numpy array (numpy.ndarray
)- Return type
- Raises
BitsValueError – If some of the fingerprint indices are greater than the fingerprint length.
MemoryError – If the operation ran out of memory.
- property unfolded_fp¶
The unfolded version of this fingerprint. If None, this fingerprint may have not been folded yet.
- Type
Fingerprint
or None, read-only
- property unfolded_indices¶
Indices of “on” bits in the unfolded fingerprint.
- Type
array_like of int, read-only
- property unfolding_map¶
The mapping between current indices and indices from the unfolded version of this fingerprint what makes it possible to trace folded bits back to the original shells (features).
- Type
dict, read-only
- union(other)[source]¶
Return the union of indices of two fingerprints.
- Return type
- Raises
InvalidFingerprintType – If the informed fingerprint is not an instance of
Fingerprint
.BitsValueError – If the fingerprints have different lengths.