luna.interaction.fp.fingerprint module

class CountFingerprint(indices=None, counts=None, fp_length=4294967296, unfolded_fp=None, unfolding_map=None, props=None)[source]

Bases: luna.interaction.fp.fingerprint.Fingerprint

A fingerprint that stores the number of occurrences of each index.

Parameters
  • indices (array_like of int, optional) – Indices of “on” bits. It is optional if counts is provided.

  • counts (dict, optional) – Mapping between each index in indices to the number of counts. If not provided, the default count value of 1 will be used instead.

  • fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).

  • unfolded_fp (Fingerprint or None) – The unfolded version of this fingerprint. If None, this fingerprint may have not been folded yet.

  • unfolding_map (dict, optional) – A mapping between current indices and indices from the unfolded version of this fingerprint what makes it possible to trace folded bits back to the original shells (features).

  • props (dict, optional) – Custom properties of the fingerprint, consisting of a string keyword and some value. It can be used, for instance, to save the ligand name and parameters used to generate shells (IFP features).

property counts

Mapping between each index in indices to the number of counts.

Type

dict, read-only

fold(new_length=4096)[source]

Fold this fingerprint to size new_length.

Parameters

new_length (int) – Length of the new fingerprint, ideally multiple of 2. The default value is 4096.

Returns

Folded Fingerprint.

Return type

Fingerprint

Raises

BitsValueError – If the new fingerprint length is not a multiple of 2 or is greater than the existing fingerprint length.

Examples

>>> from luna.interaction.fp.fingerprint import CountFingerprint
>>> import numpy as np
>>> np.random.seed(0)
>>> on_bits = 8
>>> fp_length = 32
>>> indices, counts = np.unique(np.random.randint(0, fp_length, on_bits), return_counts=True)
>>> counts = dict(zip(indices, counts))
>>> print(counts)
{0: 1, 3: 2, 7: 1, 12: 1, 15: 1, 21: 1, 27: 1}
>>> fp = CountFingerprint.from_indices(indices, counts=counts, fp_length=fp_length)
>>> print(fp.indices)
[ 0  3  7 12 15 21 27]
>>> print(fp.to_vector(compressed=False))
[1 0 0 2 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
>>> folded_fp = fp.fold(8)
>>> print(folded_fp.indices)
[0 3 4 5 7]
>>> print(folded_fp.to_vector(compressed=False))
[1 0 0 3 1 1 0 2]
classmethod from_bit_string(bit_string, counts=None, fp_length=None, **kwargs)[source]

Initialize from a bit string (e.g. ‘0010100110’).

Parameters
  • bit_string (str) – String of 0s and 1s.

  • counts (dict, optional) – Mapping between each index in indices to the number of counts. If not provided, the default count value of 1 will be used instead.

  • fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the string length.

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

CountFingerprint

Examples

>>> from luna.interaction.fp.fingerprint import CountFingerprint
>>> fp = CountFingerprint.from_bit_string("0010100110000010",
...                                       counts={2: 5, 4: 1, 7: 3, 8: 1, 14: 2})
>>> print(fp.indices)
[ 2  4  7  8 14]
>>> print(fp.counts)
{2: 5, 4: 1, 7: 3, 8: 1, 14: 2}
classmethod from_counts(counts, fp_length=4294967296, **kwargs)[source]

Initialize from a counting map.

Parameters
  • counts (dict) – Mapping between each index in indices to the number of counts.

  • fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).

  • **kwargs (dict, optional) – Extra arguments to CountFingerprint. Refer to the documentation for a list of all possible arguments.

Return type

CountFingerprint

Examples

>>> from luna.interaction.fp.fingerprint import CountFingerprint
>>> import numpy as np
>>> np.random.seed(0)
>>> on_bits = 8
>>> fp_length = 32
>>> counts = dict(zip(*np.unique(np.random.randint(0, fp_length, on_bits),
...                              return_counts=True)))
>>> print(counts)
{0: 1, 3: 2, 7: 1, 12: 1, 15: 1, 21: 1, 27: 1}
>>> fp = CountFingerprint.from_counts(counts=counts, fp_length=fp_length)
>>> print(fp.indices)
[ 0  3  7 12 15 21 27]
>>> print(fp.to_vector(compressed=False))
1 0 0 2 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
classmethod from_fingerprint(fp, **kwargs)[source]

Initialize from an existing fingerprint.

Parameters
  • fp (Fingerprint) – An existing fingerprint.

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

CountFingerprint

classmethod from_indices(indices=None, counts=None, fp_length=4294967296, **kwargs)[source]

Initialize from an array of indices.

Parameters
  • indices (array_like of int, optional) – Indices of “on” bits. It is optional if counts is provided.

  • counts (dict, optional) – Mapping between each index in indices to the number of counts. If not provided, the default count value of 1 will be used instead.

  • fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).

  • **kwargs (dict, optional) – Extra arguments to CountFingerprint. Refer to the documentation for a list of all possible arguments.

Return type

CountFingerprint

Examples

>>> from luna.interaction.fp.fingerprint import CountFingerprint
>>> import numpy as np
>>> np.random.seed(0)
>>> on_bits = 8
>>> fp_length = 32
>>> indices, counts = np.unique(np.random.randint(0, fp_length, on_bits), return_counts=True)
>>> counts = dict(zip(indices, counts))
>>> print(counts)
{0: 1, 3: 2, 7: 1, 12: 1, 15: 1, 21: 1, 27: 1}
>>> fp = CountFingerprint.from_indices(indices, counts=counts, fp_length=fp_length)
>>> print(fp.indices)
[ 0  3  7 12 15 21 27]
>>> print(fp.to_vector(compressed=False))
[1 0 0 2 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
classmethod from_vector(vector, fp_length=None, **kwargs)[source]

Initialize from a vector.

Parameters
  • vector (numpy.ndarray or scipy.sparse.csr_matrix) – Array of counts.

  • fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the vector shape.

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

CountFingerprint

Examples

>>> from luna.interaction.fp.fingerprint import CountFingerprint
>>> import numpy as np
>>> np.random.seed(0)
>>> fp_length = 32
>>> vector = np.random.choice(5, size=(fp_length,), p=[0.76, 0.1, 0.1, 0.02, 0.02])
>>> print(vector)
[0 0 0 0 2 3 0 1 0 0 2 0 0 0 1 1 2 3 1 0 1 0 0 0 2 0 0 0 1 0 0 0]
>>> fp = CountFingerprint.from_vector(vector)
>>> print(fp.indices)
[ 4  5  7 10 14 15 16 17 18 20 24 28]
>>> print(fp.counts)
{4: 2, 5: 3, 7: 1, 10: 2, 14: 1, 15: 1, 16: 2, 17: 3, 18: 1, 20: 1, 24: 2, 28: 1}
get_count(index)[source]

Get the count value at index index. Return 0 if index is not in counts.

class Fingerprint(indices, fp_length=4294967296, unfolded_fp=None, unfolding_map=None, props=None)[source]

Bases: object

A fingerprint that stores indices of “on” bits.

Parameters
  • indices (array_like of int) – Indices of “on” bits.

  • fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).

  • unfolded_fp (Fingerprint or None) – The unfolded version of this fingerprint. If None, this fingerprint may have not been folded yet.

  • unfolding_map (dict, optional) – A mapping between current indices and indices from the unfolded version of this fingerprint what makes it possible to trace folded bits back to the original shells (features).

  • props (dict, optional) – Custom properties of the fingerprint, consisting of a string keyword and some value. It can be used, for instance, to save the ligand name and parameters used to generate shells (IFP features).

property bit_count

Number of “on” bits.

Type

int, read-only

calc_similarity(other)[source]

Calculates the Tanimoto similarity between this fingeprint and other.

Return type

float

Examples

>>> from luna.interaction.fp.fingerprint import Fingerprint
>>> fp1 = Fingerprint.from_bit_string("0010101110000010")
>>> fp2 = Fingerprint.from_bit_string("1010100110010010")
>>> print(fp1.calc_similarity(fp2))
0.625
property counts

Mapping between each index in indices to the number of counts, which is always 1 for bit fingerprints.

Type

dict, read-only

property density

Proportion of “on” bits in fingerprint.

Type

float, read-only

difference(other)[source]

Return indices in this fingerprint but not in other.

Return type

numpy.ndarray

Raises
  • InvalidFingerprintType – If the informed fingerprint is not an instance of Fingerprint.

  • BitsValueError – If the fingerprints have different lengths.

fold(new_length=4096)[source]

Fold this fingerprint to size new_length.

Parameters

new_length (int) – Length of the new fingerprint, ideally multiple of 2. The default value is 4096.

Returns

Folded Fingerprint.

Return type

Fingerprint

Raises

BitsValueError – If the new fingerprint length is not a multiple of 2 or is greater than the existing fingerprint length.

Examples

>>> from luna.interaction.fp.fingerprint import Fingerprint
>>> import numpy as np
>>> np.random.seed(0)
>>> on_bits = 8
>>> fp_length = 32
>>> indices = np.random.randint(0, fp_length, on_bits)
>>> print(indices)
[12 15 21  0  3 27  3  7]
>>> fp = Fingerprint.from_indices(indices, fp_length=fp_length)
>>> print(fp.indices)
[ 0  3  7 12 15 21 27]
>>> print(fp.to_vector(compressed=False))
[1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
>>> folded_fp = fp.fold(8)
>>> print(folded_fp.indices)
[0 3 4 5 7]
>>> print(folded_fp.to_vector(compressed=False))
[1 0 0 1 1 1 0 1]
property fp_length

The fingerprint length (total number of bits).

Type

int, read-only

classmethod from_bit_string(bit_string, fp_length=None, **kwargs)[source]

Initialize from a bit string (e.g. ‘0010100110’).

Parameters
  • bit_string (str) – String of 0s and 1s.

  • fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the string length.

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

Fingerprint

Examples

>>> from luna.interaction.fp.fingerprint import Fingerprint
>>> fp = Fingerprint.from_bit_string("0010100110000010")
>>> print(fp.indices)
[ 2  4  7  8 14]
>>> print(fp.fp_length)
16
classmethod from_fingerprint(fp, **kwargs)[source]

Initialize from an existing fingerprint.

Parameters
  • fp (Fingerprint) – An existing fingerprint.

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

Fingerprint

classmethod from_indices(indices, fp_length=4294967296, **kwargs)[source]

Initialize from an array of indices.

Parameters
  • indices (array_like of int) – Indices of “on” bits.

  • fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

Fingerprint

Examples

>>> from luna.interaction.fp.fingerprint import Fingerprint
>>> import numpy as np
>>> np.random.seed(0)
>>> on_bits = 8
>>> fp_length = 32
>>> indices = np.random.randint(0, fp_length, on_bits)
>>> print(indices)
[12 15 21  0  3 27  3  7]
>>> fp = Fingerprint.from_indices(indices, fp_length=fp_length)
>>> print(fp.indices)
[ 0  3  7 12 15 21 27]
>>> print(fp.to_vector(compressed=False))
[1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
classmethod from_rdkit(rdkit_fp, **kwargs)[source]

Initialize from an RDKit fingerprint.

Parameters
  • rdkit_fp (ExplicitBitVect or SparseBitVect) – An existing RDKit fingerprint.

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

Fingerprint

classmethod from_vector(vector, fp_length=None, **kwargs)[source]

Initialize from a vector.

Parameters
  • vector (numpy.ndarray or scipy.sparse.csr_matrix) – Array of bits.

  • fp_length (int, optional) – The fingerprint length (total number of bits). If not provided, the fingerprint length will be defined based on the vector shape.

  • **kwargs (dict, optional) – Extra arguments to Fingerprint. Refer to the documentation for a list of all possible arguments.

Return type

Fingerprint

Examples

>>> from luna.interaction.fp.fingerprint import Fingerprint
>>> import numpy as np
>>> np.random.seed(0)
>>> fp_length = 32
>>> vector = np.random.choice([0, 1], size=(fp_length,), p=[0.8, 0.2])
>>> print(vector)
[0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0]
>>> fp = Fingerprint.from_vector(vector)
>>> print(fp.indices)
[ 7  8 13 17 19 20 27]
>>> print(fp.fp_length)
32
get_bit(index)[source]

Get the bit/count value at index index.

Raises

BitsValueError – If the provided index is in a different bit scale.

get_num_bits()[source]

Get the fingerprint length (total number of bits).

get_num_off_bits()[source]

Get the number of “off” bits.

get_num_on_bits()[source]

Get the number of “on” bits.

get_on_bits()[source]

Get “on” bits.

Return type

numpy.ndarray

get_prop(key)[source]

Get value of the property key. If not set, raise KeyError.

property indices

Indices of “on” bits.

Type

array_like of int, read-only

intersection(other)[source]

Return the intersection between indices of two fingerprints.

Return type

numpy.ndarray

Raises
  • InvalidFingerprintType – If the informed fingerprint is not an instance of Fingerprint.

  • BitsValueError – If the fingerprints have different lengths.

property name

The property ‘name’. If it was not provided, then return an empty string.

Type

str

property num_levels

The property ‘num_levels’ used to generate this fingerprint (see ShellGenerator). If it was not provided, then return None.

Type

int

property num_shells

The property ‘num_shells’ (see ShellGenerator). If it was not provided, then return None.

Type

int

property props

The custom properties of the fingerprint.

Type

dict, read-only

property radius_step

The property ‘radius_step’ used to generate this fingerprint (see ShellGenerator). If it was not provided, then return None.

Type

float

set_prop(key, value)[source]

Set value to the property key.

symmetric_difference(other)[source]

Return indices in either this fingerprint or other but not both.

Return type

numpy.ndarray

Raises
  • InvalidFingerprintType – If the informed fingerprint is not an instance of Fingerprint.

  • BitsValueError – If the fingerprints have different lengths.

to_bit_string()[source]

Convert this fingerprint to a string of bits.

Warning

This function may raise a MemoryError exception when using huge indices vectors. If you found this issue, you may want to try a different data type or apply a folding operation before calling to_bit_string.

Return type

str

Raises

MemoryError – If the operation ran out of memory.

to_bit_vector(compressed=True)[source]

Convert this fingerprint to a vector of bits.

Warning

This function may raise a MemoryError exception when using huge indices vectors. If you found this issue, you may want to try a different data type or apply a folding operation before calling to_bit_vector.

Parameters

compressed (bool) – If True, build a compressed sparse matrix (scipy.sparse.csr_matrix).

Returns

Vector of bits/counts. Return a compressed sparse matrix (scipy.sparse.csr_matrix) if compressed is True. Otherwise, return a Numpy array (numpy.ndarray)

Return type

numpy.ndarray or scipy.sparse.csr_matrix

Raises
  • BitsValueError – If some of the fingerprint indices are greater than the fingerprint length.

  • MemoryError – If the operation ran out of memory.

to_rdkit(rdkit_fp_cls=None)[source]

Convert this fingerprint to an RDKit fingerprint.

Note

If the fingerprint length exceeds the maximum RDKit fingerprint length (\(2^{31} - 1\)), this fingerprint will be folded to length \(2^{31} - 1\) before conversion.

Returns

If fp_length is less than \(1e5\), ExplicitBitVect is used. Otherwise, SparseBitVect is used.

Return type

ExplicitBitVect or SparseBitVect

to_vector(compressed=True, dtype=<class 'numpy.int32'>)[source]

Convert this fingerprint to a vector of bits/counts.

Warning

This function may raise a MemoryError exception when using huge indices vectors. If you found this issue, you may want to try a different data type or apply a folding operation before calling to_vector.

Parameters
  • compressed (bool) – If True, build a compressed sparse matrix (scipy.sparse.csr_matrix).

  • dtype (data-type) – The default value is np.int32.

Returns

Vector of bits/counts. Return a compressed sparse matrix (scipy.sparse.csr_matrix) if compressed is True. Otherwise, return a Numpy array (numpy.ndarray)

Return type

numpy.ndarray or scipy.sparse.csr_matrix

Raises
  • BitsValueError – If some of the fingerprint indices are greater than the fingerprint length.

  • MemoryError – If the operation ran out of memory.

unfold()[source]

Unfold this fingerprint and return its parent fingerprint.

Return type

Fingerprint

property unfolded_fp

The unfolded version of this fingerprint. If None, this fingerprint may have not been folded yet.

Type

Fingerprint or None, read-only

property unfolded_indices

Indices of “on” bits in the unfolded fingerprint.

Type

array_like of int, read-only

property unfolding_map

The mapping between current indices and indices from the unfolded version of this fingerprint what makes it possible to trace folded bits back to the original shells (features).

Type

dict, read-only

union(other)[source]

Return the union of indices of two fingerprints.

Return type

numpy.ndarray

Raises
  • InvalidFingerprintType – If the informed fingerprint is not an instance of Fingerprint.

  • BitsValueError – If the fingerprints have different lengths.