luna.interaction.fp.shell module

class CompoundClassIds(value)[source]

Bases: enum.Enum

An enumeration of compound classes.

HETATM = 1
NUCLEOTIDE = 3
RESIDUE = 2
UNKNOWN = 5
WATER = 4
class Shell(central_atm_grp, level, radius, neighborhood=None, inter_tuples=None, diff_comp_classes=True, dtype=<class 'numpy.int64'>, seed=0, manager=None, valid=True, feature_mapper=None)[source]

Bases: object

A container to store substructural information, which is the base for LUNA fingerprints.

Shells are centered on an atom or atom group (AtomGroup objects) and represent all atoms and interactions explicitly within it.

Parameters
  • central_atm_grp (AtomGroup) – The shell center.

  • level (int) – The level (iteration) at which the shell was generated.

  • radius (float) – The shell radius.

  • neighborhood (iterable of AtomGroup) – All atoms and atom groups within a shell of radius radius centered on central_atm_grp.

  • inter_tuples (iterable of (InteractionType, AtomGroup)) – All interactions within a shell of radius radius centered on central_atm_grp. Each tuple contains an InteractionType object and one of the AtomGroup objects participating to the interaction.

    Note

    As an interaction involves two participants, it would be expected that each interaction produces two tuples. However, by default, ShellGenerator sorts atom groups and considers only the first tuple that appears, which guarantees that only one of the possible tuples is added to avoid information duplication.

  • diff_comp_classes (bool) – If True (the default), include differentiation between compound classes. That means structural information originated from AtomGroup objects belonging to residues, nucleotides, ligands, or water molecules will be considered different even if their structural information are the same. This is useful for example to differentiate protein-ligand interactions from residue-residue ones.

  • dtype (data-type) – Use arrays of type dtype to store information. The default value is np.int64.

  • seed (int) – A seed to generate shell identifiers through the MurmurHash3 hash function. The default value is 0.

  • manager (ShellManager) – The ShellManager object that stores and controls this Shell object.

  • valid (bool) – If the shell is valid or not. By default, all shells are considered valid.

  • feature_mapper (dict, optional) – A dict that maps atoms and interactions to unique values. If not provided, feature_mapper will inherit from the default mappings CHEMICAL_FEATURE_IDS and INTERACTION_IDS.

Variables
  • ~Shell.central_atm_grp (AtomGroup) –

  • ~Shell.level (int) –

  • ~Shell.radius (float) –

  • ~Shell.diff_comp_classes (bool) –

  • ~Shell.dtype (data-type) –

  • ~Shell.seed (int) –

  • ~Shell.valid (bool) –

  • ~Shell.feature_mapper (dict) –

property encoded_data

The data encoded in this shell.

Type

iterable of tuple, read-only

hash_shell()[source]

Hash this shells’ substructural information into a 32-bit integer using MurmurHash3.

Returns

A 32-bit integer representing this shell’s substructural information.

Return type

int

property identifier

This shell identifier, which is generated by hashing its encoded data with a hash function. By default, LUNA uses MurmurHash3 as the hash function.

Type

int, read-only

property inter_tuples

Each tuple contains an InteractionType object and one of the AtomGroup objects participating to the interaction.

Type

iterable of tuple, read-only

property interactions

All interactions within this shell.

Type

iterable of InteractionType, read-only

is_similar(shell)[source]

If this shell is similar to shell.

Two shells are similar if they represent the same substructural information.

Parameters

shell (Shell)

Return type

bool

is_valid()[source]

If the shell is valid or not.

Return type

bool

property manager

The ShellManager object that stores and controls this Shell object.

Type

ShellManager, read-only

property neighborhood

All atoms and atom groups within this shell.

Type

iterable of AtomGroup, read-only

property previous_shell

The previous shell, i.e., a shell centered on the same central AtomGroup object from a previous level. For example, if this shell is in level 5, return a shell from level 4 having the same center.

Type

Shell, read-only

class ShellGenerator(num_levels, radius_step, fp_length=4294967296, ifp_type=IFPType.EIFP, diff_comp_classes=True, dtype=<class 'numpy.int64'>, seed=0, bucket_size=10)[source]

Bases: object

Generate shells, the base information of LUNA fingerprints.

Parameters
  • num_levels (int) – The maximum number of iterations for fingerprint generation.

  • radius_step (float) – The multiplier used to increase shell size at each iteration. At iteration 0, shell radius is 0 * radius_step, at iteration 1, radius is 1 * radius_step, etc.

  • fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).

  • ifp_type (IFPType) – The fingerprint type (EIFP, FIFP, or HIFP). The default value is EIFP.

  • diff_comp_classes (bool) – If True (the default), include differentiation between compound classes. That means structural information originated from AtomGroup objects belonging to residues, nucleotides, ligands, or water molecules will be considered different even if their structural information are the same. This is useful for example to differentiate protein-ligand interactions from residue-residue ones.

  • dtype (data-type) – Use arrays of type dtype to store information. The default value is np.int64.

  • seed (int) – A seed to generate shell identifiers through the MurmurHash3 hash function. The default value is 0.

  • bucket_size (int) – Bucket size of KD tree. You can play around with this to optimize speed if you feel like it. The default value is 10.

Variables
  • ~ShellGenerator.num_levels (int) –

  • ~ShellGenerator.radius_step (float) –

  • ~ShellGenerator.fp_length (int) –

  • ~ShellGenerator.ifp_type (IFPType) –

  • ~ShellGenerator.diff_comp_classes (bool) –

  • ~ShellGenerator.dtype (data-type) –

  • ~ShellGenerator.seed (int) –

  • ~ShellGenerator.bucket_size (int) –

Examples

In the below example, we will assume a LUNA project object named proj_obj already exists.

First, let’s define a ShellGenerator object that will create shells over 2 iterations (levels). At each iteration, the shell radius will be increased by 3 and substructural information will be encoded following EIFP definition.

>>> from luna.interaction.fp.shell import ShellGenerator
>>> from luna.interaction.fp.type import IFPType
>>> num_levels, radius_step = 2, 3
>>> sg = ShellGenerator(num_levels, radius_step, ifp_type=IFPType.EIFP)

After defining the generator, we can create shells by calling create_shells(), which expects an AtomGroupsManager object. In this example, we will the first AtomGroupsManager object from an existing LUNA project (proj_obj).

>>> atm_grps_mngr = list(proj_obj.atm_grps_mngrs)[0]
>>> sm = sg.create_shells(atm_grps_mngr)
>>> print(sm.num_shells)
528

Now, with shells stored in the ShellManager object you can, for instance:

  • Generate fingerprints:

    >>> fp = sm.to_fingerprint(fold_to_length=1024)
    >>> print(fp.indices)
    [   2   19   22   23   34   37   39   45   54   67   71   75   83   84
       93  109  138  140  157  162  181  182  186  187  191  194  206  209
      211  237  246  251  263  271  281  296  304  315  323  358  370  374
      388  392  399  400  419  439  476  481  487  509  519  527  532  578
      587  592  604  605  629  635  645  661  668  698  711  713  732  736
      740  753  764  795  813  815  820  824  825  831  836  855  873  882
      911  926  967  975  976  984  990  996 1020]
    
  • Visualize substructural information in Pymol:

    >>> from luna.interaction.fp.view import ShellViewer
    >>> shell_tuples = [(atm_grps_mngr.entry, sm.unique_shells, proj_obj.pdb_path)]
    >>> sv = ShellViewer()
    >>> sv.new_session(shell_tuples, "example.pse")
    
create_shells(atm_grps_mngr)[source]

Perceive substructural information from AtomGroup objects and their interactions, and represent such information as shells.

Parameters

atm_grps_mngr (AtomGroupsManager) – Container of AtomGroup objects and their interactions.

Return type

ShellManager

Raises

ShellCenterNotFound – If it fails to recover a shell having a given center.

class ShellManager(num_levels, radius_step, fp_length, ifp_type, shells=None, verbose=False)[source]

Bases: object

Store and manage Shell objects.

Parameters
  • num_levels (int) – The maximum number of iterations for fingerprint generation.

  • radius_step (float) – The multiplier used to increase shell size at each iteration. At iteration 0, shell radius is 0 * radius_step, at iteration 1, radius is 1 * radius_step, etc.

  • fp_length (int) – The fingerprint length (total number of bits).

  • ifp_type (IFPType) – The fingerprint type (EIFP, FIFP, or HIFP).

  • shells (iterable of Shell, optional) – An initial sequence of Shell objects (fingerprint features).

  • verbose (bool) – If True, warnings issued during the usage of this ShellManager will be displayed. The default value is False.

Variables
  • ~ShellManager.num_levels (int) – The maximum number of iterations for fingerprint generation.

  • ~ShellManager.radius_step (float) – The multiplier used to increase shell size at each iteration.

  • ~ShellManager.fp_length (int) – The fingerprint length (total number of bits).

  • ~ShellManager.ifp_type (IFPType) – The fingerprint type (EIFP, FIFP, or HIFP).

  • ~ShellManager.shells (iterable of Shell) – The sequence of shells (fingerprint features).

  • ~ShellManager.verbose (bool) – The verbosity state.

  • ~ShellManager.version (str) – The LUNA’s version with which shells were generated.

  • ~ShellManager.levels (dict of {int: list of Shell}) –

    Register shells by level, where keys are levels and values are lists of Shell objects.

    Note

    Levels are 0-indexed. So, the first level is 0, second is 1, etc. That means if num_levels is 5, the last level will be 4.

  • ~ShellManager.centers (dict of dict of {int: Shell}) – Register shells by center, where keys are AtomGroup objects and values are dict that store all shells generated for that center at each iteration (level).

add_shell(shell)[source]

Add a new shell to shells.

Parameters

shell (Shell)

find_similar_shell(shell)[source]

Find a shell in shells similar to shell.

Two shells are similar if they represent the same substructural information.

Parameters

shell (Shell)

Returns

Return a similar shell or None if it does not find any.

Return type

Shell or None

get_identifiers(level=None, unique_shells=False)[source]

Get all shells’ identifier.

Parameters
  • level (int, optional) – If provided, only return identifiers of shells at level level.

  • unique_shells (bool) – If True, ignore identifiers of non-valid shells. The default value is False.

Return type

list of int

get_last_shell(center, unique_shells=False)[source]

Get the last shell generated for center center.

Parameters
  • center (AtomGroup) – The center of a shell, which consists of an AtomGroup object.

  • unique_shells (bool) – If True, ignore non-valid shells. That means shells generated at superior levels may be ignored if they are not valid. The default value is False.

Returns

The last shell generated for center center or None if no valid shell was found.

Return type

Shell or None

get_previous_shell(center, curr_level, unique_shells=False)[source]

Get the last shell having center center that was generated before level curr_level. For instance, if the current level (iteration) is 5 and the last valid shell generated for center \(C\) was at level 4, then get_previous_shell() would return that shell at level 4.

Parameters
  • center (AtomGroup) – The center of a shell, which consists of an AtomGroup object.

  • curr_level (int) – The current level (iteration).

  • unique_shells (bool) – If True, ignore non-valid shells and go down to inferior levels until a valid shell is found. If level 0 was reached and no valid shell was found, then return None. The default value is False.

Returns

The first previous valid shell or None if no valid shell was found.

Return type

Shell or None

get_shell_by_center_and_level(center, level, unique_shells=False)[source]

Get the shell generated for center center (AtomGroup object) at level (iteration) level.

Parameters
  • center (AtomGroup) – The center of a shell, which consists of an AtomGroup object.

  • level (int) – The target level (iteration).

  • unique_shells (bool) – If True, return the Shell object if it is unique and None otherwise. The default value is False.

Returns

The shell generated for center center at level level. If the Shell object is not unique, return None.

Return type

Shell or None

get_shells_by_center(center, unique_shells=False)[source]

Get shells by center (AtomGroup object).

Parameters
  • center (AtomGroup) – The center of a shell, which consists of an AtomGroup object.

  • unique_shells (bool) – If True, return only unique shells. Otherwise, return all shells generated for center center (the default).

Returns

All shells generated for center center at each iteration (key).

Return type

dict of {int: Shell}

get_shells_by_identifier(identifier, unique_shells=False)[source]

Get shells by identifier.

Parameters
  • identifier (int) – The shell identifier.

  • unique_shells (bool) – If True, return only unique shells. Otherwise, return all shells having the identifier identifier (the default).

Return type

list of Shell

get_shells_by_level(level, unique_shells=False)[source]

Get shells by level (iteration number).

Parameters
  • level (int)

  • unique_shells (bool) – If True, return only unique shells. Otherwise, return all shells at level level (the default).

Return type

list of Shell

get_valid_shells()[source]

Return only valid shells.

A shell is considered invalid if, by the time it is added in shells, there is another shell representing the same substructural information. That means this shell is not unique and does not contributes to any new information.

On the other hand, if the shell contributes by adding new information to shells, then it will be considered valid and unique. So, the first shell of a series of shells containing the same information is considered valid and the others invalid.

Return type

list of Shell

property num_shells

Total number of shells in shells.

Type

int, read-only

property num_unique_shells

Total number of unique shells in shells.

Type

int, read-only

to_fingerprint(fold_to_length=None, count_fp=False, unique_shells=False)[source]

Encode shells into an interaction fingerprint.

Parameters
  • fold_to_length (int, optional) – If provided, fold the fingerprint to length fold_to_length.

  • count_fp (bool) – If True, create a count fingerprint (CountFingerprint). Otherwise, return a bit fingerprint (Fingerprint).

  • unique_shells (bool) – If True, only unique shells are used to create the fingerprint. The default value is False.

Return type

CountFingerprint or Fingerprint

trace_back_feature(feature_id, ifp, unique_shells=False)[source]

Trace a feature from a fingerprint back to the shells that originated that feature.

Note

Due to fingerprint folding, multiple substructures may end up encoded in the same bit, the so-called collision problem. So, if the provided feature contains collisions, shells representing different substructures may be returned by trace_back_feature().

Parameters
  • feature_id (int) – The target feature id.

  • ifp (Fingerprint) – The fingerprint containing the feature feature_id.

  • unique_shells (bool) – If True, ignore identifiers of non-valid shells. The default value is False.

Yields

Shell

Examples

In the below example, we will assume a LUNA project object named proj_obj already exists. Then, we will generate an EIFP fingerprint for the first AtomGroupsManager object at proj_obj.

>>> from luna.interaction.fp.shell import ShellGenerator
>>> from luna.interaction.fp.type import IFPType
>>> atm_grps_mngr = list(proj_obj.atm_grps_mngrs)[0]
>>> num_levels, radius_step = 2, 3
>>> sg = ShellGenerator(num_levels, radius_step, ifp_type=IFPType.EIFP)
>>> sm = sg.create_shells(atm_grps_mngr)
>>> fp = sm.to_fingerprint(fold_to_length=1024, count_fp=True)
>>> print(fp.indices)
[   2   19   22   23   34   37   39   45   54   67   71   75   83   84
   93  109  138  140  157  162  181  182  186  187  191  194  206  209
  211  237  246  251  263  271  281  296  304  315  323  358  370  374
  388  392  399  400  419  439  476  481  487  509  519  527  532  578
  587  592  604  605  629  635  645  661  668  698  711  713  732  736
  740  753  764  795  813  815  820  824  825  831  836  855  873  882
  911  926  967  975  976  984  990  996 1020]

Now, we can trace features back to original identifiers and investigate its substructural information.

>>> ori_indices = list(sm.trace_back_feature(34, fp, unique_shells=True))
>>> print(ori_indices)
[(494318626, [<Shell: level=0, radius=0.000000, center=<AtomGroup: [<ExtendedAtom: 3QQK/0/A/GLN/85/CD>, <ExtendedAtom: 3QQK/0/A/GLN/85/NE2>, <ExtendedAtom: 3QQK/0/A/GLN/85/OE1>]>, interactions=0>])]
property unique_shells

Unique shells. Return the same as get_valid_shells().

Type

iterable of Shell, read-only