luna.interaction.fp.shell module¶
- class CompoundClassIds(value)[source]¶
Bases:
enum.EnumAn enumeration of compound classes.
- HETATM = 1¶
- NUCLEOTIDE = 3¶
- RESIDUE = 2¶
- UNKNOWN = 5¶
- WATER = 4¶
- class Shell(central_atm_grp, level, radius, neighborhood=None, inter_tuples=None, diff_comp_classes=True, dtype=<class 'numpy.int64'>, seed=0, manager=None, valid=True, feature_mapper=None)[source]¶
Bases:
objectA container to store substructural information, which is the base for LUNA fingerprints.
Shells are centered on an atom or atom group (
AtomGroupobjects) and represent all atoms and interactions explicitly within it.- Parameters
central_atm_grp (
AtomGroup) – The shell center.level (int) – The level (iteration) at which the shell was generated.
radius (float) – The shell radius.
neighborhood (iterable of
AtomGroup) – All atoms and atom groups within a shell of radiusradiuscentered oncentral_atm_grp.inter_tuples (iterable of (
InteractionType,AtomGroup)) – All interactions within a shell of radiusradiuscentered oncentral_atm_grp. Each tuple contains anInteractionTypeobject and one of theAtomGroupobjects participating to the interaction.Note
As an interaction involves two participants, it would be expected that each interaction produces two tuples. However, by default, ShellGenerator sorts atom groups and considers only the first tuple that appears, which guarantees that only one of the possible tuples is added to avoid information duplication.
diff_comp_classes (bool) – If True (the default), include differentiation between compound classes. That means structural information originated from
AtomGroupobjects belonging to residues, nucleotides, ligands, or water molecules will be considered different even if their structural information are the same. This is useful for example to differentiate protein-ligand interactions from residue-residue ones.dtype (data-type) – Use arrays of type
dtypeto store information. The default value is np.int64.seed (int) – A seed to generate shell identifiers through the MurmurHash3 hash function. The default value is 0.
manager (
ShellManager) – TheShellManagerobject that stores and controls thisShellobject.valid (bool) – If the shell is valid or not. By default, all shells are considered valid.
feature_mapper (dict, optional) – A dict that maps atoms and interactions to unique values. If not provided,
feature_mapperwill inherit from the default mappingsCHEMICAL_FEATURE_IDSandINTERACTION_IDS.
- Variables
- property encoded_data¶
The data encoded in this shell.
- Type
iterable of tuple, read-only
- hash_shell()[source]¶
Hash this shells’ substructural information into a 32-bit integer using MurmurHash3.
- Returns
A 32-bit integer representing this shell’s substructural information.
- Return type
- property identifier¶
This shell identifier, which is generated by hashing its encoded data with a hash function. By default, LUNA uses MurmurHash3 as the hash function.
- Type
int, read-only
- property inter_tuples¶
Each tuple contains an
InteractionTypeobject and one of theAtomGroupobjects participating to the interaction.- Type
iterable of tuple, read-only
- property interactions¶
All interactions within this shell.
- Type
iterable of
InteractionType, read-only
- is_similar(shell)[source]¶
If this shell is similar to
shell.Two shells are similar if they represent the same substructural information.
- property manager¶
The
ShellManagerobject that stores and controls thisShellobject.- Type
ShellManager, read-only
- class ShellGenerator(num_levels, radius_step, fp_length=4294967296, ifp_type=IFPType.EIFP, diff_comp_classes=True, dtype=<class 'numpy.int64'>, seed=0, bucket_size=10)[source]¶
Bases:
objectGenerate shells, the base information of LUNA fingerprints.
- Parameters
num_levels (int) – The maximum number of iterations for fingerprint generation.
radius_step (float) – The multiplier used to increase shell size at each iteration. At iteration 0, shell radius is 0 *
radius_step, at iteration 1, radius is 1 *radius_step, etc.fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).
ifp_type (
IFPType) – The fingerprint type (EIFP, FIFP, or HIFP). The default value is EIFP.diff_comp_classes (bool) – If True (the default), include differentiation between compound classes. That means structural information originated from
AtomGroupobjects belonging to residues, nucleotides, ligands, or water molecules will be considered different even if their structural information are the same. This is useful for example to differentiate protein-ligand interactions from residue-residue ones.dtype (data-type) – Use arrays of type
dtypeto store information. The default value is np.int64.seed (int) – A seed to generate shell identifiers through the MurmurHash3 hash function. The default value is 0.
bucket_size (int) – Bucket size of KD tree. You can play around with this to optimize speed if you feel like it. The default value is 10.
- Variables
Examples
In the below example, we will assume a LUNA project object named
proj_objalready exists.First, let’s define a
ShellGeneratorobject that will create shells over 2 iterations (levels). At each iteration, the shell radius will be increased by 3 and substructural information will be encoded following EIFP definition.>>> from luna.interaction.fp.shell import ShellGenerator >>> from luna.interaction.fp.type import IFPType >>> num_levels, radius_step = 2, 3 >>> sg = ShellGenerator(num_levels, radius_step, ifp_type=IFPType.EIFP)
After defining the generator, we can create shells by calling
create_shells(), which expects anAtomGroupsManagerobject. In this example, we will the firstAtomGroupsManagerobject from an existing LUNA project (proj_obj).>>> atm_grps_mngr = list(proj_obj.atm_grps_mngrs)[0] >>> sm = sg.create_shells(atm_grps_mngr) >>> print(sm.num_shells) 528
Now, with shells stored in the
ShellManagerobject you can, for instance:Generate fingerprints:
>>> fp = sm.to_fingerprint(fold_to_length=1024) >>> print(fp.indices) [ 2 19 22 23 34 37 39 45 54 67 71 75 83 84 93 109 138 140 157 162 181 182 186 187 191 194 206 209 211 237 246 251 263 271 281 296 304 315 323 358 370 374 388 392 399 400 419 439 476 481 487 509 519 527 532 578 587 592 604 605 629 635 645 661 668 698 711 713 732 736 740 753 764 795 813 815 820 824 825 831 836 855 873 882 911 926 967 975 976 984 990 996 1020]
Visualize substructural information in Pymol:
>>> from luna.interaction.fp.view import ShellViewer >>> shell_tuples = [(atm_grps_mngr.entry, sm.unique_shells, proj_obj.pdb_path)] >>> sv = ShellViewer() >>> sv.new_session(shell_tuples, "example.pse")
- create_shells(atm_grps_mngr)[source]¶
Perceive substructural information from
AtomGroupobjects and their interactions, and represent such information as shells.- Parameters
atm_grps_mngr (
AtomGroupsManager) – Container ofAtomGroupobjects and their interactions.- Return type
- Raises
ShellCenterNotFound – If it fails to recover a shell having a given center.
- class ShellManager(num_levels, radius_step, fp_length, ifp_type, shells=None, verbose=False)[source]¶
Bases:
objectStore and manage
Shellobjects.- Parameters
num_levels (int) – The maximum number of iterations for fingerprint generation.
radius_step (float) – The multiplier used to increase shell size at each iteration. At iteration 0, shell radius is 0 *
radius_step, at iteration 1, radius is 1 *radius_step, etc.fp_length (int) – The fingerprint length (total number of bits).
ifp_type (
IFPType) – The fingerprint type (EIFP, FIFP, or HIFP).shells (iterable of
Shell, optional) – An initial sequence ofShellobjects (fingerprint features).verbose (bool) – If True, warnings issued during the usage of this
ShellManagerwill be displayed. The default value is False.
- Variables
~ShellManager.num_levels (int) – The maximum number of iterations for fingerprint generation.
~ShellManager.radius_step (float) – The multiplier used to increase shell size at each iteration.
~ShellManager.fp_length (int) – The fingerprint length (total number of bits).
~ShellManager.ifp_type (
IFPType) – The fingerprint type (EIFP, FIFP, or HIFP).~ShellManager.shells (iterable of
Shell) – The sequence of shells (fingerprint features).~ShellManager.verbose (bool) – The verbosity state.
~ShellManager.version (str) – The LUNA’s version with which shells were generated.
~ShellManager.levels (dict of {int: list of
Shell}) –Register shells by level, where keys are levels and values are lists of
Shellobjects.Note
Levels are 0-indexed. So, the first level is 0, second is 1, etc. That means if
num_levelsis 5, the last level will be 4.~ShellManager.centers (dict of dict of {int:
Shell}) – Register shells by center, where keys areAtomGroupobjects and values are dict that store all shells generated for that center at each iteration (level).
- find_similar_shell(shell)[source]¶
Find a shell in
shellssimilar toshell.Two shells are similar if they represent the same substructural information.
- get_identifiers(level=None, unique_shells=False)[source]¶
Get all shells’ identifier.
- Parameters
level (int, optional) – If provided, only return identifiers of shells at level
level.unique_shells (bool) – If True, ignore identifiers of non-valid shells. The default value is False.
- Return type
list of int
- get_last_shell(center, unique_shells=False)[source]¶
Get the last shell generated for center
center.- Parameters
- Returns
The last shell generated for center
centeror None if no valid shell was found.- Return type
Shellor None
- get_previous_shell(center, curr_level, unique_shells=False)[source]¶
Get the last shell having center
centerthat was generated before levelcurr_level. For instance, if the current level (iteration) is 5 and the last valid shell generated for center \(C\) was at level 4, thenget_previous_shell()would return that shell at level 4.- Parameters
center (
AtomGroup) – The center of a shell, which consists of anAtomGroupobject.curr_level (int) – The current level (iteration).
unique_shells (bool) – If True, ignore non-valid shells and go down to inferior levels until a valid shell is found. If level 0 was reached and no valid shell was found, then return None. The default value is False.
- Returns
The first previous valid shell or None if no valid shell was found.
- Return type
Shellor None
- get_shell_by_center_and_level(center, level, unique_shells=False)[source]¶
Get the shell generated for center
center(AtomGroupobject) at level (iteration)level.
- get_shells_by_center(center, unique_shells=False)[source]¶
Get shells by center (
AtomGroupobject).- Parameters
- Returns
All shells generated for center
centerat each iteration (key).- Return type
dict of {int:
Shell}
- get_shells_by_identifier(identifier, unique_shells=False)[source]¶
Get shells by identifier.
- Parameters
identifier (int) – The shell identifier.
unique_shells (bool) – If True, return only unique shells. Otherwise, return all shells having the identifier
identifier(the default).
- Return type
list of
Shell
- get_shells_by_level(level, unique_shells=False)[source]¶
Get shells by level (iteration number).
- Parameters
level (int)
unique_shells (bool) – If True, return only unique shells. Otherwise, return all shells at level
level(the default).
- Return type
list of
Shell
- get_valid_shells()[source]¶
Return only valid shells.
A shell is considered invalid if, by the time it is added in
shells, there is another shell representing the same substructural information. That means this shell is not unique and does not contributes to any new information.On the other hand, if the shell contributes by adding new information to
shells, then it will be considered valid and unique. So, the first shell of a series of shells containing the same information is considered valid and the others invalid.- Return type
list of
Shell
- to_fingerprint(fold_to_length=None, count_fp=False, unique_shells=False)[source]¶
Encode shells into an interaction fingerprint.
- Parameters
fold_to_length (int, optional) – If provided, fold the fingerprint to length
fold_to_length.count_fp (bool) – If True, create a count fingerprint (
CountFingerprint). Otherwise, return a bit fingerprint (Fingerprint).unique_shells (bool) – If True, only unique shells are used to create the fingerprint. The default value is False.
- Return type
- trace_back_feature(feature_id, ifp, unique_shells=False)[source]¶
Trace a feature from a fingerprint back to the shells that originated that feature.
Note
Due to fingerprint folding, multiple substructures may end up encoded in the same bit, the so-called collision problem. So, if the provided feature contains collisions, shells representing different substructures may be returned by
trace_back_feature().- Parameters
feature_id (int) – The target feature id.
ifp (
Fingerprint) – The fingerprint containing the featurefeature_id.unique_shells (bool) – If True, ignore identifiers of non-valid shells. The default value is False.
- Yields
Examples
In the below example, we will assume a LUNA project object named
proj_objalready exists. Then, we will generate an EIFP fingerprint for the firstAtomGroupsManagerobject atproj_obj.>>> from luna.interaction.fp.shell import ShellGenerator >>> from luna.interaction.fp.type import IFPType >>> atm_grps_mngr = list(proj_obj.atm_grps_mngrs)[0] >>> num_levels, radius_step = 2, 3 >>> sg = ShellGenerator(num_levels, radius_step, ifp_type=IFPType.EIFP) >>> sm = sg.create_shells(atm_grps_mngr) >>> fp = sm.to_fingerprint(fold_to_length=1024, count_fp=True) >>> print(fp.indices) [ 2 19 22 23 34 37 39 45 54 67 71 75 83 84 93 109 138 140 157 162 181 182 186 187 191 194 206 209 211 237 246 251 263 271 281 296 304 315 323 358 370 374 388 392 399 400 419 439 476 481 487 509 519 527 532 578 587 592 604 605 629 635 645 661 668 698 711 713 732 736 740 753 764 795 813 815 820 824 825 831 836 855 873 882 911 926 967 975 976 984 990 996 1020]
Now, we can trace features back to original identifiers and investigate its substructural information.
>>> ori_indices = list(sm.trace_back_feature(34, fp, unique_shells=True)) >>> print(ori_indices) [(494318626, [<Shell: level=0, radius=0.000000, center=<AtomGroup: [<ExtendedAtom: 3QQK/0/A/GLN/85/CD>, <ExtendedAtom: 3QQK/0/A/GLN/85/NE2>, <ExtendedAtom: 3QQK/0/A/GLN/85/OE1>]>, interactions=0>])]
- property unique_shells¶
Unique shells. Return the same as
get_valid_shells().- Type
iterable of
Shell, read-only