luna.interaction.fp.shell module¶
- class CompoundClassIds(value)[source]¶
Bases:
enum.Enum
An enumeration of compound classes.
- HETATM = 1¶
- NUCLEOTIDE = 3¶
- RESIDUE = 2¶
- UNKNOWN = 5¶
- WATER = 4¶
- class Shell(central_atm_grp, level, radius, neighborhood=None, inter_tuples=None, diff_comp_classes=True, dtype=<class 'numpy.int64'>, seed=0, manager=None, valid=True, feature_mapper=None)[source]¶
Bases:
object
A container to store substructural information, which is the base for LUNA fingerprints.
Shells are centered on an atom or atom group (
AtomGroup
objects) and represent all atoms and interactions explicitly within it.- Parameters
central_atm_grp (
AtomGroup
) – The shell center.level (int) – The level (iteration) at which the shell was generated.
radius (float) – The shell radius.
neighborhood (iterable of
AtomGroup
) – All atoms and atom groups within a shell of radiusradius
centered oncentral_atm_grp
.inter_tuples (iterable of (
InteractionType
,AtomGroup
)) – All interactions within a shell of radiusradius
centered oncentral_atm_grp
. Each tuple contains anInteractionType
object and one of theAtomGroup
objects participating to the interaction.Note
As an interaction involves two participants, it would be expected that each interaction produces two tuples. However, by default, ShellGenerator sorts atom groups and considers only the first tuple that appears, which guarantees that only one of the possible tuples is added to avoid information duplication.
diff_comp_classes (bool) – If True (the default), include differentiation between compound classes. That means structural information originated from
AtomGroup
objects belonging to residues, nucleotides, ligands, or water molecules will be considered different even if their structural information are the same. This is useful for example to differentiate protein-ligand interactions from residue-residue ones.dtype (data-type) – Use arrays of type
dtype
to store information. The default value is np.int64.seed (int) – A seed to generate shell identifiers through the MurmurHash3 hash function. The default value is 0.
manager (
ShellManager
) – TheShellManager
object that stores and controls thisShell
object.valid (bool) – If the shell is valid or not. By default, all shells are considered valid.
feature_mapper (dict, optional) – A dict that maps atoms and interactions to unique values. If not provided,
feature_mapper
will inherit from the default mappingsCHEMICAL_FEATURE_IDS
andINTERACTION_IDS
.
- Variables
- property encoded_data¶
The data encoded in this shell.
- Type
iterable of tuple, read-only
- hash_shell()[source]¶
Hash this shells’ substructural information into a 32-bit integer using MurmurHash3.
- Returns
A 32-bit integer representing this shell’s substructural information.
- Return type
- property identifier¶
This shell identifier, which is generated by hashing its encoded data with a hash function. By default, LUNA uses MurmurHash3 as the hash function.
- Type
int, read-only
- property inter_tuples¶
Each tuple contains an
InteractionType
object and one of theAtomGroup
objects participating to the interaction.- Type
iterable of tuple, read-only
- property interactions¶
All interactions within this shell.
- Type
iterable of
InteractionType
, read-only
- is_similar(shell)[source]¶
If this shell is similar to
shell
.Two shells are similar if they represent the same substructural information.
- property manager¶
The
ShellManager
object that stores and controls thisShell
object.- Type
ShellManager
, read-only
- class ShellGenerator(num_levels, radius_step, fp_length=4294967296, ifp_type=IFPType.EIFP, diff_comp_classes=True, dtype=<class 'numpy.int64'>, seed=0, bucket_size=10)[source]¶
Bases:
object
Generate shells, the base information of LUNA fingerprints.
- Parameters
num_levels (int) – The maximum number of iterations for fingerprint generation.
radius_step (float) – The multiplier used to increase shell size at each iteration. At iteration 0, shell radius is 0 *
radius_step
, at iteration 1, radius is 1 *radius_step
, etc.fp_length (int) – The fingerprint length (total number of bits). The default value is \(2^{32}\).
ifp_type (
IFPType
) – The fingerprint type (EIFP, FIFP, or HIFP). The default value is EIFP.diff_comp_classes (bool) – If True (the default), include differentiation between compound classes. That means structural information originated from
AtomGroup
objects belonging to residues, nucleotides, ligands, or water molecules will be considered different even if their structural information are the same. This is useful for example to differentiate protein-ligand interactions from residue-residue ones.dtype (data-type) – Use arrays of type
dtype
to store information. The default value is np.int64.seed (int) – A seed to generate shell identifiers through the MurmurHash3 hash function. The default value is 0.
bucket_size (int) – Bucket size of KD tree. You can play around with this to optimize speed if you feel like it. The default value is 10.
- Variables
Examples
In the below example, we will assume a LUNA project object named
proj_obj
already exists.First, let’s define a
ShellGenerator
object that will create shells over 2 iterations (levels). At each iteration, the shell radius will be increased by 3 and substructural information will be encoded following EIFP definition.>>> from luna.interaction.fp.shell import ShellGenerator >>> from luna.interaction.fp.type import IFPType >>> num_levels, radius_step = 2, 3 >>> sg = ShellGenerator(num_levels, radius_step, ifp_type=IFPType.EIFP)
After defining the generator, we can create shells by calling
create_shells()
, which expects anAtomGroupsManager
object. In this example, we will the firstAtomGroupsManager
object from an existing LUNA project (proj_obj
).>>> atm_grps_mngr = list(proj_obj.atm_grps_mngrs)[0] >>> sm = sg.create_shells(atm_grps_mngr) >>> print(sm.num_shells) 528
Now, with shells stored in the
ShellManager
object you can, for instance:Generate fingerprints:
>>> fp = sm.to_fingerprint(fold_to_length=1024) >>> print(fp.indices) [ 2 19 22 23 34 37 39 45 54 67 71 75 83 84 93 109 138 140 157 162 181 182 186 187 191 194 206 209 211 237 246 251 263 271 281 296 304 315 323 358 370 374 388 392 399 400 419 439 476 481 487 509 519 527 532 578 587 592 604 605 629 635 645 661 668 698 711 713 732 736 740 753 764 795 813 815 820 824 825 831 836 855 873 882 911 926 967 975 976 984 990 996 1020]
Visualize substructural information in Pymol:
>>> from luna.interaction.fp.view import ShellViewer >>> shell_tuples = [(atm_grps_mngr.entry, sm.unique_shells, proj_obj.pdb_path)] >>> sv = ShellViewer() >>> sv.new_session(shell_tuples, "example.pse")
- create_shells(atm_grps_mngr)[source]¶
Perceive substructural information from
AtomGroup
objects and their interactions, and represent such information as shells.- Parameters
atm_grps_mngr (
AtomGroupsManager
) – Container ofAtomGroup
objects and their interactions.- Return type
- Raises
ShellCenterNotFound – If it fails to recover a shell having a given center.
- class ShellManager(num_levels, radius_step, fp_length, ifp_type, shells=None, verbose=False)[source]¶
Bases:
object
Store and manage
Shell
objects.- Parameters
num_levels (int) – The maximum number of iterations for fingerprint generation.
radius_step (float) – The multiplier used to increase shell size at each iteration. At iteration 0, shell radius is 0 *
radius_step
, at iteration 1, radius is 1 *radius_step
, etc.fp_length (int) – The fingerprint length (total number of bits).
ifp_type (
IFPType
) – The fingerprint type (EIFP, FIFP, or HIFP).shells (iterable of
Shell
, optional) – An initial sequence ofShell
objects (fingerprint features).verbose (bool) – If True, warnings issued during the usage of this
ShellManager
will be displayed. The default value is False.
- Variables
~ShellManager.num_levels (int) – The maximum number of iterations for fingerprint generation.
~ShellManager.radius_step (float) – The multiplier used to increase shell size at each iteration.
~ShellManager.fp_length (int) – The fingerprint length (total number of bits).
~ShellManager.ifp_type (
IFPType
) – The fingerprint type (EIFP, FIFP, or HIFP).~ShellManager.shells (iterable of
Shell
) – The sequence of shells (fingerprint features).~ShellManager.verbose (bool) – The verbosity state.
~ShellManager.version (str) – The LUNA’s version with which shells were generated.
~ShellManager.levels (dict of {int: list of
Shell
}) –Register shells by level, where keys are levels and values are lists of
Shell
objects.Note
Levels are 0-indexed. So, the first level is 0, second is 1, etc. That means if
num_levels
is 5, the last level will be 4.~ShellManager.centers (dict of dict of {int:
Shell
}) – Register shells by center, where keys areAtomGroup
objects and values are dict that store all shells generated for that center at each iteration (level).
- find_similar_shell(shell)[source]¶
Find a shell in
shells
similar toshell
.Two shells are similar if they represent the same substructural information.
- get_identifiers(level=None, unique_shells=False)[source]¶
Get all shells’ identifier.
- Parameters
level (int, optional) – If provided, only return identifiers of shells at level
level
.unique_shells (bool) – If True, ignore identifiers of non-valid shells. The default value is False.
- Return type
list of int
- get_last_shell(center, unique_shells=False)[source]¶
Get the last shell generated for center
center
.- Parameters
- Returns
The last shell generated for center
center
or None if no valid shell was found.- Return type
Shell
or None
- get_previous_shell(center, curr_level, unique_shells=False)[source]¶
Get the last shell having center
center
that was generated before levelcurr_level
. For instance, if the current level (iteration) is 5 and the last valid shell generated for center \(C\) was at level 4, thenget_previous_shell()
would return that shell at level 4.- Parameters
center (
AtomGroup
) – The center of a shell, which consists of anAtomGroup
object.curr_level (int) – The current level (iteration).
unique_shells (bool) – If True, ignore non-valid shells and go down to inferior levels until a valid shell is found. If level 0 was reached and no valid shell was found, then return None. The default value is False.
- Returns
The first previous valid shell or None if no valid shell was found.
- Return type
Shell
or None
- get_shell_by_center_and_level(center, level, unique_shells=False)[source]¶
Get the shell generated for center
center
(AtomGroup
object) at level (iteration)level
.
- get_shells_by_center(center, unique_shells=False)[source]¶
Get shells by center (
AtomGroup
object).- Parameters
- Returns
All shells generated for center
center
at each iteration (key).- Return type
dict of {int:
Shell
}
- get_shells_by_identifier(identifier, unique_shells=False)[source]¶
Get shells by identifier.
- Parameters
identifier (int) – The shell identifier.
unique_shells (bool) – If True, return only unique shells. Otherwise, return all shells having the identifier
identifier
(the default).
- Return type
list of
Shell
- get_shells_by_level(level, unique_shells=False)[source]¶
Get shells by level (iteration number).
- Parameters
level (int)
unique_shells (bool) – If True, return only unique shells. Otherwise, return all shells at level
level
(the default).
- Return type
list of
Shell
- get_valid_shells()[source]¶
Return only valid shells.
A shell is considered invalid if, by the time it is added in
shells
, there is another shell representing the same substructural information. That means this shell is not unique and does not contributes to any new information.On the other hand, if the shell contributes by adding new information to
shells
, then it will be considered valid and unique. So, the first shell of a series of shells containing the same information is considered valid and the others invalid.- Return type
list of
Shell
- to_fingerprint(fold_to_length=None, count_fp=False, unique_shells=False)[source]¶
Encode shells into an interaction fingerprint.
- Parameters
fold_to_length (int, optional) – If provided, fold the fingerprint to length
fold_to_length
.count_fp (bool) – If True, create a count fingerprint (
CountFingerprint
). Otherwise, return a bit fingerprint (Fingerprint
).unique_shells (bool) – If True, only unique shells are used to create the fingerprint. The default value is False.
- Return type
- trace_back_feature(feature_id, ifp, unique_shells=False)[source]¶
Trace a feature from a fingerprint back to the shells that originated that feature.
Note
Due to fingerprint folding, multiple substructures may end up encoded in the same bit, the so-called collision problem. So, if the provided feature contains collisions, shells representing different substructures may be returned by
trace_back_feature()
.- Parameters
feature_id (int) – The target feature id.
ifp (
Fingerprint
) – The fingerprint containing the featurefeature_id
.unique_shells (bool) – If True, ignore identifiers of non-valid shells. The default value is False.
- Yields
Examples
In the below example, we will assume a LUNA project object named
proj_obj
already exists. Then, we will generate an EIFP fingerprint for the firstAtomGroupsManager
object atproj_obj
.>>> from luna.interaction.fp.shell import ShellGenerator >>> from luna.interaction.fp.type import IFPType >>> atm_grps_mngr = list(proj_obj.atm_grps_mngrs)[0] >>> num_levels, radius_step = 2, 3 >>> sg = ShellGenerator(num_levels, radius_step, ifp_type=IFPType.EIFP) >>> sm = sg.create_shells(atm_grps_mngr) >>> fp = sm.to_fingerprint(fold_to_length=1024, count_fp=True) >>> print(fp.indices) [ 2 19 22 23 34 37 39 45 54 67 71 75 83 84 93 109 138 140 157 162 181 182 186 187 191 194 206 209 211 237 246 251 263 271 281 296 304 315 323 358 370 374 388 392 399 400 419 439 476 481 487 509 519 527 532 578 587 592 604 605 629 635 645 661 668 698 711 713 732 736 740 753 764 795 813 815 820 824 825 831 836 855 873 882 911 926 967 975 976 984 990 996 1020]
Now, we can trace features back to original identifiers and investigate its substructural information.
>>> ori_indices = list(sm.trace_back_feature(34, fp, unique_shells=True)) >>> print(ori_indices) [(494318626, [<Shell: level=0, radius=0.000000, center=<AtomGroup: [<ExtendedAtom: 3QQK/0/A/GLN/85/CD>, <ExtendedAtom: 3QQK/0/A/GLN/85/NE2>, <ExtendedAtom: 3QQK/0/A/GLN/85/OE1>]>, interactions=0>])]
- property unique_shells¶
Unique shells. Return the same as
get_valid_shells()
.- Type
iterable of
Shell
, read-only