mdigest.core package
Subpackages
Submodules
mdigest.core.analysis module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.analysis.MDS_analysis
Bases:
object
Basic molecular dynamics analysis
- align_traj(inmem=True, reference=None, selection='protein')
- calc_NH_order_params()
This function calculates the S2 amide order parameters according to [J. Am. Chem. Soc. 1998, 120, 5301-5311]. Expected input: - set
self.nh_selections
before calling this function- Returns:
self.order_parameters (pd.DataFrame)
TODO (This function will eventually need some kwargs to specify the N-atom/NH-atom selections.)
- calc_chi1()
Calculates the Chi 1 Angles for an MDAnalysis.Universe. Uses the following attributes: - self.mda_u: The MDAnalysis.Universe - self.chi1_selection: a string used to select the subset of the universe for which you want to calculate the Chi 1 Angles.
- Returns:
self.chi1_angles
- Return type:
an array representing the Chi 1 Angles of your selection at each frame of the trajectory.
- calc_rms_quant(sel_str, **kwargs)
Calculate RMS Quantities (i.e., RMSD, RMSF).
- Parameters:
sel_str (str or int,) – string/frame_index describing the reference with which to calculate the RMS quantity
kwargs (dict) –
- example:
kwargs={'Initial': 0, 'Average': 'average', 'Final': -1}
- Returns:
self.rms_data = {'RMSD': {selection_title: RMSD_Value}, 'RMSF': {selection_title: RMSF_Value}}
, where RMSD_Value/RMSF_Value are np.arrays containing the computed quantities.- Return type:
self.rms_data –> A dict with the following structure
- compute_radius_of_gyration(**kwargs)
Calculate the radius of gyration given an MDAnalysis.Universe and a dictionary of atom selections. This dictionary should take the following form:
- Parameters:
kwargs (dict,) –
selection_id: string to describe the selection
selection_str: string used for creating selection with MDA.Universe
- do_dihedral_calcs(save_data=None)
Calculate Phi, Psi Angles from input:
- Parameters:
save_data (dict or None,) – use for saving the output. Requires dict containing the keys: - “Directory”: The desired directory to save the output to - “Output Descriptor”: simple string to identify the system uniquely (for saving output)
- do_ss_calculation(simple=True)
Calculation of SS propensities using an MDTraj trajectory object. The simple option is a Boolean flag that allows for the choice of simple SS definitions (Helix, Strand, Coil) or the more descriptive SS definitions.
- Parameters:
simple (bool) – If true, only ‘H’, ‘E’ and ‘C’ secondary structure assignments are used.
- Returns:
`self.ss_stats` – SS assignments at each frame as well as the SS propensities computed over the course of the trajectory (i.e., % Helix, % Strand, % Coil).
- Return type:
pd.DataFrame,
- get_universe()
- load_class_from_file(file_name_root)
- load_system(topology, traj_files, inmem=True)
- save_class(file_name_root)
- set_NH_selections(amide_string_list)
- Expected input:
- amide_string_list: a list of strings used to make the atom selection for your
backbone amide N atoms and the backbone amide H atoms
- set_node_type_sele(node_type_sele)
- set_num_replicas(num_replicas)
- set_selection(atm_str_sel, sys_str_sel)
- stride_trajectory(initial=0, final=-1, step=1)
Stride trajectory
- Parameters:
initial (int,) –
final (int,) –
step (int) –
MDS (# TODO redundant with) –
object (eventually remove and inherit mdigest.MDS) –
mdigest.core.auxiliary module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- mdigest.core.auxiliary.compute_DCC(values_, features_dimension, normalized=True)
Compute normalized dynamical pairwise cross-correlation coefficients for each given replica
- Parameters:
values (np.ndarray, shape (nframes, natoms*features_dimensions)) – coordinates array
features_dimension (int,) – dimension of features array
normalized (bool,) – whether to normalize cross-correlation matrix; default is True
- Returns:
cross_correlation – cross-correlation matrix
- Return type:
np.ndarray, shape (natoms, natoms)
- mdigest.core.auxiliary.compute_DCC_matrix(values, values_avg, features_dimension)
Compute dynamical cross-correlation matrix (upper triangle)
[***]
- Parameters:
values (np.ndarray, shape (nsamples, nfeature, features_dimension)) – time-series values
values_avg (np.ndarray, shape (nfeature, features_dimension)) – mean time-series values (values averaged over time)
features_dimension (int,) – dimensionality of features array
- Returns:
cross_correlation – dynamical cross correlation matrix
- Return type:
np.ndarray, shape (nfeatures, nfeatures)
See also
[***]
function adapted from https://github.com/tekpinar/correlationplus/blob/master/correlationplus/calculate.py
- mdigest.core.auxiliary.compute_distance_matrix(values, features_dimensions)
Compute distance matrix using the average position over the entire lentgh of the given trajectory
- Parameters:
values (np.ndarray,) – array of values of shape (nframes, nfeatures * features_dimensions)
features_dimensions (int,) – dimensionality of features (3 if values contains xyz coordinates)
- Returns:
dist_matrix – distance matrix
- Return type:
np.ndarray, shape (nfeatures, nfeatures)
- mdigest.core.auxiliary.compute_eigenvector_centrality(mat, loc_factor=None, distmat=None, weight='weight')
Compute eigenvector centrality based on a given correlation matrix and pairwise distance matrix
- Parameters:
mat (np.ndarray,) – adiacency (correlation matrix) to diagonalize
loc_factor (float,) – locality factor (filtering threshold)
distmat (np.ndarray,) – distance matrix
weight (str or None,) – if None treat adjaccency as binary, if ‘weight’ use values weights.
- Returns:
cdict (dict,) – centrality dictionary with nodes as keys and centrality coefficients as values
cvec (np.ndarray,) – eigenvector centrality coefficients
- mdigest.core.auxiliary.compute_generalized_correlation_coefficients(values, features_dimension=1, solver='gaussian', correction=True, subset=None)
Rescale mutual information based correlation (gcc) between 0 and 1
- Parameters:
values (np.ndarray, shape (nsamples, nfeatures, features_dimension)) – time-series values
features_dimension (int,) – dimensionality of features array
solver (string,) – which solver to use, default ‘gaussian’
correction (bool,) – whether to apply correction (subtraction of min value of mutual information)
subset (np.array or None,) – indices of features for which to calculate gcc, if None, all features are kept
- Returns:
gcc – generalized correlation coefficient matrix
- Return type:
np.ndarray,
- mdigest.core.auxiliary.coordinate_reshape(values)
- Reshape values to shape (nframes, nfeatures*features_dimensions) where
nframes
is the number of frames of the MD timeseries considered,nfeatures
is the number of features, i.e. number of atoms for which values are stored,features_dimensions
is the dimensionality of the features (i.e. 3 for xyz-coordinates, 4 for [sin(phi), cos(phi), sin(psi), cos(psi)] dihedral coordinate vector).
- Parameters:
values (np.ndarray, shape (nframes, nfeatures, features_dimensions)) – values
- Returns:
values_ – reshaped values
- Return type:
np.ndarray, shape (nframes, nfeatures * features_dimensions)
- mdigest.core.auxiliary.corr(mat)
Compute pearson correlation
- Parameters:
mat (np.ndarray,) – is an input matrix of shape (nsamples, nfeatures)
- Returns:
corrmat – pearson correlation matrix
- Return type:
np.ndarray, shape (nfeatures, nfeatures)
- mdigest.core.auxiliary.create_graph(adj_matx)
Create nx.Graph instance from a np.ndarray adjacency_matrix
- mdigest.core.auxiliary.evaluate_covariance_matrix(values, center='square_disp')
Compute the covariance matrix of features values.
- Parameters:
values (np.ndarray, shape (nframes, nfeatures, features_dimensions)) – matrix of displacements from mean values
center (str,) – -if ‘mean’ remove mean -if ‘square_disp’ compute square displacements of values array (default option)
- Returns:
covar_mat – covariance matrix
- Return type:
np.ndarray, shape (nfeatures, nfeatures)
- mdigest.core.auxiliary.filter_adjacency(mat, distmat, loc_factor)
Filter the adjacency matrix using exponential to dump long-range contacts (and hence focus on short-range ones)
- Parameters:
mat (np.ndarray,) – matrix
distmat (np.ndarray,) – matrix based on which to prune mat
loc_factor (float,) – locality factor (in Amstrongs), suggested 5 Å
- Returns:
adj_matx – filtered matrix
- Return type:
np.ndarray,
- mdigest.core.auxiliary.filter_adjacency_inv(mat, distmat, loc_factor)
Filter the adjacency matrix using exponential to dump short-range contacts (and hence focus on long-range ones)
- Parameters:
mat (np.ndarray,) – matrix
distmat (np.ndarray,) – matrix based on which to prune mat
loc_factor (float,) – locality factor (in Amstrongs), suggested 5 Å
- Returns:
adj_matx – filtered matrix
- Return type:
np.ndarray,
- mdigest.core.auxiliary.get_centrality_df(input_class, cent_don, cent_acc, cent, selection)
Create dataframe containing eigenvector centrality from KS energy and from alpha carbon displacements
- Parameters:
input_class (object,) –
cent_don (np.ndarray,) –
cent_acc (np.ndarray,) –
cent (np.ndarray,) –
selection (str,) –
- Returns:
temp_df – data frame containing electrostatic-centrality and CA-centrality
- Return type:
pd.DataFrame()
- mdigest.core.auxiliary.linfit(x, y)
Return R^2 where x and y are array-like
- mdigest.core.auxiliary.prune_adjacency(mat, distmat, loc_factor=5.0, greater=False, lower=False)
Zero out adjacency matrics value according to the locality factor. By defalult entries corresponding to atom pairs at a distance equal or greater than the locality_factor (loc_factor) are set to zero.
- Parameters:
mat (np.ndarray) – adjacency matrix
distmat (np.ndarray) – distance matrix
loc_factor (float x) – locality_factor, defines the distance threshold to use for pruning
greater (bool) – whether to prune residues pairs at distances GREATER than the locality factor, default is False
lower (bool) – whether to prune residue pairs at distances LOWER than the locality factor, default is False
- Returns:
mat – pruned ajacency matrix
- Return type:
np.ndarray,
- mdigest.core.auxiliary.reduce_trajectory(universe, segIDs)
It can be useful to work on a reduced trajectory without hydrogens, especially when visualizing stuff. This function can be used to generate a reduced MDAnalysis universe.
- Parameters:
universe (MDAnalysis.core.Universe object) –
segIDs (str,) – segIDs of groups of atoms to keep
- Returns:
reduced – reduced universe
- Return type:
MDAnalysis.core.Universe object
- mdigest.core.auxiliary.sorted_eig(A)
Sort eigenvalues from larges to smallest and reorder eigenvectors accordingly
- Parameters:
A (array, shape (nfeatures, nfeatures)) – array of eigenvalues
- Returns:
eigenValues (np.ndarray, shape (nfeatures))
eigenVectors (np.ndarray, shape (nfeatures,nfeatures))
- mdigest.core.auxiliary.to_pickle(dataframe, output)
Dump dataframe to file using pickle
- Parameters:
dataframe (pd.DataFrame,) – dataframe to pickle
output (str,) – output name with path
- mdigest.core.auxiliary.vec_query(arr, my_dict)
Convert dictionary to array
mdigest.core.correlation module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.correlation.DynCorr(MDSIM)
Bases:
object
General purpose class handling computation of different correlation metrics from atomic displacements sampled over MD trajectories.
- edge_exclusion(spatial_cutoff=4.5, contact_cutoff=0.75, save_name='none')
Compute the so-called edge exclusion matrix. When analyzing correlations it can be useful to analyze only correlation corresponding to residue pairs that are proximal (below a certain distance threshold) for a certain percentage of the trajectory frames. This function takes care of computing this using capped_distance from MDAnalysis.
- Parameters:
spatial_cutoff (float,) – distance threshold to define atoms in contact (in Amstrong)
contact_cutoff (float,) – contact persistency above which to consider pair expressed in percentage of frames
save_name – output filename
- parse_dynamics(scale=False, normalize=True, LMI='gaussian', MI='knn_5_1', DCC=False, PCC=False, COV_DISP=False, VERBOSE=False, **kwargs)
Parse molecular dynamics trajectory and compute different correlation metrices
- Parameters:
scale (bool,) – whether to remove mean from coordinates using StandardScaler
normalize (bool,) – whether to normalize cross-correlation matrices
LMI (str or None,) –
None to skip computation of LMI based correlation
’gaussian’ to compute LMI
MI (str,) – -None to skip computation of MI based correlation -‘knn_arg1_arg2’ to compute MI, with k = arg1, and estimator= arg2, default is ‘knn_5_1’
DCC (bool,) – whether to compute dynamical cross-correlation matrix of atomic displacements. Default is False
PCC (bool,) – whether to compute Pearson correlation matrix of atomic displacements. Default is False
COV_DISP (bool,) – whether to compute the covariance of atomic displacements. Default is False
VERBOSE (bool,) – whether to set verbose printing
- save_class(file_name_root='./output/cache/')
can be used to dump all correlation analyses (produced upon calling the correlation, kscorrelation, and dcorrelation modules) or the community analysis.
- Parameters:
file_name_root (str) – filename rootname
mdigest.core.dcorrelation module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.dcorrelation.DihDynCorr(MDSIM)
Bases:
object
Correlated motions of dihedrals
- parse_dih_dynamics(mean_center=True, LMI='gaussian', MI='knn_5_1', DCC=False, PCC=False, COV_DISP=True, **kwargs)
General purpose class handling computation of different correlation metrics from $phi$, $psi$ backbone dihedrals fluctuations sampled over MD trajectories. Diedrals are transformed using $phi$ –> {$sin(phi)$, $cos(phi)$} and $psi$ –> {$sin(psi)$, $cos(psi)$} such that each residue (temimal residues excluded) is described by an array of four entries [$sin(phi)$, $cos(phi)$, $sin(psi)$, $cos(psi)$].
- Parameters:
mean_center (bool) – wheter to subtract mean
LMI (str or None; default 'gaussian') –
‘gaussian’ for using gaussian estimator
None: skip computation of linearized mutual information based correlation
MI (str or None, default 'knn_5_1') – composite argument where knn specifiess use of k-nearest neighbor algorithm, 5 specifies number of nearest neighbours, 1 specifies estimate to use (options are 1 or 2)
DCC (bool,) – whether to compute dynamical cross correltaion
PCC (bool,) – whether to compute Pearson’s cross correlation
COV_DISP (bool,) – whether to compute covariance of dihedrals displacements
kwargs –
- normalized: bool
whether to normalize DCC matrix
- subset: list,
list of indices specifying the nodes for which to compute MI
- center: str or None
How to compute the covariance matrix; possible values are ‘mean’ or ‘square_disp’
- save_class(file_name_root='./output/cache/')
Save DihDynCorr class instances to file
- file_name_root: srt
path where to save class
mdigest.core.imports module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
mdigest.core.kscorrelation module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.kscorrelation.KS_Box(KS, attrlist)
Bases:
object
Use as collector to free up memory
- class mdigest.core.kscorrelation.KS_Energy(MDSIM)
Bases:
object
General purpose class handling computation Kabsch-Sander Analysis over MD trajectories.
- KS_pipeline(topology_charges=False, covariance=False, MI=None, **kwargs)
KS Pipeline
- Parameters:
topology_charges (bool,) – whether to use topology charges in KS calculation
covariance (bool,) – whether to compute covariance of KS_energies
MI (str or None,) –
if None skip computation of MI based correlation
if ‘knn_arg1_arg2’ compute MI using k=arg1, and estimator=arg2; default is ‘knn_5_1’
- compute_EEC(distance_matrix=None, loc_factor=None, don_acc=True)
Compute Electrostatic Eigenvector Centrality (EEC), donor/acceptor/don_acc centrality for each replica
- Parameters:
distance_matrix (default None,) – provide distance matrix when loc_factor != 0 to zero out values adiacency matrix values corresponding to distances exceeding loc_factor
loc_factor (float,) – filtering threshold for selection of specific correlation range
don_acc (bool,) – whether to compute DA (donor_acceptor) and D+A (donor+acceptor) centralities
- compute_KS_energy(dist_dict, topology_charges=False)
Perform KS calculation
- Parameters:
dist_dict (dict,) – single dictionary containing residue-to-residues backbone atom distances for a given replica
topology_charges (bool,) – if True,
self.q1q2
is expected to be filled with charges array
- Returns:
KS_energies – KS_energies
- Return type:
np.ndarray,
- compute_distances_parallel(beg, end, stride, remap=False)
Compute distances in parallel
- Parameters:
beg (int,) – initial frame
end (int,) – end frame
stride (int,) – step
remap (bool,) – if True assumes remapping is unneeded (all distance matrices have the same dimension)
- Returns:
bb_dist_dict – dictionary with CN, CH, OH, ON as keys and np.ndarrays with the corresponding distance arrays as values
- Return type:
dict
- prepare_kabsch_sander_arrays()
Prepare Kabsch-Sanders calculation
- save_class(file_name_root='./output/cache/', save_space=False)
Save MDS class instances to file
- Parameters:
file_name_root (str,) – path where to save class
save_space (bool,) – if False bb_distances_allrep and KS_energies_allrep are not dumped to file
- set_backbone_dictionary(backbone_dictionary)
Set backbone dictionary
- Parameters:
backbone_dictionary (dict,) – backbone dictionary specifying the atom name of each backbone atom. Names should match those in the topology files
Examples
KS_energy.set_backbone_dictionary({'N-Backbone':'N', 'O-Backbone':'O','C-Backbone':'C', 'CA-Backbone':'CA', 'H-Backbone':'H'})
- set_charges_array(chargeOtimeschargeN)
Set charges array
- Parameters:
chargeOtimeschargeN (np.ndarray,) – array of dimensions (nresidues, nresidues) entries (q1q2$_(ij)$)) are the products of the i-th CO and j-th NH residue charges extracted from the topology
- set_offset(offset)
set offset
- Parameters:
offset (int,) – set offset when the residue indexes in the topology start at number other than 0 integer that should be subtracted to the first residue index have the resindices list start from 0
- set_selection(atom_group_selstr, system_selstr='all')
Set selection strings
- Parameters:
system_selstr (str,) – selection string to be used for extracting a subset of the atoms (system) on which to perform analysis
atom_group_selstr (str,) –
- selection string to be used for selecting a subset of atoms from the system
atom_str_sel: a list of four selection strings containing in order the N, O, C, H backbone selection strings, respectively.)
Examples
KS_Energy.set_selection(['protein and backbone and name N','protein and backbone and name O', 'protein and backbone and name C','protein and name H'], system_selstr='protein')
mdigest.core.networkcanvas module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.networkcanvas.ProcCorr
Bases:
object
Process correlation matrix to make the desired arrays amenable for visualization.
- corrNetworkPymol(pdb_file, corr_matrix, pml_out_file, frame=0, selection=None, lthr_filter=None, uthr_filter=None, edge_scaling=1, chainblocks=True, **kwargs)
A useful practice is to inspect the shape of the correlation networks. This function provides a way to visualize the correlation patterns on the protein structure. The function saves a pml file that contains the user selected correlation values which can be executed in pymol to produce a png of the correlation
[***]
- Parameters:
pdb_file (str,) – filename (path+filename+extension) of the PDB that will then be read in pymol. If the file is not found, the utils module is called to write a pdb file at the path specified by pdb_file. It is crucial that this pdb file corresponds to the trajectory frame used for reading the coordinates (specified by the variable
frame
)frame (int,) – frame number
corr_matrix (np.ndarray square matrix of floats,) – correlation matrix
pml_out_file (str,) – output pml rootname
selection (None or MDAnalysis AtomGroup object,) –
if None: selection is overwritten with self.atom_group_selstr
else takes in MDAnalysis AtomGroup.
lthr_filter (float,) – lthr_filter and uthr_filter can be used to visualize correlations within an interval use lthr_filter to filter out correlation values below this value. Only correlation values greater than lthr_filter will be written to PML file for visualization
uthr_filter (float,) – lthr_filter and uthr_filter can be used to visualize correlations within an interval use uthr_filter to filter out correlation values above this value. Only correlation values equal or lower than uthr_filter will be written to PML file for visualization
edge_scaling (float,) – adjust radius of cylinders to be displayed in pymol.
edge_scaling – multiplicative factor which can be used to scale the correlation values. Recommended values are between 0.01-2.00.
chainblocks (bool,) – If True and universe contains multiple chains, separate file for inter and intra-chain correlations are printed out.
See also
[***]
function adapted from https://github.com/tekpinar/correlationplus/blob/master/correlationplus/
- filter_by_distance(matrixtype, distmat=False)
Zero out correlations lower than self.lower_str and higher than self.upper_str in a given correlation matrix. If residues are closer to each other than a certain distance (self.lower_thr), make these correlations zero. If residues are farther to each other than a certain distance (self.upper_thr), make these correlations also zero. This filter can be applied to select correlations falling within distance window of interest.
- Parameters:
matrixtype (str,) – used to select a desired correlation matrix and also used as prefix for the output files.
distmat (bool,) – default is False, which results in pruning based on correlation values. Upper and lower thresholds for pruning are set by call to set_thresholds()
- get_selection_fromMDS(MDS)
Retrieve atomstring selection
- Parameters:
MDS (mdigest.MDS object) –
- load_matrix_dictionary(matrix_dictionary)
Populate matrix_dictionary attribute with kwargs
- Parameters:
matrix_dictionary (dict,) – dictionary with format of {‘matrix_label’: np.ndarray} containing correlation matrices to visualize
- populate_attributes(matrixdict)
Create attributes of ProcCorr class corresponding to the keys of matrix_dictionary
- Parameters:
matrixdict (dict,) – example: matrixdict = {‘matrix_label’: np.ndarray}
- select_frame_coordinates(frame)
Select frame
- set_outputparams(params)
Set ouptut parameters
- set_selection(atom_group_selstr)
Set atom group selection string
- Parameters:
atom_group_selstr (str) –
- selection string
example: ‘name CA’
- set_thresholds(unit='au', prune_upon=False, **kwargs)
- Parameters:
unit (str,) – unit, default a.u., possible values ‘nm’, ‘au’
prune_upon (False or str,) – where str is the matrix_label of the array to use for pruning the correlations
kwargs (dict,) –
lower_thr
upper_thr
loc_factor
inv_loc_factor
- source_universe(universe)
Source mda.Universe
- Parameters:
universe (mda.Universe object) –
- to_df(normalize=False, **kwargs)
Save filtered matrices to pandas DataFrame
- Parameters:
normalize (bool,) –
kwargs (dict,) –
- which: str,
name of matrix (column) on which to apply normalization
- to_range: range,
range for normalization
- writePDBforframe(frame, outpdb)
Write PDB for a selected frame
- Parameters:
frame (int,) – selected frame for which to write PDB
outpdb (str,) – output filename
- mdigest.core.networkcanvas.display_community(path, sys, view, community_lookup, color_dict, outpath)
Pymol function to display communities on secondary structure
- Parameters:
path (str,) – path to pdb
sys (str,) – name of pdb to load (without extension)
view (set,) –
- orientation matrix copied from pymol get_view()
example: view = ( -0.611808956, -0.140187785, 0.778481722, 0.387945682, 0.804496109, 0.449760556, -0.689334452, 0.577175796, -0.437813699, 0.000000000, 0.000000000, -219.895217896, 45.563232422, 57.541908264, 47.740921021, 85.014022827, 354.776367188, 20.000000000 )
community_lookup (mdigest.CMTY.nodes_communities_collect object,) – collected output from MD trajectory community analysis
color_dict (dict,) – dictionary with color names as keys and rgb codes as values
outpath (str,) – where to save png, format should be outpath = ‘/path/to/png/’, png will be saved as outpath_communities.png
- mdigest.core.networkcanvas.draw_electrostatic_network(communities_path, edges_path, save_path, fetch_pdb=None, pse_path=None, edge_multiplier=5, color_ss=True)
Draw electrostatic network
- Parameters:
communities_path (str,) – path to communities txt file
edges_path (str,) – path to edges text file
save_path (str,) – path to save pse file
fetch_pdb (str,) – PDB ID to fetch, default=None
pse_path (str,) – path to structural file if fetch_pdb=None, default=None
edge_multiplier (int,) – multiplicative factor for edge widths in visualization, default=5
color_ss (bool,) – whether to color structure by secondary structure, default=True
- Return type:
.pse, PyMOL pse file,
- mdigest.core.networkcanvas.ss_network(ss_stats, gcc, nodes_save_path, edges_save_path, num_sd=1.5)
Secondary structures network
- Parameters:
ss_stats (pd.DataFrame,) – dataframe of secondary structure information, obtained from self.ss_stats
gcc (np.ndarray of shape (nfeatures*nfeatures),) – pairwise generalized correlation coefficients
nodes_save_path (str,) – path to save dictionary of nodes
edges_save_path (str,) – path to save dictionary of edges
num_sd (int,) – minimum standard deviations above the mean value for an edge between nodes to be considered as significant
- Returns:
dictionary of nodes (dict,)
dictionary of edges (dict)
mdigest.core.networkcommunities module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.networkcommunities.CMTY
Bases:
object
- assign_filters(filters=None)
- Parameters:
filters (dict,) – dictionary with format of {‘exclusion’: bool, ‘filter_by’: bool}, specifying whether to apply specified filter (Default is False for all keys.)
- best_iteration_louvain(G, nodes_comm_alliter, comm_list_alliter, partitions_alliter)
Select the best iteration from Louvain heuristic algorithm by selecting the instance with the highest modularity
- Parameters:
G (nx.Graph(),) – a networkx protein graph
nodes_comm_alliter (dict,) – dictionary of Dictionary containing the list of nodes for each community for at every iteration of Louvain heuristic algorithm
comm_list_alliter (list,) – contains the list of nodes for each community at every iteration of Louvain heuristic algorithm
partitions_alliter (list) – partitions at every iteration of Louvain heuristic algorithm
- Returns:
best_iteration_comm_nodes – communities nodes for the best iteration
- Return type:
dict,
- best_comm_list: list,
list containing the index of the communities in the best run best_partitions
- best_partitions: dict,
best partitions
- calculate_betweenness()
Calculate betweenness in entire system, for each matrix instance (entry in graph list). Store each in an ordered dictionary
- community_data(G, partition)
Compact all information on the communities into a dictionary
- Parameters:
G (nx.Graph(),) – a networkx protein graph
partition (dict,) – dictionary containing the different partitions
- Returns:
nodes_communities –
- a dictionary containing data relative to each community:
community labels,
community index ordered by modularity,
community index ordered by eigenvector centrality,
community nodes (list of nodes in each partition)
- Return type:
dict, n_communities: int
- compute_optimal_paths()
Compute optimal source-target paths using the Floyd-Warshall algorithm
- create_matrix_dict(matrix_dictionary)
Populate matrix_dictionary attribute with kwargs
- Parameters:
matrix_dictionary (dict,) – dictionary with format of
{'matrix_label': np.array or class_object.matrix_attribute}
containing matrices to feed into the community pipeline
- get_degree(instance=0)
- girvannewman(MVE=None)
Computes Girvan Newman algorithm, slightly faster than Run_Girvan_Newman(), uses builtin functions from nx
- Parameters:
MVE (function) – function to calculate most valuable edge. Default is None, which is equivalent to calling most_valuable_edge_nx().
- load_graph(distance_threshold=5.0)
General function to load graph
- Parameters:
distance_threshold (threshold applied in prune_adjacency) –
- most_valuable_edge(G, count_entries=False, normalized=False, weight='weight')
Returns most valuable edge according to edge_betweenness_centrality criterion
- Parameters:
count_entries (bool,) – use true to print betweenness values calculated without averaging over all shortest paths this is how betweenness values are calculated in the original floyd_warshall.c code.
normalized (bool,) – decides whether betweennesses are normalized or not
weight (str,) – default ‘weigth’, uses graph weights
- Returns:
(edge_key[0], edge_key[1]) (tuple,)
maxbet (float) – (edge tuple), maximum_betweeenness
- most_valuable_edge_fw(G)
Returns most vauable edge, using Floyd-Warshall algoritm
- populate_filters(filters_dictionary)
Populate exclusion_matrix and distance_matrix attributes with filter_dictionary.
- Parameters:
filters_dictionary (dict,) – dictionary with format
filters_dictionary={'exclusion_matrix': mat}
; mat can be either None, np.array or class_object.matrix_attribute or dict containing an exclusion matrix for each matrix in self.matrix_dict. The list of keys must match those in self.matrix_dict. To include filtering by distance, usefilters_dictionary={'distance_matrix': mat}
, where mat is as np.array or class_object.matrix_attribute or dict containing a distance matrix for each entry in self.matrix_dict. The list of keys must match those in self.matrix_dict.
Examples
filters_dictionary = { 'exclusion_matrix': np.array}
orfilters_dictionary ='distance_matrix': np.array }
or- ``filters_dictionary = { ‘exclusion_matrix’: {‘ca_lmi_rep_0’: np.array,’ca_lmi_rep_1’: np.array},
‘distance_matrix’: {‘ca_lmi_rep_0’: np.array,’ca_lmi_rep_1’: np.array }}``.
keys in exclusion matrix and distance matrix have to match
- retrieve_path(node_A, node_B, instance)
Reconstruct path
- run_cmty_louvain(setgraph=0)
Runs community louvain (one graph) and returns a dictionary with communities at each iteration of the louvain procedure, can be useful if one wants to check consistency across different louvain iterations for a single replica.
- Parameters:
setgraph (int,) – default 0, assing to integer correponding to replica on which to apply Louvain algorithm
- run_cmtys_louvain(aggregate=False, **kwargs)
Community generation using the louvain protocol, iterate over multiple replicas, for each save the partition with higher modularity to a dictionary
- Parameters:
aggregate (bool,) – whether to group communities by redistributing nodes of communities smaller than given threshold over the other communities. Aggregation assign each node to the partition that has yields the larges modularity.
kwargs (dict,) – use threshold = int to set threshold for regrouping communities, default is 5 (communities <= 5 elements are redistributed)
- save_class(file_name_root='../output/cache/community', save_space=True)
General function to save instances of the CMTY classs to file
- Parameters:
file_name_root (str,) – file rootname
- set_parameters(parameters)
Set parameters
- sort_cmty(cycles, setgraph=-1)
Sort communities and store sorted indices according to different metrics to a dictionary
- Parameters:
cycles (int,) – assign the number of cycles to match the number of replicas (number of graphs on which to iterate)
setgraph (int,) – which graph to use; if -1 use graph corresponding to louvain cycle
- mdigest.core.networkcommunities.display_shortes_path(nvView, path, dists, max_direct_dist, selected_atomnodes, opacity=0.75, color='green', side='both', segments=5, disable_impostor=True, use_cylinder=True)
Display shortest paths
mdigest.core.parsetrajectory module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.parsetrajectory.MDS
Bases:
object
Parse molecular dynamics trajectories
- align_traj(inmem=True, reference=None, selection='protein')
Align trajectory to specified selection using aling protocol from MDAnalysis
- Parameters:
inmem (bool, default True,) –
reference (bool or None, defalult None,) – a reference universe can be specified to use against for alignment
selection (str,) – selection string to select atoms against which to perform alignment
- get_universe()
Retrieve universe
- load_system(topology, traj_files, inmem=True)
Load MDA universe from topology and trajectory
- Parameters:
topology (str,) – path to topology file
traj_files (str or list of str,) – strings or list of strings specifying the path to trajectory file/s
inmem (bool,) – whether to load trajectory in memory
- set_num_replicas(num_replicas)
Set the number of replicas
- Parameters:
num_replicas (int,) – number of concatenated replicas
- set_selection(atom_group_selstr='protein and name CA', system_selstr='protein')
Set selection strings
- Parameters:
system_selstr (str,) – selection string to select the system portion to consider when computing the exclusion matrix for example “protein”
atom_group_selstr (str,) – selection string to be used for selecting the subset of the nodes on which to perform analysis for example “protein and name CA”
- source_system(mda_universe)
Source MDA universe
- Parameters:
mda_universe (mda.Universe object,) – MDA universe object
- stride_trajectory(initial=0, final=-1, step=1)
Stride trajectory
- Parameters:
initial (int,) – initial frame from which to start reading in the trajectory
final (int,) – final frame to consider when reading in the trajectory
step (int,) – step to use when slicing the traj frames
mdigest.core.savedata module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- class mdigest.core.savedata.MDSdata
Bases:
object
Save insances from mdigest.DynCorr, mdigest.DihDynCorr, mdigest.KS_Energy, mdigest.CMTY for easy access.
[**]
function structure adapted from https://github.com/melomcr/dynetan- load_from_file(file_name_root, save_space=False)
DESCRIPTION reads cached data and loads attributes
- save_to_file(file_name_root, save_space=False)
Opens the HDF5 file and stores all data
- Parameters:
file_name_root (str,) – file rootname
save_space (bool,) – if set to True avoid saving to file some very large attributes.
mdigest.core.toolkit module
#!/usr/bin/env python3 # -- coding: utf-8 --
# @author: fmaschietto, bcallen95
- mdigest.core.toolkit.dict2list(dictoflists)
Convert dictionary to list
- mdigest.core.toolkit.dump(filepath, array_input)
Dump np.ndarray to file
- Parameters:
filepath (str,) – output path
array_input (np.ndarray,) – array to save
- Return type:
pickle binary output file
- mdigest.core.toolkit.file_exists(filepath)
Check if file exists
- Parameters:
filepath (str,) – path to file
- Returns:
whether file is in path
- Return type:
return file: bool,
- mdigest.core.toolkit.folder_exists(path_to_folder)
Check if directory exists, create if not
- Parameters:
path_to_folder (str,) – path to folder
- mdigest.core.toolkit.get_NGLselection_from_node(node_idx, atomsel, atom=True)
Create an atom selection (whole residue or single atom) for NGLView and an atom-selection object.
- mdigest.core.toolkit.get_or_minus1(f)
Assign to minus one if index is absent
- mdigest.core.toolkit.get_path(src, trg, selected_atomnodes, preds, rep=0)
Return an np.ndarray with the list of nodes that connect src (source) and trg (target).
[**]
function adapted from https://github.com/melomcr/dynetan
- mdigest.core.toolkit.get_selection_from_node(i, atomsel, atom=False)
Get the selection string from a node index: resname, resid, segid and name and return an atom-selection object.
[**]
function adapted from https://github.com/melomcr/dynetan
- mdigest.core.toolkit.intersection(lst1, lst2)
Find intersection between two lists
- mdigest.core.toolkit.keywithmaxval(d)
Create a list of the dict’s keys and values; return the key with the max value
- mdigest.core.toolkit.list2dict(listoflists)
Convert list to dictionary
- mdigest.core.toolkit.log_progress(sequence, every=None, size=None, name='Items', userProgress=None)
Generates log progress bar
See also
[**]
this function was authored by Marcelo Melo as part of https://github.com/melomcr/dynetan
- mdigest.core.toolkit.normalize(arr)
Normalize array dividing by the sum.
- mdigest.core.toolkit.normalize_array(vals)
Normalize array between -1 and 1.
- mdigest.core.toolkit.partition2dict(partition)
Convert partitions to dictionary having nodes as keys and assigned community (partition) as values
- mdigest.core.toolkit.retrieve(filepath)
Retrive pickle object
- Parameters:
filepath (str,) – path of file to read
- Returns:
content
- Return type:
return content: np.ndarray,
Module contents
MDiGest v.0.1.0
__version__ = 0.1.0 __author__ = Federica Maschietto <federica.maschietto@gmail.com>, Brandon Allen <bcallen95@gmail.com>
DESCRIPTION # imports –> general imports # parsetrajectory –> process trajectory # correlation –> compute correlation based on atomic displacements # dcorrelation –> compute correlation from dihedrals fluctuations # kscorrelation –> KS analysis # dimreduction –> dimensionality reduction # community –> GN, LOUVAIN, communities in general # savedata –> caches the output of various models # auxiliary –> auxiliary functions used by multiple modules # toolkit –> accessory functions # plots