mdigest.core package

Subpackages

Submodules

mdigest.core.analysis module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.analysis.MDS_analysis

Bases: object

Basic molecular dynamics analysis

align_traj(inmem=True, reference=None, selection='protein')
calc_NH_order_params()

This function calculates the S2 amide order parameters according to [J. Am. Chem. Soc. 1998, 120, 5301-5311]. Expected input: - set self.nh_selections before calling this function

Returns:

  • self.order_parameters (pd.DataFrame)

  • TODO (This function will eventually need some kwargs to specify the N-atom/NH-atom selections.)

calc_chi1()

Calculates the Chi 1 Angles for an MDAnalysis.Universe. Uses the following attributes: - self.mda_u: The MDAnalysis.Universe - self.chi1_selection: a string used to select the subset of the universe for which you want to calculate the Chi 1 Angles.

Returns:

self.chi1_angles

Return type:

an array representing the Chi 1 Angles of your selection at each frame of the trajectory.

calc_rms_quant(sel_str, **kwargs)

Calculate RMS Quantities (i.e., RMSD, RMSF).

Parameters:
  • sel_str (str or int,) – string/frame_index describing the reference with which to calculate the RMS quantity

  • kwargs (dict) –

    example:

    kwargs={'Initial': 0, 'Average': 'average', 'Final': -1}

Returns:

self.rms_data = {'RMSD': {selection_title: RMSD_Value}, 'RMSF': {selection_title: RMSF_Value}}, where RMSD_Value/RMSF_Value are np.arrays containing the computed quantities.

Return type:

self.rms_data –> A dict with the following structure

compute_radius_of_gyration(**kwargs)

Calculate the radius of gyration given an MDAnalysis.Universe and a dictionary of atom selections. This dictionary should take the following form:

Parameters:

kwargs (dict,) –

  • selection_id: string to describe the selection

  • selection_str: string used for creating selection with MDA.Universe

do_dihedral_calcs(save_data=None)

Calculate Phi, Psi Angles from input:

Parameters:

save_data (dict or None,) – use for saving the output. Requires dict containing the keys: - “Directory”: The desired directory to save the output to - “Output Descriptor”: simple string to identify the system uniquely (for saving output)

do_ss_calculation(simple=True)

Calculation of SS propensities using an MDTraj trajectory object. The simple option is a Boolean flag that allows for the choice of simple SS definitions (Helix, Strand, Coil) or the more descriptive SS definitions.

Parameters:

simple (bool) – If true, only ‘H’, ‘E’ and ‘C’ secondary structure assignments are used.

Returns:

`self.ss_stats` – SS assignments at each frame as well as the SS propensities computed over the course of the trajectory (i.e., % Helix, % Strand, % Coil).

Return type:

pd.DataFrame,

get_universe()
load_class_from_file(file_name_root)
load_system(topology, traj_files, inmem=True)
save_class(file_name_root)
set_NH_selections(amide_string_list)
Expected input:
  • amide_string_list: a list of strings used to make the atom selection for your

    backbone amide N atoms and the backbone amide H atoms

set_node_type_sele(node_type_sele)
set_num_replicas(num_replicas)
set_selection(atm_str_sel, sys_str_sel)
stride_trajectory(initial=0, final=-1, step=1)

Stride trajectory

Parameters:
  • initial (int,) –

  • final (int,) –

  • step (int) –

  • MDS (# TODO redundant with) –

  • object (eventually remove and inherit mdigest.MDS) –

mdigest.core.auxiliary module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

mdigest.core.auxiliary.compute_DCC(values_, features_dimension, normalized=True)

Compute normalized dynamical pairwise cross-correlation coefficients for each given replica

Parameters:
  • values (np.ndarray, shape (nframes, natoms*features_dimensions)) – coordinates array

  • features_dimension (int,) – dimension of features array

  • normalized (bool,) – whether to normalize cross-correlation matrix; default is True

Returns:

cross_correlation – cross-correlation matrix

Return type:

np.ndarray, shape (natoms, natoms)

mdigest.core.auxiliary.compute_DCC_matrix(values, values_avg, features_dimension)

Compute dynamical cross-correlation matrix (upper triangle) [***]

Parameters:
  • values (np.ndarray, shape (nsamples, nfeature, features_dimension)) – time-series values

  • values_avg (np.ndarray, shape (nfeature, features_dimension)) – mean time-series values (values averaged over time)

  • features_dimension (int,) – dimensionality of features array

Returns:

cross_correlation – dynamical cross correlation matrix

Return type:

np.ndarray, shape (nfeatures, nfeatures)

mdigest.core.auxiliary.compute_distance_matrix(values, features_dimensions)

Compute distance matrix using the average position over the entire lentgh of the given trajectory

Parameters:
  • values (np.ndarray,) – array of values of shape (nframes, nfeatures * features_dimensions)

  • features_dimensions (int,) – dimensionality of features (3 if values contains xyz coordinates)

Returns:

dist_matrix – distance matrix

Return type:

np.ndarray, shape (nfeatures, nfeatures)

mdigest.core.auxiliary.compute_eigenvector_centrality(mat, loc_factor=None, distmat=None, weight='weight')

Compute eigenvector centrality based on a given correlation matrix and pairwise distance matrix

Parameters:
  • mat (np.ndarray,) – adiacency (correlation matrix) to diagonalize

  • loc_factor (float,) – locality factor (filtering threshold)

  • distmat (np.ndarray,) – distance matrix

  • weight (str or None,) – if None treat adjaccency as binary, if ‘weight’ use values weights.

Returns:

  • cdict (dict,) – centrality dictionary with nodes as keys and centrality coefficients as values

  • cvec (np.ndarray,) – eigenvector centrality coefficients

mdigest.core.auxiliary.compute_generalized_correlation_coefficients(values, features_dimension=1, solver='gaussian', correction=True, subset=None)

Rescale mutual information based correlation (gcc) between 0 and 1

Parameters:
  • values (np.ndarray, shape (nsamples, nfeatures, features_dimension)) – time-series values

  • features_dimension (int,) – dimensionality of features array

  • solver (string,) – which solver to use, default ‘gaussian’

  • correction (bool,) – whether to apply correction (subtraction of min value of mutual information)

  • subset (np.array or None,) – indices of features for which to calculate gcc, if None, all features are kept

Returns:

gcc – generalized correlation coefficient matrix

Return type:

np.ndarray,

mdigest.core.auxiliary.coordinate_reshape(values)
Reshape values to shape (nframes, nfeatures*features_dimensions) where
  • nframes is the number of frames of the MD timeseries considered,

  • nfeatures is the number of features, i.e. number of atoms for which values are stored,

  • features_dimensions is the dimensionality of the features (i.e. 3 for xyz-coordinates, 4 for [sin(phi), cos(phi), sin(psi), cos(psi)] dihedral coordinate vector).

Parameters:

values (np.ndarray, shape (nframes, nfeatures, features_dimensions)) – values

Returns:

values_ – reshaped values

Return type:

np.ndarray, shape (nframes, nfeatures * features_dimensions)

mdigest.core.auxiliary.corr(mat)

Compute pearson correlation

Parameters:

mat (np.ndarray,) – is an input matrix of shape (nsamples, nfeatures)

Returns:

corrmat – pearson correlation matrix

Return type:

np.ndarray, shape (nfeatures, nfeatures)

mdigest.core.auxiliary.create_graph(adj_matx)

Create nx.Graph instance from a np.ndarray adjacency_matrix

mdigest.core.auxiliary.evaluate_covariance_matrix(values, center='square_disp')

Compute the covariance matrix of features values.

Parameters:
  • values (np.ndarray, shape (nframes, nfeatures, features_dimensions)) – matrix of displacements from mean values

  • center (str,) – -if ‘mean’ remove mean -if ‘square_disp’ compute square displacements of values array (default option)

Returns:

covar_mat – covariance matrix

Return type:

np.ndarray, shape (nfeatures, nfeatures)

mdigest.core.auxiliary.filter_adjacency(mat, distmat, loc_factor)

Filter the adjacency matrix using exponential to dump long-range contacts (and hence focus on short-range ones)

Parameters:
  • mat (np.ndarray,) – matrix

  • distmat (np.ndarray,) – matrix based on which to prune mat

  • loc_factor (float,) – locality factor (in Amstrongs), suggested 5 Å

Returns:

adj_matx – filtered matrix

Return type:

np.ndarray,

mdigest.core.auxiliary.filter_adjacency_inv(mat, distmat, loc_factor)

Filter the adjacency matrix using exponential to dump short-range contacts (and hence focus on long-range ones)

Parameters:
  • mat (np.ndarray,) – matrix

  • distmat (np.ndarray,) – matrix based on which to prune mat

  • loc_factor (float,) – locality factor (in Amstrongs), suggested 5 Å

Returns:

adj_matx – filtered matrix

Return type:

np.ndarray,

mdigest.core.auxiliary.get_centrality_df(input_class, cent_don, cent_acc, cent, selection)

Create dataframe containing eigenvector centrality from KS energy and from alpha carbon displacements

Parameters:
  • input_class (object,) –

  • cent_don (np.ndarray,) –

  • cent_acc (np.ndarray,) –

  • cent (np.ndarray,) –

  • selection (str,) –

Returns:

temp_df – data frame containing electrostatic-centrality and CA-centrality

Return type:

pd.DataFrame()

mdigest.core.auxiliary.linfit(x, y)

Return R^2 where x and y are array-like

mdigest.core.auxiliary.prune_adjacency(mat, distmat, loc_factor=5.0, greater=False, lower=False)

Zero out adjacency matrics value according to the locality factor. By defalult entries corresponding to atom pairs at a distance equal or greater than the locality_factor (loc_factor) are set to zero.

Parameters:
  • mat (np.ndarray) – adjacency matrix

  • distmat (np.ndarray) – distance matrix

  • loc_factor (float x) – locality_factor, defines the distance threshold to use for pruning

  • greater (bool) – whether to prune residues pairs at distances GREATER than the locality factor, default is False

  • lower (bool) – whether to prune residue pairs at distances LOWER than the locality factor, default is False

Returns:

mat – pruned ajacency matrix

Return type:

np.ndarray,

mdigest.core.auxiliary.reduce_trajectory(universe, segIDs)

It can be useful to work on a reduced trajectory without hydrogens, especially when visualizing stuff. This function can be used to generate a reduced MDAnalysis universe.

Parameters:
  • universe (MDAnalysis.core.Universe object) –

  • segIDs (str,) – segIDs of groups of atoms to keep

Returns:

reduced – reduced universe

Return type:

MDAnalysis.core.Universe object

mdigest.core.auxiliary.sorted_eig(A)

Sort eigenvalues from larges to smallest and reorder eigenvectors accordingly

Parameters:

A (array, shape (nfeatures, nfeatures)) – array of eigenvalues

Returns:

  • eigenValues (np.ndarray, shape (nfeatures))

  • eigenVectors (np.ndarray, shape (nfeatures,nfeatures))

mdigest.core.auxiliary.to_pickle(dataframe, output)

Dump dataframe to file using pickle

Parameters:
  • dataframe (pd.DataFrame,) – dataframe to pickle

  • output (str,) – output name with path

mdigest.core.auxiliary.vec_query(arr, my_dict)

Convert dictionary to array

mdigest.core.correlation module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.correlation.DynCorr(MDSIM)

Bases: object

General purpose class handling computation of different correlation metrics from atomic displacements sampled over MD trajectories.

edge_exclusion(spatial_cutoff=4.5, contact_cutoff=0.75, save_name='none')

Compute the so-called edge exclusion matrix. When analyzing correlations it can be useful to analyze only correlation corresponding to residue pairs that are proximal (below a certain distance threshold) for a certain percentage of the trajectory frames. This function takes care of computing this using capped_distance from MDAnalysis.

Parameters:
  • spatial_cutoff (float,) – distance threshold to define atoms in contact (in Amstrong)

  • contact_cutoff (float,) – contact persistency above which to consider pair expressed in percentage of frames

  • save_name – output filename

parse_dynamics(scale=False, normalize=True, LMI='gaussian', MI='knn_5_1', DCC=False, PCC=False, COV_DISP=False, VERBOSE=False, **kwargs)

Parse molecular dynamics trajectory and compute different correlation metrices

Parameters:
  • scale (bool,) – whether to remove mean from coordinates using StandardScaler

  • normalize (bool,) – whether to normalize cross-correlation matrices

  • LMI (str or None,) –

    • None to skip computation of LMI based correlation

    • ’gaussian’ to compute LMI

  • MI (str,) – -None to skip computation of MI based correlation -‘knn_arg1_arg2’ to compute MI, with k = arg1, and estimator= arg2, default is ‘knn_5_1’

  • DCC (bool,) – whether to compute dynamical cross-correlation matrix of atomic displacements. Default is False

  • PCC (bool,) – whether to compute Pearson correlation matrix of atomic displacements. Default is False

  • COV_DISP (bool,) – whether to compute the covariance of atomic displacements. Default is False

  • VERBOSE (bool,) – whether to set verbose printing

save_class(file_name_root='./output/cache/')

can be used to dump all correlation analyses (produced upon calling the correlation, kscorrelation, and dcorrelation modules) or the community analysis.

Parameters:

file_name_root (str) – filename rootname

mdigest.core.dcorrelation module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.dcorrelation.DihDynCorr(MDSIM)

Bases: object

Correlated motions of dihedrals

parse_dih_dynamics(mean_center=True, LMI='gaussian', MI='knn_5_1', DCC=False, PCC=False, COV_DISP=True, **kwargs)

General purpose class handling computation of different correlation metrics from $phi$, $psi$ backbone dihedrals fluctuations sampled over MD trajectories. Diedrals are transformed using $phi$ –> {$sin(phi)$, $cos(phi)$} and $psi$ –> {$sin(psi)$, $cos(psi)$} such that each residue (temimal residues excluded) is described by an array of four entries [$sin(phi)$, $cos(phi)$, $sin(psi)$, $cos(psi)$].

Parameters:
  • mean_center (bool) – wheter to subtract mean

  • LMI (str or None; default 'gaussian') –

    • ‘gaussian’ for using gaussian estimator

    • None: skip computation of linearized mutual information based correlation

  • MI (str or None, default 'knn_5_1') – composite argument where knn specifiess use of k-nearest neighbor algorithm, 5 specifies number of nearest neighbours, 1 specifies estimate to use (options are 1 or 2)

  • DCC (bool,) – whether to compute dynamical cross correltaion

  • PCC (bool,) – whether to compute Pearson’s cross correlation

  • COV_DISP (bool,) – whether to compute covariance of dihedrals displacements

  • kwargs

    • normalized: bool

      whether to normalize DCC matrix

    • subset: list,

      list of indices specifying the nodes for which to compute MI

    • center: str or None

      How to compute the covariance matrix; possible values are ‘mean’ or ‘square_disp’

save_class(file_name_root='./output/cache/')

Save DihDynCorr class instances to file

file_name_root: srt

path where to save class

mdigest.core.imports module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

mdigest.core.kscorrelation module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.kscorrelation.KS_Box(KS, attrlist)

Bases: object

Use as collector to free up memory

class mdigest.core.kscorrelation.KS_Energy(MDSIM)

Bases: object

General purpose class handling computation Kabsch-Sander Analysis over MD trajectories.

KS_pipeline(topology_charges=False, covariance=False, MI=None, **kwargs)

KS Pipeline

Parameters:
  • topology_charges (bool,) – whether to use topology charges in KS calculation

  • covariance (bool,) – whether to compute covariance of KS_energies

  • MI (str or None,) –

    • if None skip computation of MI based correlation

    • if ‘knn_arg1_arg2’ compute MI using k=arg1, and estimator=arg2; default is ‘knn_5_1’

compute_EEC(distance_matrix=None, loc_factor=None, don_acc=True)

Compute Electrostatic Eigenvector Centrality (EEC), donor/acceptor/don_acc centrality for each replica

Parameters:
  • distance_matrix (default None,) – provide distance matrix when loc_factor != 0 to zero out values adiacency matrix values corresponding to distances exceeding loc_factor

  • loc_factor (float,) – filtering threshold for selection of specific correlation range

  • don_acc (bool,) – whether to compute DA (donor_acceptor) and D+A (donor+acceptor) centralities

compute_KS_energy(dist_dict, topology_charges=False)

Perform KS calculation

Parameters:
  • dist_dict (dict,) – single dictionary containing residue-to-residues backbone atom distances for a given replica

  • topology_charges (bool,) – if True, self.q1q2 is expected to be filled with charges array

Returns:

KS_energies – KS_energies

Return type:

np.ndarray,

compute_distances_parallel(beg, end, stride, remap=False)

Compute distances in parallel

Parameters:
  • beg (int,) – initial frame

  • end (int,) – end frame

  • stride (int,) – step

  • remap (bool,) – if True assumes remapping is unneeded (all distance matrices have the same dimension)

Returns:

bb_dist_dict – dictionary with CN, CH, OH, ON as keys and np.ndarrays with the corresponding distance arrays as values

Return type:

dict

prepare_kabsch_sander_arrays()

Prepare Kabsch-Sanders calculation

save_class(file_name_root='./output/cache/', save_space=False)

Save MDS class instances to file

Parameters:
  • file_name_root (str,) – path where to save class

  • save_space (bool,) – if False bb_distances_allrep and KS_energies_allrep are not dumped to file

set_backbone_dictionary(backbone_dictionary)

Set backbone dictionary

Parameters:

backbone_dictionary (dict,) – backbone dictionary specifying the atom name of each backbone atom. Names should match those in the topology files

Examples

KS_energy.set_backbone_dictionary({'N-Backbone':'N', 'O-Backbone':'O','C-Backbone':'C', 'CA-Backbone':'CA', 'H-Backbone':'H'})

set_charges_array(chargeOtimeschargeN)

Set charges array

Parameters:

chargeOtimeschargeN (np.ndarray,) – array of dimensions (nresidues, nresidues) entries (q1q2$_(ij)$)) are the products of the i-th CO and j-th NH residue charges extracted from the topology

set_offset(offset)

set offset

Parameters:

offset (int,) – set offset when the residue indexes in the topology start at number other than 0 integer that should be subtracted to the first residue index have the resindices list start from 0

set_selection(atom_group_selstr, system_selstr='all')

Set selection strings

Parameters:
  • system_selstr (str,) – selection string to be used for extracting a subset of the atoms (system) on which to perform analysis

  • atom_group_selstr (str,) –

    selection string to be used for selecting a subset of atoms from the system
    • atom_str_sel: a list of four selection strings containing in order the N, O, C, H backbone selection strings, respectively.)

Examples

KS_Energy.set_selection(['protein and backbone and name N','protein and backbone and name O', 'protein and backbone and name C','protein and name H'], system_selstr='protein')

mdigest.core.networkcanvas module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.networkcanvas.ProcCorr

Bases: object

Process correlation matrix to make the desired arrays amenable for visualization.

corrNetworkPymol(pdb_file, corr_matrix, pml_out_file, frame=0, selection=None, lthr_filter=None, uthr_filter=None, edge_scaling=1, chainblocks=True, **kwargs)

A useful practice is to inspect the shape of the correlation networks. This function provides a way to visualize the correlation patterns on the protein structure. The function saves a pml file that contains the user selected correlation values which can be executed in pymol to produce a png of the correlation [***]

Parameters:
  • pdb_file (str,) – filename (path+filename+extension) of the PDB that will then be read in pymol. If the file is not found, the utils module is called to write a pdb file at the path specified by pdb_file. It is crucial that this pdb file corresponds to the trajectory frame used for reading the coordinates (specified by the variable frame)

  • frame (int,) – frame number

  • corr_matrix (np.ndarray square matrix of floats,) – correlation matrix

  • pml_out_file (str,) – output pml rootname

  • selection (None or MDAnalysis AtomGroup object,) –

    • if None: selection is overwritten with self.atom_group_selstr

    • else takes in MDAnalysis AtomGroup.

  • lthr_filter (float,) – lthr_filter and uthr_filter can be used to visualize correlations within an interval use lthr_filter to filter out correlation values below this value. Only correlation values greater than lthr_filter will be written to PML file for visualization

  • uthr_filter (float,) – lthr_filter and uthr_filter can be used to visualize correlations within an interval use uthr_filter to filter out correlation values above this value. Only correlation values equal or lower than uthr_filter will be written to PML file for visualization

  • edge_scaling (float,) – adjust radius of cylinders to be displayed in pymol.

  • edge_scaling – multiplicative factor which can be used to scale the correlation values. Recommended values are between 0.01-2.00.

  • chainblocks (bool,) – If True and universe contains multiple chains, separate file for inter and intra-chain correlations are printed out.

filter_by_distance(matrixtype, distmat=False)

Zero out correlations lower than self.lower_str and higher than self.upper_str in a given correlation matrix. If residues are closer to each other than a certain distance (self.lower_thr), make these correlations zero. If residues are farther to each other than a certain distance (self.upper_thr), make these correlations also zero. This filter can be applied to select correlations falling within distance window of interest.

Parameters:
  • matrixtype (str,) – used to select a desired correlation matrix and also used as prefix for the output files.

  • distmat (bool,) – default is False, which results in pruning based on correlation values. Upper and lower thresholds for pruning are set by call to set_thresholds()

get_selection_fromMDS(MDS)

Retrieve atomstring selection

Parameters:

MDS (mdigest.MDS object) –

load_matrix_dictionary(matrix_dictionary)

Populate matrix_dictionary attribute with kwargs

Parameters:

matrix_dictionary (dict,) – dictionary with format of {‘matrix_label’: np.ndarray} containing correlation matrices to visualize

populate_attributes(matrixdict)

Create attributes of ProcCorr class corresponding to the keys of matrix_dictionary

Parameters:

matrixdict (dict,) – example: matrixdict = {‘matrix_label’: np.ndarray}

select_frame_coordinates(frame)

Select frame

set_outputparams(params)

Set ouptut parameters

set_selection(atom_group_selstr)

Set atom group selection string

Parameters:

atom_group_selstr (str) –

selection string

example: ‘name CA’

set_thresholds(unit='au', prune_upon=False, **kwargs)
Parameters:
  • unit (str,) – unit, default a.u., possible values ‘nm’, ‘au’

  • prune_upon (False or str,) – where str is the matrix_label of the array to use for pruning the correlations

  • kwargs (dict,) –

    • lower_thr

    • upper_thr

    • loc_factor

    • inv_loc_factor

source_universe(universe)

Source mda.Universe

Parameters:

universe (mda.Universe object) –

to_df(normalize=False, **kwargs)

Save filtered matrices to pandas DataFrame

Parameters:
  • normalize (bool,) –

  • kwargs (dict,) –

    • which: str,

      name of matrix (column) on which to apply normalization

    • to_range: range,

      range for normalization

writePDBforframe(frame, outpdb)

Write PDB for a selected frame

Parameters:
  • frame (int,) – selected frame for which to write PDB

  • outpdb (str,) – output filename

mdigest.core.networkcanvas.display_community(path, sys, view, community_lookup, color_dict, outpath)

Pymol function to display communities on secondary structure

Parameters:
  • path (str,) – path to pdb

  • sys (str,) – name of pdb to load (without extension)

  • view (set,) –

    orientation matrix copied from pymol get_view()

    example: view = ( -0.611808956, -0.140187785, 0.778481722, 0.387945682, 0.804496109, 0.449760556, -0.689334452, 0.577175796, -0.437813699, 0.000000000, 0.000000000, -219.895217896, 45.563232422, 57.541908264, 47.740921021, 85.014022827, 354.776367188, 20.000000000 )

  • community_lookup (mdigest.CMTY.nodes_communities_collect object,) – collected output from MD trajectory community analysis

  • color_dict (dict,) – dictionary with color names as keys and rgb codes as values

  • outpath (str,) – where to save png, format should be outpath = ‘/path/to/png/’, png will be saved as outpath_communities.png

mdigest.core.networkcanvas.draw_electrostatic_network(communities_path, edges_path, save_path, fetch_pdb=None, pse_path=None, edge_multiplier=5, color_ss=True)

Draw electrostatic network

Parameters:
  • communities_path (str,) – path to communities txt file

  • edges_path (str,) – path to edges text file

  • save_path (str,) – path to save pse file

  • fetch_pdb (str,) – PDB ID to fetch, default=None

  • pse_path (str,) – path to structural file if fetch_pdb=None, default=None

  • edge_multiplier (int,) – multiplicative factor for edge widths in visualization, default=5

  • color_ss (bool,) – whether to color structure by secondary structure, default=True

Return type:

.pse, PyMOL pse file,

mdigest.core.networkcanvas.ss_network(ss_stats, gcc, nodes_save_path, edges_save_path, num_sd=1.5)

Secondary structures network

Parameters:
  • ss_stats (pd.DataFrame,) – dataframe of secondary structure information, obtained from self.ss_stats

  • gcc (np.ndarray of shape (nfeatures*nfeatures),) – pairwise generalized correlation coefficients

  • nodes_save_path (str,) – path to save dictionary of nodes

  • edges_save_path (str,) – path to save dictionary of edges

  • num_sd (int,) – minimum standard deviations above the mean value for an edge between nodes to be considered as significant

Returns:

  • dictionary of nodes (dict,)

  • dictionary of edges (dict)

mdigest.core.networkcommunities module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.networkcommunities.CMTY

Bases: object

assign_filters(filters=None)
Parameters:

filters (dict,) – dictionary with format of {‘exclusion’: bool, ‘filter_by’: bool}, specifying whether to apply specified filter (Default is False for all keys.)

best_iteration_louvain(G, nodes_comm_alliter, comm_list_alliter, partitions_alliter)

Select the best iteration from Louvain heuristic algorithm by selecting the instance with the highest modularity

Parameters:
  • G (nx.Graph(),) – a networkx protein graph

  • nodes_comm_alliter (dict,) – dictionary of Dictionary containing the list of nodes for each community for at every iteration of Louvain heuristic algorithm

  • comm_list_alliter (list,) – contains the list of nodes for each community at every iteration of Louvain heuristic algorithm

  • partitions_alliter (list) – partitions at every iteration of Louvain heuristic algorithm

Returns:

best_iteration_comm_nodes – communities nodes for the best iteration

Return type:

dict,

best_comm_list: list,

list containing the index of the communities in the best run best_partitions

best_partitions: dict,

best partitions

calculate_betweenness()

Calculate betweenness in entire system, for each matrix instance (entry in graph list). Store each in an ordered dictionary

community_data(G, partition)

Compact all information on the communities into a dictionary

Parameters:
  • G (nx.Graph(),) – a networkx protein graph

  • partition (dict,) – dictionary containing the different partitions

Returns:

nodes_communities

a dictionary containing data relative to each community:
  • community labels,

  • community index ordered by modularity,

  • community index ordered by eigenvector centrality,

  • community nodes (list of nodes in each partition)

Return type:

dict, n_communities: int

compute_optimal_paths()

Compute optimal source-target paths using the Floyd-Warshall algorithm

create_matrix_dict(matrix_dictionary)

Populate matrix_dictionary attribute with kwargs

Parameters:

matrix_dictionary (dict,) – dictionary with format of {'matrix_label': np.array or class_object.matrix_attribute} containing matrices to feed into the community pipeline

get_degree(instance=0)
girvannewman(MVE=None)

Computes Girvan Newman algorithm, slightly faster than Run_Girvan_Newman(), uses builtin functions from nx

Parameters:

MVE (function) – function to calculate most valuable edge. Default is None, which is equivalent to calling most_valuable_edge_nx().

load_graph(distance_threshold=5.0)

General function to load graph

Parameters:

distance_threshold (threshold applied in prune_adjacency) –

most_valuable_edge(G, count_entries=False, normalized=False, weight='weight')

Returns most valuable edge according to edge_betweenness_centrality criterion

Parameters:
  • count_entries (bool,) – use true to print betweenness values calculated without averaging over all shortest paths this is how betweenness values are calculated in the original floyd_warshall.c code.

  • normalized (bool,) – decides whether betweennesses are normalized or not

  • weight (str,) – default ‘weigth’, uses graph weights

Returns:

  • (edge_key[0], edge_key[1]) (tuple,)

  • maxbet (float) – (edge tuple), maximum_betweeenness

most_valuable_edge_fw(G)

Returns most vauable edge, using Floyd-Warshall algoritm

populate_filters(filters_dictionary)

Populate exclusion_matrix and distance_matrix attributes with filter_dictionary.

Parameters:

filters_dictionary (dict,) – dictionary with format filters_dictionary={'exclusion_matrix': mat}; mat can be either None, np.array or class_object.matrix_attribute or dict containing an exclusion matrix for each matrix in self.matrix_dict. The list of keys must match those in self.matrix_dict. To include filtering by distance, use filters_dictionary={'distance_matrix': mat}, where mat is as np.array or class_object.matrix_attribute or dict containing a distance matrix for each entry in self.matrix_dict. The list of keys must match those in self.matrix_dict.

Examples

  • filters_dictionary = { 'exclusion_matrix': np.array} or filters_dictionary ='distance_matrix': np.array } or

  • ``filters_dictionary = { ‘exclusion_matrix’: {‘ca_lmi_rep_0’: np.array,’ca_lmi_rep_1’: np.array},

    ‘distance_matrix’: {‘ca_lmi_rep_0’: np.array,’ca_lmi_rep_1’: np.array }}``.

keys in exclusion matrix and distance matrix have to match

retrieve_path(node_A, node_B, instance)

Reconstruct path

run_cmty_louvain(setgraph=0)

Runs community louvain (one graph) and returns a dictionary with communities at each iteration of the louvain procedure, can be useful if one wants to check consistency across different louvain iterations for a single replica.

Parameters:

setgraph (int,) – default 0, assing to integer correponding to replica on which to apply Louvain algorithm

run_cmtys_louvain(aggregate=False, **kwargs)

Community generation using the louvain protocol, iterate over multiple replicas, for each save the partition with higher modularity to a dictionary

Parameters:
  • aggregate (bool,) – whether to group communities by redistributing nodes of communities smaller than given threshold over the other communities. Aggregation assign each node to the partition that has yields the larges modularity.

  • kwargs (dict,) – use threshold = int to set threshold for regrouping communities, default is 5 (communities <= 5 elements are redistributed)

save_class(file_name_root='../output/cache/community', save_space=True)

General function to save instances of the CMTY classs to file

Parameters:

file_name_root (str,) – file rootname

set_parameters(parameters)

Set parameters

sort_cmty(cycles, setgraph=-1)

Sort communities and store sorted indices according to different metrics to a dictionary

Parameters:
  • cycles (int,) – assign the number of cycles to match the number of replicas (number of graphs on which to iterate)

  • setgraph (int,) – which graph to use; if -1 use graph corresponding to louvain cycle

mdigest.core.networkcommunities.display_shortes_path(nvView, path, dists, max_direct_dist, selected_atomnodes, opacity=0.75, color='green', side='both', segments=5, disable_impostor=True, use_cylinder=True)

Display shortest paths

mdigest.core.parsetrajectory module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.parsetrajectory.MDS

Bases: object

Parse molecular dynamics trajectories

align_traj(inmem=True, reference=None, selection='protein')

Align trajectory to specified selection using aling protocol from MDAnalysis

Parameters:
  • inmem (bool, default True,) –

  • reference (bool or None, defalult None,) – a reference universe can be specified to use against for alignment

  • selection (str,) – selection string to select atoms against which to perform alignment

get_universe()

Retrieve universe

load_system(topology, traj_files, inmem=True)

Load MDA universe from topology and trajectory

Parameters:
  • topology (str,) – path to topology file

  • traj_files (str or list of str,) – strings or list of strings specifying the path to trajectory file/s

  • inmem (bool,) – whether to load trajectory in memory

set_num_replicas(num_replicas)

Set the number of replicas

Parameters:

num_replicas (int,) – number of concatenated replicas

set_selection(atom_group_selstr='protein and name CA', system_selstr='protein')

Set selection strings

Parameters:
  • system_selstr (str,) – selection string to select the system portion to consider when computing the exclusion matrix for example “protein”

  • atom_group_selstr (str,) – selection string to be used for selecting the subset of the nodes on which to perform analysis for example “protein and name CA”

source_system(mda_universe)

Source MDA universe

Parameters:

mda_universe (mda.Universe object,) – MDA universe object

stride_trajectory(initial=0, final=-1, step=1)

Stride trajectory

Parameters:
  • initial (int,) – initial frame from which to start reading in the trajectory

  • final (int,) – final frame to consider when reading in the trajectory

  • step (int,) – step to use when slicing the traj frames

mdigest.core.savedata module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

class mdigest.core.savedata.MDSdata

Bases: object

Save insances from mdigest.DynCorr, mdigest.DihDynCorr, mdigest.KS_Energy, mdigest.CMTY for easy access. [**] function structure adapted from https://github.com/melomcr/dynetan

load_from_file(file_name_root, save_space=False)

DESCRIPTION reads cached data and loads attributes

save_to_file(file_name_root, save_space=False)

Opens the HDF5 file and stores all data

Parameters:
  • file_name_root (str,) – file rootname

  • save_space (bool,) – if set to True avoid saving to file some very large attributes.

mdigest.core.toolkit module

#!/usr/bin/env python3 # -- coding: utf-8 --

# @author: fmaschietto, bcallen95

mdigest.core.toolkit.dict2list(dictoflists)

Convert dictionary to list

mdigest.core.toolkit.dump(filepath, array_input)

Dump np.ndarray to file

Parameters:
  • filepath (str,) – output path

  • array_input (np.ndarray,) – array to save

Return type:

pickle binary output file

mdigest.core.toolkit.file_exists(filepath)

Check if file exists

Parameters:

filepath (str,) – path to file

Returns:

whether file is in path

Return type:

return file: bool,

mdigest.core.toolkit.folder_exists(path_to_folder)

Check if directory exists, create if not

Parameters:

path_to_folder (str,) – path to folder

mdigest.core.toolkit.get_NGLselection_from_node(node_idx, atomsel, atom=True)

Create an atom selection (whole residue or single atom) for NGLView and an atom-selection object.

mdigest.core.toolkit.get_or_minus1(f)

Assign to minus one if index is absent

mdigest.core.toolkit.get_path(src, trg, selected_atomnodes, preds, rep=0)

Return an np.ndarray with the list of nodes that connect src (source) and trg (target). [**] function adapted from https://github.com/melomcr/dynetan

mdigest.core.toolkit.get_selection_from_node(i, atomsel, atom=False)

Get the selection string from a node index: resname, resid, segid and name and return an atom-selection object. [**] function adapted from https://github.com/melomcr/dynetan

mdigest.core.toolkit.intersection(lst1, lst2)

Find intersection between two lists

mdigest.core.toolkit.keywithmaxval(d)

Create a list of the dict’s keys and values; return the key with the max value

mdigest.core.toolkit.list2dict(listoflists)

Convert list to dictionary

mdigest.core.toolkit.log_progress(sequence, every=None, size=None, name='Items', userProgress=None)

Generates log progress bar

See also

[**] this function was authored by Marcelo Melo as part of https://github.com/melomcr/dynetan

mdigest.core.toolkit.normalize(arr)

Normalize array dividing by the sum.

mdigest.core.toolkit.normalize_array(vals)

Normalize array between -1 and 1.

mdigest.core.toolkit.partition2dict(partition)

Convert partitions to dictionary having nodes as keys and assigned community (partition) as values

mdigest.core.toolkit.retrieve(filepath)

Retrive pickle object

Parameters:

filepath (str,) – path of file to read

Returns:

content

Return type:

return content: np.ndarray,

Module contents

MDiGest v.0.1.0

__version__ = 0.1.0 __author__ = Federica Maschietto <federica.maschietto@gmail.com>, Brandon Allen <bcallen95@gmail.com>

DESCRIPTION # imports –> general imports # parsetrajectory –> process trajectory # correlation –> compute correlation based on atomic displacements # dcorrelation –> compute correlation from dihedrals fluctuations # kscorrelation –> KS analysis # dimreduction –> dimensionality reduction # community –> GN, LOUVAIN, communities in general # savedata –> caches the output of various models # auxiliary –> auxiliary functions used by multiple modules # toolkit –> accessory functions # plots