Unrolr¶
unrolr.Unrolr module¶
-
class
unrolr.core.unrolr.
Unrolr
(r_neighbor, metric='dihedral', n_components=2, n_iter=10000, random_seed=None, init='random', learning_rate=1.0, epsilon=0.0001, verbose=0, platform='OpenCL')¶ Bases:
object
-
__init__
(r_neighbor, metric='dihedral', n_components=2, n_iter=10000, random_seed=None, init='random', learning_rate=1.0, epsilon=0.0001, verbose=0, platform='OpenCL')¶ Initialize Unrolr object.
Parameters: - r_neighbor (float) – neighbor radius cutoff
- metric (str) – distance metric (choices: dihedral or intramolecular) (default: dihedral)
- n_component (int) – number of component of the final embedding (default: 2)
- n_iter (int) – number of optimization iteration (default: 10000)
- random_seed (int) – random seed (default: None)
- init (str) – method to initialize the initial embedding (random or pca)(default: random)
- learning_rate (float) – learning rate, aka computational temperature (default: 1)
- epsilon (float) – convergence criteria when computing final stress and correlation (default: 1e-4)
- verbose (int) – turn on:off verbose (default: False)
- platform (str) – platform to use for spe (OpenCL or CPU) (default: OpenCL)
-
fit_transform
(r)¶ Run the Unrolr (pSPE + didhedral distance) method.
Parameters: r (ndarray) – n-dimensional dataset (rows: frame; columns: angle)
-
save
(fname='embedding.csv', frames=None)¶ Save all the data
Parameters: - fname (str) – pathname of the csv file containing the final embedding (default: embedding.csv)
- frames (array-like) – 1d-array containing frame numbers (Default: None)
-
-
unrolr.core.unrolr.
main
()¶ Main function, unrolr.py can be executed as a standalone script
Parameters: - -f/--dihedral (filename) – hdf5 file containing dihedral angles
- -r/--rc (float) – neighborhood radius cutoff (default: 1)
- -n/-ndim (int) – number of dimension of the final embedding (default: 2)
- -c/--cycles (int) – number of optimization iteration (default: 1000)
- --start (int) – index of the first frame to analyze (default: 1)
- --stop (int) – index of the last frame to analyze (default: -1)
- --skip (int) – number of frame to skip (default: 1)
- -o/--output (filename) – csv output file name (default: embedding.csv)
- -s/--seed – random seed (default: None)
Returns: csv file containing the final embedding (default: embedding.csv)
Return type: output (file)
unrolr.feature_extraction.dihedrals module¶
-
class
unrolr.feature_extraction.dihedrals.
Dihedral
(top_file, trj_files, selection='backbone', dihedral_type='calpha', **kwargs)¶ Bases:
MDAnalysis.analysis.base.AnalysisBase
-
__init__
(top_file, trj_files, selection='backbone', dihedral_type='calpha', **kwargs)¶ Create Dihedral analysis object.
Parameters: - top_file (str) – filename of the topology file
- trj_files (str or array-like) – one or a list of trajectory files
- selection (str) – protein selection (default: backbone)
- dihedral_type (str) – type of dihedral angles to extract (choices: dihedral or calpha) (default: backbone)
-
-
unrolr.feature_extraction.dihedrals.
main
()¶ Main function, dihedral.py can be executed as a standalone script
Parameters: - -p/--top (filename) – topology file used for simulation (pdb, psf)
- -t/--trj (filename) – one or list of trajectory files
- -s/--selection (str) – protein selection
- -d/--dihedral (str) – type of dihedral angles to extract (choices: dihedral or calpha) (default: backbone)
- -o/--output (filename) – hdf5 output file name (default: dihedral_angles.h5)
Returns: hdf5 file containing the dihedral angles (default: dihedral_angles.h5)
Return type: output (file)
unrolr.feature_extraction.intramolecular_distances module¶
-
class
unrolr.feature_extraction.intramolecular_distances.
IntramolecularDistance
(top_file, trj_files, selection='backbone', **kwargs)¶ Bases:
MDAnalysis.analysis.base.AnalysisBase
-
__init__
(top_file, trj_files, selection='backbone', **kwargs)¶ Create IntramolecularDistance analysis object.
Parameters: - top_file (str) – filename of the topology file
- trj_files (str or array-like) – one or a list of trajectory files
- selection (str) – protein selection (default: backbone)
-
-
unrolr.feature_extraction.intramolecular_distances.
main
()¶ Main function, intramolecular_distances.py can be executed as a standalone script
Parameters: - -p/--top (filename) – topology file used for simulation (pdb, psf)
- -t/--trj (filename) – one or list of trajectory files
- -s/--selection (str) – protein selection
- -o/--output (filename) – hdf5 output file name (default: intramolecular_distances.h5)
Returns: hdf5 file containing the intramolecular distances (default: intramolecular_distances.h5)
Return type: output (file)
unrolr.sampling.sampling module¶
-
unrolr.sampling.sampling.
neighborhood_radius_sampler
(X, r_neighbors, metric='dihedral', n_components=2, n_iter=5000, n_runs=5, init='random', platform='OpenCL')¶ Sample different neighborhood radius rc and compute the stress and correlation.
Parameters: - X (ndarray) – n-dimensional ndarray (rows: frames; columns: features/angles)
- r_neighbors (array-like) – list of the neighborhood raidus cutoff to try
- metric (str) – metric to use to compute distance between conformations (dihedral or intramolecular) (default: dihedral)
- n_components (int) – number of dimension of the embedding
- n_iter (int) – number of optimization cycles
- n_runs (int) – number of repetitions, in order to calculate standard deviation
- init (str) – method to initialize the initial embedding (random or pca)(default: random)
- platform (str) – platform to use for spe (OpenCL or CPU) (default: OpenCL)
Returns: Pandas DataFrame containing columns [“run”, “r_neighbor”, “n_iter”, “stress”, “correlation”]
Return type: results (DataFrame)
-
unrolr.sampling.sampling.
optimization_cycle_sampler
(X, n_iters, r_neighbor, metric='dihedral', n_components=2, n_runs=5, init='random', platform='OpenCL')¶ Sample different number of optimization cycle with a certain neighborhood radius rc and compute the stress and correlation.
Parameters: - X (ndarray) – n-dimensional ndarray (rows: frames; columns: features/angles)
- n_iters (array-like) – list of the iteration numbers to try
- r_neighbor (float) – neighborhood raidus cutoff
- metric (str) – metric to use to compute distance between conformations (dihedral or intramolecular) (default: dihedral)
- n_components (int) – number of dimension of the embedding
- n_runs (int) – number of repetitions, in order to calculate standard deviation
- init (str) – method to initialize the initial embedding (random or pca)(default: random)
- platform (str) – platform to use for spe (OpenCL or CPU) (default: OpenCL)
Returns: Pandas DataFrame containing columns [“run”, “r_neighbor”, “n_iter”, “stress”, “correlation”]
Return type: results (DataFrame)
unrolr.plotting.plot_embedding module¶
-
unrolr.plotting.plot_embedding.
plot_embedding
(fname, embedding, label='Dihedral distance', clim=None, bin_size=None, cmap='viridis', show=True)¶ Plot 2D histogram of the embedding. The color code refers to the number of conformations in each bin of the histogram.
Parameters: - fname (str) – filename of the embedding plot
- embedding (ndarray) – n-dimensional embedding array (rows: frames, columns: dimensions)
- clim (array-like) – list of two element: minimum and maximum bin number (default: None)
- bin_size (float) – size of the bin. if None, use interquartile range (IQR) to define the bin size (default: None)
- cmap (str) – color map (default: viridis)
- show (bool) – show the plot (default: True)
unrolr.plotting.plot_sampling module¶
-
unrolr.plotting.plot_sampling.
plot_sampling
(fname, df, of='r_neighbor', show=True)¶ Helper function to plot results from sampling (neighborhood radii or iterations)
Parameters: - fname (str) – filename of the figure
- df (DataFrame) – Pandas DataFrame obtained from functions neighborhood_radius_sampler or optimization_cycle_sampler
- of (str) – Show the evolution of stress and correlation in function of “r_neighbor” or “n_iter” (choices: r_neighbor or n_iter) (default: r_neighbor)
- show (bool) – show the plot (default: True)
unrolr.utils module¶
-
unrolr.utils.utils.
read_dataset
(fname, dname, start=0, stop=-1, skip=1)¶ Read dataset from HDF5 file.
-
unrolr.utils.utils.
save_dataset
(fname, dname, data)¶ Save dataset to HDF5 file.
-
unrolr.utils.utils.
transform_dihedral_to_metric
(dihedral_timeseries)¶ Convert angles in radians to sine/cosine transformed coordinates.
The output will be used as the PCA input for dihedral PCA (dPCA)
Parameters: dhedral_timeseries (ndarray) – array containing dihedral angles, shape (n_samples, n_features) Returns: sine/cosine transformed coordinates Return type: ndarray
-
unrolr.utils.utils.
transform_dihedral_to_circular_mean
(dihedral_timeseries)¶ Convert angles in radians to circular mean transformed angles.
The output will be used as the PCA input for dihedral PCA+ (dPCA+)
Parameters: dhedral_timeseries (ndarray) – array containing dihedral angles, shape (n_samples, n_features) Returns: circular mean transformed angles Return type: ndarray
-
unrolr.utils.utils.
is_opencl_env_defined
()¶ Check if OpenCL env. variable is defined.
-
unrolr.utils.utils.
path_module
(module_name)¶
-
unrolr.utils.utils.
max_conformations_from_dataset
(fname, dname)¶ Get maximum number of conformations that can fit into the memory of the selected OpenCL device and also the step/interval