Unrolr

unrolr.Unrolr module

class unrolr.core.unrolr.Unrolr(r_neighbor, metric='dihedral', n_components=2, n_iter=10000, random_seed=None, init='random', learning_rate=1.0, epsilon=0.0001, verbose=0, platform='OpenCL')

Bases: object

__init__(r_neighbor, metric='dihedral', n_components=2, n_iter=10000, random_seed=None, init='random', learning_rate=1.0, epsilon=0.0001, verbose=0, platform='OpenCL')

Initialize Unrolr object.

Parameters:
  • r_neighbor (float) – neighbor radius cutoff
  • metric (str) – distance metric (choices: dihedral or intramolecular) (default: dihedral)
  • n_component (int) – number of component of the final embedding (default: 2)
  • n_iter (int) – number of optimization iteration (default: 10000)
  • random_seed (int) – random seed (default: None)
  • init (str) – method to initialize the initial embedding (random or pca)(default: random)
  • learning_rate (float) – learning rate, aka computational temperature (default: 1)
  • epsilon (float) – convergence criteria when computing final stress and correlation (default: 1e-4)
  • verbose (int) – turn on:off verbose (default: False)
  • platform (str) – platform to use for spe (OpenCL or CPU) (default: OpenCL)
fit_transform(r)

Run the Unrolr (pSPE + didhedral distance) method.

Parameters:r (ndarray) – n-dimensional dataset (rows: frame; columns: angle)
save(fname='embedding.csv', frames=None)

Save all the data

Parameters:
  • fname (str) – pathname of the csv file containing the final embedding (default: embedding.csv)
  • frames (array-like) – 1d-array containing frame numbers (Default: None)
unrolr.core.unrolr.main()

Main function, unrolr.py can be executed as a standalone script

Parameters:
  • -f/--dihedral (filename) – hdf5 file containing dihedral angles
  • -r/--rc (float) – neighborhood radius cutoff (default: 1)
  • -n/-ndim (int) – number of dimension of the final embedding (default: 2)
  • -c/--cycles (int) – number of optimization iteration (default: 1000)
  • --start (int) – index of the first frame to analyze (default: 1)
  • --stop (int) – index of the last frame to analyze (default: -1)
  • --skip (int) – number of frame to skip (default: 1)
  • -o/--output (filename) – csv output file name (default: embedding.csv)
  • -s/--seed – random seed (default: None)
Returns:

csv file containing the final embedding (default: embedding.csv)

Return type:

output (file)

unrolr.feature_extraction.dihedrals module

class unrolr.feature_extraction.dihedrals.Dihedral(top_file, trj_files, selection='backbone', dihedral_type='calpha', **kwargs)

Bases: MDAnalysis.analysis.base.AnalysisBase

__init__(top_file, trj_files, selection='backbone', dihedral_type='calpha', **kwargs)

Create Dihedral analysis object.

Parameters:
  • top_file (str) – filename of the topology file
  • trj_files (str or array-like) – one or a list of trajectory files
  • selection (str) – protein selection (default: backbone)
  • dihedral_type (str) – type of dihedral angles to extract (choices: dihedral or calpha) (default: backbone)
unrolr.feature_extraction.dihedrals.main()

Main function, dihedral.py can be executed as a standalone script

Parameters:
  • -p/--top (filename) – topology file used for simulation (pdb, psf)
  • -t/--trj (filename) – one or list of trajectory files
  • -s/--selection (str) – protein selection
  • -d/--dihedral (str) – type of dihedral angles to extract (choices: dihedral or calpha) (default: backbone)
  • -o/--output (filename) – hdf5 output file name (default: dihedral_angles.h5)
Returns:

hdf5 file containing the dihedral angles (default: dihedral_angles.h5)

Return type:

output (file)

unrolr.feature_extraction.intramolecular_distances module

class unrolr.feature_extraction.intramolecular_distances.IntramolecularDistance(top_file, trj_files, selection='backbone', **kwargs)

Bases: MDAnalysis.analysis.base.AnalysisBase

__init__(top_file, trj_files, selection='backbone', **kwargs)

Create IntramolecularDistance analysis object.

Parameters:
  • top_file (str) – filename of the topology file
  • trj_files (str or array-like) – one or a list of trajectory files
  • selection (str) – protein selection (default: backbone)
unrolr.feature_extraction.intramolecular_distances.main()

Main function, intramolecular_distances.py can be executed as a standalone script

Parameters:
  • -p/--top (filename) – topology file used for simulation (pdb, psf)
  • -t/--trj (filename) – one or list of trajectory files
  • -s/--selection (str) – protein selection
  • -o/--output (filename) – hdf5 output file name (default: intramolecular_distances.h5)
Returns:

hdf5 file containing the intramolecular distances (default: intramolecular_distances.h5)

Return type:

output (file)

unrolr.sampling.sampling module

unrolr.sampling.sampling.neighborhood_radius_sampler(X, r_neighbors, metric='dihedral', n_components=2, n_iter=5000, n_runs=5, init='random', platform='OpenCL')

Sample different neighborhood radius rc and compute the stress and correlation.

Parameters:
  • X (ndarray) – n-dimensional ndarray (rows: frames; columns: features/angles)
  • r_neighbors (array-like) – list of the neighborhood raidus cutoff to try
  • metric (str) – metric to use to compute distance between conformations (dihedral or intramolecular) (default: dihedral)
  • n_components (int) – number of dimension of the embedding
  • n_iter (int) – number of optimization cycles
  • n_runs (int) – number of repetitions, in order to calculate standard deviation
  • init (str) – method to initialize the initial embedding (random or pca)(default: random)
  • platform (str) – platform to use for spe (OpenCL or CPU) (default: OpenCL)
Returns:

Pandas DataFrame containing columns [“run”, “r_neighbor”, “n_iter”, “stress”, “correlation”]

Return type:

results (DataFrame)

unrolr.sampling.sampling.optimization_cycle_sampler(X, n_iters, r_neighbor, metric='dihedral', n_components=2, n_runs=5, init='random', platform='OpenCL')

Sample different number of optimization cycle with a certain neighborhood radius rc and compute the stress and correlation.

Parameters:
  • X (ndarray) – n-dimensional ndarray (rows: frames; columns: features/angles)
  • n_iters (array-like) – list of the iteration numbers to try
  • r_neighbor (float) – neighborhood raidus cutoff
  • metric (str) – metric to use to compute distance between conformations (dihedral or intramolecular) (default: dihedral)
  • n_components (int) – number of dimension of the embedding
  • n_runs (int) – number of repetitions, in order to calculate standard deviation
  • init (str) – method to initialize the initial embedding (random or pca)(default: random)
  • platform (str) – platform to use for spe (OpenCL or CPU) (default: OpenCL)
Returns:

Pandas DataFrame containing columns [“run”, “r_neighbor”, “n_iter”, “stress”, “correlation”]

Return type:

results (DataFrame)

unrolr.plotting.plot_embedding module

unrolr.plotting.plot_embedding.plot_embedding(fname, embedding, label='Dihedral distance', clim=None, bin_size=None, cmap='viridis', show=True)

Plot 2D histogram of the embedding. The color code refers to the number of conformations in each bin of the histogram.

Parameters:
  • fname (str) – filename of the embedding plot
  • embedding (ndarray) – n-dimensional embedding array (rows: frames, columns: dimensions)
  • clim (array-like) – list of two element: minimum and maximum bin number (default: None)
  • bin_size (float) – size of the bin. if None, use interquartile range (IQR) to define the bin size (default: None)
  • cmap (str) – color map (default: viridis)
  • show (bool) – show the plot (default: True)

unrolr.plotting.plot_sampling module

unrolr.plotting.plot_sampling.plot_sampling(fname, df, of='r_neighbor', show=True)

Helper function to plot results from sampling (neighborhood radii or iterations)

Parameters:
  • fname (str) – filename of the figure
  • df (DataFrame) – Pandas DataFrame obtained from functions neighborhood_radius_sampler or optimization_cycle_sampler
  • of (str) – Show the evolution of stress and correlation in function of “r_neighbor” or “n_iter” (choices: r_neighbor or n_iter) (default: r_neighbor)
  • show (bool) – show the plot (default: True)

unrolr.utils module

unrolr.utils.utils.read_dataset(fname, dname, start=0, stop=-1, skip=1)

Read dataset from HDF5 file.

unrolr.utils.utils.save_dataset(fname, dname, data)

Save dataset to HDF5 file.

unrolr.utils.utils.transform_dihedral_to_metric(dihedral_timeseries)

Convert angles in radians to sine/cosine transformed coordinates.

The output will be used as the PCA input for dihedral PCA (dPCA)

Parameters:dhedral_timeseries (ndarray) – array containing dihedral angles, shape (n_samples, n_features)
Returns:sine/cosine transformed coordinates
Return type:ndarray
unrolr.utils.utils.transform_dihedral_to_circular_mean(dihedral_timeseries)

Convert angles in radians to circular mean transformed angles.

The output will be used as the PCA input for dihedral PCA+ (dPCA+)

Parameters:dhedral_timeseries (ndarray) – array containing dihedral angles, shape (n_samples, n_features)
Returns:circular mean transformed angles
Return type:ndarray
unrolr.utils.utils.is_opencl_env_defined()

Check if OpenCL env. variable is defined.

unrolr.utils.utils.path_module(module_name)
unrolr.utils.utils.max_conformations_from_dataset(fname, dname)

Get maximum number of conformations that can fit into the memory of the selected OpenCL device and also the step/interval