Robert's Notebook

Computational Chemistry, Python, Politics, Policy

Under One Roof

One of the goals of the MSMBuilder3 development is to make the package as easy to use as possible. Analyzing molecular dynamics is hard enough, so there's no reason that the software should get in your way.

Currently, all of the the MSMBuilder commands are separate scripts that are installed into your path, which means that you need to remember all of the commands. If you forget -- wait, what is the name of the script for computing implied timescales? - you'll have to go back to the tutorial and check. That's a pain.

Most command line utilities let you access all their different utilities from subcommands: think git pull or svn checkout. For MSMBuilder3, we're going to put everything under msmb.

One immediate UX improvement is the the ability to have a help text directly on the root msmb -h command. I'm currently developing the feature in a different repository, which is here. It's not complete yet, but it'll look something like this:

rmcgibbo@Roberts-MacBook-Pro-2 ~/projects/msmbuilder_config
$ msmb -h
MSMBuilder: Software for building Markov State Models for Biomolecular Dynamics

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi sed nibh ut orci
suscipit scelerisque. Sed ligula augue, blandit ac eleifend eleifend, dapibus ac
sapien. Duis eu tortor ac erat porta vulputate. Phasellus ac nisl quis magna

    Construct list of atoms for RMSD calculations
    Create a sample configuration file
    Assign trajectories to microstates.
    Cluster trajectories into microstates.

--log-level=<Enum> (Application.log_level)
    Default: 30
    Choices: (0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL')
    Set the log level by value or name.

To see all available configurables, use `--help-all`

Here's what the output from running one of the subcommands would look like:

$ msmb cluster -h
Cluster trajectories into microstates.

Output: Assignments.h5, and other files depending on your choice of distance
metric and/or clustering algorithm.

Note that there are many distance metrics and clustering algorithms available
Many of which have multiple options and parameters.

A. B. Author, B. C. Author and C. D. Author, Title of our awesome paper. Chem.
Theory Comput. 7, 3412 (2013)

--metric_type=<Enum> (Cluster.metric_type)
    Default: 'RMSD'
    Choices: ['RMSD', 'Pnorm']
    What distance metric to use?
--representation=<Enum> (Cluster.representation)
    Default: 'Cartesian'
    Choices: ['Cartesian', 'Dihedral', 'ContinuousContact']
    What representation of system to use? This amounts to picking a coordinate
    system. The RMSD metric should operate on cartesian coordinates, but other
    metrics require a coordinate system that removes the rotational symmetry,
    such as the space of backbone dihedral angles (Dihedral)
--project_fn=<Unicode> (Cluster.project_fn)
    Default: u'project.yaml'
    Path to project info file
--output_dir=<Unicode> (Cluster.output_dir)
    Default: u'data/'
    Output directory to save clustering data. This will include: (1)
    Assignments.h5 (If clustering is hierarchical or stride=1): Contains the
    state assignments (2) Assignments.h5.distances (If clustering is
    hierarchical or stride=1): Contains the distance to the generator according
    to the distance metric that was employed (3) Gens.lh5 Trajectory object
    representing the generators for each state
--stride=<Int> (Cluster.stride)
    Default: 1
    Subsample by striding

To see all available configurables, use `--help-all`

There are a few other goodies, including the ability to specify options both on the command line or in a config file. The config file is pretty easy to work with too, since you can create a default one with msmb mkprofile that has all of the possible options, just commented out. It's based on the IPython configuration system, which is definitely the best. I'll post on that later.