One of the goals of the MSMBuilder3 development is to make the package as easy to use as possible. Analyzing molecular dynamics is hard enough, so there's no reason that the software should get in your way.
Currently, all of the the MSMBuilder commands are separate scripts that are installed into your path, which means that you need to remember all of the commands. If you forget -- wait, what is the name of the script for computing implied timescales? - you'll have to go back to the tutorial and check. That's a pain.
Most command line utilities let you access all their different utilities from
subcommands: think git pull
or svn checkout
. For MSMBuilder3, we're going
to put everything under msmb
.
One immediate UX improvement is the the ability to have a help text directly
on the root msmb -h
command. I'm currently developing the feature in a
different repository, which is here.
It's not complete yet, but it'll look something like this:
rmcgibbo@Roberts-MacBook-Pro-2 ~/projects/msmbuilder_config
$ msmb -h
MSMBuilder: Software for building Markov State Models for Biomolecular Dynamics
===============================================================================
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi sed nibh ut orci
suscipit scelerisque. Sed ligula augue, blandit ac eleifend eleifend, dapibus ac
sapien. Duis eu tortor ac erat porta vulputate. Phasellus ac nisl quis magna
Subcommands
-----------
atomindices
Construct list of atoms for RMSD calculations
mkprofile
Create a sample configuration file
assign
Assign trajectories to microstates.
cluster
Cluster trajectories into microstates.
Options
-------
--log-level=<Enum> (Application.log_level)
Default: 30
Choices: (0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL')
Set the log level by value or name.
To see all available configurables, use `--help-all`
Here's what the output from running one of the subcommands would look like:
$ msmb cluster -h
Cluster trajectories into microstates.
======================================
Output: Assignments.h5, and other files depending on your choice of distance
metric and/or clustering algorithm.
Note that there are many distance metrics and clustering algorithms available
Many of which have multiple options and parameters.
Reference
---------
A. B. Author, B. C. Author and C. D. Author, Title of our awesome paper. Chem.
Theory Comput. 7, 3412 (2013)
Options
-------
--metric_type=<Enum> (Cluster.metric_type)
Default: 'RMSD'
Choices: ['RMSD', 'Pnorm']
What distance metric to use?
--representation=<Enum> (Cluster.representation)
Default: 'Cartesian'
Choices: ['Cartesian', 'Dihedral', 'ContinuousContact']
What representation of system to use? This amounts to picking a coordinate
system. The RMSD metric should operate on cartesian coordinates, but other
metrics require a coordinate system that removes the rotational symmetry,
such as the space of backbone dihedral angles (Dihedral)
--project_fn=<Unicode> (Cluster.project_fn)
Default: u'project.yaml'
Path to project info file
--output_dir=<Unicode> (Cluster.output_dir)
Default: u'data/'
Output directory to save clustering data. This will include: (1)
Assignments.h5 (If clustering is hierarchical or stride=1): Contains the
state assignments (2) Assignments.h5.distances (If clustering is
hierarchical or stride=1): Contains the distance to the generator according
to the distance metric that was employed (3) Gens.lh5 Trajectory object
representing the generators for each state
--stride=<Int> (Cluster.stride)
Default: 1
Subsample by striding
To see all available configurables, use `--help-all`
There are a few other goodies, including the ability to specify options both on the
command line or in a config file. The config file is pretty easy to work with too,
since you can create a default one with msmb mkprofile
that has all of the
possible options, just commented out. It's based on the IPython configuration system,
which is definitely the best.
I'll post on that later.