Robert's Notebookhttp://rmcgibbo.github.io/2013-06-04T14:28:00-07:00Next Steps With Swift2013-06-04T14:28:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-06-04:blog/2013/06/04/next-steps-with-swift/<p>Let's try doing something a little more complicated with swift. Here's my new
swift script. Basically, it's going to run a python script on each of a set of
input files</p>
<div class="highlight"><pre><span class="c"># count.swift</span>
<span class="nb">type </span>File;
<span class="nb">type </span>Pythonscript;
app <span class="o">(</span>File o<span class="o">)</span> python<span class="o">(</span>Pythonscript script, File input<span class="o">)</span> <span class="o">{</span>
<span class="c"># this script will get executed just as python process.py <input></span>
python @script @input <span class="nv">stdout</span><span class="o">=</span>@filename<span class="o">(</span>o<span class="o">)</span>;
<span class="o">}</span>
File inputfiles<span class="o">[]</span> <filesys_mapper; <span class="nv">pattern</span><span class="o">=</span><span class="s2">"*.txt"</span>>;
Pythonscript pyscript <<span class="s2">"process.py"</span>>;
foreach f in inputfiles <span class="o">{</span>
File c <regexp_mapper; <span class="nb">source</span><span class="o">=</span>@f, <span class="nv">match</span><span class="o">=</span><span class="s2">"(.*)txt"</span>, <span class="nv">transform</span><span class="o">=</span><span class="s2">"\\1processed"</span>>;
<span class="nv">c</span> <span class="o">=</span> python<span class="o">(</span>pyscript, f<span class="o">)</span>;
<span class="o">}</span>
</pre></div>
<p>My python script counts the words in a file, and prints out the most common
words to stdout.</p>
<div class="highlight"><pre><span class="c"># count.py</span>
<span class="sd">"""count the most common words in the file</span>
<span class="sd">"""</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">string</span>
<span class="kn">import</span> <span class="nn">pprint</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">Counter</span>
<span class="n">counter</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">()</span>
<span class="n">exclude</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">string</span><span class="o">.</span><span class="n">punctuation</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span>
<span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">():</span>
<span class="n">word</span> <span class="o">=</span> <span class="s">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">ch</span> <span class="k">for</span> <span class="n">ch</span> <span class="ow">in</span> <span class="n">elem</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">if</span> <span class="n">ch</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">exclude</span><span class="p">)</span>
<span class="n">counter</span><span class="p">[</span><span class="n">word</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">print</span> <span class="n">counter</span><span class="o">.</span><span class="n">most_common</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="c"># this will print the uname to stdout so that we can see where we executed</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'uname -a'</span><span class="p">)</span>
</pre></div>
<p>I downloaded three books from project gutenburg. Les Miserables, Pride and Prejudice,
and Alice and Wonderland. They all end in the <code>.txt</code> extension, so they get picked
up by the inputfile mapper.</p>
<p>After running <code>swift count.swift</code>, I now have three new files on my workstation</p>
<div class="highlight"><pre><span class="nv">$ </span>tail *.processed
<span class="o">==</span>> alice.in.wonderland.processed <<span class="o">==</span>
Linux vsp-compute-22.Stanford.EDU 2.6.18-274.el5 <span class="c">#1 SMP Fri Jul 22 04:43:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</span>
<span class="o">[(</span><span class="s1">'the'</span>, 1804<span class="o">)</span>, <span class="o">(</span><span class="s1">'and'</span>, 912<span class="o">)</span>, <span class="o">(</span><span class="s1">'to'</span>, 801<span class="o">)</span>, <span class="o">(</span><span class="s1">'a'</span>, 684<span class="o">)</span>, <span class="o">(</span><span class="s1">'of'</span>, 625<span class="o">)</span>, <span class="o">(</span><span class="s1">'it'</span>, 541<span class="o">)</span>, <span class="o">(</span><span class="s1">'she'</span>, 538<span class="o">)</span>, <span class="o">(</span><span class="s1">'said'</span>, 462<span class="o">)</span>, <span class="o">(</span><span class="s1">'you'</span>, 429<span class="o">)</span>, <span class="o">(</span><span class="s1">'in'</span>, 428<span class="o">)</span>, <span class="o">(</span><span class="s1">'i'</span>, 400<span class="o">)</span>, <span class="o">(</span><span class="s1">'alice'</span>, 385<span class="o">)</span>, <span class="o">(</span><span class="s1">'was'</span>, 358<span class="o">)</span>, <span class="o">(</span><span class="s1">'that'</span>, 291<span class="o">)</span>, <span class="o">(</span><span class="s1">'as'</span>, 272<span class="o">)</span>, <span class="o">(</span><span class="s1">'her'</span>, 248<span class="o">)</span>, <span class="o">(</span><span class="s1">'with'</span>, 228<span class="o">)</span>, <span class="o">(</span><span class="s1">'at'</span>, 224<span class="o">)</span>, <span class="o">(</span><span class="s1">'on'</span>, 204<span class="o">)</span>, <span class="o">(</span><span class="s1">'all'</span>, 197<span class="o">)</span>, <span class="o">(</span><span class="s1">'this'</span>, 181<span class="o">)</span>, <span class="o">(</span><span class="s1">'for'</span>, 179<span class="o">)</span>, <span class="o">(</span><span class="s1">'had'</span>, 178<span class="o">)</span>, <span class="o">(</span><span class="s1">'but'</span>, 169<span class="o">)</span>, <span class="o">(</span><span class="s1">'not'</span>, 165<span class="o">)</span>, <span class="o">(</span><span class="s1">'be'</span>, 165<span class="o">)</span>, <span class="o">(</span><span class="s1">'or'</span>, 154<span class="o">)</span>, <span class="o">(</span><span class="s1">'so'</span>, 151<span class="o">)</span>, <span class="o">(</span><span class="s1">'very'</span>, 145<span class="o">)</span>, <span class="o">(</span><span class="s1">'what'</span>, 137<span class="o">)</span>, <span class="o">(</span><span class="s1">'they'</span>, 130<span class="o">)</span>, <span class="o">(</span><span class="s1">'is'</span>, 128<span class="o">)</span>, <span class="o">(</span><span class="s1">'little'</span>, 128<span class="o">)</span>, <span class="o">(</span><span class="s1">'he'</span>, 122<span class="o">)</span>, <span class="o">(</span><span class="s1">'its'</span>, 117<span class="o">)</span>, <span class="o">(</span><span class="s1">'if'</span>, 114<span class="o">)</span>, <span class="o">(</span><span class="s1">'out'</span>, 114<span class="o">)</span>, <span class="o">(</span><span class="s1">'one'</span>, 102<span class="o">)</span>, <span class="o">(</span><span class="s1">'about'</span>, 102<span class="o">)</span>, <span class="o">(</span><span class="s1">'down'</span>, 101<span class="o">)</span>, <span class="o">(</span><span class="s1">'up'</span>, 101<span class="o">)</span>, <span class="o">(</span><span class="s1">'do'</span>, 98<span class="o">)</span>, <span class="o">(</span><span class="s1">'no'</span>, 97<span class="o">)</span>, <span class="o">(</span><span class="s1">'his'</span>, 96<span class="o">)</span>, <span class="o">(</span><span class="s1">'then'</span>, 90<span class="o">)</span>, <span class="o">(</span><span class="s1">'were'</span>, 87<span class="o">)</span>, <span class="o">(</span><span class="s1">'know'</span>, 87<span class="o">)</span>, <span class="o">(</span><span class="s1">'project'</span>, 86<span class="o">)</span>, <span class="o">(</span><span class="s1">'like'</span>, 85<span class="o">)</span>, <span class="o">(</span><span class="s1">'have'</span>, 85<span class="o">)</span>, <span class="o">(</span><span class="s1">'them'</span>, 84<span class="o">)</span>, <span class="o">(</span><span class="s1">'would'</span>, 83<span class="o">)</span>, <span class="o">(</span><span class="s1">'went'</span>, 83<span class="o">)</span>, <span class="o">(</span><span class="s1">'herself'</span>, 83<span class="o">)</span>, <span class="o">(</span><span class="s1">'again'</span>, 82<span class="o">)</span>, <span class="o">(</span><span class="s1">'when'</span>, 80<span class="o">)</span>, <span class="o">(</span><span class="s1">'could'</span>, 78<span class="o">)</span>, <span class="o">(</span><span class="s1">'there'</span>, 77<span class="o">)</span>, <span class="o">(</span><span class="s1">'any'</span>, 76<span class="o">)</span>, <span class="o">(</span><span class="s1">'by'</span>, 76<span class="o">)</span>, <span class="o">(</span><span class="s1">''</span>, 75<span class="o">)</span>, <span class="o">(</span><span class="s1">'thought'</span>, 74<span class="o">)</span>, <span class="o">(</span><span class="s1">'off'</span>, 73<span class="o">)</span>, <span class="o">(</span><span class="s1">'are'</span>, 72<span class="o">)</span>, <span class="o">(</span><span class="s1">'your'</span>, 71<span class="o">)</span>, <span class="o">(</span><span class="s1">'see'</span>, 69<span class="o">)</span>, <span class="o">(</span><span class="s1">'me'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'how'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'queen'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'time'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'into'</span>, 67<span class="o">)</span>, <span class="o">(</span><span class="s1">'who'</span>, 64<span class="o">)</span>, <span class="o">(</span><span class="s1">'did'</span>, 62<span class="o">)</span>, <span class="o">(</span><span class="s1">'king'</span>, 61<span class="o">)</span>, <span class="o">(</span><span class="s1">'an'</span>, 61<span class="o">)</span>, <span class="o">(</span><span class="s1">'dont'</span>, 60<span class="o">)</span>, <span class="o">(</span><span class="s1">'well'</span>, 60<span class="o">)</span>, <span class="o">(</span><span class="s1">'my'</span>, 58<span class="o">)</span>, <span class="o">(</span><span class="s1">'began'</span>, 58<span class="o">)</span>, <span class="o">(</span><span class="s1">'im'</span>, 57<span class="o">)</span>, <span class="o">(</span><span class="s1">'now'</span>, 57<span class="o">)</span>, <span class="o">(</span><span class="s1">'turtle'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'gutenbergtm'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'mock'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'which'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'hatter'</span>, 55<span class="o">)</span>, <span class="o">(</span><span class="s1">'gryphon'</span>, 55<span class="o">)</span>, <span class="o">(</span><span class="s1">'quite'</span>, 55<span class="o">)</span>, <span class="o">(</span><span class="s1">'must'</span>, 54<span class="o">)</span>, <span class="o">(</span><span class="s1">'way'</span>, 54<span class="o">)</span>, <span class="o">(</span><span class="s1">'work'</span>, 53<span class="o">)</span>, <span class="o">(</span><span class="s1">'think'</span>, 53<span class="o">)</span>, <span class="o">(</span><span class="s1">'other'</span>, 53<span class="o">)</span>, <span class="o">(</span><span class="s1">'much'</span>, 52<span class="o">)</span>, <span class="o">(</span><span class="s1">'some'</span>, 52<span class="o">)</span>, <span class="o">(</span><span class="s1">'their'</span>, 52<span class="o">)</span>, <span class="o">(</span><span class="s1">'just'</span>, 51<span class="o">)</span>, <span class="o">(</span><span class="s1">'only'</span>, 51<span class="o">)</span>, <span class="o">(</span><span class="s1">'from'</span>, 51<span class="o">)</span>, <span class="o">(</span><span class="s1">'say'</span>, 50<span class="o">)]</span>
<span class="o">==</span>> les.miserables.processed <<span class="o">==</span>
Linux vsp-compute-21.Stanford.EDU 2.6.18-274.el5 <span class="c">#1 SMP Fri Jul 22 04:43:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</span>
<span class="o">[(</span><span class="s1">'the'</span>, 40845<span class="o">)</span>, <span class="o">(</span><span class="s1">'of'</span>, 19924<span class="o">)</span>, <span class="o">(</span><span class="s1">'and'</span>, 14877<span class="o">)</span>, <span class="o">(</span><span class="s1">'a'</span>, 14485<span class="o">)</span>, <span class="o">(</span><span class="s1">'to'</span>, 13705<span class="o">)</span>, <span class="o">(</span><span class="s1">'in'</span>, 11183<span class="o">)</span>, <span class="o">(</span><span class="s1">'he'</span>, 9580<span class="o">)</span>, <span class="o">(</span><span class="s1">'was'</span>, 8613<span class="o">)</span>, <span class="o">(</span><span class="s1">'that'</span>, 7768<span class="o">)</span>, <span class="o">(</span><span class="s1">'it'</span>, 6475<span class="o">)</span>, <span class="o">(</span><span class="s1">'his'</span>, 6459<span class="o">)</span>, <span class="o">(</span><span class="s1">'is'</span>, 6184<span class="o">)</span>, <span class="o">(</span><span class="s1">'had'</span>, 6171<span class="o">)</span>, <span class="o">(</span><span class="s1">'which'</span>, 5138<span class="o">)</span>, <span class="o">(</span><span class="s1">'with'</span>, 4525<span class="o">)</span>, <span class="o">(</span><span class="s1">'on'</span>, 4462<span class="o">)</span>, <span class="o">(</span><span class="s1">'at'</span>, 4055<span class="o">)</span>, <span class="o">(</span><span class="s1">'this'</span>, 3971<span class="o">)</span>, <span class="o">(</span><span class="s1">'not'</span>, 3799<span class="o">)</span>, <span class="o">(</span><span class="s1">'you'</span>, 3661<span class="o">)</span>, <span class="o">(</span><span class="s1">'i'</span>, 3634<span class="o">)</span>, <span class="o">(</span><span class="s1">'as'</span>, 3253<span class="o">)</span>, <span class="o">(</span><span class="s1">'one'</span>, 3127<span class="o">)</span>, <span class="o">(</span><span class="s1">'for'</span>, 2964<span class="o">)</span>, <span class="o">(</span><span class="s1">'him'</span>, 2923<span class="o">)</span>, <span class="o">(</span><span class="s1">'have'</span>, 2793<span class="o">)</span>, <span class="o">(</span><span class="s1">'her'</span>, 2633<span class="o">)</span>, <span class="o">(</span><span class="s1">'there'</span>, 2615<span class="o">)</span>, <span class="o">(</span><span class="s1">'who'</span>, 2540<span class="o">)</span>, <span class="o">(</span><span class="s1">'all'</span>, 2451<span class="o">)</span>, <span class="o">(</span><span class="s1">'from'</span>, 2447<span class="o">)</span>, <span class="o">(</span><span class="s1">'she'</span>, 2428<span class="o">)</span>, <span class="o">(</span><span class="s1">'be'</span>, 2389<span class="o">)</span>, <span class="o">(</span><span class="s1">'by'</span>, 2382<span class="o">)</span>, <span class="o">(</span><span class="s1">'are'</span>, 2159<span class="o">)</span>, <span class="o">(</span><span class="s1">'an'</span>, 2116<span class="o">)</span>, <span class="o">(</span><span class="s1">'they'</span>, 2113<span class="o">)</span>, <span class="o">(</span><span class="s1">'but'</span>, 2043<span class="o">)</span>, <span class="o">(</span><span class="s1">'no'</span>, 1967<span class="o">)</span>, <span class="o">(</span><span class="s1">'man'</span>, 1899<span class="o">)</span>, <span class="o">(</span><span class="s1">'were'</span>, 1824<span class="o">)</span>, <span class="o">(</span><span class="s1">'what'</span>, 1796<span class="o">)</span>, <span class="o">(</span><span class="s1">'said'</span>, 1791<span class="o">)</span>, <span class="o">(</span><span class="s1">'been'</span>, 1517<span class="o">)</span>, <span class="o">(</span><span class="s1">'when'</span>, 1362<span class="o">)</span>, <span class="o">(</span><span class="s1">'marius'</span>, 1352<span class="o">)</span>, <span class="o">(</span><span class="s1">'we'</span>, 1278<span class="o">)</span>, <span class="o">(</span><span class="s1">'their'</span>, 1252<span class="o">)</span>, <span class="o">(</span><span class="s1">'will'</span>, 1226<span class="o">)</span>, <span class="o">(</span><span class="s1">'two'</span>, 1183<span class="o">)</span>, <span class="o">(</span><span class="s1">'so'</span>, 1180<span class="o">)</span>, <span class="o">(</span><span class="s1">'jean'</span>, 1176<span class="o">)</span>, <span class="o">(</span><span class="s1">'my'</span>, 1166<span class="o">)</span>, <span class="o">(</span><span class="s1">'me'</span>, 1150<span class="o">)</span>, <span class="o">(</span><span class="s1">'more'</span>, 1128<span class="o">)</span>, <span class="o">(</span><span class="s1">'himself'</span>, 1079<span class="o">)</span>, <span class="o">(</span><span class="s1">'has'</span>, 1077<span class="o">)</span>, <span class="o">(</span><span class="s1">'them'</span>, 1064<span class="o">)</span>, <span class="o">(</span><span class="s1">'would'</span>, 1052<span class="o">)</span>, <span class="o">(</span><span class="s1">'valjean'</span>, 1046<span class="o">)</span>, <span class="o">(</span><span class="s1">'then'</span>, 1034<span class="o">)</span>, <span class="o">(</span><span class="s1">'its'</span>, 1013<span class="o">)</span>, <span class="o">(</span><span class="s1">'these'</span>, 998<span class="o">)</span>, <span class="o">(</span><span class="s1">'did'</span>, 993<span class="o">)</span>, <span class="o">(</span><span class="s1">'into'</span>, 992<span class="o">)</span>, <span class="o">(</span><span class="s1">'out'</span>, 984<span class="o">)</span>, <span class="o">(</span><span class="s1">'little'</span>, 975<span class="o">)</span>, <span class="o">(</span><span class="s1">'like'</span>, 962<span class="o">)</span>, <span class="o">(</span><span class="s1">'or'</span>, 954<span class="o">)</span>, <span class="o">(</span><span class="s1">'do'</span>, 928<span class="o">)</span>, <span class="o">(</span><span class="s1">'very'</span>, 922<span class="o">)</span>, <span class="o">(</span><span class="s1">'up'</span>, 921<span class="o">)</span>, <span class="o">(</span><span class="s1">'cosette'</span>, 913<span class="o">)</span>, <span class="o">(</span><span class="s1">'other'</span>, 879<span class="o">)</span>, <span class="o">(</span><span class="s1">'m'</span>, 878<span class="o">)</span>, <span class="o">(</span><span class="s1">'old'</span>, 873<span class="o">)</span>, <span class="o">(</span><span class="s1">'than'</span>, 866<span class="o">)</span>, <span class="o">(</span><span class="s1">'made'</span>, 782<span class="o">)</span>, <span class="o">(</span><span class="s1">'some'</span>, 781<span class="o">)</span>, <span class="o">(</span><span class="s1">'only'</span>, 780<span class="o">)</span>, <span class="o">(</span><span class="s1">'good'</span>, 773<span class="o">)</span>, <span class="o">(</span><span class="s1">'time'</span>, 758<span class="o">)</span>, <span class="o">(</span><span class="s1">'your'</span>, 757<span class="o">)</span>, <span class="o">(</span><span class="s1">'those'</span>, 730<span class="o">)</span>, <span class="o">(</span><span class="s1">'nothing'</span>, 729<span class="o">)</span>, <span class="o">(</span><span class="s1">'if'</span>, 728<span class="o">)</span>, <span class="o">(</span><span class="s1">'without'</span>, 699<span class="o">)</span>, <span class="o">(</span><span class="s1">'could'</span>, 678<span class="o">)</span>, <span class="o">(</span><span class="s1">'day'</span>, 673<span class="o">)</span>, <span class="o">(</span><span class="s1">'rue'</span>, 664<span class="o">)</span>, <span class="o">(</span><span class="s1">'about'</span>, 642<span class="o">)</span>, <span class="o">(</span><span class="s1">'well'</span>, 614<span class="o">)</span>, <span class="o">(</span><span class="s1">'where'</span>, 614<span class="o">)</span>, <span class="o">(</span><span class="s1">'say'</span>, 598<span class="o">)</span>, <span class="o">(</span><span class="s1">'men'</span>, 596<span class="o">)</span>, <span class="o">(</span><span class="s1">'de'</span>, 592<span class="o">)</span>, <span class="o">(</span><span class="s1">'any'</span>, 578<span class="o">)</span>, <span class="o">(</span><span class="s1">''</span>, 577<span class="o">)</span>, <span class="o">(</span><span class="s1">'here'</span>, 576<span class="o">)</span>, <span class="o">(</span><span class="s1">'first'</span>, 565<span class="o">)]</span>
<span class="o">==</span>> pride.and.prejudice.processed <<span class="o">==</span>
Linux vsp-compute-20.Stanford.EDU 2.6.18-274.el5 <span class="c">#1 SMP Fri Jul 22 04:43:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</span>
<span class="o">[(</span><span class="s1">'the'</span>, 4495<span class="o">)</span>, <span class="o">(</span><span class="s1">'to'</span>, 4207<span class="o">)</span>, <span class="o">(</span><span class="s1">'of'</span>, 3715<span class="o">)</span>, <span class="o">(</span><span class="s1">'and'</span>, 3602<span class="o">)</span>, <span class="o">(</span><span class="s1">'her'</span>, 2215<span class="o">)</span>, <span class="o">(</span><span class="s1">'i'</span>, 2051<span class="o">)</span>, <span class="o">(</span><span class="s1">'a'</span>, 1996<span class="o">)</span>, <span class="o">(</span><span class="s1">'in'</span>, 1919<span class="o">)</span>, <span class="o">(</span><span class="s1">'was'</span>, 1844<span class="o">)</span>, <span class="o">(</span><span class="s1">'she'</span>, 1704<span class="o">)</span>, <span class="o">(</span><span class="s1">'that'</span>, 1582<span class="o">)</span>, <span class="o">(</span><span class="s1">'it'</span>, 1535<span class="o">)</span>, <span class="o">(</span><span class="s1">'not'</span>, 1445<span class="o">)</span>, <span class="o">(</span><span class="s1">'you'</span>, 1417<span class="o">)</span>, <span class="o">(</span><span class="s1">'he'</span>, 1333<span class="o">)</span>, <span class="o">(</span><span class="s1">'his'</span>, 1267<span class="o">)</span>, <span class="o">(</span><span class="s1">'be'</span>, 1257<span class="o">)</span>, <span class="o">(</span><span class="s1">'as'</span>, 1189<span class="o">)</span>, <span class="o">(</span><span class="s1">'had'</span>, 1174<span class="o">)</span>, <span class="o">(</span><span class="s1">'with'</span>, 1098<span class="o">)</span>, <span class="o">(</span><span class="s1">'for'</span>, 1071<span class="o">)</span>, <span class="o">(</span><span class="s1">'but'</span>, 977<span class="o">)</span>, <span class="o">(</span><span class="s1">'is'</span>, 883<span class="o">)</span>, <span class="o">(</span><span class="s1">'have'</span>, 846<span class="o">)</span>, <span class="o">(</span><span class="s1">'at'</span>, 801<span class="o">)</span>, <span class="o">(</span><span class="s1">'mr'</span>, 783<span class="o">)</span>, <span class="o">(</span><span class="s1">'him'</span>, 761<span class="o">)</span>, <span class="o">(</span><span class="s1">'on'</span>, 726<span class="o">)</span>, <span class="o">(</span><span class="s1">'my'</span>, 717<span class="o">)</span>, <span class="o">(</span><span class="s1">'by'</span>, 657<span class="o">)</span>, <span class="o">(</span><span class="s1">'all'</span>, 637<span class="o">)</span>, <span class="o">(</span><span class="s1">'they'</span>, 604<span class="o">)</span>, <span class="o">(</span><span class="s1">'elizabeth'</span>, 594<span class="o">)</span>, <span class="o">(</span><span class="s1">'so'</span>, 585<span class="o">)</span>, <span class="o">(</span><span class="s1">'were'</span>, 565<span class="o">)</span>, <span class="o">(</span><span class="s1">'which'</span>, 542<span class="o">)</span>, <span class="o">(</span><span class="s1">'could'</span>, 525<span class="o">)</span>, <span class="o">(</span><span class="s1">'been'</span>, 515<span class="o">)</span>, <span class="o">(</span><span class="s1">'from'</span>, 505<span class="o">)</span>, <span class="o">(</span><span class="s1">'this'</span>, 493<span class="o">)</span>, <span class="o">(</span><span class="s1">'no'</span>, 493<span class="o">)</span>, <span class="o">(</span><span class="s1">'very'</span>, 486<span class="o">)</span>, <span class="o">(</span><span class="s1">'what'</span>, 474<span class="o">)</span>, <span class="o">(</span><span class="s1">'would'</span>, 469<span class="o">)</span>, <span class="o">(</span><span class="s1">'your'</span>, 465<span class="o">)</span>, <span class="o">(</span><span class="s1">'their'</span>, 441<span class="o">)</span>, <span class="o">(</span><span class="s1">'me'</span>, 439<span class="o">)</span>, <span class="o">(</span><span class="s1">'them'</span>, 434<span class="o">)</span>, <span class="o">(</span><span class="s1">'will'</span>, 418<span class="o">)</span>, <span class="o">(</span><span class="s1">'said'</span>, 401<span class="o">)</span>, <span class="o">(</span><span class="s1">'such'</span>, 393<span class="o">)</span>, <span class="o">(</span><span class="s1">'or'</span>, 373<span class="o">)</span>, <span class="o">(</span><span class="s1">'when'</span>, 372<span class="o">)</span>, <span class="o">(</span><span class="s1">'darcy'</span>, 371<span class="o">)</span>, <span class="o">(</span><span class="s1">'do'</span>, 364<span class="o">)</span>, <span class="o">(</span><span class="s1">'if'</span>, 364<span class="o">)</span>, <span class="o">(</span><span class="s1">'are'</span>, 359<span class="o">)</span>, <span class="o">(</span><span class="s1">'an'</span>, 357<span class="o">)</span>, <span class="o">(</span><span class="s1">'there'</span>, 347<span class="o">)</span>, <span class="o">(</span><span class="s1">'mrs'</span>, 343<span class="o">)</span>, <span class="o">(</span><span class="s1">'much'</span>, 328<span class="o">)</span>, <span class="o">(</span><span class="s1">'more'</span>, 326<span class="o">)</span>, <span class="o">(</span><span class="s1">'must'</span>, 318<span class="o">)</span>, <span class="o">(</span><span class="s1">'am'</span>, 316<span class="o">)</span>, <span class="o">(</span><span class="s1">'any'</span>, 306<span class="o">)</span>, <span class="o">(</span><span class="s1">'bennet'</span>, 293<span class="o">)</span>, <span class="o">(</span><span class="s1">'who'</span>, 286<span class="o">)</span>, <span class="o">(</span><span class="s1">'than'</span>, 284<span class="o">)</span>, <span class="o">(</span><span class="s1">'miss'</span>, 283<span class="o">)</span>, <span class="o">(</span><span class="s1">'did'</span>, 270<span class="o">)</span>, <span class="o">(</span><span class="s1">'one'</span>, 266<span class="o">)</span>, <span class="o">(</span><span class="s1">'jane'</span>, 263<span class="o">)</span>, <span class="o">(</span><span class="s1">'we'</span>, 260<span class="o">)</span>, <span class="o">(</span><span class="s1">'bingley'</span>, 257<span class="o">)</span>, <span class="o">(</span><span class="s1">'should'</span>, 250<span class="o">)</span>, <span class="o">(</span><span class="s1">'know'</span>, 239<span class="o">)</span>, <span class="o">(</span><span class="s1">'how'</span>, 231<span class="o">)</span>, <span class="o">(</span><span class="s1">'before'</span>, 229<span class="o">)</span>, <span class="o">(</span><span class="s1">'herself'</span>, 224<span class="o">)</span>, <span class="o">(</span><span class="s1">'has'</span>, 223<span class="o">)</span>, <span class="o">(</span><span class="s1">'other'</span>, 222<span class="o">)</span>, <span class="o">(</span><span class="s1">'can'</span>, 221<span class="o">)</span>, <span class="o">(</span><span class="s1">'though'</span>, 221<span class="o">)</span>, <span class="o">(</span><span class="s1">'never'</span>, 220<span class="o">)</span>, <span class="o">(</span><span class="s1">'only'</span>, 217<span class="o">)</span>, <span class="o">(</span><span class="s1">'soon'</span>, 216<span class="o">)</span>, <span class="o">(</span><span class="s1">'well'</span>, 212<span class="o">)</span>, <span class="o">(</span><span class="s1">'think'</span>, 211<span class="o">)</span>, <span class="o">(</span><span class="s1">'now'</span>, 209<span class="o">)</span>, <span class="o">(</span><span class="s1">'some'</span>, 209<span class="o">)</span>, <span class="o">(</span><span class="s1">'may'</span>, 207<span class="o">)</span>, <span class="o">(</span><span class="s1">'time'</span>, 200<span class="o">)</span>, <span class="o">(</span><span class="s1">'might'</span>, 200<span class="o">)</span>, <span class="o">(</span><span class="s1">'after'</span>, 199<span class="o">)</span>, <span class="o">(</span><span class="s1">'every'</span>, 198<span class="o">)</span>, <span class="o">(</span><span class="s1">'most'</span>, 190<span class="o">)</span>, <span class="o">(</span><span class="s1">'little'</span>, 189<span class="o">)</span>, <span class="o">(</span><span class="s1">'lady'</span>, 183<span class="o">)</span>, <span class="o">(</span><span class="s1">'own'</span>, 183<span class="o">)</span>, <span class="o">(</span><span class="s1">'good'</span>, 182<span class="o">)]</span>
</pre></div>
<p>Looks like "the" is the most common word in the english language. No surprise
there. But each of these calculations ran on a different node. You can see that
from the uname output at the top of each file. Since the vsp-compute nodes have
24 hyperthreaded cores and python is (usually) single-threaded, this is pretty
silly. Lets see if we can do better.</p>
<p>In my <code>sites.xml</code> file, I changed the vsp-compute pool to have the follwing line</p>
<div class="highlight"><pre><span class="c"><!-- Swift should run 24 app() task at a time within each PBS job slot --></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"jobsPerNode"</span><span class="nt">></span>24<span class="nt"></profile></span>
</pre></div>
<p>And now, rerunning swift, you can see that the jobs all got pipelined to run
in a single PBS slot.</p>
<div class="highlight"><pre><span class="nv">$ </span>tail *.processed
<span class="o">==</span>> alice.in.wonderland.processed <<span class="o">==</span>
Linux vsp-compute-22.Stanford.EDU 2.6.18-274.el5 <span class="c">#1 SMP Fri Jul 22 04:43:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</span>
<span class="o">[(</span><span class="s1">'the'</span>, 1804<span class="o">)</span>, <span class="o">(</span><span class="s1">'and'</span>, 912<span class="o">)</span>, <span class="o">(</span><span class="s1">'to'</span>, 801<span class="o">)</span>, <span class="o">(</span><span class="s1">'a'</span>, 684<span class="o">)</span>, <span class="o">(</span><span class="s1">'of'</span>, 625<span class="o">)</span>, <span class="o">(</span><span class="s1">'it'</span>, 541<span class="o">)</span>, <span class="o">(</span><span class="s1">'she'</span>, 538<span class="o">)</span>, <span class="o">(</span><span class="s1">'said'</span>, 462<span class="o">)</span>, <span class="o">(</span><span class="s1">'you'</span>, 429<span class="o">)</span>, <span class="o">(</span><span class="s1">'in'</span>, 428<span class="o">)</span>, <span class="o">(</span><span class="s1">'i'</span>, 400<span class="o">)</span>, <span class="o">(</span><span class="s1">'alice'</span>, 385<span class="o">)</span>, <span class="o">(</span><span class="s1">'was'</span>, 358<span class="o">)</span>, <span class="o">(</span><span class="s1">'that'</span>, 291<span class="o">)</span>, <span class="o">(</span><span class="s1">'as'</span>, 272<span class="o">)</span>, <span class="o">(</span><span class="s1">'her'</span>, 248<span class="o">)</span>, <span class="o">(</span><span class="s1">'with'</span>, 228<span class="o">)</span>, <span class="o">(</span><span class="s1">'at'</span>, 224<span class="o">)</span>, <span class="o">(</span><span class="s1">'on'</span>, 204<span class="o">)</span>, <span class="o">(</span><span class="s1">'all'</span>, 197<span class="o">)</span>, <span class="o">(</span><span class="s1">'this'</span>, 181<span class="o">)</span>, <span class="o">(</span><span class="s1">'for'</span>, 179<span class="o">)</span>, <span class="o">(</span><span class="s1">'had'</span>, 178<span class="o">)</span>, <span class="o">(</span><span class="s1">'but'</span>, 169<span class="o">)</span>, <span class="o">(</span><span class="s1">'not'</span>, 165<span class="o">)</span>, <span class="o">(</span><span class="s1">'be'</span>, 165<span class="o">)</span>, <span class="o">(</span><span class="s1">'or'</span>, 154<span class="o">)</span>, <span class="o">(</span><span class="s1">'so'</span>, 151<span class="o">)</span>, <span class="o">(</span><span class="s1">'very'</span>, 145<span class="o">)</span>, <span class="o">(</span><span class="s1">'what'</span>, 137<span class="o">)</span>, <span class="o">(</span><span class="s1">'they'</span>, 130<span class="o">)</span>, <span class="o">(</span><span class="s1">'is'</span>, 128<span class="o">)</span>, <span class="o">(</span><span class="s1">'little'</span>, 128<span class="o">)</span>, <span class="o">(</span><span class="s1">'he'</span>, 122<span class="o">)</span>, <span class="o">(</span><span class="s1">'its'</span>, 117<span class="o">)</span>, <span class="o">(</span><span class="s1">'if'</span>, 114<span class="o">)</span>, <span class="o">(</span><span class="s1">'out'</span>, 114<span class="o">)</span>, <span class="o">(</span><span class="s1">'one'</span>, 102<span class="o">)</span>, <span class="o">(</span><span class="s1">'about'</span>, 102<span class="o">)</span>, <span class="o">(</span><span class="s1">'down'</span>, 101<span class="o">)</span>, <span class="o">(</span><span class="s1">'up'</span>, 101<span class="o">)</span>, <span class="o">(</span><span class="s1">'do'</span>, 98<span class="o">)</span>, <span class="o">(</span><span class="s1">'no'</span>, 97<span class="o">)</span>, <span class="o">(</span><span class="s1">'his'</span>, 96<span class="o">)</span>, <span class="o">(</span><span class="s1">'then'</span>, 90<span class="o">)</span>, <span class="o">(</span><span class="s1">'were'</span>, 87<span class="o">)</span>, <span class="o">(</span><span class="s1">'know'</span>, 87<span class="o">)</span>, <span class="o">(</span><span class="s1">'project'</span>, 86<span class="o">)</span>, <span class="o">(</span><span class="s1">'like'</span>, 85<span class="o">)</span>, <span class="o">(</span><span class="s1">'have'</span>, 85<span class="o">)</span>, <span class="o">(</span><span class="s1">'them'</span>, 84<span class="o">)</span>, <span class="o">(</span><span class="s1">'would'</span>, 83<span class="o">)</span>, <span class="o">(</span><span class="s1">'went'</span>, 83<span class="o">)</span>, <span class="o">(</span><span class="s1">'herself'</span>, 83<span class="o">)</span>, <span class="o">(</span><span class="s1">'again'</span>, 82<span class="o">)</span>, <span class="o">(</span><span class="s1">'when'</span>, 80<span class="o">)</span>, <span class="o">(</span><span class="s1">'could'</span>, 78<span class="o">)</span>, <span class="o">(</span><span class="s1">'there'</span>, 77<span class="o">)</span>, <span class="o">(</span><span class="s1">'any'</span>, 76<span class="o">)</span>, <span class="o">(</span><span class="s1">'by'</span>, 76<span class="o">)</span>, <span class="o">(</span><span class="s1">''</span>, 75<span class="o">)</span>, <span class="o">(</span><span class="s1">'thought'</span>, 74<span class="o">)</span>, <span class="o">(</span><span class="s1">'off'</span>, 73<span class="o">)</span>, <span class="o">(</span><span class="s1">'are'</span>, 72<span class="o">)</span>, <span class="o">(</span><span class="s1">'your'</span>, 71<span class="o">)</span>, <span class="o">(</span><span class="s1">'see'</span>, 69<span class="o">)</span>, <span class="o">(</span><span class="s1">'me'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'how'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'queen'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'time'</span>, 68<span class="o">)</span>, <span class="o">(</span><span class="s1">'into'</span>, 67<span class="o">)</span>, <span class="o">(</span><span class="s1">'who'</span>, 64<span class="o">)</span>, <span class="o">(</span><span class="s1">'did'</span>, 62<span class="o">)</span>, <span class="o">(</span><span class="s1">'king'</span>, 61<span class="o">)</span>, <span class="o">(</span><span class="s1">'an'</span>, 61<span class="o">)</span>, <span class="o">(</span><span class="s1">'dont'</span>, 60<span class="o">)</span>, <span class="o">(</span><span class="s1">'well'</span>, 60<span class="o">)</span>, <span class="o">(</span><span class="s1">'my'</span>, 58<span class="o">)</span>, <span class="o">(</span><span class="s1">'began'</span>, 58<span class="o">)</span>, <span class="o">(</span><span class="s1">'im'</span>, 57<span class="o">)</span>, <span class="o">(</span><span class="s1">'now'</span>, 57<span class="o">)</span>, <span class="o">(</span><span class="s1">'turtle'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'gutenbergtm'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'mock'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'which'</span>, 56<span class="o">)</span>, <span class="o">(</span><span class="s1">'hatter'</span>, 55<span class="o">)</span>, <span class="o">(</span><span class="s1">'gryphon'</span>, 55<span class="o">)</span>, <span class="o">(</span><span class="s1">'quite'</span>, 55<span class="o">)</span>, <span class="o">(</span><span class="s1">'must'</span>, 54<span class="o">)</span>, <span class="o">(</span><span class="s1">'way'</span>, 54<span class="o">)</span>, <span class="o">(</span><span class="s1">'work'</span>, 53<span class="o">)</span>, <span class="o">(</span><span class="s1">'think'</span>, 53<span class="o">)</span>, <span class="o">(</span><span class="s1">'other'</span>, 53<span class="o">)</span>, <span class="o">(</span><span class="s1">'much'</span>, 52<span class="o">)</span>, <span class="o">(</span><span class="s1">'some'</span>, 52<span class="o">)</span>, <span class="o">(</span><span class="s1">'their'</span>, 52<span class="o">)</span>, <span class="o">(</span><span class="s1">'just'</span>, 51<span class="o">)</span>, <span class="o">(</span><span class="s1">'only'</span>, 51<span class="o">)</span>, <span class="o">(</span><span class="s1">'from'</span>, 51<span class="o">)</span>, <span class="o">(</span><span class="s1">'say'</span>, 50<span class="o">)]</span>
<span class="o">==</span>> les.miserables.processed <<span class="o">==</span>
Linux vsp-compute-22.Stanford.EDU 2.6.18-274.el5 <span class="c">#1 SMP Fri Jul 22 04:43:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</span>
<span class="o">[(</span><span class="s1">'the'</span>, 40845<span class="o">)</span>, <span class="o">(</span><span class="s1">'of'</span>, 19924<span class="o">)</span>, <span class="o">(</span><span class="s1">'and'</span>, 14877<span class="o">)</span>, <span class="o">(</span><span class="s1">'a'</span>, 14485<span class="o">)</span>, <span class="o">(</span><span class="s1">'to'</span>, 13705<span class="o">)</span>, <span class="o">(</span><span class="s1">'in'</span>, 11183<span class="o">)</span>, <span class="o">(</span><span class="s1">'he'</span>, 9580<span class="o">)</span>, <span class="o">(</span><span class="s1">'was'</span>, 8613<span class="o">)</span>, <span class="o">(</span><span class="s1">'that'</span>, 7768<span class="o">)</span>, <span class="o">(</span><span class="s1">'it'</span>, 6475<span class="o">)</span>, <span class="o">(</span><span class="s1">'his'</span>, 6459<span class="o">)</span>, <span class="o">(</span><span class="s1">'is'</span>, 6184<span class="o">)</span>, <span class="o">(</span><span class="s1">'had'</span>, 6171<span class="o">)</span>, <span class="o">(</span><span class="s1">'which'</span>, 5138<span class="o">)</span>, <span class="o">(</span><span class="s1">'with'</span>, 4525<span class="o">)</span>, <span class="o">(</span><span class="s1">'on'</span>, 4462<span class="o">)</span>, <span class="o">(</span><span class="s1">'at'</span>, 4055<span class="o">)</span>, <span class="o">(</span><span class="s1">'this'</span>, 3971<span class="o">)</span>, <span class="o">(</span><span class="s1">'not'</span>, 3799<span class="o">)</span>, <span class="o">(</span><span class="s1">'you'</span>, 3661<span class="o">)</span>, <span class="o">(</span><span class="s1">'i'</span>, 3634<span class="o">)</span>, <span class="o">(</span><span class="s1">'as'</span>, 3253<span class="o">)</span>, <span class="o">(</span><span class="s1">'one'</span>, 3127<span class="o">)</span>, <span class="o">(</span><span class="s1">'for'</span>, 2964<span class="o">)</span>, <span class="o">(</span><span class="s1">'him'</span>, 2923<span class="o">)</span>, <span class="o">(</span><span class="s1">'have'</span>, 2793<span class="o">)</span>, <span class="o">(</span><span class="s1">'her'</span>, 2633<span class="o">)</span>, <span class="o">(</span><span class="s1">'there'</span>, 2615<span class="o">)</span>, <span class="o">(</span><span class="s1">'who'</span>, 2540<span class="o">)</span>, <span class="o">(</span><span class="s1">'all'</span>, 2451<span class="o">)</span>, <span class="o">(</span><span class="s1">'from'</span>, 2447<span class="o">)</span>, <span class="o">(</span><span class="s1">'she'</span>, 2428<span class="o">)</span>, <span class="o">(</span><span class="s1">'be'</span>, 2389<span class="o">)</span>, <span class="o">(</span><span class="s1">'by'</span>, 2382<span class="o">)</span>, <span class="o">(</span><span class="s1">'are'</span>, 2159<span class="o">)</span>, <span class="o">(</span><span class="s1">'an'</span>, 2116<span class="o">)</span>, <span class="o">(</span><span class="s1">'they'</span>, 2113<span class="o">)</span>, <span class="o">(</span><span class="s1">'but'</span>, 2043<span class="o">)</span>, <span class="o">(</span><span class="s1">'no'</span>, 1967<span class="o">)</span>, <span class="o">(</span><span class="s1">'man'</span>, 1899<span class="o">)</span>, <span class="o">(</span><span class="s1">'were'</span>, 1824<span class="o">)</span>, <span class="o">(</span><span class="s1">'what'</span>, 1796<span class="o">)</span>, <span class="o">(</span><span class="s1">'said'</span>, 1791<span class="o">)</span>, <span class="o">(</span><span class="s1">'been'</span>, 1517<span class="o">)</span>, <span class="o">(</span><span class="s1">'when'</span>, 1362<span class="o">)</span>, <span class="o">(</span><span class="s1">'marius'</span>, 1352<span class="o">)</span>, <span class="o">(</span><span class="s1">'we'</span>, 1278<span class="o">)</span>, <span class="o">(</span><span class="s1">'their'</span>, 1252<span class="o">)</span>, <span class="o">(</span><span class="s1">'will'</span>, 1226<span class="o">)</span>, <span class="o">(</span><span class="s1">'two'</span>, 1183<span class="o">)</span>, <span class="o">(</span><span class="s1">'so'</span>, 1180<span class="o">)</span>, <span class="o">(</span><span class="s1">'jean'</span>, 1176<span class="o">)</span>, <span class="o">(</span><span class="s1">'my'</span>, 1166<span class="o">)</span>, <span class="o">(</span><span class="s1">'me'</span>, 1150<span class="o">)</span>, <span class="o">(</span><span class="s1">'more'</span>, 1128<span class="o">)</span>, <span class="o">(</span><span class="s1">'himself'</span>, 1079<span class="o">)</span>, <span class="o">(</span><span class="s1">'has'</span>, 1077<span class="o">)</span>, <span class="o">(</span><span class="s1">'them'</span>, 1064<span class="o">)</span>, <span class="o">(</span><span class="s1">'would'</span>, 1052<span class="o">)</span>, <span class="o">(</span><span class="s1">'valjean'</span>, 1046<span class="o">)</span>, <span class="o">(</span><span class="s1">'then'</span>, 1034<span class="o">)</span>, <span class="o">(</span><span class="s1">'its'</span>, 1013<span class="o">)</span>, <span class="o">(</span><span class="s1">'these'</span>, 998<span class="o">)</span>, <span class="o">(</span><span class="s1">'did'</span>, 993<span class="o">)</span>, <span class="o">(</span><span class="s1">'into'</span>, 992<span class="o">)</span>, <span class="o">(</span><span class="s1">'out'</span>, 984<span class="o">)</span>, <span class="o">(</span><span class="s1">'little'</span>, 975<span class="o">)</span>, <span class="o">(</span><span class="s1">'like'</span>, 962<span class="o">)</span>, <span class="o">(</span><span class="s1">'or'</span>, 954<span class="o">)</span>, <span class="o">(</span><span class="s1">'do'</span>, 928<span class="o">)</span>, <span class="o">(</span><span class="s1">'very'</span>, 922<span class="o">)</span>, <span class="o">(</span><span class="s1">'up'</span>, 921<span class="o">)</span>, <span class="o">(</span><span class="s1">'cosette'</span>, 913<span class="o">)</span>, <span class="o">(</span><span class="s1">'other'</span>, 879<span class="o">)</span>, <span class="o">(</span><span class="s1">'m'</span>, 878<span class="o">)</span>, <span class="o">(</span><span class="s1">'old'</span>, 873<span class="o">)</span>, <span class="o">(</span><span class="s1">'than'</span>, 866<span class="o">)</span>, <span class="o">(</span><span class="s1">'made'</span>, 782<span class="o">)</span>, <span class="o">(</span><span class="s1">'some'</span>, 781<span class="o">)</span>, <span class="o">(</span><span class="s1">'only'</span>, 780<span class="o">)</span>, <span class="o">(</span><span class="s1">'good'</span>, 773<span class="o">)</span>, <span class="o">(</span><span class="s1">'time'</span>, 758<span class="o">)</span>, <span class="o">(</span><span class="s1">'your'</span>, 757<span class="o">)</span>, <span class="o">(</span><span class="s1">'those'</span>, 730<span class="o">)</span>, <span class="o">(</span><span class="s1">'nothing'</span>, 729<span class="o">)</span>, <span class="o">(</span><span class="s1">'if'</span>, 728<span class="o">)</span>, <span class="o">(</span><span class="s1">'without'</span>, 699<span class="o">)</span>, <span class="o">(</span><span class="s1">'could'</span>, 678<span class="o">)</span>, <span class="o">(</span><span class="s1">'day'</span>, 673<span class="o">)</span>, <span class="o">(</span><span class="s1">'rue'</span>, 664<span class="o">)</span>, <span class="o">(</span><span class="s1">'about'</span>, 642<span class="o">)</span>, <span class="o">(</span><span class="s1">'well'</span>, 614<span class="o">)</span>, <span class="o">(</span><span class="s1">'where'</span>, 614<span class="o">)</span>, <span class="o">(</span><span class="s1">'say'</span>, 598<span class="o">)</span>, <span class="o">(</span><span class="s1">'men'</span>, 596<span class="o">)</span>, <span class="o">(</span><span class="s1">'de'</span>, 592<span class="o">)</span>, <span class="o">(</span><span class="s1">'any'</span>, 578<span class="o">)</span>, <span class="o">(</span><span class="s1">''</span>, 577<span class="o">)</span>, <span class="o">(</span><span class="s1">'here'</span>, 576<span class="o">)</span>, <span class="o">(</span><span class="s1">'first'</span>, 565<span class="o">)]</span>
<span class="o">==</span>> pride.and.prejudice.processed <<span class="o">==</span>
Linux vsp-compute-22.Stanford.EDU 2.6.18-274.el5 <span class="c">#1 SMP Fri Jul 22 04:43:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</span>
<span class="o">[(</span><span class="s1">'the'</span>, 4495<span class="o">)</span>, <span class="o">(</span><span class="s1">'to'</span>, 4207<span class="o">)</span>, <span class="o">(</span><span class="s1">'of'</span>, 3715<span class="o">)</span>, <span class="o">(</span><span class="s1">'and'</span>, 3602<span class="o">)</span>, <span class="o">(</span><span class="s1">'her'</span>, 2215<span class="o">)</span>, <span class="o">(</span><span class="s1">'i'</span>, 2051<span class="o">)</span>, <span class="o">(</span><span class="s1">'a'</span>, 1996<span class="o">)</span>, <span class="o">(</span><span class="s1">'in'</span>, 1919<span class="o">)</span>, <span class="o">(</span><span class="s1">'was'</span>, 1844<span class="o">)</span>, <span class="o">(</span><span class="s1">'she'</span>, 1704<span class="o">)</span>, <span class="o">(</span><span class="s1">'that'</span>, 1582<span class="o">)</span>, <span class="o">(</span><span class="s1">'it'</span>, 1535<span class="o">)</span>, <span class="o">(</span><span class="s1">'not'</span>, 1445<span class="o">)</span>, <span class="o">(</span><span class="s1">'you'</span>, 1417<span class="o">)</span>, <span class="o">(</span><span class="s1">'he'</span>, 1333<span class="o">)</span>, <span class="o">(</span><span class="s1">'his'</span>, 1267<span class="o">)</span>, <span class="o">(</span><span class="s1">'be'</span>, 1257<span class="o">)</span>, <span class="o">(</span><span class="s1">'as'</span>, 1189<span class="o">)</span>, <span class="o">(</span><span class="s1">'had'</span>, 1174<span class="o">)</span>, <span class="o">(</span><span class="s1">'with'</span>, 1098<span class="o">)</span>, <span class="o">(</span><span class="s1">'for'</span>, 1071<span class="o">)</span>, <span class="o">(</span><span class="s1">'but'</span>, 977<span class="o">)</span>, <span class="o">(</span><span class="s1">'is'</span>, 883<span class="o">)</span>, <span class="o">(</span><span class="s1">'have'</span>, 846<span class="o">)</span>, <span class="o">(</span><span class="s1">'at'</span>, 801<span class="o">)</span>, <span class="o">(</span><span class="s1">'mr'</span>, 783<span class="o">)</span>, <span class="o">(</span><span class="s1">'him'</span>, 761<span class="o">)</span>, <span class="o">(</span><span class="s1">'on'</span>, 726<span class="o">)</span>, <span class="o">(</span><span class="s1">'my'</span>, 717<span class="o">)</span>, <span class="o">(</span><span class="s1">'by'</span>, 657<span class="o">)</span>, <span class="o">(</span><span class="s1">'all'</span>, 637<span class="o">)</span>, <span class="o">(</span><span class="s1">'they'</span>, 604<span class="o">)</span>, <span class="o">(</span><span class="s1">'elizabeth'</span>, 594<span class="o">)</span>, <span class="o">(</span><span class="s1">'so'</span>, 585<span class="o">)</span>, <span class="o">(</span><span class="s1">'were'</span>, 565<span class="o">)</span>, <span class="o">(</span><span class="s1">'which'</span>, 542<span class="o">)</span>, <span class="o">(</span><span class="s1">'could'</span>, 525<span class="o">)</span>, <span class="o">(</span><span class="s1">'been'</span>, 515<span class="o">)</span>, <span class="o">(</span><span class="s1">'from'</span>, 505<span class="o">)</span>, <span class="o">(</span><span class="s1">'this'</span>, 493<span class="o">)</span>, <span class="o">(</span><span class="s1">'no'</span>, 493<span class="o">)</span>, <span class="o">(</span><span class="s1">'very'</span>, 486<span class="o">)</span>, <span class="o">(</span><span class="s1">'what'</span>, 474<span class="o">)</span>, <span class="o">(</span><span class="s1">'would'</span>, 469<span class="o">)</span>, <span class="o">(</span><span class="s1">'your'</span>, 465<span class="o">)</span>, <span class="o">(</span><span class="s1">'their'</span>, 441<span class="o">)</span>, <span class="o">(</span><span class="s1">'me'</span>, 439<span class="o">)</span>, <span class="o">(</span><span class="s1">'them'</span>, 434<span class="o">)</span>, <span class="o">(</span><span class="s1">'will'</span>, 418<span class="o">)</span>, <span class="o">(</span><span class="s1">'said'</span>, 401<span class="o">)</span>, <span class="o">(</span><span class="s1">'such'</span>, 393<span class="o">)</span>, <span class="o">(</span><span class="s1">'or'</span>, 373<span class="o">)</span>, <span class="o">(</span><span class="s1">'when'</span>, 372<span class="o">)</span>, <span class="o">(</span><span class="s1">'darcy'</span>, 371<span class="o">)</span>, <span class="o">(</span><span class="s1">'do'</span>, 364<span class="o">)</span>, <span class="o">(</span><span class="s1">'if'</span>, 364<span class="o">)</span>, <span class="o">(</span><span class="s1">'are'</span>, 359<span class="o">)</span>, <span class="o">(</span><span class="s1">'an'</span>, 357<span class="o">)</span>, <span class="o">(</span><span class="s1">'there'</span>, 347<span class="o">)</span>, <span class="o">(</span><span class="s1">'mrs'</span>, 343<span class="o">)</span>, <span class="o">(</span><span class="s1">'much'</span>, 328<span class="o">)</span>, <span class="o">(</span><span class="s1">'more'</span>, 326<span class="o">)</span>, <span class="o">(</span><span class="s1">'must'</span>, 318<span class="o">)</span>, <span class="o">(</span><span class="s1">'am'</span>, 316<span class="o">)</span>, <span class="o">(</span><span class="s1">'any'</span>, 306<span class="o">)</span>, <span class="o">(</span><span class="s1">'bennet'</span>, 293<span class="o">)</span>, <span class="o">(</span><span class="s1">'who'</span>, 286<span class="o">)</span>, <span class="o">(</span><span class="s1">'than'</span>, 284<span class="o">)</span>, <span class="o">(</span><span class="s1">'miss'</span>, 283<span class="o">)</span>, <span class="o">(</span><span class="s1">'did'</span>, 270<span class="o">)</span>, <span class="o">(</span><span class="s1">'one'</span>, 266<span class="o">)</span>, <span class="o">(</span><span class="s1">'jane'</span>, 263<span class="o">)</span>, <span class="o">(</span><span class="s1">'we'</span>, 260<span class="o">)</span>, <span class="o">(</span><span class="s1">'bingley'</span>, 257<span class="o">)</span>, <span class="o">(</span><span class="s1">'should'</span>, 250<span class="o">)</span>, <span class="o">(</span><span class="s1">'know'</span>, 239<span class="o">)</span>, <span class="o">(</span><span class="s1">'how'</span>, 231<span class="o">)</span>, <span class="o">(</span><span class="s1">'before'</span>, 229<span class="o">)</span>, <span class="o">(</span><span class="s1">'herself'</span>, 224<span class="o">)</span>, <span class="o">(</span><span class="s1">'has'</span>, 223<span class="o">)</span>, <span class="o">(</span><span class="s1">'other'</span>, 222<span class="o">)</span>, <span class="o">(</span><span class="s1">'can'</span>, 221<span class="o">)</span>, <span class="o">(</span><span class="s1">'though'</span>, 221<span class="o">)</span>, <span class="o">(</span><span class="s1">'never'</span>, 220<span class="o">)</span>, <span class="o">(</span><span class="s1">'only'</span>, 217<span class="o">)</span>, <span class="o">(</span><span class="s1">'soon'</span>, 216<span class="o">)</span>, <span class="o">(</span><span class="s1">'well'</span>, 212<span class="o">)</span>, <span class="o">(</span><span class="s1">'think'</span>, 211<span class="o">)</span>, <span class="o">(</span><span class="s1">'now'</span>, 209<span class="o">)</span>, <span class="o">(</span><span class="s1">'some'</span>, 209<span class="o">)</span>, <span class="o">(</span><span class="s1">'may'</span>, 207<span class="o">)</span>, <span class="o">(</span><span class="s1">'time'</span>, 200<span class="o">)</span>, <span class="o">(</span><span class="s1">'might'</span>, 200<span class="o">)</span>, <span class="o">(</span><span class="s1">'after'</span>, 199<span class="o">)</span>, <span class="o">(</span><span class="s1">'every'</span>, 198<span class="o">)</span>, <span class="o">(</span><span class="s1">'most'</span>, 190<span class="o">)</span>, <span class="o">(</span><span class="s1">'little'</span>, 189<span class="o">)</span>, <span class="o">(</span><span class="s1">'lady'</span>, 183<span class="o">)</span>, <span class="o">(</span><span class="s1">'own'</span>, 183<span class="o">)</span>, <span class="o">(</span><span class="s1">'good'</span>, 182<span class="o">)]</span>
</pre></div>
<p>Perfect!</p>Setting up Swift2013-06-03T19:40:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-06-03:blog/2013/06/03/setting-up-swift/<p><a href="http://www.ci.uchicago.edu/swift/main/">Swift</a> is a parallel scripting language
developed at the U. Chicago Computation Institute. It provides a way to manage
heterogeneous clusters. I have <code>n</code> jobs that I want to get done, and I have access
to about 5 high performance computing clusters. I really <em>dont</em> want to handle
file transfer, checking <code>qstat</code> to see where there's resource availability, etc.
I want my jobs to run wherever they can, as fast as possible.</p>
<p>First, I downloaded the 0.94 release from <a href="http://www.ci.uchicago.edu/swift/downloads/">here</a>,
I untarred it in <code>$HOME/local/swift-0.94</code>, and added the executables
to my <code>$PATH</code>.</p>
<div class="highlight"><pre><span class="nb">cd</span> <span class="nv">$HOME</span>/local
wget http://www.ci.uchicago.edu/swift/packages/swift-0.94.tar.gz
tar -xzvf swift-0.94.tar.gz
<span class="nb">export</span> <span class="nv">$PATH</span><span class="o">=</span><span class="nv">$HOME</span>/local/swift-0.94/bin:<span class="nv">$PATH</span>
</pre></div>
<h2>Setting up our cluster</h2>
<p>First, we add a new cluster to <code>sites.xml</code> file. This is a file that tells swift
what clusters we have available to us, and what their queues look like. The file
is located in <code>$HOME/local/swift-0.94/etc/sites.xml</code>. I added the following
new <code>pool</code> to the file to describe our groups little analysis cluster, vsp-compute.
vsp-compute is a 40 node linux cluster, with a shared filesystem between the nodes.
Each node has individual (node-local) space mounted on /scratch.</p>
<div class="highlight"><pre><span class="nt"><pool</span> <span class="na">handle=</span><span class="s">"vsp-compute"</span><span class="nt">></span>
<span class="c"><!-- use the "coaster" provider, which enables Swift to ssh to another system and qsub from there --></span>
<span class="nt"><execution</span> <span class="na">provider=</span><span class="s">"coaster"</span> <span class="na">jobmanager=</span><span class="s">"ssh-cl:pbs"</span> <span class="na">url=</span><span class="s">"vsp-compute-01.stanford.edu"</span><span class="nt">/></span>
<span class="c"><!-- app() tasks should be limited to 5 minutes walltime --></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"maxWalltime"</span><span class="nt">></span>00:05:00<span class="nt"></profile></span>
<span class="c"><!-- app() tasks will be run within PBS coaster "pilot" jobs. Each PBS job should have a walltime of 1 hour --></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"lowOverAllocation"</span><span class="nt">></span>100<span class="nt"></profile></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"highOverAllocation"</span><span class="nt">></span>100<span class="nt"></profile></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"maxtime"</span><span class="nt">></span>3600<span class="nt"></profile></span>
<span class="c"><!-- Up to 5 concurrent PBS coaster jobs each asking for 1 node will be submitted to the default queue --></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"queue"</span><span class="nt">></span>default<span class="nt"></profile></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"slots"</span><span class="nt">></span>5<span class="nt"></profile></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"maxnodes"</span><span class="nt">></span>1<span class="nt"></profile></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"nodeGranularity"</span><span class="nt">></span>1<span class="nt"></profile></span>
<span class="c"><!-- Swift should run only one app() task at a time within each PBS job slot --></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"globus"</span> <span class="na">key=</span><span class="s">"jobsPerNode"</span><span class="nt">></span>1<span class="nt"></profile></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"karajan"</span> <span class="na">key=</span><span class="s">"jobThrottle"</span><span class="nt">></span>1.00<span class="nt"></profile></span>
<span class="nt"><profile</span> <span class="na">namespace=</span><span class="s">"karajan"</span> <span class="na">key=</span><span class="s">"initialScore"</span><span class="nt">></span>10000<span class="nt"></profile></span>
<span class="c"><!-- the scratch filesystem is unique to each node, and not shared across the cluster --></span>
<span class="nt"><workdirectory></span>/scratch/{env.USER}/.swiftwork<span class="nt"></workdirectory></span>
<span class="nt"></pool></span>
</pre></div>
<p>Next, we set up the transformations catalog, <code>tc.data</code>. This file specifies what
commands are installed on each machine. The user specific <code>tc.data</code> is in
<code>$HOME/local/swift-0.94/etc/tc.data</code>. I added two lines to the bottom,
to describe the software available there. The lines are</p>
<div class="highlight"><pre><span class="cp"># vsp-compute</span>
<span class="n">vsp</span><span class="o">-</span><span class="n">compute</span> <span class="n">uname</span> <span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">uname</span> <span class="n">null</span> <span class="n">null</span> <span class="n">null</span>
<span class="n">vsp</span><span class="o">-</span><span class="n">compute</span> <span class="n">wc</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">wc</span> <span class="n">null</span> <span class="n">null</span> <span class="n">null</span>
</pre></div>
<p>This tells the swift execution engine that the <code>uname</code> and <code>wc</code> commands are
available on vsp-compute.</p>
<h2>Setting the swift.properties file</h2>
<p>How should we transfer the input files to the compute nodes (and bring the
output files back). One option is called "coaster provider staging". To set
this up, I opened up the <code>$HOME/local/swift-0.94/etc/swift.properties</code> file,
and changed these four settings.</p>
<div class="highlight"><pre><span class="cp"># this lets the provider deal with the staging of files. we want this because</span>
<span class="cp"># vsp-compute does not share a shared filesystem with my workstation.</span>
<span class="n">use</span><span class="p">.</span><span class="n">provider</span><span class="p">.</span><span class="n">staging</span><span class="o">=</span><span class="nb">true</span>
<span class="n">provider</span><span class="p">.</span><span class="n">staging</span><span class="p">.</span><span class="n">pin</span><span class="p">.</span><span class="n">swiftfiles</span><span class="o">=</span><span class="nb">true</span>
<span class="n">status</span><span class="p">.</span><span class="n">mode</span><span class="o">=</span><span class="n">provider</span>
<span class="cp"># this is just for debugging</span>
<span class="n">wrapperlog</span><span class="p">.</span><span class="n">always</span><span class="p">.</span><span class="n">transfer</span><span class="o">=</span><span class="nb">true</span>
</pre></div>
<h2>Dealing with a weird ssh issue</h2>
<p>There was an issue with my ssh keys. To save you the pain of debugging this,
if you have a file on your machine at <code>$HOME/.ssh/id_rsa.pub</code>, but not one at
<code>$HOME/.ssh/identity.pub</code>, make these softlinks.</p>
<div class="highlight"><pre>ln -s ~/.ssh/id_rsa ~/.ssh/identity
ln -s ~/.ssh/id_rsa.pub ~/.ssh/identity.pub
</pre></div>
<h2>Running a parallel script</h2>
<p>Enough configuration! Here's the script that I want to execute. It just runs
the *nix <code>uname</code> command. Remember, this command needs to be available in <code>tc.data</code></p>
<p>Here's my swift script.</p>
<div class="highlight"><pre><span class="c"># uname.swift</span>
<span class="nb">type</span> <span class="nb">file</span><span class="p">;</span>
<span class="n">app</span> <span class="p">(</span><span class="nb">file</span> <span class="n">o</span><span class="p">)</span> <span class="n">uname</span><span class="p">()</span> <span class="p">{</span>
<span class="c"># execute the uname command, with the argument -a, sending stdout to a file</span>
<span class="n">uname</span> <span class="s">"-a"</span> <span class="n">stdout</span><span class="o">=</span><span class="nd">@o</span><span class="p">;</span>
<span class="p">}</span>
<span class="nb">file</span> <span class="n">outfile</span> <span class="o"><</span><span class="s">"uname.txt"</span><span class="o">></span><span class="p">;</span>
<span class="n">outfile</span> <span class="o">=</span> <span class="n">uname</span><span class="p">();</span>
</pre></div>
<p>To run it, I just execute the script from the command line</p>
<div class="highlight"><pre><span class="nv">$ </span>swift uname.swift
</pre></div>
<p>The following gets printed to my terminal</p>
<div class="highlight"><pre><span class="n">Swift</span> <span class="mf">0.94</span> <span class="n">swift</span><span class="o">-</span><span class="n">r6492</span> <span class="n">cog</span><span class="o">-</span><span class="n">r3658</span>
<span class="nl">RunID:</span> <span class="mi">20130604</span><span class="o">-</span><span class="mi">1330</span><span class="o">-</span><span class="n">fpx5r78b</span>
<span class="nl">Progress:</span> <span class="n">time</span><span class="o">:</span> <span class="n">Tue</span><span class="p">,</span> <span class="mo">04</span> <span class="n">Jun</span> <span class="mi">2013</span> <span class="mi">13</span><span class="o">:</span><span class="mi">30</span><span class="o">:</span><span class="mo">06</span> <span class="o">-</span><span class="mo">0700</span>
<span class="nl">Progress:</span> <span class="n">time</span><span class="o">:</span> <span class="n">Tue</span><span class="p">,</span> <span class="mo">04</span> <span class="n">Jun</span> <span class="mi">2013</span> <span class="mi">13</span><span class="o">:</span><span class="mi">30</span><span class="o">:</span><span class="mi">36</span> <span class="o">-</span><span class="mo">0700</span> <span class="n">Submitting</span><span class="o">:</span><span class="mi">1</span>
<span class="nl">Progress:</span> <span class="n">time</span><span class="o">:</span> <span class="n">Tue</span><span class="p">,</span> <span class="mo">04</span> <span class="n">Jun</span> <span class="mi">2013</span> <span class="mi">13</span><span class="o">:</span><span class="mi">30</span><span class="o">:</span><span class="mi">49</span> <span class="o">-</span><span class="mo">0700</span> <span class="n">Submitted</span><span class="o">:</span><span class="mi">1</span>
<span class="nl">Progress:</span> <span class="n">time</span><span class="o">:</span> <span class="n">Tue</span><span class="p">,</span> <span class="mo">04</span> <span class="n">Jun</span> <span class="mi">2013</span> <span class="mi">13</span><span class="o">:</span><span class="mi">30</span><span class="o">:</span><span class="mi">51</span> <span class="o">-</span><span class="mo">0700</span> <span class="n">Stage</span> <span class="n">in</span><span class="o">:</span><span class="mi">1</span>
<span class="n">Final</span> <span class="n">status</span><span class="o">:</span> <span class="n">Tue</span><span class="p">,</span> <span class="mo">04</span> <span class="n">Jun</span> <span class="mi">2013</span> <span class="mi">13</span><span class="o">:</span><span class="mi">30</span><span class="o">:</span><span class="mi">51</span> <span class="o">-</span><span class="mo">0700</span> <span class="n">Finished</span> <span class="n">successfully</span><span class="o">:</span><span class="mi">1</span>
</pre></div>
<p>Looking in my working directory, I now have a new file called <code>uname.txt</code>. The
file indicates that the job ran on one of the vsp-compute worker nodes. Swift
transparently submitted a pbs job, and copied the results back to my workstation.</p>
<div class="highlight"><pre><span class="nv">$ </span>cat uname.txt
Linux vsp-compute-31.Stanford.EDU 2.6.18-274.el5 <span class="c">#1 SMP Fri Jul 22 04:43:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux</span>
</pre></div>What is the uncertainty in an MSM?2013-05-26T14:23:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-05-26:blog/2013/05/26/what-is-the-uncertainty-in-an-msm/<p>Two major sources of error characterize a Markov state model. The first is
the convergence of the dataset -- we can only model the processes that we've
simulated, in some form or another. When the sampling is insufficient, its
not like the MSM can make something out of nothing. This sampling error is
hard to model, because we <em>don't know</em> whats out there. The best that we can
do is assess how "densely" sampled the data we've seen is. Do we have
transitions that we've only seen once? If we split the dataset in half, are the
halves look consistent with one another?</p>
<p>The second major source of uncertainty is uncertainty in the parameters. These
are the number of clusters, their locations and size, and transition probabilities
between them (also the lag time, but let's leave that aside for now).</p>
<p>To really assess these errors, we need an approach that models them all
explicitly. This is challenging though -- our traditional clustering approaches
are parametric in the number of states, so they don't allow us to naturally
express our uncertainty there. We're going to need to go nonparametric.</p>
<p>Here's the idea: Small peptide (ala2 or ala5). <a href="http://blog.echen.me/2012/03/20/infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-process/">Dirichlet process gaussian mixture model</a>
"clustering" (nonparametric in number of states), with a <a href="http://blog.echen.me/2012/03/20/infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-process/">Gibbs sampler</a>
so that we can sample from the posterior over clusterings. For each
clustering, sample transition matrices with the MCMC engine that Kyle's been
working on. We would have to do the whole thing in probably the projected
backbone dihedral space. It would be nice to use the <a href="http://en.wikipedia.org/wiki/Von_Mises_distribution">von Mises</a>
distribution instead of the gaussian to avoid going into the sin/cos space
doubling the number of variable.</p>
<p>I think going nonparameteric in the number of states is pretty key. This
is probably going to be really expensive, and tuning the settings on the
samplers could definitely be a nightmare (now there's a whole separate
convergence issue to worry about), but it would be nice do a careful accounting
for the uncertainties. Other approaches that you might do don't really model
the uncertainty in the clustering parameters.</p>
<p>A simpler approach might be to run a bootstrap on the actual trajectory data.
That's a good option too, but not as elegant. You don't get uncertainty in the
number of states, and there are so many issues with the bootstrap on timeseries.</p>Faster PyYAML Parsing with LibYAML2013-05-23T23:20:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-05-23:blog/2013/05/23/faster-yaml-parsing-with-libyaml/<p>This morning, <a href="https://github.com/schwancr">Christian</a> submitted a great
<a href="https://github.com/SimTk/msmbuilder/pull/199">pull request</a> to speed up YAML
parsing in MSMBuilder using LibYAML. In MSMBuilder,
we use a YAML file to save the "project", which is keeps the path to all of
the files associated with an MSMBuilder project, such as the different
trajectories, and a PDB for the protein's topology.</p>
<p>YAML is convenient here: being easily editable plain text, it makes it pretty
easy to check what's in your project, and perform simple tasks like splitting
a project in two, without needing any libraries or programming. But there's
one big disadvantage: it can be really really slow to load a large project
file. Although it's not usually the rate limiting step, it's still really
annoying.</p>
<p>Using the LibYAML parser (written in C) can speed up the reading significantly.
To see if your python installation is linked against LibYAML, try the following
command.</p>
<div class="highlight"><pre>python -c <span class="s2">"from yaml import CLoader"</span>
</pre></div>
<p>If this blows up with an ImportError, then LibYAML isn't installed. If the
commands runs just fine, then you've already got LibYAML.</p>
<h2>Installing LibYAML</h2>
<p>To install LibYAML, you can either build it from source or use a package manager.
If you've got <code>sudo</code> privileges, you can easily install LibYAML from source,
following the directions from <a href="http://pyyaml.org/wiki/LibYAML">here</a>, with:</p>
<div class="highlight"><pre>wget http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
tar -xzvf yaml-0.1.4.tar.gz
<span class="nb">cd </span>yaml-0.1.4
./configure
make
sudo make install
</pre></div>
<p>If you're on a machine without <code>sudo</code> privileges, use a <code>--prefix</code> flag with
<code>configure</code> to install the library in user-space. You'll probably also have to
add the new library directory to <code>LD_LIBRARY_PATH</code> (or <code>DYLD_LIBRARY_PATH</code> on mac).</p>
<p>If you access to a package manager like <code>apt-get</code> on ubuntu, then you can install
LibYAML with the command <code>sudo apt-get install libyaml-dev</code>. On a centos
system, you should be able to get LibYAML with <code>sudo yum install libyaml-devel</code>.</p>
<h2>Rebuilding PyYAML with the C Bindings</h2>
<p>If you've installed LibYAML with the default location (either by compiling from
source without --prefix, or using your package manager), then you can rebuild PyYAML
with the LibYAML bindings by just reinstalling it through pip with <code>pip install pyyaml --upgrade --force</code>.</p>
<p>Otherwise, download the source package from <a href="https://pypi.python.org/pypi/PyYAML">pypi</a>
and edit the <code>setup.cfg</code> file to point to the lib and include directories of your
LibYAML installation. I configured LibYAML with <code>--prefix=$HOME/opt/yaml</code>, so
uncommented lines 7 and 10 of <code>setup.cfg</code>, and edited them to read</p>
<div class="highlight"><pre><span class="c"># List of directories to search for 'yaml.h' (separated by ':').</span>
<span class="nv">include_dirs</span><span class="o">=</span>/usr/local/include:../../include:/home/rmcgibbo/opt/yaml/include
<span class="c"># List of directories to search for 'libyaml.a' (separated by ':').</span>
<span class="nv">library_dirs</span><span class="o">=</span>/usr/local/lib:../../lib:/home/rmcgibbo/opt/yaml/lib
</pre></div>
<p>And then rebuilt the package with</p>
<div class="highlight"><pre>python setup.py --with-libyaml install
</pre></div>Projects in the Pipeline2013-05-23T00:32:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-05-23:blog/2013/05/23/projects-in-the-pipeline/<p>A few of the projects on my mind right now. Some are father along than others.</p>
<ul>
<li>Accelerated conformation clustering with RMSD using a kmeans-like algorithm.
The key to kmeans is that you can take the average of a set of data points
under an $L_p$ norm by just... taking their average. But that doesn't work
for RMSD, because alignment isn't transitive. I'm working on some ways to do
that averaging, and I think I can accelerate RMSD clustering compared to
k-medoids. The procedure works well for a small number of atoms, but needs
some tweaks -- I think a better weighting -- when there are more atoms (or
the dynamic range of distances is greater).</li>
<li>MSMAccelerator2: Last month, I refactored the MSMAccelerator code base. Well,
actually I ripped all of the old code out and started from the ground up. The
new code has a message massing architecture with ZeroMQ, and has a little
server that looks like a mini version of the FAH workserver. Now, this code
needs to get some exercise. I've started folding some small proteins, and
we need to analyze the convergence. I think we're going to see an impressive
speedup.</li>
<li>GBVI: We need to push on this. Currently, not a lot has been done.</li>
<li>Optimal-K: At this point, the framework for choosing the optimal number of
states, at least under euclidean metrics, is low hanging fruit. Getting this
finished and out the door is academic priority number 1.</li>
</ul>OpenMM Script Builder2013-05-23T00:08:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-05-23:blog/2013/05/23/openmm-builder/<p><a href="http://builder.openmm.org">Build</a> custom OpenMM scripts right in the browser!
<a href="http://openmm.org/">OpenMM</a> is one of the most flexible molecular dynamics
packages, put it can be a little intimidating for the new user. Instead of
interacting with it via a set of command line scripts, as one would with
amber or gromacs, to interact with OpenMM you write a little python script.
If you've never written a script before, this might seem a little unfamiliar,
but it's an incredibly powerful paradigm.</p>
<p>But to help you out, I've written a little <a href="http://builder.openmm.org">web application</a>
that'll build an OpenMM python script for you. As you select the options via
the menus, the script will be "written" for you, live. The code is
live on heroku, at <a href="http://builder.openmm.org">builder.openmm.org</a>, and free (GPL)
on <a href="https://github.com/rmcgibbo/openmm-webbuilder">github</a>. Fork away.</p>
<p>It would be really awesome if the webapp had a "run" button that would run
your simulation for a short period of time on a donated GPU, but that's
going to a little nontrivial, especially with the security ramifications. Pull
requests welcome!</p>First First-Author Paper2013-05-22T23:49:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-05-22:blog/2013/05/22/first-first-author/<p>My first first-author <a href="http://pubs.acs.org/doi/abs/10.1021/ct400132h">paper</a>
has just been accepted by the Journal of Chemical Theory and Computation.
The title is <em>Learning Kinetic Distance Metrics for Markov State Models of Protein Conformational Dynamics</em>.</p>Group Meeting2013-05-22T23:33:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-05-22:blog/2013/05/22/group-meeting/<p>A few weeks ago, I gave the Pande Group meeting. The slides are on <a href="https://github.com/rmcgibbo/group_meeting_april22">github</a>,
and you can view them <a href="http://htmlpreview.github.io/?http://github.com/rmcgibbo/group_meeting_april22/blob/master/index.html">here</a>.
The title of the talk is <em>Protein Folding is Easy: Towards Markov State Models
for Conformational Change</em>, and mostly addresses my learning distance metrics
for kinetic clustering of protein conformations from molecular dynamics
simulations. The central challenge here is detecting structurally subtle but
slow conformational changes in a dataset that might contain massive structural
changes, like folding. Simply clustering at a tiny radius with a standard distance
metric (RMSD) is fine in theory, but fails in practice to deal with the bias-variance
tradeoff effectively.</p>
<p>The slides are written in pure markdown and rendered to HTML5 using an adapted
version of the google-io-2012 HTML5 slide deck. The slide deck is now a little
python package, hosted on <a href="https://github.com/rmcgibbo/slidedeck">github</a>. After
installing it (<code>python setup.py install</code>), just run <code>slidedeck create</code> to get
started with a new template deck, and <code>slidedeck render</code> to make some HTML5.</p>Under One Roof2013-05-22T22:03:00-07:00Robert McGibbontag:rmcgibbo.github.io,2013-05-22:blog/2013/05/22/under-one-roof/<p>One of the goals of the MSMBuilder3 development is to make the package as easy
to use as possible. Analyzing molecular dynamics is hard enough, so there's
no reason that the software should get in your way.</p>
<p>Currently, all of the the MSMBuilder commands are separate scripts that are
installed into your path, which means that you need to remember all of the
commands. If you forget -- <em>wait, what is the name of the script for computing
implied timescales?</em> - you'll have to go back to the tutorial and check.
That's a pain.</p>
<p>Most command line utilities let you access all their different utilities from
subcommands: think <code>git pull</code> or <code>svn checkout</code>. For MSMBuilder3, we're going
to put everything under <code>msmb</code>.</p>
<p>One immediate UX improvement is the the ability to have a help text directly
on the root <code>msmb -h</code> command. I'm currently developing the feature in a
different repository, which is <a href="https://github.com/rmcgibbo/msmbuilder_config">here</a>.
It's not complete yet, but it'll look something like this:</p>
<div class="highlight"><pre>rmcgibbo@Roberts-MacBook-Pro-2 ~/projects/msmbuilder_config
$ msmb -h
MSMBuilder: Software for building Markov State Models for Biomolecular Dynamics
===============================================================================
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi sed nibh ut orci
suscipit scelerisque. Sed ligula augue, blandit ac eleifend eleifend, dapibus ac
sapien. Duis eu tortor ac erat porta vulputate. Phasellus ac nisl quis magna
Subcommands
-----------
atomindices
Construct list of atoms for RMSD calculations
mkprofile
Create a sample configuration file
assign
Assign trajectories to microstates.
cluster
Cluster trajectories into microstates.
Options
-------
--log-level=<Enum> (Application.log_level)
Default: 30
Choices: (0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL')
Set the log level by value or name.
To see all available configurables, use `--help-all`
</pre></div>
<p>Here's what the output from running one of the subcommands would look like:</p>
<div class="highlight"><pre>$ msmb cluster -h
Cluster trajectories into microstates.
======================================
Output: Assignments.h5, and other files depending on your choice of distance
metric and/or clustering algorithm.
Note that there are many distance metrics and clustering algorithms available
Many of which have multiple options and parameters.
Reference
---------
A. B. Author, B. C. Author and C. D. Author, Title of our awesome paper. Chem.
Theory Comput. 7, 3412 (2013)
Options
-------
--metric_type=<Enum> (Cluster.metric_type)
Default: 'RMSD'
Choices: ['RMSD', 'Pnorm']
What distance metric to use?
--representation=<Enum> (Cluster.representation)
Default: 'Cartesian'
Choices: ['Cartesian', 'Dihedral', 'ContinuousContact']
What representation of system to use? This amounts to picking a coordinate
system. The RMSD metric should operate on cartesian coordinates, but other
metrics require a coordinate system that removes the rotational symmetry,
such as the space of backbone dihedral angles (Dihedral)
--project_fn=<Unicode> (Cluster.project_fn)
Default: u'project.yaml'
Path to project info file
--output_dir=<Unicode> (Cluster.output_dir)
Default: u'data/'
Output directory to save clustering data. This will include: (1)
Assignments.h5 (If clustering is hierarchical or stride=1): Contains the
state assignments (2) Assignments.h5.distances (If clustering is
hierarchical or stride=1): Contains the distance to the generator according
to the distance metric that was employed (3) Gens.lh5 Trajectory object
representing the generators for each state
--stride=<Int> (Cluster.stride)
Default: 1
Subsample by striding
To see all available configurables, use `--help-all`
</pre></div>
<p>There are a few other goodies, including the ability to specify options both on the
command line or in a config file. The config file is pretty easy to work with too,
since you can create a default one with <code>msmb mkprofile</code> that has all of the
possible options, just commented out. It's based on the <a href="http://ipython.org/ipython-doc/stable/config/overview.html">IPython configuration system</a>,
which is definitely <a href="http://python.6.x6.nabble.com/IPython-config-HasTraits-Traitlets-as-an-independent-library-td5014385.html">the best</a>.
I'll post on that later.</p>