Mimi Tzeng should make software executable by others

From Geoscience Paper of the Future
Revision as of 02:08, 21 March 2015 by Mimi (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Details on how to do this task: Make software executable by others

I can tell just by reading the instructions that this is going to be a major pain, because: Matlab.

First, as noted in "make sure software is usable", the version of Matlab is hugely important and I should've noted which one I was using when I did the original processing. I think it was 2010b or something. I am kind of interested whether something like Docker or Vagrant is possible for past versions of Matlab, when Matlab is proprietary, expensive, and has extremely restrictive licensing. If it were, I would be able to check that the software works in the previous version and then just note what version to use. Failing that: it will probably take an enormous amount of time and effort to get the scripts to run correctly in the 2015 version of Matlab.

Other concerns: the Matlab scripts are not as automated as they might first appear, because every single batch of data has some sort of issue with it that requires adjusting things in the code. I've automated as much as possible for the most common problems, such as sensor/sensor package lost battery power halfway through deployment, sensor/sensor package completely missing due to malfunction or just not deployed, variables not always in the same order, variable missing from a particular sensor package because the sensor that measures it malfunctioned or was removed, new variables due to new sensors added, etc. There is also the case where one of the ten thermistors also has a pressure sensor and it's present at every other deployment. The project as a whole started out in 2004 with 20 thermistors, 10 at a time spaced equally through the water column; in 2011 it was down to 10, with 5 of them at a time placed at strategic depths of interest to physical oceanographers. As of the end of 2014, I think they're down to 7-8; a number of them failed in 2014.

And that's the core problem about having this software be executable by others. The scripts are highly specific to that particular mooring in that particular place with the particular sensors in their particular deployment plan. Nobody else will have the exact same set of sensors doing this exact thing. Also, each PI will be interested in seeing different types and formats of preliminary figures and data files from any other PIs, so the outputs won't necessarily make everyone equally happy either.

So what should I adjust to make it more broadly useful to others? I can add code to ask a whole lot of "does X sensor have Y variable this time? If so, it's # what in the input file?" This will get very annoying to have to answer each and every time, which is why I just made a note in my processing steps instructions to check and adjust the variable order in the code directly. Can I just add to my processing steps instructions instead, and say "check and adjust these line numbers in the input file against these line numbers in the Matlab script" ?


Making the perl script executable by others was fairly simple, by comparison. It only needs to be placed in the same directory as the input files. I have added the following to the top of the file:

#Step D in the Workflow Diagram
#This perl script should be placed in the same directory as the data files to be
#processed. It takes the CTD, YSI, and thermistor files after they have been
#initially processed with the proprietary software that came with the sensors,
#and creates versions that will auto-open in Matlab, by stripping off the 
#headers (which often have a variable and unpredictable number of lines). 
#It also creates a file called moor-timestamps.txt to tell Matlab the input 
#variables of importance, that either came from the stripped off headers
#(station names, starting date, starting time) or are found manually (starting
#and ending scan numbers for the "good" data).