Friday, January 18, 2013

Shell script for scheduling jobs

Suppose you want to schedule many runs of myscript.py for different parameters, but never more than 8 at a time (e.g. because your machine has only 8 processor cores), then you can do:
 for param in `cat params.txt`; do  
   # wait if more than 7 jobs are running together  
   njobs=`ps aux | grep myscript | wc -l`  
   if [ $njobs -gt 8 ]; then          # 7 jobs + 1 grep  
     while [ $njobs -gt 8 ]; do  
       sleep 10  
     done  
   fi  
   python myscript.py param &     # schedule background job here  
 done  

Thursday, January 17, 2013

File compression in python

Normally, you can do
 f = open(fname, 'r')  
 data = f.read()  
 f.close()  
to read a file, and
 f = open(fname, 'w')  
 f.write(text)  
 f.close()  
to write a file.

To use gzip compression, do
 import gzip  
 f = gzip.open(fname, 'rb')  
 data = f.read()  
 f.close()  
to read a gzip-compressed file, and
 f = gzip.open(fname, 'wb')  
 f.write(data)  
 f.close()  
to write a gzip-compressed file.

To compress pickled objects, just use the file handle returned by gzip.open(); everything else remains the same.

(Courtesy Doug Hellmann.)