‘subprocess’ is a new addition (Python 2.4), and it provides a convenient and powerful way to run system commands. (...and you should use it instead of os.system, commands.getstatusoutput, or any of the Popen modules).
Unfortunately subprocess is a bi hard to use at the moment; I’m hoping to help fix that for Python 2.6, but in the meantime here are some basic commands.
Let’s just try running a system command and retrieving the output:
>>> import subprocess >>> p = subprocess.Popen(['/bin/echo', 'hello, world'], stdout=subprocess.PIPE) >>> (stdout, stderr) = p.communicate() >>> print stdout, hello, world
What’s going on is that we’re starting a subprocess (running ‘/bin/echo hello, world’) and then asking for all of the output aggregated together.
We could, for short strings, read directly from p.stdout (which is a file handle):
>>> p = subprocess.Popen(['/bin/echo', 'hello, world'], stdout=subprocess.PIPE) >>> print p.stdout.read(), hello, world
but you could run into trouble here if the command returns a lot of data; you should use communicate to get the output instead.
Let’s do something a bit more complicated, just to show you that it’s possible: we’re going to write to ‘cat’ (which is basically an echo chamber):
>>> from subprocess import PIPE >>> p = subprocess.Popen(["/bin/cat"], stdin=PIPE, stdout=PIPE) >>> (stdout, stderr) = p.communicate('hello, world') >>> print stdout, hello, world
There are a number of more complicated things you can do with subprocess – like interact with the stdin and stdout of other processes – but they are fraught with peril.
rpy is an extension for R that lets R and Python talk naturally. For those of you that have never used R, it’s a very nice package that’s mainly used for statistics, and it has tons of libraries.
To use rpy, just
from rpy import *
The most important symbol that will be imported is ‘r’, which lets you run arbitrary R comments:
For example, if you wanted to run a principle component analysis, you could do it like so:
from rpy import * def plot_pca(filename): r("""data <- read.delim('%s', header=FALSE, sep=" ", nrows=5000)""" \ % (filename,)) r("""pca <- prcomp(data, scale=FALSE, center=FALSE)""") r("""pairs(pca$x[,1:3], pch=20)""") plot_pca('vectors.txt')
Now, the problem with this code is that I’m really just using Python to drive R, which seems inefficient. You can go access the data directly if you want; I’m just using R’s loading features directly because they’re faster. For example,
x = r.pca[‘x’]
is equivalent to ‘x <- pca$x’.