Saturday, December 15, 2007

File Retrieval with BirdBath and Condor-G

As I wrote earlier, output files (other than standard out) of Globus jobs submitted via Condor-G can be retrieved using the TransferOutput attribute. To reproduce this with the BirdBath Java API, you need an incantation something like below.

ClassAdStructAttr[] extraAttributes =
{
new ClassAdStructAttr("GridResource", ClassAdAttrType.value3,
gridResourceVal),
new ClassAdStructAttr("Out", ClassAdAttrType.value3,
outputFile),
new ClassAdStructAttr("UserLog", ClassAdAttrType.value3,
logFile),
new ClassAdStructAttr("Err", ClassAdAttrType.value3,
errFile),
new ClassAdStructAttr("TransferExecutable",
ClassAdAttrType.value4,
"FALSE"),
new ClassAdStructAttr("when_to_transfer_output",
ClassAdAttrType.value2,
"\"ON_EXIT\""),
new ClassAdStructAttr("should_transfer_files",
ClassAdAttrType.value2,
"\"YES\""),
new ClassAdStructAttr("StreamOut",
ClassAdAttrType.value4,
"FALSE"),
new ClassAdStructAttr("StreamErr",
ClassAdAttrType.value4,
"FALSE"),
new ClassAdStructAttr("TransferOutput",
ClassAdAttrType.value2,
"\"testgeoupdate.index,testgeoupdate.node,testgeoupdate.tetra\""),
new ClassAdStructAttr("Environment",ClassAdAttrType.value2,
"\"PATH=/home/teragrid/tg459247/geofest.binaryexec:/bin:/usr/bin\""),
new ClassAdStructAttr("x509userproxy",
ClassAdAttrType.value3,
proxyLocation)


The key problem with BirdBath is that it seems to require the parameter "true-names" rather than their values documented in the commandline condor_submit's man pages. For example, to specify the file to use for logging, you have to use the parameter "UserLog", not "log" or "Log". To see the actual "true-names" used by Condor internally, submit a job with condor_submit and then use condor_q -l. It will be in the job_queue.log file in the the spool directory as well.

With birdbath+condor-g there is a similar problem returning files from the remote globus server. You have to use "TransferOutput" as one of the attributes. Also I noticed birdbath did not set StreamOut and StreamErr, so I manually set these as well.

The output files (testgeoupdate.index, etc) will be uploaded to your job's spool directory. Note birdbath does not clean this up, unlike the condor_submit invocations (which delete your job's clusterxy.proc0.subproc0 directory on completion). To make birdbath mimic this behavior, you can set LeaveJobInQueue=TRUE in your classadstructattrs above.

No comments: