Saturday, December 15, 2007

File Retrieval with BirdBath and Condor-G

As I wrote earlier, output files (other than standard out) of Globus jobs submitted via Condor-G can be retrieved using the TransferOutput attribute. To reproduce this with the BirdBath Java API, you need an incantation something like below.

ClassAdStructAttr[] extraAttributes =
new ClassAdStructAttr("GridResource", ClassAdAttrType.value3,
new ClassAdStructAttr("Out", ClassAdAttrType.value3,
new ClassAdStructAttr("UserLog", ClassAdAttrType.value3,
new ClassAdStructAttr("Err", ClassAdAttrType.value3,
new ClassAdStructAttr("TransferExecutable",
new ClassAdStructAttr("when_to_transfer_output",
new ClassAdStructAttr("should_transfer_files",
new ClassAdStructAttr("StreamOut",
new ClassAdStructAttr("StreamErr",
new ClassAdStructAttr("TransferOutput",
new ClassAdStructAttr("Environment",ClassAdAttrType.value2,
new ClassAdStructAttr("x509userproxy",

The key problem with BirdBath is that it seems to require the parameter "true-names" rather than their values documented in the commandline condor_submit's man pages. For example, to specify the file to use for logging, you have to use the parameter "UserLog", not "log" or "Log". To see the actual "true-names" used by Condor internally, submit a job with condor_submit and then use condor_q -l. It will be in the job_queue.log file in the the spool directory as well.

With birdbath+condor-g there is a similar problem returning files from the remote globus server. You have to use "TransferOutput" as one of the attributes. Also I noticed birdbath did not set StreamOut and StreamErr, so I manually set these as well.

The output files (testgeoupdate.index, etc) will be uploaded to your job's spool directory. Note birdbath does not clean this up, unlike the condor_submit invocations (which delete your job's clusterxy.proc0.subproc0 directory on completion). To make birdbath mimic this behavior, you can set LeaveJobInQueue=TRUE in your classadstructattrs above.

No comments: