Saturday, November 17, 2007

Condor, Dagman for Running VLAB Jobs

Here's how to run PWSCF codes with Condor. Assume you have the executables and that the input data comes to you in a .zip file. We need to do these steps:
  1. Clean up any old data
  2. Unpack the .zip
  3. Run PWSCF's executables.
For each step, we need to write a simple Condor .cmd file. Condor can do parameter sweeps but you must use DAGMan to run jobs with dependencies. We will start with a data file called

Here is clean_pwscf.cmd:

executable = /bin/rm
arguments= -rf __CC5f_7
output = clean.out
error = clean.err
log = clean.log

Now unpack_pwscf.cmd:

universe = vanilla
executable = /usr/bin/unzip
arguments =
output = unpack.out
error = unpack.err
log = unpack.log

Finally, run_pwscf.cmd:

executable = pw.x
input = Pwscf_Input
output = pw_condor.out
error = pw_condor.err
log = pw_condor.log
initialdir= __CC5f_7

All of these are run in the same directory that contains the pw.x executable. The only interesting part to any of these scripts is the initialdir directive in the last one. This specifies that the script is executed in the newly unpacked __CC5f_7 directory, which contains the Pwscf_Input file and will be the location of the .out, .err, and .log files.

As with most scientific codes, PWSCF creates more than one output file. In this case they are located in the __CC5f_7/tmp directory. Presumably these would be preserved and copied back if I was running this on a cluster and not just one machine, although I may need to include the directives

should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT

Finally, we need to create our DAG. Here is the content of pwscf.dag:

Job Clean clean_pwscf.cmd
Job Unpack unpack_pwscf.cmd
Job Run run_pwscf.cmd


The "job" portion associates each script with a nickname. The PARENT portion than defines the DAG dependencies. Submit this with
condor_submit_dag -f pwscf.dag

The -f option forces condor to overwrite any preexisting log and other files associated with pwscf.dag.

