Thursday, December 13, 2007

Condor-G Plus TeraGrid Example

Below is an example condor command script for running a job from my local grid client (a PC running FC7, globus, condor). I'm running a set of codes that I installed on the machine (lonestar) that I wanted to use. These codes are run by a perl script, so I had to set the PATH. I want to upload the input files from my desktop at submission time--the paths here are for my client machine. Note Condor-G will put these in $SCRATCH_DIRECTORY on lonestar, which doubles as your working directory (that is, autoref.pl will be executed here). To get the files back from lonestar to the PC, I used "transfer_output_files" and listed each file. Full paths for these aren't necessary. Condor will pull them back from $SCRATCH_DIRECTORY on the remote machine to your local directory.

# Here it is. Please gaze at it only through a welder's mask.
executable = /home/teragrid/myaccnt/geofest.binaryexec/autoref.pl
arguments = testgeoupdate rare
transfer_executable = false
should_transfer_files=yes
when_to_transfer_output=ON_EXIT
transfer_input_files =/home/mpierce/condor_test/Northridge2.flt,/home/mpierce/condor_test/Northr
idge2.params,/home/mpierce/condor_test/Northridge2.sld,/home/mpierce/condor_test/NorthridgeAreaMan
tle.materials,/home/mpierce/condor_test/NorthridgeAreaMantle.sld,/home/mpierce/condor_test/Northri
dgeAreaMidCrust.materials,/home/mpierce/condor_test/NorthridgeAreaMidCrust.sld,/home/mpierce/condo
r_test/NorthridgeAreaUpper.materials,/home/mpierce/condor_test/NorthridgeAreaUpper.sld,/home/mpier
ce/condor_test/testgeoupdate.grp
transfer_output_files=testgeoupdate.index,testgeoupdate.node,testgeoupdate.tetra
universe = grid
grid_resource = gt2 tg-login.tacc.teragrid.org/jobmanager-fork
output = test.out.$(Cluster)
log = test.log.$(Cluster)
environment = PATH=/home/teragrid/myaccnt/geofest.binaryexec/:/bin/:/usr/bin
queue

2 comments:

cooolit86 said...

Hi Marlon,

I am a beginner. And i am trying to understand how to run my application over some other cluster say Lonestar. I have managed to run it successfully on my local clusters and also on a cluster which has a condor setup.(Again local)

Now i am trying to run this application little faster as my cluster doesn't provide 300 cpus at one time. Can you tell me how it works? For example my application requires me to first doing a make(on 1 CPU). Then followed by a DAG which has number of jobs (300 - 600) running parallely on different CPU's (or different cores). I only want to run the DAG part on Lonestar.

Looking forward to hearing you. Your blog has many things that i can learn. Thanks.

- Rohan

Marlon Pierce said...

See http://www.teragrid.org/userinfo/jobs/condorg.php for some examples. You should use jobmanager-lsf instead of jobmanager-pbs for Lonestar.