Thursday, August 10, 2006

About Condor

Step 1: Condor Installation
Follow the simple steps available on following website:
http://docs.optena.com/display/CONDOR/A+Simple+Linux+Installation.
I installed condor6.7.19 by using this instructions on single linux machine as a central manager.

Step 2: After installing condor, set the following environment variables
export CONDOR_CONFIG=/home/condor_user/condor_installation_directory/etc/condor_config
export PATH=$PATH:/home/condor_user/condor_installation_directory/bin
export PATH=$PATH:/home/condor_user/condor_installation_directory/sbin

Step 3: Start the condor deamon by using command '/home/condor_user/condor_installation_directory/sbin/condor_master'

Step 4: Check the available machines by using command 'condor_status'
It will display all the machines added with the central manager. In my case it will display simple linux machine details working as a central manager.

Step 5: For configuration of Condor-G follow the instructions from the following link
http://www.cs.wisc.edu/condor/manual/v6.4/5_3Condor_G.html#SECTION00632000000000000000

We have installed NMI-9 which comes with Globus toolkit. Also you need to change configuartion file located at /home/condor_user/condor_installation_directory/etc/condor_config file.
Add or modify following entries in condor_config file
GRIDMANAGER = $(SBIN)/condor_gridmanager
GT2_GAHP = $(SBIN)/gahp_server
GRID_MONITOR = $(SBIN)/grid_monitor.sh
MAX_GRIDMANAGER_LOG = 1000000
GRIDMANAGER_DEBUG = D_COMMAND
GRIDMANAGER_LOG = /tmp/GridmanagerLog.$(USERNAME)
Finally run condor_reconfigure command for updates

Step 6: To run globus jobs you need to have grid credentials and need to run grid-proxy-init command to create proxy.

Step 7: To submit a globus job to condor-g you should have job description file.
Example1:
executable = test.sh
output = test.out
error = test.error
log = test.log
universe = grid
grid_type = gt2
globusscheduler = gf1.ucs.indiana.edu/job-manager
leave_in_queue = jobstatus == 4
queue

This forks and runs job directly on to gf1.ucs.indiana.edu which is specified by globusscheduler.

To run this job simply run the command 'condor_submit job_description_filename'. You can check the status of job by running command 'condor_q or condor_q -analyze job_cluster_number'. You can remove the job by using command 'condor_rm job_cluster_number'

Step 8: using classad
you can advertise classad of any machine to the central manager.
Example: simple class ad for gf1.ucs.indiana.edu machine

MyType = "Machine"
TargetType = "Job"
Name = "condorTest02"
Machine = "condorTest02"
gatekeeper_url = "gf1.ucs.indiana.edu/jobmanager"
UpdatesSequenced = 9
CurMatches = 0
Requirements = TARGET.JobUniverse == 9
Rank = 0.000000
CurrentRank = 0.000000
OpSys = "LINUX"
Arch = "INTEL"
State = "Unclaimed"
Activity = "Idle"
LoadAvg = 0.000000
Memory = 2048
WantAdRevaluate = True
StartdIpAddr = "156.56.104.135"
you can advertise this classad by using following command
condor_advertise -debug -pool pool_name UPDATE_STARTD_AD classad_name and can check the status of that machine by using command condor_status
Example of job submission file for classad use

universe = grid
grid_type = gt2
notification = never
globusscheduler = $$(gatekeeper_url)
executable = test.sh
transfer_executable = true
output = hg-host.$(Cluster).out
error = hg-host.$(Cluster).error
log = hg-host.$(Cluster).log
requirements = TARGET.gatekeeper_url =!= UNDEFINED
queue
submit this job using condor_submit command. Similarly you can create classads for teragrid machines like NCSA and SDSC
Example classad for NCSA:

MyType = "Machine"
TargetType = "Job"
Name = "condorTest05"
Machine = "condorTest05"
gatekeeper_url = "login-co.ncsa.teragrid.org/jobmanager"
UpdatesSequenced = 9
CurMatches = 0
Requirements = TARGET.JobUniverse == 9
Rank = 0.000000
CurrentRank = 0.000000
OpSys = "LINUX"
Arch = "INTEL"
State = "Unclaimed"
Activity = "Idle"
LoadAvg = 0.000000
Memory = 2048
WantAdRevaluate = True
StartdIpAddr = "156.56.104.135"

Example classad for SDSC:

MyType = "Machine"
TargetType = "Job"
Name = "condorTest03"
Machine = "condorTest03"
gatekeeper_url = "tg-login.sdsc.teragrid.org/jobmanager"
UpdatesSequenced = 9
CurMatches = 0
Requirements = TARGET.JobUniverse == 9
Rank = 0.000000
CurrentRank = 0.000000
OpSys = "LINUX"
Arch = "INTEL"
State = "Unclaimed"
Activity = "Idle"
LoadAvg = 0.000000
Memory = 2048
WantAdRevaluate = True
StartdIpAddr = "156.56.104.135"

Good documentation about condor-g and classad is available at following link:
http://www-128.ibm.com/developerworks/grid/library/gr-condorg2/

No comments: