Monday, December 31, 2007

Installing and Configuring Globus Services on Mac OS X

* Download Mac binaries from Globus site. Probably the VDT will work fine also.

* Do the usual configure/make/make install business.

* Unfortunately, all Macs that I tried are missing Perl XML parsers and so will fail at the "make install" step. For some reason, Globus's ./configure isn't set up to catch this. Not sure why. I found the following nice instructions for installing Perl::XML and other stuff at

http://ripary.com/bundle-xml.html

This will take a while.

* Set up Simple CA. See

http://vdt.cs.wisc.edu/releases/1.8.1/installation_post_server.html#simpleca

and

http://www-unix.globus.org/toolkit/docs/4.0/admin/docbook/ch07.html#s-simpleca-admin-installing.

The VDT page above actually points to some older (v 3.2) instructions, but these are still OK also.

* Make your host certificate and sign it. I did all of this as root.

$GLOBUS_LOCATION/bin/grid-cert-request -host `hostname`

(note the backticks around hostname).

cd /etc/grid-security/
~/Globus-Services/bin/grid-ca-sign -in hostcert_request.pem -out hostsigned.pem

* Create your xinetd services.

cd /etc/inetd.d/
touch gsigatekeeper
touch gsiftp

* Here is my gsigatekeeper:
service gsigatekeeper
{
socket_type = stream
protocol = tcp
wait = no
user = root
env = LD_LIBRARY_PATH=/Users/mpierce/Globus-Services/lib
env = DYLD_LIBRARY_PATH=/Users/mpierce/Globus-Services/lib
server = /Users/mpierce/Globus-Services/sbin/globus-gatekeeper
server_args = -conf /Users/mpierce/Globus-Services/etc/globus-gatekeeper.conf
disable = no
}

And my gsiftp:
service gsiftp
{
instances = 100
socket_type = stream
wait = no
user = root
env += GLOBUS_LOCATION=/Users/mpierce/Globus-Services
env += LD_LIBRARY_PATH=/Users/mpierce/Globus-Services/lib
env += DYLD_LIBRARY_PATH=/Users/mpierce/Globus-Services/lib
server = /Users/mpierce/Globus-Services/sbin/globus-gridftp-server
server_args = -i
log_on_success += DURATION
nice = 10
disable = no
}

Note the LD_LIBRARY_PATH is not useful for Macs--you need DYLD_LIBRARY_PATH instead (see below). But I left it in anyway--you will need this for Linux installations.

* Start your services:

service gsiftp start
service gsigatekeeper start

* You may want to add these also to /etc/services
tail /etc/services
# Carstein Seeberg
# 48004-48555 Unassigned
com-bardac-dw 48556/udp # com-bardac-dw
com-bardac-dw 48556/tcp # com-bardac-dw
# Nicholas J Howes
# 48557-49150 Unassigned
# 49151 IANA Reserved
#gsiftp 2811/tcp
#gsigatekeeper 2119/tcp


* Check these with telnet:
telnet localhost 2811
telnet localhost 2119

* Note you must use DYLD_LIBRARY_PATH on the Mac or else the service will not start even though the "service" commands above will not complain. You will get errors like this if you telnet to the ports:
/Users/condor/execute/dir_8492/userdir/install/lib/libglobus_gss_assist_gcc32.0.dylib

* Requisite Globus Complaint: I had to do all the above configuration stuff by hand. Why not have a configuration post-installation "flavor" called "my first globus installation" that does all of this for you?

* Create a grid-mapfile and some usercerts, or just use your favorite grid-mapfile from some place else.

Monday, December 17, 2007

Condor-G, Birdbath and TransferOutputRemaps

Here is the classad attribute structure you need for transferring your output back from the remote globus machine to specific files on your condor-g job broker/universal client. By default, the output goes to your cluster's spool directory. The example below shows how to move these to /tmp/ on the condor host.

Note that TransferOutput files are separated by commas, and TransferOutputRemaps files are separated by semicolons.

new ClassAdStructAttr("TransferOutput",
ClassAdAttrType.value2,
"\"testgeoupdate.index,testgeoupdate.node,testgeoupdate.tetra\""),

new ClassAdStructAttr("TransferOutputRemaps",
ClassAdAttrType.value2,"\"testgeoupdate.index=/tmp/testgeoupdate.index;
testgeoupdate.node=/tmp/testgeoupdate.node;testgeoupdate.tetra=/tmp/testgeoupdate.tetra\""),

...

The trick as usual is to get the condor true-names for the parameters you want to use. These are typically NOT the familiar names from condor_submit. Always create a command script for what you want to do before writing your birdbath api code, and then use condor_q -l to see the internal names that condor actually uses. These are the ones you will need to use in your classAdStructAttr array.

Saturday, December 15, 2007

File Retrieval with BirdBath and Condor-G

As I wrote earlier, output files (other than standard out) of Globus jobs submitted via Condor-G can be retrieved using the TransferOutput attribute. To reproduce this with the BirdBath Java API, you need an incantation something like below.

ClassAdStructAttr[] extraAttributes =
{
new ClassAdStructAttr("GridResource", ClassAdAttrType.value3,
gridResourceVal),
new ClassAdStructAttr("Out", ClassAdAttrType.value3,
outputFile),
new ClassAdStructAttr("UserLog", ClassAdAttrType.value3,
logFile),
new ClassAdStructAttr("Err", ClassAdAttrType.value3,
errFile),
new ClassAdStructAttr("TransferExecutable",
ClassAdAttrType.value4,
"FALSE"),
new ClassAdStructAttr("when_to_transfer_output",
ClassAdAttrType.value2,
"\"ON_EXIT\""),
new ClassAdStructAttr("should_transfer_files",
ClassAdAttrType.value2,
"\"YES\""),
new ClassAdStructAttr("StreamOut",
ClassAdAttrType.value4,
"FALSE"),
new ClassAdStructAttr("StreamErr",
ClassAdAttrType.value4,
"FALSE"),
new ClassAdStructAttr("TransferOutput",
ClassAdAttrType.value2,
"\"testgeoupdate.index,testgeoupdate.node,testgeoupdate.tetra\""),
new ClassAdStructAttr("Environment",ClassAdAttrType.value2,
"\"PATH=/home/teragrid/tg459247/geofest.binaryexec:/bin:/usr/bin\""),
new ClassAdStructAttr("x509userproxy",
ClassAdAttrType.value3,
proxyLocation)


The key problem with BirdBath is that it seems to require the parameter "true-names" rather than their values documented in the commandline condor_submit's man pages. For example, to specify the file to use for logging, you have to use the parameter "UserLog", not "log" or "Log". To see the actual "true-names" used by Condor internally, submit a job with condor_submit and then use condor_q -l. It will be in the job_queue.log file in the the spool directory as well.

With birdbath+condor-g there is a similar problem returning files from the remote globus server. You have to use "TransferOutput" as one of the attributes. Also I noticed birdbath did not set StreamOut and StreamErr, so I manually set these as well.

The output files (testgeoupdate.index, etc) will be uploaded to your job's spool directory. Note birdbath does not clean this up, unlike the condor_submit invocations (which delete your job's clusterxy.proc0.subproc0 directory on completion). To make birdbath mimic this behavior, you can set LeaveJobInQueue=TRUE in your classadstructattrs above.

Thursday, December 13, 2007

Condor-G Plus TeraGrid Example

Below is an example condor command script for running a job from my local grid client (a PC running FC7, globus, condor). I'm running a set of codes that I installed on the machine (lonestar) that I wanted to use. These codes are run by a perl script, so I had to set the PATH. I want to upload the input files from my desktop at submission time--the paths here are for my client machine. Note Condor-G will put these in $SCRATCH_DIRECTORY on lonestar, which doubles as your working directory (that is, autoref.pl will be executed here). To get the files back from lonestar to the PC, I used "transfer_output_files" and listed each file. Full paths for these aren't necessary. Condor will pull them back from $SCRATCH_DIRECTORY on the remote machine to your local directory.

# Here it is. Please gaze at it only through a welder's mask.
executable = /home/teragrid/myaccnt/geofest.binaryexec/autoref.pl
arguments = testgeoupdate rare
transfer_executable = false
should_transfer_files=yes
when_to_transfer_output=ON_EXIT
transfer_input_files =/home/mpierce/condor_test/Northridge2.flt,/home/mpierce/condor_test/Northr
idge2.params,/home/mpierce/condor_test/Northridge2.sld,/home/mpierce/condor_test/NorthridgeAreaMan
tle.materials,/home/mpierce/condor_test/NorthridgeAreaMantle.sld,/home/mpierce/condor_test/Northri
dgeAreaMidCrust.materials,/home/mpierce/condor_test/NorthridgeAreaMidCrust.sld,/home/mpierce/condo
r_test/NorthridgeAreaUpper.materials,/home/mpierce/condor_test/NorthridgeAreaUpper.sld,/home/mpier
ce/condor_test/testgeoupdate.grp
transfer_output_files=testgeoupdate.index,testgeoupdate.node,testgeoupdate.tetra
universe = grid
grid_resource = gt2 tg-login.tacc.teragrid.org/jobmanager-fork
output = test.out.$(Cluster)
log = test.log.$(Cluster)
environment = PATH=/home/teragrid/myaccnt/geofest.binaryexec/:/bin/:/usr/bin
queue

Returning Standard Output in Condor-G

[The problems have been hard to reproduce, so none of this may be necessary. But it doesn't hurt.]

Here's a nugget of wisdom: start Condor-G as root and set CONDOR_IDS=(uid).(gid)(that is, see /etc/passwd and /etc/group for the values, should be something like 501.501) if you want to run as a user other than "condor".

I had the following problem: condor_submit worked correctly but standard output went to /dev/null. Checking the logs and spool, I saw that TransferOut was also FALSE and couldn't override this.

I solved this by restarting condor as root (it resets the process id to the CONDOR_IDS user). It also seems to work if you run condor as the correct user (that is, the one specified by CONDOR_IDS).

Wednesday, December 12, 2007

Globus Client+MyProxy+Pacman on FC7 Notes

Here are my "hold your mouth right" instructions. Deviate at your own risk.

* Get pacman, untar, cd, and run setup.sh.

* Make an installation directory (like mkdir $HOME/vdt) and cd into it. Don't run Pacman in $HOME--it copies everything in the current working directory into post-install, which could be awful if you run from $HOME.

* Fedora Core 7 is not supported by the VDT, so do the pacman pretend:

cd $HOME/vdt (if not there already)
pacman -pretend-platform Fedora-4

* Install Globus clients (you are still in $HOME/vdt):
pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:Globus-Client

Answer no to installing VDT certs. We will get TG certs in a minute, and the VDT certs seem to have some problems.

* Install MyProxy (still in $HOME/vdt):
pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:MyProxy

* Install GSIOpenSSH (still in $HOME/vdt):
pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:GSIOpenSSH

* Set your $GLOBUS_LOCATION and source $GLOBUS_LOCATION/etc/globus-user-env.sh.

* Get the TG certificates:
* wget http://security.teragrid.org/docs/teragrid-certs.tar.gz
Unpack these in $HOME/.globus/certificates/

* I didn't like the Condor installation I got from Pacman. This should be done for now with the old condor-configure command on a direct download.

* Also make sure pacman didn't set $X509_CERT_DIR. This had some self-signed certs that caused myproxy-logon to myproxy.teragrid.org to fail and overrode my .globus and /etc/grid-security certificate locations.

Monday, December 10, 2007

MyProxy, Globus Clients on Mac OSX: Use the VDT

[NOTE: Read the comments. Mac binaries are available from www.globus.org; missing links were added.]

I spent a couple of hours trying to compile Globus source on my Mac OS X, only to have it fail with some mysterious error. Some things never change. Why don't they provide a pre-built Mac OS X binary? Here is the error that I get:

make: *** [globus_rls_server-thr] Error 2

Luckily, the VDT seems to do things the right way. I used Pacman to install globus clients (the command line tools only) and MyProxy. See the VDT documentation for instructions. Why doesn't Globus do this? Unbelievable. Anyway, here are the Pacman commands that I used:

pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:Globus-Client

and

pacman -get http://vdt.cs.wisc.edu/vdt_181_cache:MyProxy

I answered "no" for installing certificates--I have the ones I needed, and answering "yes" caused a failure, probably because of write permission problems.

Set $GLOBUS_LOCATION and source $GLOBUS_LOCATION/etc/globus-user-env.sh as usual.

Thursday, December 06, 2007

Condor Birdbath Compromises

I have not been able to find the right incantation for using the "initialdir" attribute with the Birdbath web service. I'm trying to do the equivalent of the following submission script:

universe = vanilla
executable = /path/to/my/bin/a.out
output = theOutput
error = theError
log = theLog
arguments = 10 21
environment = "PATH=/path/to/my/bin/:/bin/:/usr/bin"
initialdir = /path/to/my/data
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
queue

Looking at spool/job.queue.log, you will see that this attribute's proper name is "Iwd". However, the classad structure (using Birdbath java helper classes) like below didn't work:

//Create a classad for the job.
ClassAdStructAttr[] extraAttributes =
{
new ClassAdStructAttr("Out", ClassAdAttrType.value3,
outputFile),
new ClassAdStructAttr("Err", ClassAdAttrType.value3,
errFile),
new ClassAdStructAttr("Log", ClassAdAttrType.value3,
logFile),
new ClassAdStructAttr("Environment", ClassAdAttrType.value2,
"\"PATH=/path/to/my/bin/:/bin:/usr/bin\""),
new ClassAdStructAttr("ShouldTransferFiles", ClassAdAttrType.value2,
"\"IF_NEEDED\""),
new ClassAdStructAttr("WhenToTransferFiles", ClassAdAttrType.value2,
"ON_EXIT"),
new ClassAdStructAttr("Iwd", ClassAdAttrType.value3,
"/path/to/my/data/")
};

The Iwd parameter is set correctly in the logs, but the executables can't find the input files in the initial directory correctly. My workaround solution was to uploading all the data files to the

File[] files={ new File("/local/data/path/file1.dat"),
new File("/local/data/path/file1.dat") };

And then run the usual way, except specify the files

xact.submit(clusterId, jobId, userName, universeType,
executable,arguments,"(TRUE)", extraAttributes, files);
xact.commit();
schedd.requestReschedule();

This will put the data (input and output) in condor's spool directory.

Tuesday, December 04, 2007

Running GeoFEST with Condor

This is a prelude to using Condor-G. I used the following incantation:

-bash-3.00$ more meshgen.cmd
universe = vanilla
executable = /globalhome/gateway/geofest.binaryexec/autoref.pl
output = autoref.out
error = autoref.err
log = autoref.log
arguments = /globalhome/gateway/condor_test/testgeoupdate rare
environment = "PATH=/globalhome/gateway/geofest.binaryexec/"
#getenv = true
initialdir = /globalhome/gateway/condor_test
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
queue

The key thing here was the "environment" attribute, since the autoref.pl script is of course a perl script that calls other executables. These executables need to be in the PATH.

You can also do this by setting the PATH environment variable in your shell and then using the getenv attribute. This will work OK for the command line but will not be much use in a web service version. OK, you could get it to work, but this is the kind of thing that will break 6 months later when you move to a new machine.

None of the condor examples I could find discuss this, but it is well documented in the condor_submit manual page.

Monday, December 03, 2007

Compiling RDAHMM Correctly

The correct value for the LDLIBS variable is

LDLIBS = -lda -lnr -lut -lcp /usr/lib/liblapack.a -lblas -lm