Problem: I had a working pre-Web service GRAM "fork" job manager but then needed to use LSF job manager for submissions to the scheduler on a cluster. The LSF job manager was not built when we deployed globus initially, which is unusual.
The LSF job manager was built with the commands
% gpt-build globus_gram_job_manager_setup_lsf-1.17.tar.gz
% ./setup-globus-gram-job-manager-lsf
However the command line tests didn't work. For example, the command
globusrun -o -r my.secret.machine/jobmanager-lsf '&(executable=/bin/date)'
threw the error
GRAM Job submission failed because data transfer to the server failed (error code 10)
This is unfortunately an all-purpose Globus error. You will sometimes see it associated with problems in the grid-mapfile, but again my fork jobmanager worked fine, so I had a different bug.
Unfortunately nothing useful turned up in the gsi-gatekeeper.log, even after I turned up the logging level.
Solution: the problem turned out to be that the LSF job manager files were not given the correct permissions during the deployment. These should be 755 (group and world readable and executable). Find them with a command like
find $GLOBUS_LOCATION -name "*lsf*"
I then made the changes manually, but you may also do some "find|xargs" trick.
Subscribe to:
Post Comments (Atom)
1 comment:
thanks for the info... it works for me :)
Post a Comment