Thursday, February 05, 2009

GRAM Job submission failed because data transfer to the server failed (error code 10)

Problem: I had a working pre-Web service GRAM "fork" job manager but then needed to use LSF job manager for submissions to the scheduler on a cluster. The LSF job manager was not built when we deployed globus initially, which is unusual.

The LSF job manager was built with the commands

% gpt-build globus_gram_job_manager_setup_lsf-1.17.tar.gz
% ./setup-globus-gram-job-manager-lsf

However the command line tests didn't work. For example, the command

globusrun -o -r my.secret.machine/jobmanager-lsf '&(executable=/bin/date)'

threw the error

GRAM Job submission failed because data transfer to the server failed (error code 10)

This is unfortunately an all-purpose Globus error. You will sometimes see it associated with problems in the grid-mapfile, but again my fork jobmanager worked fine, so I had a different bug.

Unfortunately nothing useful turned up in the gsi-gatekeeper.log, even after I turned up the logging level.

Solution: the problem turned out to be that the LSF job manager files were not given the correct permissions during the deployment. These should be 755 (group and world readable and executable). Find them with a command like

find $GLOBUS_LOCATION -name "*lsf*"

I then made the changes manually, but you may also do some "find|xargs" trick.

1 comment:

$ure$h @votla said...

thanks for the info... it works for me :)