LSF in ALEPH

The ALEPH offline batch environment has been migrated from from the old NQS system to the LSF (Load Sharing Facility) system. This page is intended to provide information about LSF for Aleph users.

A good introduction to LSF at CERN, including an overview of the most useful user commands, is available at http://wwwinfo.cern.ch/pdp/lsf/LSF-at-CERN.html. There is also a collection of user and administrator guides at http://wwwinfo.cern.ch/pdp/lsf/index.html.



The table below describes the LSF queues that have been set up on aloha and shift50. The maximum job length in each queue is defined in terms of "Normalised CERN Units" (NCU), the normalisation factor for shift50 is 1.3, for aloha nodes it varies between 0.2 and 0.4. To get the maximum CPU time allowed in each queue, divide the NCU limit by the normalisation factor. This is given for shift50 in the table below. There is also a maximum real elapsed time for each queue on shift50, beyond which the job will be stopped even if it has used little CPU time. This can happen for example if your job is stageing in a lot of data and doing little analysis.

Queue name Length (NCU) CPU time limit (shift50) Absolute time limit (shift50) Equivalent NQS queues
xu_8nm 8 NCU minutes 369 seconds 185 minutes A_xux
xu_1nh 1 NCU hour 2769 seconds 923 minutes B_xus + short C_xum jobs
xu_8nh 8 NCU hours 22153 seconds 61.5 hours long C_Xum + short D_xul jobs
xu_1nd 1 NCU day 66461 seconds 92.3 hours long D_xul jobs

LSF is installed on the majority of aloha nodes and on shift50. The configuration can always be tuned, taking into account feedback from users. If you have any comments or problems with the current setup, or if you notice ant strange behaviour, please do not hesitate to tell us. Note that LSF is set up to run also on nodes usually used for interactive work. LSF will automatically suspend batch jobs running on an aloha nodes when it detects any interactive load on that node. Obviusly this is not instantaneous, so the initial interactive response may be degraded, but this degradation should only last a few seconds.

Statistics on LSF queues are collected every week. See the LSF performance graphs for aloha and shift50


Using LSF

In order to access the LSF commands you should do the following: You can then use any of the LSF commands to submit or monitor jobs and queues. You can create and submit an ALPHA job using alpharun, which has been modified to submit the job to LSF. Once you have created a job you can submit it in the following way : bsub < myalpha.job. The .job file must be executable: chmod +x myalpha.job

Hints

Accounting for ALOHA and shift50

Plot for ALOHA and shift50


Please give your feedback on LSF to Marco Cattaneo or Joel Closier