Issue117

Title support Sun Grid Engine
Priority feature Status chatting
Superseder Nosy List baagaard, leif, tan2
Assigned To leif Topics CitcomS, PyLith, Pyre

Created on 2007-06-18.21:50:02 by leif, last changed 2008-04-17.00:35:01 by tan2.

Files
File name Uploaded Type Edit
qsubPyLith leif, 2007-06-18.21:50:02 application/octet-stream
Messages
msg446 (view) Author: tan2 Date: 2008-04-17.00:35:01
The SchedulerSGE can only handle tpn=16.
If tpn is not 16, the formula in pe-number needs to be changed, which is
error prone.
Besides, if job.nodes is not a multiple of 16, one needs to set a
special environment variable "MY_NSLOTS".

All of these seem to be Ranger-specific. So I created  a scheduler for
Ranger.

Excerpt form Ranger's User Guide:

          Using fewer than 16 cores per node

When you want to use less that 16 MPI tasks per node, the choice of
tasks per node is limited to the set of numbers {1, 2, 4, 8, 12, and
15}. When the number of tasks you need is equal to "Number of Tasks per
Node x Number of Nodes", then use the following prescription:

    $# -pe  <TpN>way  <NoN x 16>

    where TpN is a number in the set {1, 2, 4, 8, 12, 15}

If the total number of tasks that you need is less than "Number of Tasks
per Node x Number of Nodes", then set the MY_NSLOTS environment variable
to the total number of tasks. In a job script, use the following -pe
option and environment variable statement:

    $# -pe  <TpN>way <NoN x 16>
    export MY_NSLOTS=<Total Number of Tasks>  

For example, using 31 cores with 8 Tasks per Node:
    $# -pe & 8way 64        {use 8 Tasks per Node, 4 Nodes requested}
    export MY_NSLOTS=31

Leif Strand wrote:
> Eh Tan "Roundup Issue Tracker" wrote:
>> [CitcomS.tacc-ranger]
>> tpn = 16
>> # tpn is the number of tasks per node, must be one of {1, 2, 4, 8,
>> 15, 16}.
>
> The following didn't work?
>
> [CitcomS.sge]
> pe-name = 16way
> pe-number = ((n + 15) / 16) * 16
>
>
> I tried to make "sge" general enough to handle Ranger :-)
>
> --Leif
msg445 (view) Author: leif Date: 2008-04-17.00:25:01
Eh Tan "Roundup Issue Tracker" wrote:
> [CitcomS.tacc-ranger]
> tpn = 16
> # tpn is the number of tasks per node, must be one of {1, 2, 4, 8, 15, 16}.

The following didn't work?

[CitcomS.sge]
pe-name = 16way
pe-number = ((n + 15) / 16) * 16

I tried to make "sge" general enough to handle Ranger :-)

--Leif
msg444 (view) Author: tan2 Date: 2008-04-17.00:11:18
Implemented the scheduler for TACC Ranger in r11821.

CitcomS needs this:

[CitcomS]
scheduler = tacc-ranger

[CitcomS.tacc-ranger]
tpn = 16
# tpn is the number of tasks per node, must be one of {1, 2, 4, 8, 15, 16}.
msg443 (view) Author: leif Date: 2008-04-10.22:31:17
Blindly implemented SGE support, without having access to SGE:

r11795

Note that TACC's Ranger will need something like this:

[CitcomS.sge]
pe-name = 16way
pe-number = ((n + 15) / 16) * 16
msg436 (view) Author: tan2 Date: 2008-03-19.15:59:24
The TACC Ranger cluster also uses SGE for the batch scheduler.

A brief SGE documentation can be found here:
http://www.tacc.utexas.edu/services/userguides/ranger/#batch
msg370 (view) Author: leif Date: 2007-06-18.21:50:02
http://en.wikipedia.org/wiki/Sun_Grid_Engine

free, open-source version:

http://gridengine.sunsource.net/

-------- Original Message --------
Subject: 	RE: [CIG-SHORT] Problem attempting to run on mutiple processors
Date: 	Mon, 18 Jun 2007 14:36:24 -0600
From: 	Oliver Boyd <olboyd@usgs.gov>
To: 	'Leif Strand' <leif@geodynamics.org>
References: 	<006d01c7af9d$96c8b250$c45a16f0$@gov> 
<46731FA2.40207@geodynamics.org> <00e701c7b1be$ca3d8340$5eb889c0$@gov> 
<4676D878.80608@geodynamics.org>

The --launcher.dry was useful in giving me the command that can be submitted
to SGE. The script for a particular run of qsub with 2 processors, run as
'qsub -pe mpich 2 script', now looks like

#!/bin/bash
#$ -cwd
#$ -j y
#$ -S /bin/bash
MPI_DIR=/opt/mpich/gnu
$MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMP/machines
/data/Software/Store/pylith3d-0.8.2/pylith3d/mpipypylith3d --pyre-start
/data/Software/PyLith-0.8.2:/data/Software/Store/pylith3d-0.8.2/python/pythi
a-0.8.1.3-py2.4.egg:/data/Softwar
e/Store/pylith3d-0.8.2/python/Cheetah-2.0rc8-py2.4-linux-x86_64.egg:/data/So
ftware/Store/pylith3d-0.8.2/python/merlin-1.
2.egg PyLith mpi:mpistart pylith3d.PyLithApp:PyLithApp pylith3d-npX.cfg
--nodes=2 --macros.nodes=2 --macros.job.name= --
macros.job.id=24652

I have attached qsubPyLith, which will create this script and run pylith
with sge (not sure of you want it). To use it type 'qsubPyLith np
cfgfile1.cfg cfgfile2.cfg ...', where np is the number of processors.

Couple more questions:
1) What do I do about --macros.job.id? Is it necessary?
2) Running the above or pylith3dapp.py pylith3d-np2.cfg where --nodes=2
produces bmrsnog.0.* and bmrsnog.1.*. What am I supposed to do with these
two sets of files?

Please let me know if you see any problems with what I've done. Thanks for
your help.

Oliver
History
Date User Action Args
2008-04-17 00:35:01tan2setnosy: leif, baagaard, tan2
messages: + msg446
2008-04-17 00:25:01leifsetnosy: leif, baagaard, tan2
messages: + msg445
2008-04-17 00:11:18tan2setnosy: leif, baagaard, tan2
messages: + msg444
2008-04-10 22:31:17leifsetnosy: + baagaard
messages: + msg443
2008-03-19 15:59:25tan2setstatus: unread -> chatting
priority: feature
nosy: + tan2
messages: + msg436
topic: + CitcomS, PyLith, Pyre
assignedto: leif
2007-06-18 21:50:02leifcreate