Issue133

Title launcher and mpiexec
Priority bug Status chatting
Superseder Nosy List baagaard, leif
Assigned To leif Topics Pyre

Created on 2007-12-07.22:40:01 by leif, last changed 2008-03-20.19:40:01 by leif.

Files
File name Uploaded Type Edit
Launcher.py baagaard, 2008-03-20.01:05:01 application/x-python
LauncherMPICH.py baagaard, 2008-03-20.01:05:01 application/x-python
Messages
msg441 (view) Author: leif Date: 2008-03-20.19:40:01
Brad Aagaard wrote:
> I am not sure how to go about setting an interface with the Sun Grid Engine 
> scheduler. I have been using a generic shell script to handle ssh key 
> generation (copied from the manual) and a shell script to (1) distribute 
> files to the local disks on the compute nodes, start the mpd on the correct 
> nodes, and (2) run the program. I am not sure how to handle copying the 
> appropriate files, which is job dependent, in the context of the scheduler.

I once tried to extend CIG-Pyre to do staging of input/output files.  As 
you may know, you can put 'inputFile' and 'outputFile' properties in 
Pyre's Inventory.  (These properties don't seem to be used by any 
Pyre-based projects.)  I used {input,output}File properties as metadata: 
  they determined which files needed to be staged in/out of the job. 
But the implementation turned into a big mess, and it only ever worked 
for LSF.

In general -- and with Pyre projects specifically -- it seems that 
people assume the presence of a global filesystem.  But the GFS on our 
cluster is really slow -- it's best to avoid it.  So I may return to 
this staging issue at some point.

In the meantime, all I have planned for SGE is automatic batch script 
generation and submission, equivalent to what CIG-Pyre already does for 
LSF and PBS:

citcoms --job.queue=normal --job.walltime=1*hour ...

The above will automatically generate the LSF/PBS batch script, and 
submit it to the queue.  There is some doubt in my mind as to whether a 
separate "psub" (Pyre submit) command would have been a better way to go 
-- it would have been more familiar/intuitive from a UI standpoint.

Anyway, I would stick with the shell script. :-/

--Leif
msg440 (view) Author: baagaard Date: 2008-03-20.18:35:01
Leif-

You are right. Using the environment variable in the command parameter is the 
correct way to handle a queue generated machine file.

I am not sure how to go about setting an interface with the Sun Grid Engine 
scheduler. I have been using a generic shell script to handle ssh key 
generation (copied from the manual) and a shell script to (1) distribute 
files to the local disks on the compute nodes, start the mpd on the correct 
nodes, and (2) run the program. I am not sure how to handle copying the 
appropriate files, which is job dependent, in the context of the scheduler.

Brad

On Wednesday 19 March 2008, Brad Aagaard "Roundup Issue Tracker" wrote:
> Brad Aagaard <baagaard@usgs.gov> added the comment:
>
> Leif-
>
> Sorry for the poor explanation. I was never using both the queue
> generated machinefile and the nodelist and the nodegen parameters. I
> think I misunderstood how to set the machine file with an environment
> variable. I will try setting the command parameter again using just the
> environment variable.
>
> Brad
>
> Leif Strand "Roundup Issue Tracker" wrote:
> > Leif Strand <leif@geodynamics.org> added the comment:
> >
> > I'm confused.  If the queue system generates the machine file, you
> > shouldn't be using 'nodelist' and 'nodegen' at all.  Take a look at the
> > PBS example on the following page:
> >
> > http://www.geodynamics.org/cig/software/packages/cs/pythia/docs/batch
> >
> > It does this:
> >
> > command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}
> >
> > Here, Pyre doesn't care about the machine file.  The batch system, PBS,
> > generates the machine file and stores its pathname in the PBS_NODEFILE
> > environment variable.  CIG-Pyre's environment-variable expansion feature
> > is all that is needed to generate the correct mpirun/mpiexec line.
> >
> > I guess I don't understand what makes SGE special in this area.  Although
> > clearly Pyre needs SGE support so that the 'scheduler' facility works. 
> > See issue117.
> >
> > _____________________________________________________
> > Roundup issue tracker <issue_tracker@geodynamics.org>
> > <http://geodynamics.org/roundup/issues/issue133>
> > _____________________________________________________
>
> _____________________________________________________
> Roundup issue tracker <issue_tracker@geodynamics.org>
> <http://geodynamics.org/roundup/issues/issue133>
> _____________________________________________________
msg439 (view) Author: baagaard Date: 2008-03-20.03:35:01
Leif-

Sorry for the poor explanation. I was never using both the queue 
generated machinefile and the nodelist and the nodegen parameters. I 
think I misunderstood how to set the machine file with an environment 
variable. I will try setting the command parameter again using just the 
environment variable.

Brad

Leif Strand "Roundup Issue Tracker" wrote:
> Leif Strand <leif@geodynamics.org> added the comment:
> 
> I'm confused.  If the queue system generates the machine file, you shouldn't be
> using 'nodelist' and 'nodegen' at all.  Take a look at the PBS example on the
> following page:
> 
> http://www.geodynamics.org/cig/software/packages/cs/pythia/docs/batch
> 
> It does this:
> 
> command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}
> 
> Here, Pyre doesn't care about the machine file.  The batch system, PBS,
> generates the machine file and stores its pathname in the PBS_NODEFILE
> environment variable.  CIG-Pyre's environment-variable expansion feature is all
> that is needed to generate the correct mpirun/mpiexec line.
> 
> I guess I don't understand what makes SGE special in this area.  Although
> clearly Pyre needs SGE support so that the 'scheduler' facility works.  See
> issue117.
> 
> _____________________________________________________
> Roundup issue tracker <issue_tracker@geodynamics.org>
> <http://geodynamics.org/roundup/issues/issue133>
> _____________________________________________________
>
msg438 (view) Author: leif Date: 2008-03-20.01:57:16
I'm confused.  If the queue system generates the machine file, you shouldn't be
using 'nodelist' and 'nodegen' at all.  Take a look at the PBS example on the
following page:

http://www.geodynamics.org/cig/software/packages/cs/pythia/docs/batch

It does this:

command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}

Here, Pyre doesn't care about the machine file.  The batch system, PBS,
generates the machine file and stores its pathname in the PBS_NODEFILE
environment variable.  CIG-Pyre's environment-variable expansion feature is all
that is needed to generate the correct mpirun/mpiexec line.

I guess I don't understand what makes SGE special in this area.  Although
clearly Pyre needs SGE support so that the 'scheduler' facility works.  See
issue117.
msg437 (view) Author: baagaard Date: 2008-03-20.01:05:01
Leif-

Your changes to Launcher didn't quite meet my needs. The queue system (Sun 
Grid Engine) generates the machine file. In the current implementation 
mpi.Launcher only calls mpi.LauncherMPICH._expandNodeListArgs() if the node 
list is specified. Because the queue determines the nodes, I don't have a 
node list, just a machine file. I modified mpi.Launcher to always call 
_expandNodeListArgs(), but mpi.LauncherMPICH._expandNodeListArgs only builds 
the machine file if there is a node list. These modifications seem to produce 
the desired behavior:
(1) Setting the machinefile environment variable is ignored, if the command 
doesn't have a machinefile argument.
(2) If the command uses the machinefile environment variable, it is used but 
not generated if the node list is not specified.
(3) The machinefile is generated if the node list is specified.

I have attached modified versions of mpi.Launcher.py and mpi.LauncherMPICH.py.

Brad

On Wednesday 12 December 2007, Leif Strand "Roundup Issue Tracker" wrote:
> Leif Strand <leif@geodynamics.org> added the comment:
>
> Fixed:
>
> r8639
>
> Summary:  Using 'launcher.nodelist' instructs Pyre to generate a
> machinefile using the printf-format string 'launcher.nodegen'.  The
> filename of the machinefile is controlled by 'launcher.machinefile'
> (default 'mpirun.nodes').
>
> The placement of the corresponding '-machinefile' option on the
> mpirun/mpiexec command line must now be explicitly specified by the user
> using the 'launcher.command' option:
>
> [app.launcher]
> command = mpiexec -machinefile ${launcher.machinefile} -np ${nodes}
>
> The macro ${launcher.machinefile} (named after the corresponding option)
> expands to the filename of the generated machinefile.
>
> ----------
> status: chatting -> resolved
>
> _____________________________________________________
> Roundup issue tracker <issue_tracker@geodynamics.org>
> <http://geodynamics.org/roundup/issues/issue133>
> _____________________________________________________
msg417 (view) Author: leif Date: 2007-12-12.20:48:39
Fixed:

r8639

Summary:  Using 'launcher.nodelist' instructs Pyre to generate a machinefile
using the printf-format string 'launcher.nodegen'.  The filename of the
machinefile is controlled by 'launcher.machinefile' (default 'mpirun.nodes').

The placement of the corresponding '-machinefile' option on the mpirun/mpiexec
command line must now be explicitly specified by the user using the
'launcher.command' option:

[app.launcher]
command = mpiexec -machinefile ${launcher.machinefile} -np ${nodes}

The macro ${launcher.machinefile} (named after the corresponding option) expands
to the filename of the generated machinefile.
msg416 (view) Author: leif Date: 2007-12-11.18:43:36
Note: I just discovered that one must configure MPICH2 with --with-pm=mpd ... if
one configures with --with-pm=gforker (as I always do), it installs a different
'mpiexec'.
msg415 (view) Author: leif Date: 2007-12-07.22:42:35
Brad Aagaard wrote:
> I am using mpiexec so I need something of the form:
> mpiexec -machinefile mpirun.nodes -n ${nodes}
> 
> Using
> command = mpiexec -n ${nodes}
> results in the machinefile being put after the -n which mpiexec chokes on 
> because the global options (machinefile) must come before the local options 
> (-n). Is there a way to add -n ${nodes} to LauncherMPICH.py so that it comes 
> after the machinefile setting?
msg414 (view) Author: leif Date: 2007-12-07.22:40:01
Brad,

OK... I have 1.0.5p4.  CC'ing roundup.

Thanks,
Leif

Brad Aagaard wrote:
> Leif-
> 
> I am using 1.0.6p1 (the current release).
> 
> /tools/common/mpich2-1.0.6p1/gcc-4.1.2_64/bin/mpiexec -machinefile foo
> mpiexec: missing arguments after global args
> 
> usage:
> mpiexec [-h or -help or --help]    # get this message
> mpiexec -file filename             # (or -f) filename contains XML job 
> description
> mpiexec [global args] [local args] executable [args]
>    where global args may be
>       -l                           # line labels by MPI rank
>       -bnr                         # MPICH1 compatibility mode
>       -machinefile                 # file mapping procs to machines
>       -s <spec>                    # direct stdin to "all" or 1,2 or 2-4,6
>       -1                           # override default of trying 1st proc 
> locally
>       -ifhn                        # network interface to use locally
>       -tv                          # run procs under totalview (must be 
> installed)
>       -tvsu                        # totalview startup only
>       -gdb                         # run procs under gdb
>       -m                           # merge output lines (default with gdb)
>       -a                           # means assign this alias to the job
>       -ecfn                        # output_xml_exit_codes_filename
>       -g<local arg name>           # global version of local arg (below)
>     and local args may be
>       -n <n> or -np <n>            # number of processes to start
>       -wdir <dirname>              # working directory to start in
>       -umask <umask>               # umask for remote process
>       -path <dirname>              # place to look for executables
>       -host <hostname>             # host to start on
>       -soft <spec>                 # modifier of -n value
>       -arch <arch>                 # arch type to start on (not implemented)
>       -envall                      # pass all env vars in current environment
>       -envnone                     # pass no env vars
>       -envlist <list of env var names> # pass current values of these vars
>       -env <name> <value>          # pass this value of this env var
> mpiexec [global args] [local args] executable args : [local args] 
> executable...
> mpiexec -gdba jobid                # gdb-attach to existing jobid
> mpiexec -configfile filename       # filename contains cmd line segs as lines
>   (See User Guide for more details)
> 
> Examples:
>    mpiexec -l -n 10 cpi 100
>    mpiexec -genv QPL_LICENSE 4705 -n 3 a.out
> 
>    mpiexec -n 1 -host foo master : -n 4 -host mysmp slave
> 
> 
> 
> On Friday 07 December 2007, Leif Strand wrote:
>> Brad Aagaard wrote:
>>> MPICH2
>> ???  In my copy, '-machinefile' isn't even an option:
>>
>> $ ~buildbot/opt/mpich2/bin/mpiexec  -machinefile foo
>> invalid mpiexec argument -machinefile
>> Usage: mpiexec -usize <universesize> -maxtime <seconds> -exitinfo -l\
>>                 -n <numprocs> -soft <softness> -host <hostname> \
>>                 -wdir <working directory> -path <search path> \
>>                 -file <filename> -configfile <filename> \
>>                 -genvnone -genvlist <name1,name2,...> -genv name value\
>>                 -envnone -envlist <name1,name2,...> -env name value\
>>                 execname <args>\
>>                 [ : -n <numprocs> ... execname <args>]
>>
>>
>> --Leif
> 
>
History
Date User Action Args
2008-03-20 19:40:01leifsetnosy: leif, baagaard
messages: + msg441
2008-03-20 18:35:01baagaardsetnosy: leif, baagaard
messages: + msg440
2008-03-20 03:35:01baagaardsetnosy: leif, baagaard
messages: + msg439
2008-03-20 01:57:16leifsetnosy: leif, baagaard
messages: + msg438
2008-03-20 01:05:01baagaardsetfiles: + LauncherMPICH.py, Launcher.py
status: resolved -> chatting
messages: + msg437
nosy: leif, baagaard
2007-12-12 20:48:39leifsetstatus: chatting -> resolved
nosy: leif, baagaard
messages: + msg417
2007-12-11 18:43:36leifsetnosy: leif, baagaard
messages: + msg416
2007-12-07 22:42:35leifsetstatus: unread -> chatting
priority: bug
nosy: leif, baagaard
messages: + msg415
topic: + Pyre
assignedto: leif
2007-12-07 22:40:01leifcreate