Created on 2007-12-07.22:40:01 by leif, last changed 2008-03-20.19:40:01 by leif.
| File name |
Uploaded |
Type |
Edit |
|
Launcher.py
|
baagaard,
2008-03-20.01:05:01
|
application/x-python |
|
|
LauncherMPICH.py
|
baagaard,
2008-03-20.01:05:01
|
application/x-python |
|
| msg441 (view) |
Author: leif |
Date: 2008-03-20.19:40:01 |
Brad Aagaard wrote:
> I am not sure how to go about setting an interface with the Sun Grid Engine
> scheduler. I have been using a generic shell script to handle ssh key
> generation (copied from the manual) and a shell script to (1) distribute
> files to the local disks on the compute nodes, start the mpd on the correct
> nodes, and (2) run the program. I am not sure how to handle copying the
> appropriate files, which is job dependent, in the context of the scheduler.
I once tried to extend CIG-Pyre to do staging of input/output files. As
you may know, you can put 'inputFile' and 'outputFile' properties in
Pyre's Inventory. (These properties don't seem to be used by any
Pyre-based projects.) I used {input,output}File properties as metadata:
they determined which files needed to be staged in/out of the job.
But the implementation turned into a big mess, and it only ever worked
for LSF.
In general -- and with Pyre projects specifically -- it seems that
people assume the presence of a global filesystem. But the GFS on our
cluster is really slow -- it's best to avoid it. So I may return to
this staging issue at some point.
In the meantime, all I have planned for SGE is automatic batch script
generation and submission, equivalent to what CIG-Pyre already does for
LSF and PBS:
citcoms --job.queue=normal --job.walltime=1*hour ...
The above will automatically generate the LSF/PBS batch script, and
submit it to the queue. There is some doubt in my mind as to whether a
separate "psub" (Pyre submit) command would have been a better way to go
-- it would have been more familiar/intuitive from a UI standpoint.
Anyway, I would stick with the shell script. :-/
--Leif
|
| msg440 (view) |
Author: baagaard |
Date: 2008-03-20.18:35:01 |
Leif-
You are right. Using the environment variable in the command parameter is the
correct way to handle a queue generated machine file.
I am not sure how to go about setting an interface with the Sun Grid Engine
scheduler. I have been using a generic shell script to handle ssh key
generation (copied from the manual) and a shell script to (1) distribute
files to the local disks on the compute nodes, start the mpd on the correct
nodes, and (2) run the program. I am not sure how to handle copying the
appropriate files, which is job dependent, in the context of the scheduler.
Brad
On Wednesday 19 March 2008, Brad Aagaard "Roundup Issue Tracker" wrote:
> Brad Aagaard <baagaard@usgs.gov> added the comment:
>
> Leif-
>
> Sorry for the poor explanation. I was never using both the queue
> generated machinefile and the nodelist and the nodegen parameters. I
> think I misunderstood how to set the machine file with an environment
> variable. I will try setting the command parameter again using just the
> environment variable.
>
> Brad
>
> Leif Strand "Roundup Issue Tracker" wrote:
> > Leif Strand <leif@geodynamics.org> added the comment:
> >
> > I'm confused. If the queue system generates the machine file, you
> > shouldn't be using 'nodelist' and 'nodegen' at all. Take a look at the
> > PBS example on the following page:
> >
> > http://www.geodynamics.org/cig/software/packages/cs/pythia/docs/batch
> >
> > It does this:
> >
> > command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}
> >
> > Here, Pyre doesn't care about the machine file. The batch system, PBS,
> > generates the machine file and stores its pathname in the PBS_NODEFILE
> > environment variable. CIG-Pyre's environment-variable expansion feature
> > is all that is needed to generate the correct mpirun/mpiexec line.
> >
> > I guess I don't understand what makes SGE special in this area. Although
> > clearly Pyre needs SGE support so that the 'scheduler' facility works.
> > See issue117.
> >
> > _____________________________________________________
> > Roundup issue tracker <issue_tracker@geodynamics.org>
> > <http://geodynamics.org/roundup/issues/issue133>
> > _____________________________________________________
>
> _____________________________________________________
> Roundup issue tracker <issue_tracker@geodynamics.org>
> <http://geodynamics.org/roundup/issues/issue133>
> _____________________________________________________
|
| msg439 (view) |
Author: baagaard |
Date: 2008-03-20.03:35:01 |
Leif-
Sorry for the poor explanation. I was never using both the queue
generated machinefile and the nodelist and the nodegen parameters. I
think I misunderstood how to set the machine file with an environment
variable. I will try setting the command parameter again using just the
environment variable.
Brad
Leif Strand "Roundup Issue Tracker" wrote:
> Leif Strand <leif@geodynamics.org> added the comment:
>
> I'm confused. If the queue system generates the machine file, you shouldn't be
> using 'nodelist' and 'nodegen' at all. Take a look at the PBS example on the
> following page:
>
> http://www.geodynamics.org/cig/software/packages/cs/pythia/docs/batch
>
> It does this:
>
> command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}
>
> Here, Pyre doesn't care about the machine file. The batch system, PBS,
> generates the machine file and stores its pathname in the PBS_NODEFILE
> environment variable. CIG-Pyre's environment-variable expansion feature is all
> that is needed to generate the correct mpirun/mpiexec line.
>
> I guess I don't understand what makes SGE special in this area. Although
> clearly Pyre needs SGE support so that the 'scheduler' facility works. See
> issue117.
>
> _____________________________________________________
> Roundup issue tracker <issue_tracker@geodynamics.org>
> <http://geodynamics.org/roundup/issues/issue133>
> _____________________________________________________
>
|
| msg438 (view) |
Author: leif |
Date: 2008-03-20.01:57:16 |
I'm confused. If the queue system generates the machine file, you shouldn't be
using 'nodelist' and 'nodegen' at all. Take a look at the PBS example on the
following page:
http://www.geodynamics.org/cig/software/packages/cs/pythia/docs/batch
It does this:
command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}
Here, Pyre doesn't care about the machine file. The batch system, PBS,
generates the machine file and stores its pathname in the PBS_NODEFILE
environment variable. CIG-Pyre's environment-variable expansion feature is all
that is needed to generate the correct mpirun/mpiexec line.
I guess I don't understand what makes SGE special in this area. Although
clearly Pyre needs SGE support so that the 'scheduler' facility works. See
issue117.
|
| msg437 (view) |
Author: baagaard |
Date: 2008-03-20.01:05:01 |
Leif-
Your changes to Launcher didn't quite meet my needs. The queue system (Sun
Grid Engine) generates the machine file. In the current implementation
mpi.Launcher only calls mpi.LauncherMPICH._expandNodeListArgs() if the node
list is specified. Because the queue determines the nodes, I don't have a
node list, just a machine file. I modified mpi.Launcher to always call
_expandNodeListArgs(), but mpi.LauncherMPICH._expandNodeListArgs only builds
the machine file if there is a node list. These modifications seem to produce
the desired behavior:
(1) Setting the machinefile environment variable is ignored, if the command
doesn't have a machinefile argument.
(2) If the command uses the machinefile environment variable, it is used but
not generated if the node list is not specified.
(3) The machinefile is generated if the node list is specified.
I have attached modified versions of mpi.Launcher.py and mpi.LauncherMPICH.py.
Brad
On Wednesday 12 December 2007, Leif Strand "Roundup Issue Tracker" wrote:
> Leif Strand <leif@geodynamics.org> added the comment:
>
> Fixed:
>
> r8639
>
> Summary: Using 'launcher.nodelist' instructs Pyre to generate a
> machinefile using the printf-format string 'launcher.nodegen'. The
> filename of the machinefile is controlled by 'launcher.machinefile'
> (default 'mpirun.nodes').
>
> The placement of the corresponding '-machinefile' option on the
> mpirun/mpiexec command line must now be explicitly specified by the user
> using the 'launcher.command' option:
>
> [app.launcher]
> command = mpiexec -machinefile ${launcher.machinefile} -np ${nodes}
>
> The macro ${launcher.machinefile} (named after the corresponding option)
> expands to the filename of the generated machinefile.
>
> ----------
> status: chatting -> resolved
>
> _____________________________________________________
> Roundup issue tracker <issue_tracker@geodynamics.org>
> <http://geodynamics.org/roundup/issues/issue133>
> _____________________________________________________
|
| msg417 (view) |
Author: leif |
Date: 2007-12-12.20:48:39 |
Fixed:
r8639
Summary: Using 'launcher.nodelist' instructs Pyre to generate a machinefile
using the printf-format string 'launcher.nodegen'. The filename of the
machinefile is controlled by 'launcher.machinefile' (default 'mpirun.nodes').
The placement of the corresponding '-machinefile' option on the mpirun/mpiexec
command line must now be explicitly specified by the user using the
'launcher.command' option:
[app.launcher]
command = mpiexec -machinefile ${launcher.machinefile} -np ${nodes}
The macro ${launcher.machinefile} (named after the corresponding option) expands
to the filename of the generated machinefile.
|
| msg416 (view) |
Author: leif |
Date: 2007-12-11.18:43:36 |
Note: I just discovered that one must configure MPICH2 with --with-pm=mpd ... if
one configures with --with-pm=gforker (as I always do), it installs a different
'mpiexec'.
|
| msg415 (view) |
Author: leif |
Date: 2007-12-07.22:42:35 |
Brad Aagaard wrote:
> I am using mpiexec so I need something of the form:
> mpiexec -machinefile mpirun.nodes -n ${nodes}
>
> Using
> command = mpiexec -n ${nodes}
> results in the machinefile being put after the -n which mpiexec chokes on
> because the global options (machinefile) must come before the local options
> (-n). Is there a way to add -n ${nodes} to LauncherMPICH.py so that it comes
> after the machinefile setting?
|
| msg414 (view) |
Author: leif |
Date: 2007-12-07.22:40:01 |
Brad,
OK... I have 1.0.5p4. CC'ing roundup.
Thanks,
Leif
Brad Aagaard wrote:
> Leif-
>
> I am using 1.0.6p1 (the current release).
>
> /tools/common/mpich2-1.0.6p1/gcc-4.1.2_64/bin/mpiexec -machinefile foo
> mpiexec: missing arguments after global args
>
> usage:
> mpiexec [-h or -help or --help] # get this message
> mpiexec -file filename # (or -f) filename contains XML job
> description
> mpiexec [global args] [local args] executable [args]
> where global args may be
> -l # line labels by MPI rank
> -bnr # MPICH1 compatibility mode
> -machinefile # file mapping procs to machines
> -s <spec> # direct stdin to "all" or 1,2 or 2-4,6
> -1 # override default of trying 1st proc
> locally
> -ifhn # network interface to use locally
> -tv # run procs under totalview (must be
> installed)
> -tvsu # totalview startup only
> -gdb # run procs under gdb
> -m # merge output lines (default with gdb)
> -a # means assign this alias to the job
> -ecfn # output_xml_exit_codes_filename
> -g<local arg name> # global version of local arg (below)
> and local args may be
> -n <n> or -np <n> # number of processes to start
> -wdir <dirname> # working directory to start in
> -umask <umask> # umask for remote process
> -path <dirname> # place to look for executables
> -host <hostname> # host to start on
> -soft <spec> # modifier of -n value
> -arch <arch> # arch type to start on (not implemented)
> -envall # pass all env vars in current environment
> -envnone # pass no env vars
> -envlist <list of env var names> # pass current values of these vars
> -env <name> <value> # pass this value of this env var
> mpiexec [global args] [local args] executable args : [local args]
> executable...
> mpiexec -gdba jobid # gdb-attach to existing jobid
> mpiexec -configfile filename # filename contains cmd line segs as lines
> (See User Guide for more details)
>
> Examples:
> mpiexec -l -n 10 cpi 100
> mpiexec -genv QPL_LICENSE 4705 -n 3 a.out
>
> mpiexec -n 1 -host foo master : -n 4 -host mysmp slave
>
>
>
> On Friday 07 December 2007, Leif Strand wrote:
>> Brad Aagaard wrote:
>>> MPICH2
>> ??? In my copy, '-machinefile' isn't even an option:
>>
>> $ ~buildbot/opt/mpich2/bin/mpiexec -machinefile foo
>> invalid mpiexec argument -machinefile
>> Usage: mpiexec -usize <universesize> -maxtime <seconds> -exitinfo -l\
>> -n <numprocs> -soft <softness> -host <hostname> \
>> -wdir <working directory> -path <search path> \
>> -file <filename> -configfile <filename> \
>> -genvnone -genvlist <name1,name2,...> -genv name value\
>> -envnone -envlist <name1,name2,...> -env name value\
>> execname <args>\
>> [ : -n <numprocs> ... execname <args>]
>>
>>
>> --Leif
>
>
|
|
| Date |
User |
Action |
Args |
| 2008-03-20 19:40:01 | leif | set | nosy:
leif, baagaard messages:
+ msg441 |
| 2008-03-20 18:35:01 | baagaard | set | nosy:
leif, baagaard messages:
+ msg440 |
| 2008-03-20 03:35:01 | baagaard | set | nosy:
leif, baagaard messages:
+ msg439 |
| 2008-03-20 01:57:16 | leif | set | nosy:
leif, baagaard messages:
+ msg438 |
| 2008-03-20 01:05:01 | baagaard | set | files:
+ LauncherMPICH.py, Launcher.py status: resolved -> chatting messages:
+ msg437 nosy:
leif, baagaard |
| 2007-12-12 20:48:39 | leif | set | status: chatting -> resolved nosy:
leif, baagaard messages:
+ msg417 |
| 2007-12-11 18:43:36 | leif | set | nosy:
leif, baagaard messages:
+ msg416 |
| 2007-12-07 22:42:35 | leif | set | status: unread -> chatting priority: bug nosy:
leif, baagaard messages:
+ msg415 topic:
+ Pyre assignedto: leif |
| 2007-12-07 22:40:01 | leif | create | |
|