Issue116

Title too cryptic: "--pyre-start: mpirun: exit 127"
Priority feature Status chatting
Superseder Nosy List leif
Assigned To leif Topics Pyre

Created on 2007-06-18.19:30:02 by leif, last changed 2008-08-25.23:51:45 by tan2.

Files
File name Uploaded Type Edit
unnamed leif, 2007-06-18.19:30:01 text/html
Messages
msg369 (view) Author: leif Date: 2007-06-18.19:30:01
Provide better diagnostic when 'mpirun' cannot be found.

Also: Does it even make sense to have a default for 'command'?

--Leif

-------- Original Message --------
Subject: 	Re: [CIG-SHORT] Problem attempting to run on mutiple processors
Date: 	Mon, 18 Jun 2007 12:09:44 -0700
From: 	Leif Strand <leif@geodynamics.org>
To: 	Oliver Boyd <olboyd@usgs.gov>
CC: 	cig-short@geodynamics.org
References: 	<006d01c7af9d$96c8b250$c45a16f0$@gov> 
<46731FA2.40207@geodynamics.org> <00e701c7b1be$ca3d8340$5eb889c0$@gov>

Oliver,

I apologize for that rather cryptic error message. It usually means it 
can't find 'mpirun'.

Naturally, you could alter your PATH so that it can find 'mpirun' in the 
context of the job. However, there is a good chance that 'mpirun' isn't 
even the right command to use on your cluster.

So, I recommend creating the file ~/.pyre/pylith3d/pylith3d.cfg and 
inserting something similar to the following:

[pylith3d.launcher]
command = /full/path/to/mpirun -np ${nodes}

Using a full pathname for 'mpirun' avoids the PATH environment problem. 
But more importantly, you will have to edit 'command' so that it 
produces the right command for your cluster. For example, if you are 
using MPICH2, you would replace 'mpirun' with 'mpiexec'. On many 
clusters, there is a special script to use (on our it's "mpirun.lsf") 
and "-np ${nodes}" is omitted:

[pylith3d.launcher]
/opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin/mpirun.lsf

To debug the "launcher" command, run PyLith as follows:

pylith3dapp.py pylith3d.cfg --launcher.dry

This will simply print the launcher command to the console, instead of 
actually running it.

--Leif

Oliver Boyd wrote:
>
> Hi Leif,
>
> It looks like when I reinstalled a few of the compute nodes, I 
> neglected to update gcc. I’ve now done this, and it appears to work 
> fine. Thanks for the hint.
>
> As an aside, could you tell me how I can use the SGE batch system? I 
> am not exactly sure why it doesn’t work. I use a program called qsub 
> to submit the job. I give it an argument which is a script containing 
> the command and a few options, e.g.
>
> > qsub pylith-1.sh
>
> where pylith-1.sh looks like
>
> #!/bin/bash
>
> #$ -cwd
>
> #$ -j y
>
> #$ -S /bin/bash
>
> pylith3dapp.py pylith3d.cfg
>
> The result of this attempt produces
>
> --pyre-start: mpirun: exit 127
>
> /usr/local/bin/pylith3dapp.py: 
> /data/Software/Store/pylith3d-0.8.2/pylith3d/pypylith3d: exit 1
>
> Oliver
>

_______________________________________________
CIG-SHORT mailing list
CIG-SHORT@geodynamics.org
http://geodynamics.org/cgi-bin/mailman/listinfo/cig-short
History
Date User Action Args
2008-08-25 23:51:45tan2setpriority: feature
status: unread -> chatting
topic: + Pyre
assignedto: leif
2007-06-18 19:30:02leifcreate