[CIG-MC] problem launching citcoms without Batch system
Leif Strand
leif at geodynamics.org
Wed May 14 15:14:20 PDT 2008
Rob,
Yes -- my mistake, you will need something like
[CitcomS.launcher]
command = mpirun -nolocal -np ${nodes} -machinefile ${launcher.machinefile}
The variable ${launcher.machinefile} expands to the name of the
generated machine file (default "mpirun.nodes"). You can change the
name of this file using the --launcher.machinefile option:
citcoms --launcher.machinefile=foo
or in your CitcomS.cfg file:
[CitcomS.launcher]
machinefile = foo
--Leif
Robert Moucha wrote:
> Hi Leif,
>
> Thanks for the quick response, I tried your suggestion by adding
> ~/.pyre/CitcomS/CitcomS.cfg file with:
>
> [CitcomS.launcher]
> command = mpirun -nolocal -np ${nodes}
>
> It appears that I also need to pass the machine file to mpirun,
> because the error I get now is:
>
> Could not find enough machines for architecture LINUX
> --pyre-start: mpirun: exit 1
>
> LINUX is the default cluster MPICH machine file that only contains the
> head node. What variable name should I use for the machine file in
> the CitcomS.cfg file?
>
> Thanks again,
> Rob
>
> On Wed, May 14, 2008 at 4:32 PM, Leif Strand <leif at geodynamics.org> wrote:
>> Hi Rob,
>>
>> I would start by adding "--launcher.dry" to your command-line arguments:
>>
>> citcoms example1.cfg mymachines4.cfg --solver.datadir=/state/partition1/test
>> --launcher.dry
>>
>> This will print the 'mpirun' command used by CitcomS (without actually
>> executing it). It will look something like this:
>>
>> mpirun -np 4 /path/to/mpipycitcoms --pyre-start [...lots of arguments...]
>>
>> So, for example, if "-nolocal" is missing, you would then add the following
>> to your ~/.pyre/CitcomS/CitcomS.cfg file:
>>
>> [CitcomS.launcher]
>> command = mpirun -nolocal -np ${nodes}
>>
>> Once the 'mpirun' command looks right, go ahead and remove "--launcher.dry"
>> to perform an actual run.
>>
>> --Leif
>>
>> Robert Moucha wrote:
>>> Hello all,
>>>
>>> Just wondering if anyone else had a problem launching citcoms without
>>> the use of a Batch system. It appears that the job is launched only
>>> on the head node. I installed the latest 3.0.2 version and issued the
>>> following command in a working directory (the path to the
>>> CitcomS-3.0.2/bin is set)
>>>
>>> $ citcoms example1.cfg mymachines4.cfg
>>> --solver.datadir=/state/partition1/test
>>>
>>> Cannot make new directory '/state/partition1/test'
>>> Cannot make new directory '/state/partition1/test'
>>> Cannot make new directory '/state/partition1/test'
>>> Cannot make new directory '/state/partition1/test'
>>> --pyre-start: mpirun: exit 8
>>> /home/moucha/CitcomS-3.0.2/bin/citcoms:
>>> /home/moucha/CitcomS-3.0.2/bin/pycitcoms: exit 1
>>>
>>> The file mymachines4.cfg contains:
>>>
>>> [CitcomS.launcher]
>>> nodegen = c0-%g
>>> nodelist = [1-4]
>>>
>>> The mpirun.nodes file has:
>>>
>>> c0-1
>>> c0-2
>>> c0-3
>>> c0-4
>>>
>>> which is correct for our cluster.
>>>
>>> The above error makes sense, because I don't have write privileges to
>>> /state directory on the head node. If I changed the datadir parameter
>>> to a directory that I have write access to on the head node, the
>>> program runs without a problem (but only on the head node).
>>> Incidentally, the following command runs without problem on the
>>> compute nodes:
>>>
>>> mpirun -nolocal -np 4 -machinefile mpirun.nodes
>>> ~/CitcomS-3.0.2/bin/CitcomSRegional example1b.cfg
>>>
>>> I'm using Python 2.4.5, gcc-3.4.6, mpich 1.2.7p1 (can provide further
>>> info if need be).
>>>
>>> Thanks
>>> Rob
>>> _______________________________________________
>>> CIG-MC mailing list
>>> CIG-MC at geodynamics.org
>>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>>
More information about the CIG-MC
mailing list