[CIG-SHORT] PETSc error when running Pylith on a cluster

Brad Aagaard baagaard at usgs.gov
Thu Feb 16 13:40:17 PST 2012


Hongfeng,

Remember that rate and state friction has some bugs in v1.6.2 so the 
solver errors you are getting may be related to bugs in the friction 
implementation. I suggest running a problem without friction (e.g., 
step01, step03) to see if the solver errors disappear.

I am currently working on resolving the last few issues related to 
friction and quasi-static time stepping.

Brad


On 02/16/2012 12:40 PM, Hongfeng Yang wrote:
> Each node on the cluster has 8 cores.
> We ran 24 tests for each of 5 configurations. (95 total runs)
> 1 node, 2 nodes, 4 nodes, 6 nodes, 8 nodes.
>
> The 1 node, 8 core jobs were 100% successful (24 passes)
> The 2 node, 16 core jobs were 33% successful (8 passes)
> The higher node/core count jobs all failed
>
> Attached is the stdout file.
>
> The full run command is the following:
>
> /usr/mpi/gcc/openmpi-1.4.3/bin/mpirun --hostfile $PBS_NODEFILE -np
> $NPROCS /home/username/pylith57/bin/mpinemesis --pyre-start
> /home/username/pylith57/bin:/home/username/pylith57/lib/python2.6/site-packages/pythia-0.8.1.12-py2.6.egg:/home/username/pylith57/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg:/home/username/pylith57/lib/python2.6/site-packages/merlin-1.7.egg:/home/username/pylith57/lib/python2.6/site-packages:/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/src/pylith/examples/3d/hex8:/home/username/pylith57/lib/python26.zip:/home/username/pylith57/lib/python2.6/lib-dynload:/home/username/pylith57/lib/python2.6:/home/username/pylith57/lib/python2.6/plat-linux2:/home/username/pylith57/lib/python2.6/lib-tk:/home/username/pylith57/lib/python2.6/lib-old:/home/username/pylith57/lib64/python/site-packages:/home/usern
ame/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/lib/python/site-packages::/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/src/pylith/examples/3d/hex8:/home/username/pylith57/lib/python26.zip:/home/username/pylith57/lib/python2.6/plat-linux2:/home/username/pylith57/lib/python2.6/lib-tk:/home/username/pylith57/lib/python2.6/lib-old
> pythia mpi:mpistart pylith.apps.PyLithApp:PyLithApp step14.cfg
> --nodes=$NPROCS --petsc.start_in_debugger --launcher.dry --nodes=$NPROCS
> --macros.nodes=$NPROCS --macros.job.name= --macros.job.id=8403>&
> ./$PBS_JOBID.log
>
>
> Thanks,
>
> Hongfeng
>
> On 02/16/2012 10:46 AM, Brad Aagaard wrote:
>> Hongfeng,
>>
>> Please send everything that was written to stdout. Also please indicate
>> what NPROCS is (how many processes you are using). It also helps when
>> you state what command you entered on the command line so that we can
>> see if we can reproduce what you did.
>>
>> The error message you list only indicates that one of the processes
>> aborted because another process already aborted due to an error. The
>> message associated with the real error should have been written earlier.
>>
>> Brad
>>
>>
>> On 02/16/2012 07:26 AM, Hongfeng Yang wrote:
>>> Hi All,
>>>
>>> The cluster is running CentOS 5.7. Options to build Pylith are
>>>
>>>
>>> $HOME/src57/pylith/pylith-installer-1.6.2-0/configure \
>>> --enable-python --with-make-threads=2 \
>>> --with-petsc-options="--download-chaco=1 --download-ml=1
>>> --download-f-blas-lapack=1 --with-debugging=yes" \
>>> --prefix=$HOME/pylith
>>>
>>>
>>> However, the following error message appears when running an example on
>>> the cluster.
>>>
>>> [30]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>>
>>>
>>> So, we have successfully built debugging into petsc, but it is not
>>> enabled.
>>>
>>> Here is the full run command:
>>>
>>> /usr/mpi/gcc/openmpi-1.4.3/bin/mpirun --hostfile $PBS_NODEFILE -np
>>> $NPROCS /home/username/pylith57/bin/mpinemesis --pyre-start
>>> /home/username/pylith57/bin:/home/username/pylith57/lib/python2.6/site-packages/pythia-0.8.1.12-py2.6.egg:/home/username/pylith57/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg:/home/username/pylith57/lib/python2.6/site-packages/merlin-1.7.egg:/home/username/pylith57/lib/python2.6/site-packages:/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/src/pylith/examples/3d/hex8:/home/username/pylith57/lib/python26.zip:/home/username/pylith57/lib/python2.6/lib-dynload:/home/username/pylith57/lib/python2.6:/home/username/pylith57/lib/python2.6/plat-linux2:/home/username/pylith57/l
>>>
>> ib
>>> /python2.6/lib-tk:/home/username/pylith57/lib/python2.6/lib-old:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/lib/python/site-packages::/home/username/pylith57/lib/python/site-packages:/home/username/pylith57/lib64/python/site-packages:/home/username/pylith57/src/pylith/examples/3d/hex8:/home/username/pylith57/lib/python26.zip:/home/username/pylith57/lib/python2.6/plat-linux2:/home/username/pylith57/lib/python2.6/lib-tk:/home/username/pylith57/lib/python2.6/lib-old
>>> pythia mpi:mpistart pylith.apps.PyLithApp:PyLithApp step14.cfg
>>> --nodes=$NPROCS --petsc.start_in_debugger --launcher.dry
>>> --nodes=$NPROCS --macros.nodes=$NPROCS --macros.job.name=
>>> --macros.job.id=8403>& ./$PBS_JOBID.log
>>>
>>> Here is the full error message which states that we are not in
>>> debugging mode:
>>>
>>> [35]PETSC ERROR: --------------------- Stack Frames
>>> ------------------------------------
>>> [30]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [30]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or
>>> the batch system) has told this process to end
>>> [30]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> [30]PETSC ERROR: or
>>> seehttp://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[30]PETSC
>>> ERROR: or tryhttp://valgrind.org on GNU/linux and Apple Mac OS X to
>>> find memory corruption errors
>>> [30]PETSC ERROR: likely location of problem given in stack below
>>> [30]PETSC ERROR: --------------------- Stack Frames
>>> ------------------------------------
>>>
>>>
>>>
>>>
>>> Anyone could help? Thanks!
>>>
>>> Hongfeng Yang
>>>
>> _______________________________________________
>> CIG-SHORT mailing list
>> CIG-SHORT at geodynamics.org
>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>>
>
>
>
>
> _______________________________________________
> CIG-SHORT mailing list
> CIG-SHORT at geodynamics.org
> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-short



More information about the CIG-SHORT mailing list