Issue143

Title Crashes gale-1.2.2 with sphere_in_cylinder.xml
Priority bug Status chatting
Superseder Nosy List bill, walter
Assigned To walter Topics Gale

Created on 2008-04-18.19:19:28 by bill, last changed 2008-05-12.17:06:19 by walter.

Messages
msg453 (view) Author: walter Date: 2008-05-12.17:06:19
I think you might be running out of memory.  Try running with the command line
arguments

--elementResI=16 --elementResJ=32 --elementResK=32

Also, add the lines

   <param name="journal.info">True</param>
   <param name="journal.debug">True</param>
   <param name="journal-level.info">2</param>
   <param name="journal-level.debug">2</param>

near the end, and you will get more debugging information.
msg447 (view) Author: bill Date: 2008-04-18.19:19:28
I'm using rocks (a centos-4 based cluster distribution) trying to get the
parallel version of Gale working with our infinipath MPI stack. 
petsc-2.3.2-pl10 seems to work fine.  But when I run:
/usr/bin/mpirun -q 0 -np 4 /share/apps/gale-1.2.2/bin/Gale \
input/benchmarks/falling_sphere/sphere_in_cylinder.xml

I get after an 30-120 minutes of running on 4,8,16 or 20 CPUs running on
opterons I get (no normal output whatsoever, just the error):
Gale: build/StgFEM/SLE/SystemSetup/src/StiffnessMatrix.c:545:
_StiffnessMatrix_Build: Assertion `self->rowLocalSize' failed.

Gale:30908 terminated with signal 6 at PC=3c8972e21d SP=7fbffff268.  Backtrace:
/lib/../lib64/tls/libc.so.6(gsignal+0x3d)[0x3c8972e21d]
/lib/../lib64/tls/libc.so.6(abort+0xfe)[0x3c8972fa1e]
/lib/../lib64/tls/libc.so.6(__assert_fail+0xf1)[0x3c89727ae1]
/share/apps/gale-1.2.2/bin/Gale(_StiffnessMatrix_Build+0x179)[0x537cde]
Gale: build/StgFEM/SLE/SystemSetup/src/StiffnessMatrix.c:545:
_StiffnessMatrix_Build: Assertion `self->rowLocalSize' failed.

Gale:7810 terminated with signal 6 at PC=3b2f22e21d SP=7fbfffeed8.  Backtrace:
/lib/../lib64/tls/libc.so.6(gsignal+0x3d)[0x3b2f22e21d]
/lib/../lib64/tls/libc.so.6(abort+0xfe)[0x3b2f22fa1e]
/lib/../lib64/tls/libc.so.6(__assert_fail+0xf1)[0x3b2f227ae1]
/share/apps/gale-1.2.2/bin/Gale(_StiffnessMatrix_Build+0x179)[0x537cde]
MPIRUN.icompute-4-19: 2 ranks have not yet exited 60 seconds after rank 3 (node
icompute-3-11) exited without reaching MPI_Finalize().
MPIRUN.icompute-4-19: Waiting at most another 60 seconds for the remaining ranks
to do a clean shutdown before terminating 2 node processes

Any idea if this is a problem with Gale?  Petsc?  Or something specific to my
install?  I tried another example which seems to get a normal response from:
/usr/bin/mpirun -q 0 -np 4 /share/apps/gale-1.2.2/bin/Gale
input/benchmarks/extension.xml
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
( it's still running)
History
Date User Action Args
2008-05-12 17:06:19waltersetstatus: unread -> chatting
messages: + msg453
2008-05-02 22:55:05tan2settopic: + Gale
nosy: + walter
assignedto: walter
2008-04-18 19:19:28billcreate