[CIG-MC] CitcomS crashing when reading tracer files
Dan Bower
danb at gps.caltech.edu
Thu Mar 13 11:09:41 PDT 2014
Hi CIG,
In brief, I am using a modified version of CitcomS from the svn (r16400).
To demonstrate my problem, when I read in 103345 tracers from a file
(using tracer_ic_method=1) the relevant part of the stderr looks like the
following (I added in more debug output) and the model proceeds fine:
--------------------
Beginning Mapping
Beginning Regtoel submapping
Mapping completed (26.341404 seconds)
tracer setup done
initial_mesh_solver_setup done
initialization time = 36.243746
Sum of Tracers: 103345
before find_tracers(E)
after j,k loop
before parallel_process_sync
after parallel_process_sync
after lost_souls
after free later arrays
after reduce_tracer_arrays
find_tracers(E) complete
--------------------
However, for a tracer file that contains, say 10 times more tracers
(1033449), the model crashes at the parallel_process_sync call:
---------------------------
Beginning Mapping
Beginning Regtoel submapping
Mapping completed (26.581470 seconds)
tracer setup done
initial_mesh_solver_setup done
initialization time = 35.774893
Sum of Tracers: 1033449
before find_tracers(E)
after j,k loop
before parallel_process_sync
[proxy:0:0 at compute-8-209.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN &
~POLLOUT & ~POLLHUP)) failed
[proxy:0:0 at compute-8-209.local] main (./pm/pmiserv/pmip.c:387): demux
engine error waiting for event
[mpiexec at compute-8-209.local] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:101): one of the processes terminated
badly; aborting
[mpiexec at compute-8-209.local] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error
waiting for completion
[mpiexec at compute-8-209.local] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:521): bootstrap server returned error waiting
for completion
[mpiexec at compute-8-209.local] main (./ui/mpich/mpiexec.c:548): process
manager error waiting for completion
/home/danb/cig/CitcomS-assim/bin/pycitcoms: /opt/intel/impi/
4.0.3.008/bin/mpirun: exit 255
Connection to compute-3-109 closed by remote host.^M
Connection to compute-3-110 closed by remote host.^M
Connection to compute-3-111 closed by remote host.^M
Connection to compute-4-121 closed by remote host.^M
Connection to compute-4-122 closed by remote host.^M
Connection to compute-4-140 closed by remote host.^M
Connection to compute-4-145 closed by remote host.^M
Connection to compute-7-162 closed by remote host.^M
Connection to compute-7-166 closed by remote host.^M
Connection to compute-7-167 closed by remote host.^M
Connection to compute-7-168 closed by remote host.^M
Connection to compute-7-169 closed by remote host.^M
Connection to compute-7-170 closed by remote host.^M
Connection to compute-7-171 closed by remote host.^M
Connection to compute-7-179 closed by remote host.^M
Connection to compute-7-180 closed by remote host.^M
Connection to compute-7-183 closed by remote host.^M
Connection to compute-7-184 closed by remote host.^M
Connection to compute-7-189 closed by remote host.^M
Connection to compute-7-191 closed by remote host.^M
Connection to compute-7-192 closed by remote host.^M
Connection to compute-7-193 closed by remote host.^M
Connection to compute-7-194 closed by remote host.^M
Connection to compute-8-195 closed by remote host.^M
Connection to compute-8-196 closed by remote host.^M
Connection to compute-8-198 closed by remote host.^M
Connection to compute-8-201 closed by remote host.^M
Connection to compute-8-205 closed by remote host.^M
Connection to compute-8-206 closed by remote host.^M
Connection to compute-8-207 closed by remote host.^M
Connection to compute-8-208 closed by remote host.^M
Killed by signal 15.^M
--------------------------------
parallel_process_sync() is in Parallel_util.c and is simply a call to
MPI_Barrier:
-----------------------------
void parallel_process_sync(struct All_variables *E)
{
MPI_Barrier(E->parallel.world);
return;
}
-----------------------------
Any ideas why this crashes when I have a larger input tracer file? I am
using the pyre version of CitcomS and using intel-12 and intel/impi/4.0
compilers.
Any advice greatly appreciated.
Cheers,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/cig-mc/attachments/20140313/9bea30fb/attachment.html>
More information about the CIG-MC
mailing list