[aspect-devel] A confused "not terminated"problem with multiple nodes.

Scott King sdk at vt.edu
Mon Jun 22 12:00:37 PDT 2015


This strange error message shows up for everyone all the time (apparently).  
I agree that it *could* be part of the problem but does not seem to affect other 
codes.  The sysadmins know about it but strangely seem unconcernedā€¦  :(    

Shangxin, make sure that the mpi module that is loaded when you compile
is the same mpi module used when you run.   Same with compilers.   It is 
possible it is an mpich vs. openmpi issue but I would suspect more major 
problems if you had the wrong flavor of mpiā€¦

Scott

On Jun 22, 2015, at 12:56 AM, Timo Heister <timo.heister at gmail.com> wrote:

>> Thanks for the detailed suggestions. I'll contact our system administrators.
>> Btw, there is another error on our cluster that I'm not sure whether is
>> related with this "not terminated" problem. Every time I run an ASPECT job,
>> the following error always appear in the record file:
>> 
>> [mpiexec at br310] HYDT_bscd_pbs_wait_for_completion
>> (./tools/bootstrap/external/pbs_wait.c:68): tm_poll(obit_event) failed with
>> TM error 17002
> 
> Might be related and something you should ask your admins.
> 
>> This error appears both in single node and multiple nodes case, but doesn't
>> inhibit the results output. Our cluster uses mvapich MPI module and
>> mpicc/mpicxx compilers.
> 
> Are you using mpirun_rsh in your job script?
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel



More information about the Aspect-devel mailing list