[aspect-devel] A confused "not terminated"problem with multiple nodes.

Shangxin Liu sxliu at vt.edu
Mon Jun 22 11:55:09 PDT 2015


I use mpirun  -np  $PBS_NP  aspect  file.prm in my job script ($PBS_NP is
the total processor numbers). what does the extension "_rsh" mean?


On Mon, Jun 22, 2015 at 12:56 AM, Timo Heister <timo.heister at gmail.com>
wrote:

> > Thanks for the detailed suggestions. I'll contact our system
> administrators.
> > Btw, there is another error on our cluster that I'm not sure whether is
> > related with this "not terminated" problem. Every time I run an ASPECT
> job,
> > the following error always appear in the record file:
> >
> > [mpiexec at br310] HYDT_bscd_pbs_wait_for_completion
> > (./tools/bootstrap/external/pbs_wait.c:68): tm_poll(obit_event) failed
> with
> > TM error 17002
>
> Might be related and something you should ask your admins.
>
> > This error appears both in single node and multiple nodes case, but
> doesn't
> > inhibit the results output. Our cluster uses mvapich MPI module and
> > mpicc/mpicxx compilers.
>
> Are you using mpirun_rsh in your job script?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/aspect-devel/attachments/20150622/8d0e4e82/attachment.html>


More information about the Aspect-devel mailing list