[aspect-devel] A confused "not terminated"problem with multiple nodes.

Timo Heister heister at clemson.edu
Sat Jun 20 00:34:11 PDT 2015


Hey Shangxin,

what do you mean by "the case cannot be terminated"? Does ASPECT not
stop when the computation is done? Is it not killed by the job
scheduler on the cluster (then it is a bug in their scheduler)? What
happens if you ask the scheduler to kill the job (using qdel on pbs
systems)?
You should be able to ssh into one of the nodes and kill ASPECT using
"kill" manually.


On Fri, Jun 19, 2015 at 11:54 AM, Shangxin Liu <sxliu at vt.edu> wrote:
> Hi everyone,
>
> There is a problem always confuses me. On our machine, each node has 16
> processors. When I run ASPECT with 1 node, 16 processors, and request a
> walltime larger than the finish time, the case can be terminated when
> finished. However, when I run ASPECT with multiple nodes (more than 1), the
> case cannot be terminated when finished. It can only be terminated (killed)
> when the time exceeds the request walltime. For example, if I run ASPECT
> with 3 nodes (48 processors), requesting 3 hours, the case finishes at 10
> minutes from the record file, but it cannot be terminated at 10 minutes when
> it's finished. It is terminated (killed) at 10 hours walltime by the
> machine. I also make a test in deal.II and find deal.II doesn't have this
> multiple nodes "not terminated" problem. So I suppose there may be something
> in the MPI part of ASPECT incompatible with our machine. But why one node
> cases can be terminated while multiple nodes cases cannot?
>
> Although this problem doesn't influence the results, it makes the debug very
> slow. Every time with multiple nodes case I have to wait until the requested
> walltime. Any suggestions to solve this confused problem?
>
> Best regards,
>
> Shangxin
>
>
>
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel



-- 
Timo Heister
http://www.math.clemson.edu/~heister/


More information about the Aspect-devel mailing list