[aspect-devel] exception faults when using more than one node

Robert Moucha rmoucha at gmail.com
Wed Jul 1 11:36:39 PDT 2015


Hi All,

First time I compiled ASPECT on cluster. I can run aspect in parallel
on a single node, but when I use more than one node using
mvapich-2.0.1 with ch3:mrail I get an abort.

For example, I submit a job on 2 nodes each with 12 cores I get the
following (sorry for the length). Any ideas?  Thanks.

mv: cannot move `/tmp/aspect.tmp.FwNelP' to
`/state/partition1/RMOUCHA/output/solution-00000.0012.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.8qNelP' to
`/state/partition1/RMOUCHA/output/solution-00000.0013.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.flf7kP' to
`/state/partition1/RMOUCHA/output/solution-00000.0014.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.zJ9bkP' to
`/state/partition1/RMOUCHA/output/solution-00000.0015.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.PoYv2S' to
`/state/partition1/RMOUCHA/output/solution-00000.0016.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.Slsy2S' to
`/state/partition1/RMOUCHA/output/solution-00000.0017.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.HvAtkP' to
`/state/partition1/RMOUCHA/output/solution-00000.0018.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.jhyC2S' to
`/state/partition1/RMOUCHA/output/solution-00000.0019.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.8qkGkP' to
`/state/partition1/RMOUCHA/output/solution-00000.0020.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.yaFFjP' to
`/state/partition1/RMOUCHA/output/solution-00000.0021.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.TlsAjP' to
`/state/partition1/RMOUCHA/output/solution-00000.0022.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.fe1QjP' to
`/state/partition1/RMOUCHA/output/solution-00000.0023.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.8qNelP' to
`/state/partition1/RMOUCHA/output/solution-00000.0013.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.flf7kP' to
`/state/partition1/RMOUCHA/output/solution-00000.0014.vtu': No such
file or directory
mv: cannot move `/tmp/aspect.tmp.FwNelP' to
`/state/partition1/RMOUCHA/output/solution-00000.0012.vtu': No such
file or directory
***** WARNING: could not move /tmp/aspect.tmp.FwNelP to
/state/partition1/RMOUCHA/output/solution-00000.0012.vtu. Trying again
to write directly to /sta
te/partition1/RMOUCHA/output/solution-00000.0012.vtu. (On processor 12.)

---------------------------------------------------------
In one of the sub-threads of this program, an exception
was thrown and not caught. Since exceptions do not
propagate to the main thread, the library has caught it.
The information carried by this exception is given below.

---------------------------------------------------------
***** WARNING: could not move /tmp/aspect.tmp.8qNelP to
/state/partition1/RMOUCHA/output/solution-00000.0013.vtu. Trying again
to write directly to /sta
te/partition1/RMOUCHA/output/solution-00000.0013.vtu. (On processor 13.)


---------------------------------------------------------
In one of the sub-threads of this program, an exception
was thrown and not caught. Since exceptions do not
propagate to the main thread, the library has caught it.
The information carried by this exception is given below.

---------------------------------------------------------
Exception message:

--------------------------------------------------------
An error occurred in line <533> of file
</home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
in function
    static void
aspect::Postprocess::Visualization<dim>::background_writer(const
std::string*, const std::string*) [with int dim = 3]
The violated condition was:
    false
The name and call sequence of the exception was:
    ExcMessage(std::string("Trying to write to file <") + *filename +
" but the file can't be opened!")
Additional Information:
Trying to write to file
</state/partition1/RMOUCHA/output/solution-00000.0013.vtu but the file
can't be opened!
--------------------------------------------------------

Exception type:
  N6dealii18StandardExceptions10ExcMessageE
Aborting!
---------------------------------------------------------
[gpuunit-0-1.local:mpi_rank_13][error_sighandler] Caught error:
Aborted (signal 6)
***** WARNING: could not move /tmp/aspect.tmp.flf7kP to
/state/partition1/RMOUCHA/output/solution-00000.0014.vtu. Trying again
to write directly to /sta
te/partition1/RMOUCHA/output/solution-00000.0014.vtu. (On processor 14.)


---------------------------------------------------------
In one of the sub-threads of this program, an exception
was thrown and not caught. Since exceptions do not
propagate to the main thread, the library has caught it.
The information carried by this exception is given below.

---------------------------------------------------------
Exception message:

--------------------------------------------------------
An error occurred in line <533> of file
</home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
in function
    static void
aspect::Postprocess::Visualization<dim>::background_writer(const
std::string*, const std::string*) [with int dim = 3]
The violated condition was:
    false
The name and call sequence of the exception was:
    ExcMessage(std::string("Trying to write to file <") + *filename +
" but the file can't be opened!")
Additional Information:
Trying to write to file
</state/partition1/RMOUCHA/output/solution-00000.0014.vtu but the file
can't be opened!
--------------------------------------------------------

Exception type:
  N6dealii18StandardExceptions10ExcMessageE
Aborting!
---------------------------------------------------------
[gpuunit-0-1.local:mpi_rank_14][error_sighandler] Caught error:
Aborted (signal 6)
mv: cannot move `/tmp/aspect.tmp.Slsy2S' to
`/state/partition1/RMOUCHA/output/solution-00000.0017.vtu': No such
file or directory
Exception message:

--------------------------------------------------------
An error occurred in line <533> of file
</home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
in function
    static void
aspect::Postprocess::Visualization<dim>::background_writer(const
std::string*, const std::string*) [with int dim = 3]
The violated condition was:
    false
The name and call sequence of the exception was:
    ExcMessage(std::string("Trying to write to file <") + *filename +
" but the file can't be opened!")
Additional Information:
Trying to write to file
</state/partition1/RMOUCHA/output/solution-00000.0012.vtu but the file
can't be opened!
--------------------------------------------------------

Exception type:
  N6dealii18StandardExceptions10ExcMessageE
Aborting!
---------------------------------------------------------
[gpuunit-0-1.local:mpi_rank_12][error_sighandler] Caught error:
Aborted (signal 6)
mv: cannot move `/tmp/aspect.tmp.PoYv2S' to
`/state/partition1/RMOUCHA/output/solution-00000.0016.vtu': No such
file or directory
***** WARNING: could not move /tmp/aspect.tmp.PoYv2S to
/state/partition1/RMOUCHA/output/solution-00000.0016.vtu. Trying again
to write directly to /sta
te/partition1/RMOUCHA/output/solution-00000.0016.vtu. (On processor 16.)


---------------------------------------------------------
In one of the sub-threads of this program, an exception
was thrown and not caught. Since exceptions do not
propagate to the main thread, the library has caught it.
The information carried by this exception is given below.

---------------------------------------------------------
Exception message:

--------------------------------------------------------
An error occurred in line <533> of file
</home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
in function
    static void
aspect::Postprocess::Visualization<dim>::background_writer(const
std::string*, const std::string*) [with int dim = 3]
The violated condition was:
    false
The name and call sequence of the exception was:
    ExcMessage(std::string("Trying to write to file <") + *filename +
" but the file can't be opened!")
Additional Information:
Trying to write to file
</state/partition1/RMOUCHA/output/solution-00000.0016.vtu but the file
can't be opened!
--------------------------------------------------------

Exception type:
  N6dealii18StandardExceptions10ExcMessageE
Aborting!
---------------------------------------------------------
[gpuunit-0-1.local:mpi_rank_16][error_sighandler] Caught error:
Aborted (signal 6)
***** WARNING: could not move /tmp/aspect.tmp.Slsy2S to
/state/partition1/RMOUCHA/output/solution-00000.0017.vtu. Trying again
to write directly to /sta
te/partition1/RMOUCHA/output/solution-00000.0017.vtu. (On processor 17.)


---------------------------------------------------------
In one of the sub-threads of this program, an exception
was thrown and not caught. Since exceptions do not
propagate to the main thread, the library has caught it.
The information carried by this exception is given below.

---------------------------------------------------------
Exception message:

--------------------------------------------------------
An error occurred in line <533> of file
</home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
in function
    static void
aspect::Postprocess::Visualization<dim>::background_writer(const
std::string*, const std::string*) [with int dim = 3]
The violated condition was:
    false
The name and call sequence of the exception was:
    ExcMessage(std::string("Trying to write to file <") + *filename +
" but the file can't be opened!")
Additional Information:
Trying to write to file
</state/partition1/RMOUCHA/output/solution-00000.0017.vtu but the file
can't be opened!
--------------------------------------------------------

Exception type:
  N6dealii18StandardExceptions10ExcMessageE
Aborting!



-- 
------------------------------------------------------------
Robert Moucha
Assistant Professor of Geophysics
Department of Earth Sciences
204 Heroy Geology Lab
Syracuse University
Syracuse, NY, 13244-1070


More information about the Aspect-devel mailing list