[aspect-devel] exception faults when using more than one node

Timo Heister heister at clemson.edu
Thu Jul 2 02:10:12 PDT 2015


For some reason your nodes can not write into
/state/partition1/RMOUCHA/output/ - are you sure this is a valid path?

On Wed, Jul 1, 2015 at 2:36 PM, Robert Moucha <rmoucha at gmail.com> wrote:
> Hi All,
>
> First time I compiled ASPECT on cluster. I can run aspect in parallel
> on a single node, but when I use more than one node using
> mvapich-2.0.1 with ch3:mrail I get an abort.
>
> For example, I submit a job on 2 nodes each with 12 cores I get the
> following (sorry for the length). Any ideas?  Thanks.
>
> mv: cannot move `/tmp/aspect.tmp.FwNelP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0012.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.8qNelP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0013.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.flf7kP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0014.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.zJ9bkP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0015.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.PoYv2S' to
> `/state/partition1/RMOUCHA/output/solution-00000.0016.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.Slsy2S' to
> `/state/partition1/RMOUCHA/output/solution-00000.0017.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.HvAtkP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0018.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.jhyC2S' to
> `/state/partition1/RMOUCHA/output/solution-00000.0019.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.8qkGkP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0020.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.yaFFjP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0021.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.TlsAjP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0022.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.fe1QjP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0023.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.8qNelP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0013.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.flf7kP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0014.vtu': No such
> file or directory
> mv: cannot move `/tmp/aspect.tmp.FwNelP' to
> `/state/partition1/RMOUCHA/output/solution-00000.0012.vtu': No such
> file or directory
> ***** WARNING: could not move /tmp/aspect.tmp.FwNelP to
> /state/partition1/RMOUCHA/output/solution-00000.0012.vtu. Trying again
> to write directly to /sta
> te/partition1/RMOUCHA/output/solution-00000.0012.vtu. (On processor 12.)
>
> ---------------------------------------------------------
> In one of the sub-threads of this program, an exception
> was thrown and not caught. Since exceptions do not
> propagate to the main thread, the library has caught it.
> The information carried by this exception is given below.
>
> ---------------------------------------------------------
> ***** WARNING: could not move /tmp/aspect.tmp.8qNelP to
> /state/partition1/RMOUCHA/output/solution-00000.0013.vtu. Trying again
> to write directly to /sta
> te/partition1/RMOUCHA/output/solution-00000.0013.vtu. (On processor 13.)
>
>
> ---------------------------------------------------------
> In one of the sub-threads of this program, an exception
> was thrown and not caught. Since exceptions do not
> propagate to the main thread, the library has caught it.
> The information carried by this exception is given below.
>
> ---------------------------------------------------------
> Exception message:
>
> --------------------------------------------------------
> An error occurred in line <533> of file
> </home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
> in function
>     static void
> aspect::Postprocess::Visualization<dim>::background_writer(const
> std::string*, const std::string*) [with int dim = 3]
> The violated condition was:
>     false
> The name and call sequence of the exception was:
>     ExcMessage(std::string("Trying to write to file <") + *filename +
> " but the file can't be opened!")
> Additional Information:
> Trying to write to file
> </state/partition1/RMOUCHA/output/solution-00000.0013.vtu but the file
> can't be opened!
> --------------------------------------------------------
>
> Exception type:
>   N6dealii18StandardExceptions10ExcMessageE
> Aborting!
> ---------------------------------------------------------
> [gpuunit-0-1.local:mpi_rank_13][error_sighandler] Caught error:
> Aborted (signal 6)
> ***** WARNING: could not move /tmp/aspect.tmp.flf7kP to
> /state/partition1/RMOUCHA/output/solution-00000.0014.vtu. Trying again
> to write directly to /sta
> te/partition1/RMOUCHA/output/solution-00000.0014.vtu. (On processor 14.)
>
>
> ---------------------------------------------------------
> In one of the sub-threads of this program, an exception
> was thrown and not caught. Since exceptions do not
> propagate to the main thread, the library has caught it.
> The information carried by this exception is given below.
>
> ---------------------------------------------------------
> Exception message:
>
> --------------------------------------------------------
> An error occurred in line <533> of file
> </home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
> in function
>     static void
> aspect::Postprocess::Visualization<dim>::background_writer(const
> std::string*, const std::string*) [with int dim = 3]
> The violated condition was:
>     false
> The name and call sequence of the exception was:
>     ExcMessage(std::string("Trying to write to file <") + *filename +
> " but the file can't be opened!")
> Additional Information:
> Trying to write to file
> </state/partition1/RMOUCHA/output/solution-00000.0014.vtu but the file
> can't be opened!
> --------------------------------------------------------
>
> Exception type:
>   N6dealii18StandardExceptions10ExcMessageE
> Aborting!
> ---------------------------------------------------------
> [gpuunit-0-1.local:mpi_rank_14][error_sighandler] Caught error:
> Aborted (signal 6)
> mv: cannot move `/tmp/aspect.tmp.Slsy2S' to
> `/state/partition1/RMOUCHA/output/solution-00000.0017.vtu': No such
> file or directory
> Exception message:
>
> --------------------------------------------------------
> An error occurred in line <533> of file
> </home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
> in function
>     static void
> aspect::Postprocess::Visualization<dim>::background_writer(const
> std::string*, const std::string*) [with int dim = 3]
> The violated condition was:
>     false
> The name and call sequence of the exception was:
>     ExcMessage(std::string("Trying to write to file <") + *filename +
> " but the file can't be opened!")
> Additional Information:
> Trying to write to file
> </state/partition1/RMOUCHA/output/solution-00000.0012.vtu but the file
> can't be opened!
> --------------------------------------------------------
>
> Exception type:
>   N6dealii18StandardExceptions10ExcMessageE
> Aborting!
> ---------------------------------------------------------
> [gpuunit-0-1.local:mpi_rank_12][error_sighandler] Caught error:
> Aborted (signal 6)
> mv: cannot move `/tmp/aspect.tmp.PoYv2S' to
> `/state/partition1/RMOUCHA/output/solution-00000.0016.vtu': No such
> file or directory
> ***** WARNING: could not move /tmp/aspect.tmp.PoYv2S to
> /state/partition1/RMOUCHA/output/solution-00000.0016.vtu. Trying again
> to write directly to /sta
> te/partition1/RMOUCHA/output/solution-00000.0016.vtu. (On processor 16.)
>
>
> ---------------------------------------------------------
> In one of the sub-threads of this program, an exception
> was thrown and not caught. Since exceptions do not
> propagate to the main thread, the library has caught it.
> The information carried by this exception is given below.
>
> ---------------------------------------------------------
> Exception message:
>
> --------------------------------------------------------
> An error occurred in line <533> of file
> </home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
> in function
>     static void
> aspect::Postprocess::Visualization<dim>::background_writer(const
> std::string*, const std::string*) [with int dim = 3]
> The violated condition was:
>     false
> The name and call sequence of the exception was:
>     ExcMessage(std::string("Trying to write to file <") + *filename +
> " but the file can't be opened!")
> Additional Information:
> Trying to write to file
> </state/partition1/RMOUCHA/output/solution-00000.0016.vtu but the file
> can't be opened!
> --------------------------------------------------------
>
> Exception type:
>   N6dealii18StandardExceptions10ExcMessageE
> Aborting!
> ---------------------------------------------------------
> [gpuunit-0-1.local:mpi_rank_16][error_sighandler] Caught error:
> Aborted (signal 6)
> ***** WARNING: could not move /tmp/aspect.tmp.Slsy2S to
> /state/partition1/RMOUCHA/output/solution-00000.0017.vtu. Trying again
> to write directly to /sta
> te/partition1/RMOUCHA/output/solution-00000.0017.vtu. (On processor 17.)
>
>
> ---------------------------------------------------------
> In one of the sub-threads of this program, an exception
> was thrown and not caught. Since exceptions do not
> propagate to the main thread, the library has caught it.
> The information carried by this exception is given below.
>
> ---------------------------------------------------------
> Exception message:
>
> --------------------------------------------------------
> An error occurred in line <533> of file
> </home/rmoucha/install_stuff/aspect-1.3/source/postprocess/visualization.cc>
> in function
>     static void
> aspect::Postprocess::Visualization<dim>::background_writer(const
> std::string*, const std::string*) [with int dim = 3]
> The violated condition was:
>     false
> The name and call sequence of the exception was:
>     ExcMessage(std::string("Trying to write to file <") + *filename +
> " but the file can't be opened!")
> Additional Information:
> Trying to write to file
> </state/partition1/RMOUCHA/output/solution-00000.0017.vtu but the file
> can't be opened!
> --------------------------------------------------------
>
> Exception type:
>   N6dealii18StandardExceptions10ExcMessageE
> Aborting!
>
>
>
> --
> ------------------------------------------------------------
> Robert Moucha
> Assistant Professor of Geophysics
> Department of Earth Sciences
> 204 Heroy Geology Lab
> Syracuse University
> Syracuse, NY, 13244-1070
> _______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel



-- 
Timo Heister
http://www.math.clemson.edu/~heister/


More information about the Aspect-devel mailing list