54.144.95.36
  • Discoverability Visible
  • Join Policy Invite Only
  • Created 05 Jan 2021

Errors running PyLith

Version 1
by (unknown)
Version 2
by (unknown)

Deletions or items before changed

Additions or items after changed

1 -
== Errors running !PyLith ==
2 Errors when running !PyLith.
3 === Spatialdata ===
4
5 * Error:
6 {{{
7 RuntimeError: Error occurred while reading spatial database file 'FILENAME'.
8 I/O error while reading !SimpleDB data.
9 }}}
10 Make sure the ''num-locs'' values in the header matches
11 the number of lines of data and that the last line of data
12 includes an end-of-line character.
13 ----
14
15 === Running on a Cluster ===
16 Issues related to running !PyLith on a cluster or other parallel computer.
17
18 ==== !OpenMPI and Infiniband ====
19
20 * Segmentation faults when using !OpenMPI with Infiniband
21 {{{
22 PETSC ERROR: ------------------------------------------------------------------------
23 PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
24 PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
25 PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[14]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
26 PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
27 PETSC ERROR: to get more information on the crash.
28 PETSC ERROR: --------------------- Error Message ------------------------------------
29 PETSC ERROR: Signal received!
30 PETSC ERROR: ------------------------------------------------------------------------
31 PETSC ERROR: Petsc Development HG revision: 78eda070d9530a3e6c403cf54d9873c76e711d49 HG Date: Wed Oct 24 00:04:09 2012 -0400
32 PETSC ERROR: See docs/changes/index.html for recent updates.
33 PETSC ERROR: See docs/faq.html for hints about trouble shooting.
34 PETSC ERROR: See docs/index.html for manual pages.
35 PETSC ERROR: ------------------------------------------------------------------------
36 PETSC ERROR: /home/brad/pylith-1.8.0/bin/mpinemesis on a arch-linu named des-compute11.des by brad Tue Nov 13 10:44:06 2012
37 PETSC ERROR: Libraries linked from /home/brad/pylith-1.8.0/lib
38 PETSC ERROR: Configure run at Wed Nov 7 16:42:26 2012
39 PETSC ERROR: Configure options --prefix=/home/brad/pylith-1.8.0 --with-c2html=0 --with-x=0 --with-clanguage=C++ --with-mpicompilers=1 --with-debugging=0 --with-shared-libraries=1 --with-sieve=1 --download-boost=1 --download-chaco=1 --download-ml=1 --download-f-blas-lapack=1 --with-hdf5=1 --with-hdf5-include=/home/brad/pylith-1.8.0/include --with-hdf5-lib=/home/brad/pylith-1.8.0/lib/libhdf5.dylib --LIBS=-lz CPPFLAGS="-I/home/brad/pylith-1.8.0/include " LDFLAGS="-L/home/brad/pylith-1.8.0/lib " CFLAGS="-g -O2" CXXFLAGS="-g -O2 -DMPICH_IGNORE_CXX_SEEK" FCFLAGS="-g -O2" PETSC_DIR=/home/brad/build/pylith_installer/petsc-dev
40 PETSC ERROR: ------------------------------------------------------------------------
41 PETSC ERROR: User provided function() line 0 in unknown directory unknown file
42 }}}
43 This appears to be associated with how !OpenMPI interprets calls to fork() when !PyLith starts up. Set your environment (these can also be set on the command line like other !OpenMPI parameters) to turn off Infiniband support for fork so that a normal fork call is made:
44 {{{
45 export OMPI_MCA_mpi_warn_on_fork=0
46 export OMPI_MCA_btl_openib_want_fork_support=0
47 }}}
48 * Turn on processor and memory affinity by using the ''--bind-to-core'' command line argument for mpirun.
49
50 ==== Submitting to batch systems ====
51
52 ===== PBS/Torque =====
53 * pylithapp.cfg:
54 {{{
55 [pylithapp]
56 scheduler = pbs
57
58 [pylithapp.pbs]
59 shell = /bin/bash
60 qsub-options = -V -m bea -M johndoe@university.edu
61
62 [pylithapp.launcher]
63 command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}
64 }}}
65 Command line arguments:
66 {{{
67 −−nodes=NUMPROCS --scheduler.ppn=N --job.name=NAME --job.stdout=LOG_FILE
68
69 # NPROCS = total number of processes
70 # N = number of processes per compute node
71 # NAME = name of job in queue
72 # LOG_FILE = name of file where stdout will be written
73 }}}
74 ===== Sun Grid Engine =====
75 * pylithapp.cfg:
76 {{{
77 [pylithapp]
78 scheduler = sge
79
80 [pylithapp.pbs]
81 shell = /bin/bash
82 pe-name = orte
83 qsub-options = -V -m bea -M johndoe@university.edu -j y
84
85 [pylithapp.launcher]
86 command = mpirun -np ${nodes}
87 # Use the options below if not using the !OpenMPI ORTE Parallel Environment
88 #command = mpirun -np ${nodes}-machinefile ${PE_HOSTFILE} -n ${NSLOTS}
89 }}}
90 Command line arguments:
91 {{{
92 −−nodes=NPROCS --job.name=NAME --job.stdout=LOG_FILE
93
94 # NPROCS = total number of processes
95 # NAME = name of job in queue
96 # LOG_FILE = name of file where stdout will be written
97 }}}
98 ==== HDF5 and parallel I/O ====
99
100 The !PyLith HDF5 data writers (!DataWriterHDF5Mesh, etc) use HDF5 parallel I/O to write files in parallel. As noted in the !PyLith manual, this is not nearly as robust as the HDF5Ext data writers (!DataWriterHDF5!ExtMesh, etc) that write raw binary files using MPI I/O accompanied by an HDF5 metadata file written. If you experience errors when running on multiple compute nodes where jobs mysteriously get hung up with or without HDF5 error messages, switching from the !DataWriterHDF5 data writers to the !DataWriterHDF5Ext data writers may fix the problem (if HDF5 parallel I/O is the source of the problem). This will produce one raw binary file per HDF5 dataset, so it means lots more files that must be kept together.