You are here: Home / Groups / Short-Term Crustal Dynamics / Wiki / PyLith / Errors running PyLith
3.138.134.221
  • Discoverability Visible
  • Join Policy Invite Only
  • Created 05 Jan 2021

PyLith /

Errors running PyLith

Errors when running PyLith.

Spatialdata

  • Error:
  RuntimeError: Error occurred while reading spatial database file 'FILENAME'.
  I/O error while reading !SimpleDB data.

Make sure the num-locs values in the header matches the number of lines of data and that the last line of data includes an end-of-line character.


Running on a Cluster

Issues related to running PyLith on a cluster or other parallel computer.

OpenMPI and Infiniband

  • Segmentation faults when using OpenMPI with Infiniband
  PETSC ERROR: ------------------------------------------------------------------------
  PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
  PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
  PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[14]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
  PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
  PETSC ERROR: to get more information on the crash.
  PETSC ERROR: --------------------- Error Message ------------------------------------
  PETSC ERROR: Signal received!
  PETSC ERROR: ------------------------------------------------------------------------
  PETSC ERROR: Petsc Development HG revision: 78eda070d9530a3e6c403cf54d9873c76e711d49  HG Date: Wed Oct 24 00:04:09 2012 -0400
  PETSC ERROR: See docs/changes/index.html for recent updates.
  PETSC ERROR: See docs/faq.html for hints about trouble shooting.
  PETSC ERROR: See docs/index.html for manual pages.
  PETSC ERROR: ------------------------------------------------------------------------
  PETSC ERROR: /home/brad/pylith-1.8.0/bin/mpinemesis on a arch-linu named des-compute11.des by brad Tue Nov 13 10:44:06 2012
  PETSC ERROR: Libraries linked from /home/brad/pylith-1.8.0/lib
  PETSC ERROR: Configure run at Wed Nov  7 16:42:26 2012
  PETSC ERROR: Configure options --prefix=/home/brad/pylith-1.8.0 --with-c2html=0 --with-x=0 --with-clanguage=C++ --with-mpicompilers=1 --with-debugging=0 --with-shared-libraries=1 --with-sieve=1 --download-boost=1 --download-chaco=1 --download-ml=1 --download-f-blas-lapack=1 --with-hdf5=1 --with-hdf5-include=/home/brad/pylith-1.8.0/include --with-hdf5-lib=/home/brad/pylith-1.8.0/lib/libhdf5.dylib --LIBS=-lz CPPFLAGS="-I/home/brad/pylith-1.8.0/include " LDFLAGS="-L/home/brad/pylith-1.8.0/lib " CFLAGS="-g -O2" CXXFLAGS="-g -O2 -DMPICH_IGNORE_CXX_SEEK" FCFLAGS="-g -O2" PETSC_DIR=/home/brad/build/pylith_installer/petsc-dev
  PETSC ERROR: ------------------------------------------------------------------------
  PETSC ERROR: User provided function() line 0 in unknown directory unknown file

This appears to be associated with how OpenMPI interprets calls to fork() when PyLith starts up. Set your environment (these can also be set on the command line like other OpenMPI parameters) to turn off Infiniband support for fork so that a normal fork call is made:

  export OMPI_MCA_mpi_warn_on_fork=0
  export OMPI_MCA_btl_openib_want_fork_support=0
  • Turn on processor and memory affinity by using the —bind-to-core command line argument for mpirun.

Submitting to batch systems

PBS/Torque
  • pylithapp.cfg:
  [pylithapp]
  scheduler = pbs
  
  [pylithapp.pbs]
  shell = /bin/bash
  qsub-options = -V -m bea -M johndoe@university.edu
  
  [pylithapp.launcher]
  command = mpirun -np ${nodes} -machinefile ${PBS_NODEFILE}

Command line arguments:

  −−nodes=NUMPROCS --scheduler.ppn=N --job.name=NAME --job.stdout=LOG_FILE
  
  # NPROCS = total number of processes
  # N = number of processes per compute node
  # NAME = name of job in queue
  # LOG_FILE = name of file where stdout will be written
Sun Grid Engine
  • pylithapp.cfg:
  [pylithapp]
  scheduler = sge
  
  [pylithapp.pbs]
  shell = /bin/bash
  pe-name = orte
  qsub-options = -V -m bea -M johndoe@university.edu -j y
  
  [pylithapp.launcher]
  command = mpirun -np ${nodes}
  # Use the options below if not using the !OpenMPI ORTE Parallel Environment
  #command = mpirun -np ${nodes}-machinefile ${PE_HOSTFILE} -n ${NSLOTS}

Command line arguments:

  −−nodes=NPROCS --job.name=NAME --job.stdout=LOG_FILE

  # NPROCS = total number of processes
  # NAME = name of job in queue
  # LOG_FILE = name of file where stdout will be written

HDF5 and parallel I/O

The PyLith HDF5 data writers (DataWriterHDF5Mesh, etc) use HDF5 parallel I/O to write files in parallel. As noted in the PyLith manual, this is not nearly as robust as the HDF5Ext data writers (DataWriterHDF5ExtMesh, etc) that write raw binary files using MPI I/O accompanied by an HDF5 metadata file written. If you experience errors when running on multiple compute nodes where jobs mysteriously get hung up with or without HDF5 error messages, switching from the DataWriterHDF5 data writers to the DataWriterHDF5Ext data writers may fix the problem (if HDF5 parallel I/O is the source of the problem). This will produce one raw binary file per HDF5 dataset, so it means lots more files that must be kept together.

Created on , Last modified on