[CIG-SHORT] Pylith Segmentation Fault when running a dynamic rupture simulation

Brad Aagaard baagaard at usgs.gov
Thu Jun 28 10:43:43 PDT 2018


Ge Li,

My guess is that Charles is correct and that you are running out of 
memory. You can use global uniform refinement flag to have PyLith refine 
the mesh by a factor of 2x or 4x after it is distributed. This means you 
can generate your mesh at a coarser resolution and reduce the memory use 
when the mesh is loaded while still running a high resolution simulation.

[pylithapp.mesh_generator]
# Refine mesh by a factor of 2
refiner = pylith.topology.RefineUniform
refiner.levels = 1

Use refiner.levels = 2 to refine by a factor of 4.

Regards,
Brad


On 06/27/2018 04:57 PM, Charles Williams wrote:
> Dear Ge,
> 
> How much memory per node does the cluster have?  Your mesh is fairly 
> large (~13M vertices, ~78M cells).  Also, the fact that there was a SEGV 
> while distributing the mesh makes it seem likely that you have run out 
> of memory, since this is where PyLith uses the most memory.
> 
> There was some memory logging facility within PyLith, but I”m not sure 
> if it still works.  The alternative would be to login to the first node 
> on which PyLith is running, and run ‘top’ as your code is running.  Keep 
> an eye on the memory usage and determine when it maxes out.  If it 
> approaches the amount of RAM for that node, I would suspect that is the 
> problem.
> 
> Cheers,
> Charles
> 
> p.s.  The other question is whether you are sharing nodes with any other 
> jobs, which could also limit your available memory.
> 
> 
>> On 28/06/2018, at 10:58 AM, Ge Li <ge.li2 at mail.mcgill.ca 
>> <mailto:ge.li2 at mail.mcgill.ca>> wrote:
>>
>> Hi there,
>> I’m using Pylith to simulate a dynamic rupture on a Cluster.
>> However, the program was aborted due a segmentation fault error  when
>> distributing the mesh:
>> [0]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>> probably memory access out of range.
>> As suggested in the manual, I rerun Pylith under the debugger mode and 
>> with valgrind turned on, respectively, which
>> Generated some log files. These files are attached here.
>> I also tried:
>>
>> export OMPI_MCA_mpi_warn_on_fork=0
>>
>> export OMPI_MCA_btl_openib_want_fork_support=0
>>
>> But it didn’t work.
>> This error was eliminated when :
>>
>>  1. Run with a coarser mesh;
>>  2. Run with a single processor.
>>
>> Can you help me to target any potential issues that may cause this 
>> problem?
>> Many thanks!
>> -- 
>> *Ge Li*
>> /Ph.D. Candidate/
>> /Department of Earth & Planetary Science,/
>> /McGill University/
>> /3450 University Street/
>> /Montreal, QC, Canada/
>> /H3A 0E8/
>> /ge.li2 at mail.mcgill.ca <mailto:ge.li2 at mail.mcgill.ca>/
>> <valgrind-log><debugger_error.zip>_______________________________________________
>> CIG-SHORT mailing list
>> CIG-SHORT at geodynamics.org <mailto:CIG-SHORT at geodynamics.org>
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-short
> 
> *Charles Williams I Geodynamic Modeler
> GNS Science **I** Te Pῡ Ao*
> 1 Fairway Drive, Avalon 5010, PO Box 30368, Lower Hutt 5040, New Zealand
> *Ph* 0064-4-570-4566 I *Mob* 0064-22-350-7326 I *Fax* 0064-4-570-4600*_
> _**http://www.gns.cri.nz/**I**Email: **C.Williams at gns.cri.nz* 
> <mailto:your.email at gns.cri.nz>
> 



More information about the CIG-SHORT mailing list