[CIG-MC] Installing ASPECT on Cray XC30

Marine Lasbleis marine.lasbleis at gmail.com
Mon Jul 10 17:00:40 PDT 2017


Hi all, 

This is my first message here, I hope it’s OK. 
I’m started to work on ASPECT, and installed it already on a desktop computer (debian with 8 cores). But would like to install it on the available clusters. (I have access to 3 different clusters. Not sure which one is the best for that… And definitely no real admin for the clusters. They are “self-organised”, which is not always for the best)

I’m trying to install ASPECT on the ELSI cluster, which is a CRAY CX30, and while having problems, I found that you may have done the same a couple of weeks ago (I saw this conversation: http://dealii.narkive.com/jCU1oGdB/deal-ii-get-errors-when-installing-dealii-on-opensuse-leap-42-1-by-using-candi <http://dealii.narkive.com/jCU1oGdB/deal-ii-get-errors-when-installing-dealii-on-opensuse-leap-42-1-by-using-candi> )

For now, what we’ve done: (before seeing candi installation)
- switch to PrgEnv-gnu 
- try to install p4est. But it seems that we need to use “ftn” and not fortran or others, so he can’t do anything, and stop very soon. I tried to modify by hand the configure file (adding ftn where I could find the system was looking for fortran of mpif77.) But I guess it’s definitely not a good idea, and I am obviously still missing a couple of call because I still got the same error. 

So, with the conversation, I guessed that https://github.com/dealii/candi <https://github.com/dealii/candi> can actually install everything for me. 
Since I’m using a slightly different cluster (CRAY XC30), I will try to give you updates on my progress. 
I’m not familiar with candi, but I decided to give a try, so please excuse me if I am doing obvious mistakes. 

I changed the configuration as requested, and loaded the required modules and defined new variables for the info on the compilers. 
In this particular cluster, we need to be careful with the path where to install (the default one is on a drive that is very slow to access, and compilation takes forever), so I had to use a -p path option. Also, I think I used first too many cores to compile, and got a memory error (internal compiler error raised, which seems to be related to available memory)

So, from my day trying to install: 
- I finished the candi.sh script, apparently everything correctly installed. 
- I built ASPECT (with this particular cluster, be careful with cmake. By default, the cmake is not up-to-date, and in particular even after installation with candi.sh, the available cmake is not the one that was installed)
I got a couple of warnings, mostly about PETSc, that I thought were only warnings and not problems.
Most of them were along the line of this one: 
warning: 'dealii::PETScWrappers::MPI::Vector::supports_distributed_data' is deprecated [-Wdeprecated-declarations] , for either PETSc or Trilinos.

- I’ve run a couple of examples from the cookbook. None are working. 

I got this from running ASPEC using aprun -n4 ../aspect burnman.prm
-----------------------------------------------------------------------------
-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
--     . version 1.5.0
--     . running in DEBUG mode
--     . running with 4 MPI processes
--     . using Trilinos
-----------------------------------------------------------------------------

[0]PETSC ERROR: [1]PETSC ERROR: [3]PETSC ERROR: [2]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: ------------------------------------------------------------------------
------------------------------------------------------------------------
[2]PETSC ERROR: ------------------------------------------------------------------------
[1]PETSC ERROR: [3]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
Caught signal number 8 FPE: Floating Point Exception,probably divide by zero
[1]PETSC ERROR: [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[1]PETSC ERROR: [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[1]PETSC ERROR: [3]PETSC ERROR: to get more information on the crash.
configure using --with-debugging=yes, recompile, link, and run
[3]PETSC ERROR: to get more information on the crash.
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
Caught signal number 8 FPE: Floating Point Exception,probably divide by zero





Any idea where this could come from? 
(any additional files I should show you?) 


Thanks! (and many thanks to the person who did the candi.sh script for Cray XC40 :-) )
Marine




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/cig-mc/attachments/20170711/f57ad6d3/attachment.html>


More information about the CIG-MC mailing list