[CIG-MC] Installing ASPECT on Cray XC30

Matthew Knepley knepley at mcs.anl.gov
Mon Jul 10 18:45:39 PDT 2017


On Mon, Jul 10, 2017 at 5:00 PM, Marine Lasbleis <marine.lasbleis at gmail.com>
wrote:

> Hi all,
>
> This is my first message here, I hope it’s OK.
> I’m started to work on ASPECT, and installed it already on a desktop
> computer (debian with 8 cores). But would like to install it on the
> available clusters. (I have access to 3 different clusters. Not sure which
> one is the best for that… And definitely no real admin for the clusters.
> They are “self-organised”, which is not always for the best)
>
> I’m trying to install ASPECT on the ELSI cluster, which is a CRAY CX30,
> and while having problems, I found that you may have done the same a couple
> of weeks ago (I saw this conversation: http://dealii.
> narkive.com/jCU1oGdB/deal-ii-get-errors-when-installing-
> dealii-on-opensuse-leap-42-1-by-using-candi )
>
> For now, what we’ve done: (before seeing candi installation)
> - switch to PrgEnv-gnu
> - try to install p4est. But it seems that we need to use “ftn” and not
> fortran or others, so he can’t do anything, and stop very soon. I tried to
> modify by hand the configure file (adding ftn where I could find the system
> was looking for fortran of mpif77.) But I guess it’s definitely not a good
> idea, and I am obviously still missing a couple of call because I still got
> the same error.
>
> So, with the conversation, I guessed that https://github.com/dealii/candi can
> actually install everything for me.
> Since I’m using a slightly different cluster (CRAY XC30), I will try to
> give you updates on my progress.
> I’m not familiar with candi, but I decided to give a try, so please excuse
> me if I am doing obvious mistakes.
>
> I changed the configuration as requested, and loaded the required modules
> and defined new variables for the info on the compilers.
> In this particular cluster, we need to be careful with the path where to
> install (the default one is on a drive that is very slow to access, and
> compilation takes forever), so I had to use a -p path option. Also, I think
> I used first too many cores to compile, and got a memory error (internal
> compiler error raised, which seems to be related to available memory)
>
> So, from my day trying to install:
> - I finished the candi.sh script, apparently everything correctly
> installed.
> - I built ASPECT (with this particular cluster, be careful with cmake. By
> default, the cmake is not up-to-date, and in particular even after
> installation with candi.sh, the available cmake is not the one that was
> installed)
> I got a couple of warnings, mostly about PETSc, that I thought were only
> warnings and not problems.
> Most of them were along the line of this one:
> warning: 'dealii::PETScWrappers::MPI::Vector::supports_distributed_data'
> is deprecated [-Wdeprecated-declarations] , for either PETSc or Trilinos.
>
> - I’ve run a couple of examples from the cookbook. None are working.
>
> I got this from running ASPEC using aprun -n4 ../aspect burnman.prm
> ------------------------------------------------------------
> -----------------
> -- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
> --     . version 1.5.0
> --     . running in DEBUG mode
> --     . running with 4 MPI processes
> --     . using Trilinos
> ------------------------------------------------------------
> -----------------
>
> [0]PETSC ERROR: [1]PETSC ERROR: [3]PETSC ERROR: [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> ------------------------------------------------------------------------
> [2]PETSC ERROR: ------------------------------
> ------------------------------------------
> [1]PETSC ERROR: [3]PETSC ERROR: Caught signal number 8 FPE: Floating Point
> Exception,probably divide by zero
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> Caught signal number 8 FPE: Floating Point Exception,probably divide by
> zero
> [1]PETSC ERROR: [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
> Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: [3]PETSC ERROR: or try http://valgrind.org on GNU/linux
> and Apple Mac OS X to find memory corruption errors
> or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [1]PETSC ERROR: [3]PETSC ERROR: configure using --with-debugging=yes,
> recompile, link, and run
> or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [1]PETSC ERROR: [3]PETSC ERROR: to get more information on the crash.
> configure using --with-debugging=yes, recompile, link, and run
> [3]PETSC ERROR: to get more information on the crash.
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> Caught signal number 8 FPE: Floating Point Exception,probably divide by
> zero
>
> Any idea where this could come from?
>

This does not appear to actually be a PETSc error. It appears that ASPECT
calls PetscInitialize
even when 'using Trilinos'. This installs a signal handler (unless you
unload it), which caught
the FPE signal generated somewhere in ASPECT code.

I suggest you run this under valgrind. I also suggest not debugging in
parallel before serial things work.

  Thanks,

     Matt


> (any additional files I should show you?)
>
>
> Thanks! (and many thanks to the person who did the candi.sh script for
> Cray XC40 :-) )
> Marine
>
>
>
>
>
> _______________________________________________
> CIG-MC mailing list
> CIG-MC at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/cig-mc/attachments/20170710/f71a22c1/attachment.html>


More information about the CIG-MC mailing list