[aspect-devel] Aspect hangs after several time steps
Lev Karatun
lev.karatun at gmail.com
Mon Feb 1 16:42:00 PST 2016
Hi again,
I know this is not really an Aspect problem, but I just don't know who else
to ask. I tried searching online and posting the question on stackoverflow,
but didn't get any replies. I'd appreciate if you could help me with it.
I was trying to use gdb to debug my code, but it's complaining about
missing packages:
Program received signal SIGFPE, Arithmetic exception.
> __mpn_lshift () at ../sysdeps/x86_64/lshift.S:26
> 26 movq -8(%rsi,%rdx,8), %mm7
> Missing separate debuginfos, use:
> debuginfo-install blas-3.2.1-4.el6.x86_64 libtool-ltdl-2.2.6-15.5.el6.x86_64
Tried to install the missing packages, but got another error:
> # debuginfo-install blas-3.2.1-4.el6.x86_64
> libtool-ltdl-2.2.6-15.5.el6.x86_64
> Loaded plugins: auto-update-debuginfo, fastestmirror, priorities,
> refresh-packagekit
> Loading mirror speeds from cached hostfile
> * base: mirror.gpmidi.net
> * extras: mirror.csclub.uwaterloo.ca
> * rpmforge: mirror.nexcess.net
> * updates: mirror.gpmidi.net
> Could not find debuginfo for main pkg: blas-3.2.1-4.el6.x86_64
> Could not find debuginfo pkg for dependency package blas-3.2.1-4.el6.x86_64
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package gcc-debuginfo-4.4.7-16.el6.x86_64 already installed and latest
> version
> Package gcc-debuginfo-4.4.7-16.el6.x86_64 already installed and latest
> version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Could not find debuginfo for main pkg: libtool-ltdl-2.2.6-15.5.el6.x86_64
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Package glibc-debuginfo-2.12-1.166.el6_7.3.x86_64 already installed and
> latest version
> Could not find debuginfo pkg for dependency package
> libtool-ltdl-2.2.6-15.5.el6.x86_64
> No debuginfo packages available to install
I downloaded the package, but when trying to install it manually, it says
it's already installed:
> # sudo rpm -ivh blas-3.2.1-4.el6.x86_64.rpm
> warning: blas-3.2.1-4.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key
> ID 41a40948: NOKEY
> Preparing... ###########################################
> [100%]
> package blas-3.2.1-4.el6.x86_64 is already installed
> file /usr/lib64/libblas.so.3.2.1 from install of
> blas-3.2.1-4.el6.x86_64 conflicts with file from package
> blas-3.2.1-4.el6.x86_64
I also tried substituting Debuginfo repo with a local one into which I put
these 2 files, but it didn't work either:
> # debuginfo-install blas-3.2.1-4.el6.x86_64
> libtool-ltdl-2.2.6-15.5.el6.x86_64
> Loaded plugins: auto-update-debuginfo, fastestmirror, priorities,
> refresh-packagekit
> Repository local is listed more than once in the configuration
> Loading mirror speeds from cached hostfile
> * base: centos.mirror.ca.planethoster.net
> * extras: mirror.csclub.uwaterloo.ca
> * rpmforge: mirror.nexcess.net
> * updates: centos.mirror.iweb.ca
> local
> | 2.9 kB
> 00:00 ...
> Could not find debuginfo for main pkg: blas-3.2.1-4.el6.x86_64
> Could not find debuginfo pkg for dependency package blas-3.2.1-4.el6.x86_64
> Could not find debuginfo for main pkg: libtool-ltdl-2.2.6-15.5.el6.x86_64
> Could not find debuginfo pkg for dependency package
> libtool-ltdl-2.2.6-15.5.el6.x86_64
> No debuginfo packages available to install
Could you please tell me what I'm doing wrong?..
Thanks in advance.
Best regards,
Lev Karatun.
2016-02-01 3:16 GMT-05:00 Lev Karatun <lev.karatun at gmail.com>:
> Nevermind, I see it's gdb now (sorry for multiple emails).
>
> Best regards,
> Lev Karatun.
>
> 2016-02-01 3:11 GMT-05:00 Lev Karatun <lev.karatun at gmail.com>:
>
>> Hi Timo,
>>
>> turns out I didn't have the latest version of Aspect, that's why I
>> couldn't see the exceptions. Anyway, I sort of see them now, but only in
>> this form:
>>
>> [titan:05192] *** Process received signal ***
>>> [titan:05192] Signal: Floating point exception (8)
>>> [titan:05192] Signal code: Invalid floating point operation (7)
>>> [titan:05192] Failing at address: 0x3db3041cb0
>>> [titan:05192] [ 0] /lib64/libpthread.so.0[0x3db380f790]
>>> [titan:05192] [ 1] /lib64/libc.so.6[0x3db3041cb0]
>>> [titan:05192] [ 2] /lib64/libc.so.6(__printf_fp+0x97e)[0x3db304a53e]
>>> [titan:05192] [ 3] /lib64/libc.so.6(_IO_vfprintf+0x18d0)[0x3db30458a0]
>>> [titan:05192] [ 4] /lib64/libc.so.6(vsnprintf+0xa2)[0x3db306f752]
>>> [titan:05192] [ 5] /usr/lib64/libstdc++.so.6[0x3dbd87eb4f]
>>> [titan:05192] [ 6]
>>> /usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE15_M_insert_floatIdEES3_S3_RSt8ios_baseccT_+0xd3)[0x3dbd880f23]
>>> [titan:05192] [ 7]
>>> /usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE6do_putES3_RSt8ios_basecd+0x19)[0x3dbd881249]
>>> [titan:05192] [ 8]
>>> /usr/lib64/libstdc++.so.6(_ZNSo9_M_insertIdEERSoT_+0x9f)[0x3dbd89487f]
>>> [titan:05192] [ 9]
>>> /home/lev/aspect/dealii_debug_new/lib/libdeal_II.g.so.8.3.0(_ZNK6dealii8Patterns6Double11descriptionEv+0xb6)[0x7fb34e3a9468]
>>> [titan:05192] [10]
>>> /home/lev/aspect/dealii_debug_new/lib/libdeal_II.g.so.8.3.0(_ZN6dealii16ParameterHandler13declare_entryERKSsS2_RKNS_8Patterns11PatternBaseES2_+0x6a1)[0x7fb34e3b0569]
>>> [titan:05192] [11]
>>> ../aspect(_ZN6aspect10ParametersILi3EE18declare_parametersERN6dealii16ParameterHandlerE+0xc4a)[0x103fef2]
>>> [titan:05192] [12]
>>> ../aspect(_ZN6aspect9SimulatorILi3EE18declare_parametersERN6dealii16ParameterHandlerE+0x18)[0x104e2e8]
>>> [titan:05192] [13] ../aspect(main+0x721)[0x12e50a9]
>>> [titan:05192] [14] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3db301ed5d]
>>> [titan:05192] [15] ../aspect[0xeab859]
>>> [titan:05192] *** End of error message ***
>>> Floating point exception (core dumped)
>>
>>
>> which is not too helpful because it doesn't give me neither the source
>> file name nor the line number. Is it determined by the compiler? Is there a
>> way for me to see which line is causing problems?
>>
>> Thanks in advance.
>>
>> Best regards,
>> Lev Karatun.
>>
>> 2016-01-28 17:40 GMT-05:00 Timo Heister <heister at clemson.edu>:
>>
>>> > Thanks for pointing this out. I reran the setup in Debug mode on 1
>>> core and
>>> > the only error messages I got were related to the the names of the
>>> > compositional fields. After 12 hours of running there is not a single
>>> > message neither in stdout nor in stderr. Am I doing something wrong?
>>> Do I
>>> > need manually edit the Makefile to add compilation options like -Wall
>>> or
>>> > -O0? How can I see these errors?
>>>
>>> You might have a system where ASPECT_USE_FP_EXCEPTIONS is not
>>> supported (take a look at your detailed.log in your aspect build
>>> directory). This will generate several of the errors I listed. Just
>>> read the errors I posted.
>>>
>>> > Also, my understanding was that a division by zero error prevents the
>>> code
>>> > from continuing to run, which happened a lot (even when running in
>>> release
>>> > mode) before I added a couple checks in the plugin, but never happened
>>> after
>>> > it. How does Aspect even run if I have such errors?
>>>
>>> You might be doing a division by zero without using the result later on.
>>>
>>> > Also, this expression contains no divisions, so I don't understand how
>>> this
>>> > can be a division by zero error...
>>>
>>> Read my email again, the error is that you are reading from
>>> in.strain_rate even if it is a vector of size zero (no strain rate
>>> available). That is undefined behavior.
>>>
>>>
>>> --
>>> Timo Heister
>>> http://www.math.clemson.edu/~heister/
>>> _______________________________________________
>>> Aspect-devel mailing list
>>> Aspect-devel at geodynamics.org
>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/aspect-devel/attachments/20160201/d3b1d2be/attachment-0001.html>
More information about the Aspect-devel
mailing list