[CIG-MC] Fwd: MPI_Isend error

Magali Billen mibillen at ucdavis.edu
Tue Nov 17 18:23:28 PST 2009


Hello Eh,

This is a run on 8 processors. If I print the stack I get:

(gdb) bt
#0  0x00002b943e3c208a in opal_progress () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libopen-pal.so.0
#1  0x00002b943def5c85 in ompi_request_default_wait_all () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#2  0x00002b943df229d3 in PMPI_Waitall () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#3  0x0000000000427ef5 in exchange_id_d20 ()
#4  0x00000000004166f3 in gauss_seidel ()
#5  0x000000000041884b in multi_grid ()
#6  0x0000000000418c44 in solve_del2_u ()
#7  0x000000000041b151 in solve_Ahat_p_fhat ()
#8  0x000000000041b9a1 in solve_constrained_flow_iterative ()
#9  0x0000000000411ca6 in general_stokes_solver ()
#10 0x0000000000409c21 in main ()

I've attached the version of Parallel_related.c that is used... I have  
not modified this in anyway
from the CIG release of CitcomCU.


Luckily, there are commented fprintf statements in just that part of  
the code... we'll continue to dig...

Oh, and just to eliminate the new cluster from suspicion, we  
downloaded, compiled and ran CitcomS
example1.cfg on the same cluster with the same compilers, and their  
was not problem.

Maybe this is the sign that I'm suppose to finally switch from  
CitcomCU to CitcomS... :-(
Magali

On Nov 17, 2009, at 5:02 PM, Eh Tan wrote:

> Hi Magali,
>
> How many processors are you using? If more than 100 processors are  
> used,
> you are seeing this bug:
> http://www.geodynamics.org/pipermail/cig-mc/2008-March/000080.html
>
>
> Eh
>
>
>
> Magali Billen wrote:
>> One correction to the e-mail below, we've been compiling CitcomCU
>> using openmpi on our old
>> cluster, so the compiler on the new cluster is the same. The big
>> difference is that the cluster
>> is about twice as fast as the 5-year old cluster. This suggests that
>> this change to a much faster
>> clsuter may have exposed an existing race condition in CitcomCU??
>> Magali
>>
>>
>> Begin forwarded message:
>>
>>> *From: *Magali Billen <mibillen at ucdavis.edu
>>> <mailto:mibillen at ucdavis.edu>>
>>> *Date: *November 17, 2009 4:23:45 PM PST
>>> *To: *cig-mc at geodynamics.org <mailto:cig-mc at geodynamics.org>
>>> *Subject: **[CIG-MC] MPI_Isend error*
>>>
>>> Hello,
>>>
>>> I'm using CitcomCU and am having a strange problem with problem
>>> either hanging (no error, just doesn't
>>> go anywhere) or it dies with an MPI_Isend error (see below).  I seem
>>> to recall having problems with the MPI_Isend
>>> command and the lam-mpi version of mpi, but I've not had any  
>>> problems
>>> with mpich-2.
>>> On the new cluster we are compling with openmpi instead of MPICH-2.
>>>
>>> The MPI_Isend error seems to occur during Initialization in the call
>>> to the function mass_matrix, which then
>>> calls exchange_node_f20, which is where the call to MPI_Isend is.
>>>
>>> --snip--
>>> ok14: parallel shuffle element and id arrays
>>> ok15: construct shape functions
>>> [farm.caes.ucdavis.edu:27041] *** An error occurred in MPI_Isend
>>> [farm.caes.ucdavis.edu:27041] *** on communicator MPI_COMM_WORLD
>>> [farm.caes.ucdavis.edu:27041] *** MPI_ERR_RANK: invalid rank
>>> [farm.caes.ucdavis.edu:27041] *** MPI_ERRORS_ARE_FATAL (your MPI job
>>> will now abort)
>>>
>>> Has this (or these) types of error occurred for other versions of
>>> Citcom using MPI_Isend (it seems that CitcomS uses
>>> this command also).   I'm not sure how to debug this error,
>>> especially since sometimes it just hangs with no error.
>>>
>>> Any advice you have would be hepful,
>>> Magali
>>>
>>>
>>> -----------------------------
>>> Associate Professor, U.C. Davis
>>> Department of Geology/KeckCAVEs
>>> Physical & Earth Sciences Bldg, rm 2129
>>> Davis, CA 95616
>>> -----------------
>>> mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
>>> (530) 754-5696
>>> *-----------------------------*
>>> *** Note new e-mail, building, office*
>>> *    information as of Sept. 2009 ***
>>> -----------------------------
>>>
>>> _______________________________________________
>>> CIG-MC mailing list
>>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>>
>> -----------------------------
>> Associate Professor, U.C. Davis
>> Department of Geology/KeckCAVEs
>> Physical & Earth Sciences Bldg, rm 2129
>> Davis, CA 95616
>> -----------------
>> mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
>> (530) 754-5696
>> *-----------------------------*
>> *** Note new e-mail, building, office*
>> *    information as of Sept. 2009 ***
>> -----------------------------
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> CIG-MC mailing list
>> CIG-MC at geodynamics.org
>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>>
>
> -- 
> Eh Tan
> Staff Scientist
> Computational Infrastructure for Geodynamics
> California Institute of Technology, 158-79
> Pasadena, CA 91125
> (626) 395-1693
> http://www.geodynamics.org
>
> _______________________________________________
> CIG-MC mailing list
> CIG-MC at geodynamics.org
> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc

-----------------------------
Associate Professor, U.C. Davis
Department of Geology/KeckCAVEs
Physical & Earth Sciences Bldg, rm 2129
Davis, CA 95616
-----------------
mibillen at ucdavis.edu
(530) 754-5696
-----------------------------
** Note new e-mail, building, office
     information as of Sept. 2009 **
-----------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://geodynamics.org/pipermail/cig-mc/attachments/20091117/5989bec1/attachment-0002.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Parallel_related.c
Type: application/octet-stream
Size: 39074 bytes
Desc: not available
Url : http://geodynamics.org/pipermail/cig-mc/attachments/20091117/5989bec1/attachment-0001.obj 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://geodynamics.org/pipermail/cig-mc/attachments/20091117/5989bec1/attachment-0003.htm 


More information about the CIG-MC mailing list