[CIG-MC] Fwd: MPI_Isend error

Eh Tan tan2 at geodynamics.org
Wed Nov 18 00:13:58 PST 2009


Hi Magali,

Like Shijie said, the function exchange_id_d20() in CitcomCU is very 
similar to regional_exchange_id_d() in CitcomS. I don't have an 
immediate answer why one works but the other doesn't.

BTW, in your earlier email, you mentioned that the code died inside 
function mass_matrix(). In this email, the code died inside function 
gauss_seidel(). Did the code die at different places randomly?

Eh



Magali Billen wrote:
> Hello Eh,
>
> This is a run on 8 processors. If I print the stack I get:
>
> (gdb) bt
> #0  0x00002b943e3c208a in opal_progress () from
> /share/apps/openmpisb-1.3/gcc-4.4/lib/libopen-pal.so.0
> #1  0x00002b943def5c85 in ompi_request_default_wait_all () from
> /share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
> #2  0x00002b943df229d3 in PMPI_Waitall () from
> /share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
> #3  0x0000000000427ef5 in exchange_id_d20 ()
> #4  0x00000000004166f3 in gauss_seidel ()
> #5  0x000000000041884b in multi_grid ()
> #6  0x0000000000418c44 in solve_del2_u ()
> #7  0x000000000041b151 in solve_Ahat_p_fhat ()
> #8  0x000000000041b9a1 in solve_constrained_flow_iterative ()
> #9  0x0000000000411ca6 in general_stokes_solver ()
> #10 0x0000000000409c21 in main ()
>
> I've attached the version of Parallel_related.c that is used... I have 
> not modified this in anyway
> from the CIG release of CitcomCU.
>
>
> ------------------------------------------------------------------------
>
> Luckily, there are commented fprintf statements in just that part of 
> the code... we'll continue to dig...
>
> Oh, and just to eliminate the new cluster from suspicion, we 
> downloaded, compiled and ran CitcomS
> example1.cfg on the same cluster with the same compilers, and their 
> was not problem.
>
> Maybe this is the sign that I'm suppose to finally switch from 
> CitcomCU to CitcomS... :-(
> Magali
>
> On Nov 17, 2009, at 5:02 PM, Eh Tan wrote:
>
>> Hi Magali,
>>
>> How many processors are you using? If more than 100 processors are used,
>> you are seeing this bug:
>> http://www.geodynamics.org/pipermail/cig-mc/2008-March/000080.html
>>
>>
>> Eh
>>
>>
>>
>> Magali Billen wrote:
>>> One correction to the e-mail below, we've been compiling CitcomCU
>>> using openmpi on our old
>>> cluster, so the compiler on the new cluster is the same. The big
>>> difference is that the cluster
>>> is about twice as fast as the 5-year old cluster. This suggests that
>>> this change to a much faster
>>> clsuter may have exposed an existing race condition in CitcomCU??
>>> Magali
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> *From: *Magali Billen <mibillen at ucdavis.edu
>>>> <mailto:mibillen at ucdavis.edu>>
>>>> *Date: *November 17, 2009 4:23:45 PM PST
>>>> *To: *cig-mc at geodynamics.org <mailto:cig-mc at geodynamics.org>
>>>> *Subject: **[CIG-MC] MPI_Isend error*
>>>>
>>>> Hello,
>>>>
>>>> I'm using CitcomCU and am having a strange problem with problem
>>>> either hanging (no error, just doesn't
>>>> go anywhere) or it dies with an MPI_Isend error (see below).  I seem
>>>> to recall having problems with the MPI_Isend
>>>> command and the lam-mpi version of mpi, but I've not had any problems
>>>> with mpich-2.
>>>> On the new cluster we are compling with openmpi instead of MPICH-2.
>>>>
>>>> The MPI_Isend error seems to occur during Initialization in the call
>>>> to the function mass_matrix, which then
>>>> calls exchange_node_f20, which is where the call to MPI_Isend is.
>>>>
>>>> --snip--
>>>> ok14: parallel shuffle element and id arrays
>>>> ok15: construct shape functions
>>>> [farm.caes.ucdavis.edu:27041] *** An error occurred in MPI_Isend
>>>> [farm.caes.ucdavis.edu:27041] *** on communicator MPI_COMM_WORLD
>>>> [farm.caes.ucdavis.edu:27041] *** MPI_ERR_RANK: invalid rank
>>>> [farm.caes.ucdavis.edu:27041] *** MPI_ERRORS_ARE_FATAL (your MPI job
>>>> will now abort)
>>>>
>>>> Has this (or these) types of error occurred for other versions of
>>>> Citcom using MPI_Isend (it seems that CitcomS uses
>>>> this command also).   I'm not sure how to debug this error,
>>>> especially since sometimes it just hangs with no error.
>>>>
>>>> Any advice you have would be hepful,
>>>> Magali
>>>>
>>>>
>>>> -----------------------------
>>>> Associate Professor, U.C. Davis
>>>> Department of Geology/KeckCAVEs
>>>> Physical & Earth Sciences Bldg, rm 2129
>>>> Davis, CA 95616
>>>> -----------------
>>>> mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
>>>> (530) 754-5696
>>>> *-----------------------------*
>>>> *** Note new e-mail, building, office*
>>>> *    information as of Sept. 2009 ***
>>>> -----------------------------
>>>>
>>>> _______________________________________________
>>>> CIG-MC mailing list
>>>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>>>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>>>
>>> -----------------------------
>>> Associate Professor, U.C. Davis
>>> Department of Geology/KeckCAVEs
>>> Physical & Earth Sciences Bldg, rm 2129
>>> Davis, CA 95616
>>> -----------------
>>> mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
>>> (530) 754-5696
>>> *-----------------------------*
>>> *** Note new e-mail, building, office*
>>> *    information as of Sept. 2009 ***
>>> -----------------------------
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> CIG-MC mailing list
>>> CIG-MC at geodynamics.org
>>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>>>
>>
>> -- 
>> Eh Tan
>> Staff Scientist
>> Computational Infrastructure for Geodynamics
>> California Institute of Technology, 158-79
>> Pasadena, CA 91125
>> (626) 395-1693
>> http://www.geodynamics.org
>>
>> _______________________________________________
>> CIG-MC mailing list
>> CIG-MC at geodynamics.org
>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>
> -----------------------------
> Associate Professor, U.C. Davis
> Department of Geology/KeckCAVEs
> Physical & Earth Sciences Bldg, rm 2129
> Davis, CA 95616
> -----------------
> mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
> (530) 754-5696
> *-----------------------------*
> *** Note new e-mail, building, office*
> *    information as of Sept. 2009 ***
> -----------------------------
>



More information about the CIG-MC mailing list