[CIG-MC] Fwd: MPI_Isend error
jshhuang
jshhuang at ustc.edu.cn
Wed Nov 18 02:01:07 PST 2009
Hi, Magali,
You can try to add the following to the subroutine: void parallel_domain_decomp1(struct All_variables *E) in Parallel_related.c:
----------------------------------------------------------------------------
for(j = 0; j < E->parallel.nproc; j++)
for(i = 0; i <= E->parallel.nproc; i++)
{
E->parallel.mst1[j][i] = 1;
E->parallel.mst2[j][i] = 2;
E->parallel.mst2[j][i] = 3;
}
----------------------------------------------------------------------------
I'm not sure if it works, but I thought it deserve a try. This is a machine-dependent issue.
Good luck!
Jinshui Huang
---------------------------------------
School of Earth and Space Sciences
University of Science and Technology of China
Hefei, Anhui 230026, China
0551-3606781
---------------------------------------
----- Original Message -----
From: Magali Billen
To: Eh Tan
Cc: cig-mc at geodynamics.org
Sent: Wednesday, November 18, 2009 10:23 AM
Subject: [?? Probable Spam] Re: [CIG-MC] Fwd: MPI_Isend error
Hello Eh,
This is a run on 8 processors. If I print the stack I get:
(gdb) bt
#0 0x00002b943e3c208a in opal_progress () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libopen-pal.so.0
#1 0x00002b943def5c85 in ompi_request_default_wait_all () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#2 0x00002b943df229d3 in PMPI_Waitall () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#3 0x0000000000427ef5 in exchange_id_d20 ()
#4 0x00000000004166f3 in gauss_seidel ()
#5 0x000000000041884b in multi_grid ()
#6 0x0000000000418c44 in solve_del2_u ()
#7 0x000000000041b151 in solve_Ahat_p_fhat ()
#8 0x000000000041b9a1 in solve_constrained_flow_iterative ()
#9 0x0000000000411ca6 in general_stokes_solver ()
#10 0x0000000000409c21 in main ()
I've attached the version of Parallel_related.c that is used... I have not modified this in anyway
from the CIG release of CitcomCU.
------------------------------------------------------------------------------
Luckily, there are commented fprintf statements in just that part of the code... we'll continue to dig...
Oh, and just to eliminate the new cluster from suspicion, we downloaded, compiled and ran CitcomS
example1.cfg on the same cluster with the same compilers, and their was not problem.
Maybe this is the sign that I'm suppose to finally switch from CitcomCU to CitcomS... :-(
Magali
On Nov 17, 2009, at 5:02 PM, Eh Tan wrote:
Hi Magali,
How many processors are you using? If more than 100 processors are used,
you are seeing this bug:
http://www.geodynamics.org/pipermail/cig-mc/2008-March/000080.html
Eh
Magali Billen wrote:
One correction to the e-mail below, we've been compiling CitcomCU
using openmpi on our old
cluster, so the compiler on the new cluster is the same. The big
difference is that the cluster
is about twice as fast as the 5-year old cluster. This suggests that
this change to a much faster
clsuter may have exposed an existing race condition in CitcomCU??
Magali
Begin forwarded message:
*From: *Magali Billen <mibillen at ucdavis.edu
<mailto:mibillen at ucdavis.edu>>
*Date: *November 17, 2009 4:23:45 PM PST
*To: *cig-mc at geodynamics.org <mailto:cig-mc at geodynamics.org>
*Subject: **[CIG-MC] MPI_Isend error*
Hello,
I'm using CitcomCU and am having a strange problem with problem
either hanging (no error, just doesn't
go anywhere) or it dies with an MPI_Isend error (see below). I seem
to recall having problems with the MPI_Isend
command and the lam-mpi version of mpi, but I've not had any problems
with mpich-2.
On the new cluster we are compling with openmpi instead of MPICH-2.
The MPI_Isend error seems to occur during Initialization in the call
to the function mass_matrix, which then
calls exchange_node_f20, which is where the call to MPI_Isend is.
--snip--
ok14: parallel shuffle element and id arrays
ok15: construct shape functions
[farm.caes.ucdavis.edu:27041] *** An error occurred in MPI_Isend
[farm.caes.ucdavis.edu:27041] *** on communicator MPI_COMM_WORLD
[farm.caes.ucdavis.edu:27041] *** MPI_ERR_RANK: invalid rank
[farm.caes.ucdavis.edu:27041] *** MPI_ERRORS_ARE_FATAL (your MPI job
will now abort)
Has this (or these) types of error occurred for other versions of
Citcom using MPI_Isend (it seems that CitcomS uses
this command also). I'm not sure how to debug this error,
especially since sometimes it just hangs with no error.
Any advice you have would be hepful,
Magali
-----------------------------
Associate Professor, U.C. Davis
Department of Geology/KeckCAVEs
Physical & Earth Sciences Bldg, rm 2129
Davis, CA 95616
-----------------
mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
(530) 754-5696
*-----------------------------*
*** Note new e-mail, building, office*
* information as of Sept. 2009 ***
-----------------------------
_______________________________________________
CIG-MC mailing list
CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
-----------------------------
Associate Professor, U.C. Davis
Department of Geology/KeckCAVEs
Physical & Earth Sciences Bldg, rm 2129
Davis, CA 95616
-----------------
mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
(530) 754-5696
*-----------------------------*
*** Note new e-mail, building, office*
* information as of Sept. 2009 ***
-----------------------------
------------------------------------------------------------------------
_______________________________________________
CIG-MC mailing list
CIG-MC at geodynamics.org
http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
--
Eh Tan
Staff Scientist
Computational Infrastructure for Geodynamics
California Institute of Technology, 158-79
Pasadena, CA 91125
(626) 395-1693
http://www.geodynamics.org
_______________________________________________
CIG-MC mailing list
CIG-MC at geodynamics.org
http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
-----------------------------
Associate Professor, U.C. Davis
Department of Geology/KeckCAVEs
Physical & Earth Sciences Bldg, rm 2129
Davis, CA 95616
-----------------
mibillen at ucdavis.edu
(530) 754-5696
-----------------------------
** Note new e-mail, building, office
information as of Sept. 2009 **
-----------------------------
------------------------------------------------------------------------------
Hello Eh,
This is a run on 8 processors. If I print the stack I get:
(gdb) bt
#0 0x00002b943e3c208a in opal_progress () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libopen-pal.so.0
#1 0x00002b943def5c85 in ompi_request_default_wait_all () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#2 0x00002b943df229d3 in PMPI_Waitall () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#3 0x0000000000427ef5 in exchange_id_d20 ()
#4 0x00000000004166f3 in gauss_seidel ()
#5 0x000000000041884b in multi_grid ()
#6 0x0000000000418c44 in solve_del2_u ()
#7 0x000000000041b151 in solve_Ahat_p_fhat ()
#8 0x000000000041b9a1 in solve_constrained_flow_iterative ()
#9 0x0000000000411ca6 in general_stokes_solver ()
#10 0x0000000000409c21 in main ()
I've attached the version of Parallel_related.c that is used... I have
not modified this in anyway
from the CIG release of CitcomCU.
Luckily, there are commented fprintf statements in just that part of
the code... we'll continue to dig...
Oh, and just to eliminate the new cluster from suspicion, we
downloaded, compiled and ran CitcomS
example1.cfg on the same cluster with the same compilers, and their
was not problem.
Maybe this is the sign that I'm suppose to finally switch from
CitcomCU to CitcomS... :-(
Magali
On Nov 17, 2009, at 5:02 PM, Eh Tan wrote:
> Hi Magali,
>
> How many processors are you using? If more than 100 processors are
> used,
> you are seeing this bug:
> http://www.geodynamics.org/pipermail/cig-mc/2008-March/000080.html
>
>
> Eh
>
>
>
> Magali Billen wrote:
>> One correction to the e-mail below, we've been compiling CitcomCU
>> using openmpi on our old
>> cluster, so the compiler on the new cluster is the same. The big
>> difference is that the cluster
>> is about twice as fast as the 5-year old cluster. This suggests that
>> this change to a much faster
>> clsuter may have exposed an existing race condition in CitcomCU??
>> Magali
>>
>>
>> Begin forwarded message:
>>
>>> *From: *Magali Billen <mibillen at ucdavis.edu
>>> <mailto:mibillen at ucdavis.edu>>
>>> *Date: *November 17, 2009 4:23:45 PM PST
>>> *To: *cig-mc at geodynamics.org <mailto:cig-mc at geodynamics.org>
>>> *Subject: **[CIG-MC] MPI_Isend error*
>>>
>>> Hello,
>>>
>>> I'm using CitcomCU and am having a strange problem with problem
>>> either hanging (no error, just doesn't
>>> go anywhere) or it dies with an MPI_Isend error (see below). I seem
>>> to recall having problems with the MPI_Isend
>>> command and the lam-mpi version of mpi, but I've not had any
>>> problems
>>> with mpich-2.
>>> On the new cluster we are compling with openmpi instead of MPICH-2.
>>>
>>> The MPI_Isend error seems to occur during Initialization in the call
>>> to the function mass_matrix, which then
>>> calls exchange_node_f20, which is where the call to MPI_Isend is.
>>>
>>> --snip--
>>> ok14: parallel shuffle element and id arrays
>>> ok15: construct shape functions
>>> [farm.caes.ucdavis.edu:27041] *** An error occurred in MPI_Isend
>>> [farm.caes.ucdavis.edu:27041] *** on communicator MPI_COMM_WORLD
>>> [farm.caes.ucdavis.edu:27041] *** MPI_ERR_RANK: invalid rank
>>> [farm.caes.ucdavis.edu:27041] *** MPI_ERRORS_ARE_FATAL (your MPI job
>>> will now abort)
>>>
>>> Has this (or these) types of error occurred for other versions of
>>> Citcom using MPI_Isend (it seems that CitcomS uses
>>> this command also). I'm not sure how to debug this error,
>>> especially since sometimes it just hangs with no error.
>>>
>>> Any advice you have would be hepful,
>>> Magali
>>>
>>>
>>> -----------------------------
>>> Associate Professor, U.C. Davis
>>> Department of Geology/KeckCAVEs
>>> Physical & Earth Sciences Bldg, rm 2129
>>> Davis, CA 95616
>>> -----------------
>>> mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
>>> (530) 754-5696
>>> *-----------------------------*
>>> *** Note new e-mail, building, office*
>>> * information as of Sept. 2009 ***
>>> -----------------------------
>>>
>>> _______________________________________________
>>> CIG-MC mailing list
>>> CIG-MC at geodynamics.org <mailto:CIG-MC at geodynamics.org>
>>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>>
>> -----------------------------
>> Associate Professor, U.C. Davis
>> Department of Geology/KeckCAVEs
>> Physical & Earth Sciences Bldg, rm 2129
>> Davis, CA 95616
>> -----------------
>> mibillen at ucdavis.edu <mailto:mibillen at ucdavis.edu>
>> (530) 754-5696
>> *-----------------------------*
>> *** Note new e-mail, building, office*
>> * information as of Sept. 2009 ***
>> -----------------------------
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> CIG-MC mailing list
>> CIG-MC at geodynamics.org
>> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
>>
>
> --
> Eh Tan
> Staff Scientist
> Computational Infrastructure for Geodynamics
> California Institute of Technology, 158-79
> Pasadena, CA 91125
> (626) 395-1693
> http://www.geodynamics.org
>
> _______________________________________________
> CIG-MC mailing list
> CIG-MC at geodynamics.org
> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
-----------------------------
Associate Professor, U.C. Davis
Department of Geology/KeckCAVEs
Physical & Earth Sciences Bldg, rm 2129
Davis, CA 95616
-----------------
mibillen at ucdavis.edu
(530) 754-5696
-----------------------------
** Note new e-mail, building, office
information as of Sept. 2009 **
-----------------------------
------------------------------------------------------------------------------
_______________________________________________
CIG-MC mailing list
CIG-MC at geodynamics.org
http://geodynamics.org/cgi-bin/mailman/listinfo/cig-mc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://geodynamics.org/pipermail/cig-mc/attachments/20091118/cd5434aa/attachment-0001.htm
More information about the CIG-MC
mailing list