[CIG-SHORT] Pylith-running mpi on cluster

Niloufar Abolfathian niloufar.abolfathian at gmail.com
Thu Feb 8 15:32:26 PST 2018


Hi,

Thanks again for helping me with this code. This is the work I am
collaborating with Chris Johnson. I have cc'd him on this email.
As I explained we are trying to run it for 10,000 years, but after ~1800
years the code will crash. That is why in the cfg file I only run it for
1780 years!
In addition, the code is taking ~2days to run. We want to run it on mpi and
be faster!

By problem size, what I mean is the size of your mesh (number of vertices
and cells).
My mesh is a 3 dimension including 159681 nodes and 150000 elements.


Also, the actual run log (not just the PETSc summary) would be helpful, as
it shows us what is happening with convergence.
I am not really sure what is the run log and how I can find it.


Also, did you run the problem on 24 nodes or 24 cores on the cluster?  If
24 nodes, how many cores per node?
We tried to run it on
i) 1 node, 24 core, linux server (shared memory) and
ii) 2 nodes, 24 core each linux server (mpi)
but all of the runs, took the same amount of time as I just was running on
my own mac.


If you send all of your .cfg files (including the one with your job
submission information), that might help.
I have attached a zip file including all my cfg files and also my mesh
model. If you try to run it for 2000 years it will crashed.

For running without mpi but on different nodes of 1 core we tried:  "pylith
your.cfg --nodes=24" .  We tried with downloaded binaries and building from
source.  No performance difference.

Best,
Niloufar



On Thu, Feb 8, 2018 at 3:12 PM, Matthew Knepley <knepley at rice.edu> wrote:

> On Fri, Feb 9, 2018 at 4:58 AM, Charles Williams <willic3 at gmail.com>
> wrote:
>
>> Hi Niloufar,
>>
>> By problem size, what I mean is the size of your mesh (number of vertices
>> and cells).  Also, the actual run log (not just the PETSc summary) would be
>> helpful, as it shows us what is happening with convergence.  Also, did you
>> run the problem on 24 nodes or 24 cores on the cluster?  If 24 nodes, how
>> many cores per node?  If you send all of your .cfg files (including the one
>> with your job submission information), that might help.
>>
>
> It is possible for 24 cores to be on a single node in a modern machine.
> The best thing to do would be to run the STREAMS benchmark on your
> compute machine, so we could see how much speedup we expect. However, the
> output from
>
>   --petsc.log_view
>
> would be an acceptable substitute.
>
>   Thanks,
>
>      Matt
>
>
>> Cheers,
>> Charles
>>
>>
>> On 8/02/2018, at 4:29 PM, Niloufar Abolfathian <
>> niloufar.abolfathian at gmail.com> wrote:
>>
>> Hi, thanks for your replies. Here are my answers to your questions.
>>
>> 1.  What size of problem are you running?
>> I am running a quasi-static model to simulate a vertically dipping
>> strike-slip fault with static friction that is loaded by tectonic forces.
>> The boundary conditions include a far-field velocity of 1 cm/yr and an
>> initial displacement of 0.1 m applied normal to the fault surface to
>> maintain a compressive stress on the fault. I want to run this simple model
>> for thousands of years. The first issue is that the model will give
>> run-time error after ~1800 years. The second problem I am encountering
>> is that each run will take more than two days! That is why I am trying to
>> use multicore so it may run faster. From Matt's link, I understand that I
>> should not expect the program to run faster when using multicore on my own
>> mac, but I have tried it on 24 nodes on the cluster and it took the same
>> time as on my own mac.
>>
>> 2.  What solver settings are you using?
>> pylith.problems.SolverNonlinear
>>
>> 3.  Is this a linear or nonlinear problem?
>> A nonlinear problem.
>>
>> 4.  Is this a 2D or 3D problem?
>> A 3D problem.
>>
>> 5.  What does the run log show?  This will include convergence
>> information and a PETSc summary of calls, etc.
>> I did not have my PETSc summary for the runs. I made a new run for only
>> 200 years and the summary is attached as a text file. And here is my
>> PETSc configuration:
>>
>> # Set the solver options.
>> [pylithapp.petsc]
>> malloc_dump =
>>
>> # Preconditioner settings.
>> pc_type = asm
>> sub_pc_factor_shift_type = nonzero
>>
>> # Convergence parameters.
>> ksp_rtol = 1.0e-8
>> ksp_atol = 1.0e-12
>> ksp_max_it = 500
>> ksp_gmres_restart = 50
>>
>> # Linear solver monitoring options.
>> ksp_monitor = true
>> #ksp_view = true
>> ksp_converged_reason = true
>> ksp_error_if_not_converged = true
>>
>> # Nonlinear solver monitoring options.
>> snes_rtol = 1.0e-8
>> snes_atol = 1.0e-12
>> snes_max_it = 100
>> snes_monitor = true
>> snes_linesearch_monitor = true
>> #snes_view = true
>> snes_converged_reason = true
>> snes_error_if_not_converged = true
>>
>>
>> Hope this information can help. Please let me know if I need to provide
>> you with any other information.
>>
>> Thanks,
>> Niloufar
>>
>>
>>
>> On Wed, Feb 7, 2018 at 4:01 AM, Matthew Knepley <knepley at rice.edu> wrote:
>>
>>> On Wed, Feb 7, 2018 at 2:24 PM, Charles Williams <willic3 at gmail.com>
>>> wrote:
>>>
>>>> Dear Niloufar,
>>>>
>>>> It is hard to diagnose your problem without more information.
>>>> Information that would be helpful includes:
>>>>
>>>> 1.  What size of problem are you running?
>>>> 2.  What solver settings are you using?
>>>> 3.  Is this a linear or nonlinear problem?
>>>> 4.  Is this a 2D or 3D problem?
>>>> 5.  What does the run log show?  This will include convergence
>>>> information and a PETSc summary of calls, etc.
>>>>
>>>> There are probably other things it would be good to know, but this
>>>> should get us started.
>>>>
>>>
>>> In addition to the points Charles makes, it is very useful to understand
>>> how performance is affected by architecture.
>>> The advantages of multiple cores are very often oversold by vendors.
>>> Here is a useful reference:
>>>
>>>   http://www.mcs.anl.gov/petsc/documentation/faq.html#computers
>>>
>>> I recommend running the streams program, which can be found in the PETSc
>>> installation.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>>
>>>> Cheers,
>>>> Charles
>>>>
>>>>
>>>> On 7/02/2018, at 1:06 PM, Niloufar Abolfathian <
>>>> niloufar.abolfathian at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am trying to run my code on the cluster but I have not gotten any
>>>> improvements when using multiple cores.
>>>>
>>>> What I have tried:
>>>>
>>>> Downloaded binaries for both Mac and Linux.  Single core vs multiple
>>>> cores (2 and 24 for Mac and Linux respectively) takes the same amount of
>>>> time.
>>>>
>>>> Compiled from source, no speed up either using shared memory or mpi,
>>>> even though the correct number of mpinemesis processes show up on multiple
>>>> nodes.
>>>>
>>>> I appreciate if you can help me with running the mpi on the cluster.
>>>>
>>>> Thanks,
>>>> Niloufar
>>>> _______________________________________________
>>>> CIG-SHORT mailing list
>>>> CIG-SHORT at geodynamics.org
>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>>>>
>>>>
>>>>
>>>> *Charles Williams I Geodynamic ModelerGNS Science **I** Te Pῡ Ao*
>>>> 1 Fairway Drive, Avalon 5010, PO Box 30368, Lower Hutt 5040, New Zealand
>>>> *Ph* 0064-4-570-4566 I *Mob* 0064-22-350-7326 I *Fax* 0064-4-570-4600
>>>> *http://www.gns.cri.nz/* <http://www.gns.cri.nz/> *I* *Email: *
>>>> *C.Williams at gns.cri.nz* <your.email at gns.cri.nz>
>>>>
>>>>
>>>> _______________________________________________
>>>> CIG-SHORT mailing list
>>>> CIG-SHORT at geodynamics.org
>>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>>>>
>>>
>>>
>>> _______________________________________________
>>> CIG-SHORT mailing list
>>> CIG-SHORT at geodynamics.org
>>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>>>
>>
>> <PETSc_log_summary.txt>_______________________________________________
>> CIG-SHORT mailing list
>> CIG-SHORT at geodynamics.org
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>>
>>
>>
>> *Charles Williams I Geodynamic ModelerGNS Science **I** Te Pῡ Ao*
>> 1 Fairway Drive, Avalon 5010, PO Box 30368, Lower Hutt 5040, New Zealand
>> *Ph* 0064-4-570-4566 I *Mob* 0064-22-350-7326 I *Fax* 0064-4-570-4600
>> *http://www.gns.cri.nz/* <http://www.gns.cri.nz/> *I* *Email: *
>> *C.Williams at gns.cri.nz* <your.email at gns.cri.nz>
>>
>>
>> _______________________________________________
>> CIG-SHORT mailing list
>> CIG-SHORT at geodynamics.org
>> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>>
>
>
> _______________________________________________
> CIG-SHORT mailing list
> CIG-SHORT at geodynamics.org
> http://lists.geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.geodynamics.org/pipermail/cig-short/attachments/20180208/dda4b2d2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: model1.zip
Type: application/zip
Size: 1931374 bytes
Desc: not available
URL: <http://lists.geodynamics.org/pipermail/cig-short/attachments/20180208/dda4b2d2/attachment-0001.zip>


More information about the CIG-SHORT mailing list