[aspect-devel] PETSc support

Ian Rose ian.rose at berkeley.edu
Fri Jan 17 12:37:38 PST 2014


Here are five steps of more detailed output for Trilinos and PETSc.  Both
are in optimized mode, using composition_passive.prm

PETSc:

Running with 4 MPI tasks.
Number of active cells: 1,024 (on 6 levels)
Number of degrees of freedom: 22,214 (8,450+1,089+4,225+4,225+4,225)

*** Timestep 0:  t=0 seconds
   Solving temperature system... 0 iterations.
   Solving composition system 1... 0 iterations.
   Solving composition system 2... 0 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 20 iterations.

   Postprocessing:
     Writing graphical output:  output/solution-00000
     Temperature min/avg/max:   0 K, 0.5 K, 1 K
     Compositions min/max/mass: 0/1/0.3854 // 0/1/0.3854

*** Timestep 1:  t=0.015625 seconds
   Solving temperature system... 31 iterations.
   Solving composition system 1... 32 iterations.
   Solving composition system 2... 39 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 5 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.5 K, 1 K
     Compositions min/max/mass: -0.04786/1.017/0.3854 //
-0.04708/1.035/0.3854

*** Timestep 2:  t=0.03125 seconds
   Solving temperature system... 41 iterations.
   Solving composition system 1... 35 iterations.
   Solving composition system 2... 44 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 20 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.5 K, 1 K
     Compositions min/max/mass: -0.05936/1.007/0.3854 //
-0.05107/1.01/0.3854

*** Timestep 3:  t=0.046875 seconds
   Solving temperature system... 43 iterations.
   Solving composition system 1... 37 iterations.
   Solving composition system 2... 45 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 19 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.4999 K, 1 K
     Compositions min/max/mass: -0.03993/1.007/0.3854 //
-0.03474/1.004/0.3854

*** Timestep 4:  t=0.0625 seconds
   Solving temperature system... 41 iterations.
   Solving composition system 1... 37 iterations.
   Solving composition system 2... 44 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 19 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.4999 K, 1 K
     Compositions min/max/mass: -0.027/1.006/0.3854 // -0.02073/1.003/0.3854

*** Timestep 5:  t=0.078125 seconds
   Solving temperature system... 43 iterations.
   Solving composition system 1... 36 iterations.
   Solving composition system 2... 43 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 19 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.4999 K, 1 K
     Compositions min/max/mass: -0.0177/1.006/0.3854 //
-0.01353/1.002/0.3854



+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |      3.16s |            |
|                                             |            |            |
| Section                         | no. calls |  wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system          |         6 |     0.565s |        18% |
| Assemble composition system     |        12 |     0.712s |        23% |
| Assemble temperature system     |         6 |     0.495s |        16% |
| Build Stokes preconditioner     |         6 |     0.436s |        14% |
| Build composition preconditioner|        12 |  0.000605s |     0.019% |
| Build temperature preconditioner|         6 |  0.000622s |      0.02% |
| Solve Stokes system             |         6 |     0.393s |        12% |
| Solve composition system        |        12 |    0.0991s |       3.1% |
| Solve temperature system        |         6 |    0.0648s |       2.1% |
| Initialization                  |         2 |    0.0381s |       1.2% |
| Postprocessing                  |         6 |    0.0721s |       2.3% |
| Setup dof systems               |         1 |     0.149s |       4.7% |
+---------------------------------+-----------+------------+------------+

Trilinos:

Running with 4 MPI tasks.
Number of active cells: 1,024 (on 6 levels)
Number of degrees of freedom: 22,214 (8,450+1,089+4,225+4,225+4,225)

*** Timestep 0:  t=0 seconds
   Solving temperature system... 0 iterations.
   Solving composition system 1... 0 iterations.
   Solving composition system 2... 0 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 30+2 iterations.

   Postprocessing:
     Writing graphical output:  output/solution-00000
     Temperature min/avg/max:   0 K, 0.5 K, 1 K
     Compositions min/max/mass: 0/1/0.3854 // 0/1/0.3854

*** Timestep 1:  t=0.015625 seconds
   Solving temperature system... 15 iterations.
   Solving composition system 1... 13 iterations.
   Solving composition system 2... 15 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 9 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.5 K, 1 K
     Compositions min/max/mass: -0.04786/1.017/0.3854 //
-0.04708/1.035/0.3854

*** Timestep 2:  t=0.03125 seconds
   Solving temperature system... 16 iterations.
   Solving composition system 1... 13 iterations.
   Solving composition system 2... 17 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 30+1 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.5 K, 1 K
     Compositions min/max/mass: -0.05936/1.007/0.3854 //
-0.05107/1.01/0.3854

*** Timestep 3:  t=0.046875 seconds
   Solving temperature system... 16 iterations.
   Solving composition system 1... 12 iterations.
   Solving composition system 2... 16 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 30 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.4999 K, 1 K
     Compositions min/max/mass: -0.03993/1.007/0.3854 //
-0.03474/1.004/0.3854

*** Timestep 4:  t=0.0625 seconds
   Solving temperature system... 15 iterations.
   Solving composition system 1... 12 iterations.
   Solving composition system 2... 16 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 30 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.4999 K, 1 K
     Compositions min/max/mass: -0.027/1.006/0.3854 // -0.02073/1.003/0.3854

*** Timestep 5:  t=0.078125 seconds
   Solving temperature system... 15 iterations.
   Solving composition system 1... 12 iterations.
   Solving composition system 2... 15 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 30+2 iterations.

   Postprocessing:
     Temperature min/avg/max:   0 K, 0.4999 K, 1 K
     Compositions min/max/mass: -0.0177/1.006/0.3854 //
-0.01353/1.002/0.3854



+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |      3.29s |            |
|                                             |            |            |
| Section                         | no. calls |  wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system          |         6 |     0.545s |        17% |
| Assemble composition system     |        12 |     0.541s |        16% |
| Assemble temperature system     |         6 |     0.353s |        11% |
| Build Stokes preconditioner     |         6 |      0.51s |        15% |
| Build composition preconditioner|        12 |    0.0777s |       2.4% |
| Build temperature preconditioner|         6 |    0.0394s |       1.2% |
| Solve Stokes system             |         6 |     0.657s |        20% |
| Solve composition system        |        12 |    0.0948s |       2.9% |
| Solve temperature system        |         6 |    0.0616s |       1.9% |
| Initialization                  |         2 |    0.0493s |       1.5% |
| Postprocessing                  |         6 |    0.0615s |       1.9% |
| Setup dof systems               |         1 |      0.16s |       4.9% |
+---------------------------------+-----------+------------+------------+







On Fri, Jan 17, 2014 at 11:58 AM, Ian Rose <ian.rose at berkeley.edu> wrote:

> Alright, the problem seems to be in one of the inner CG solves.  I can
> make it disappear by replacing
>
> SolverCG<LinearAlgebra::Vector> solver(solver_control);
> with
> PETScWrappers::SolverCG solver(solver_control);
>
>
> Some timing information follows, composition_passive.prm run in optimized
> mode with four processors--
>
>
> Aspect optimized, PETSc optimized:
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start    |      69.9s |            |
> |                                             |            |            |
> | Section                         | no. calls |  wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Assemble Stokes system          |       101 |      9.88s |        14% |
> | Assemble composition system     |       202 |      12.6s |        18% |
> | Assemble temperature system     |       101 |      8.66s |        12% |
> | Build Stokes preconditioner     |       101 |      7.66s |        11% |
> | Build composition preconditioner|       202 |    0.0144s |     0.021% |
> | Build temperature preconditioner|       101 |   0.00868s |     0.012% |
> | Solve Stokes system             |       101 |      15.2s |        22% |
> | Solve composition system        |       202 |      8.11s |        12% |
> | Solve temperature system        |       101 |      4.12s |       5.9% |
> | Initialization                  |         2 |    0.0451s |     0.065% |
> | Postprocessing                  |       101 |      1.27s |       1.8% |
> | Setup dof systems               |         1 |     0.157s |      0.22% |
> +---------------------------------+-----------+------------+------------+
>
> Aspect optimized, Trilinos:
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start    |      49.1s |            |
> |                                             |            |            |
> | Section                         | no. calls |  wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Assemble Stokes system          |       101 |      9.16s |        19% |
> | Assemble composition system     |       202 |      9.31s |        19% |
> | Assemble temperature system     |       101 |      6.03s |        12% |
> | Build Stokes preconditioner     |       101 |      8.51s |        17% |
> | Build composition preconditioner|       202 |      1.32s |       2.7% |
> | Build temperature preconditioner|       101 |     0.652s |       1.3% |
> | Solve Stokes system             |       101 |      8.24s |        17% |
> | Solve composition system        |       202 |      1.57s |       3.2% |
> | Solve temperature system        |       101 |     0.836s |       1.7% |
> | Initialization                  |         2 |    0.0537s |      0.11% |
> | Postprocessing                  |       101 |      1.04s |       2.1% |
> | Setup dof systems               |         1 |     0.157s |      0.32% |
> +---------------------------------+-----------+------------+------------+
>
>
>
> On Fri, Jan 17, 2014 at 11:32 AM, Ian Rose <ian.rose at berkeley.edu> wrote:
>
>> Okay, I can reproduce the 50x slowdown by linking to the debug PETSc. So
>> there's that.  However, I still get unsynchronized calls to PETSc when
>> running on several processors with the debug version... It's certainly
>> possible that something is up with my install, but the step-40 tutorial
>> does seem to run fine.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://geodynamics.org/pipermail/aspect-devel/attachments/20140117/0c28ef44/attachment-0001.html>


More information about the Aspect-devel mailing list