[aspect-devel] PETSc support
Ian Rose
ian.rose at berkeley.edu
Fri Jan 17 12:37:38 PST 2014
Here are five steps of more detailed output for Trilinos and PETSc. Both
are in optimized mode, using composition_passive.prm
PETSc:
Running with 4 MPI tasks.
Number of active cells: 1,024 (on 6 levels)
Number of degrees of freedom: 22,214 (8,450+1,089+4,225+4,225+4,225)
*** Timestep 0: t=0 seconds
Solving temperature system... 0 iterations.
Solving composition system 1... 0 iterations.
Solving composition system 2... 0 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 20 iterations.
Postprocessing:
Writing graphical output: output/solution-00000
Temperature min/avg/max: 0 K, 0.5 K, 1 K
Compositions min/max/mass: 0/1/0.3854 // 0/1/0.3854
*** Timestep 1: t=0.015625 seconds
Solving temperature system... 31 iterations.
Solving composition system 1... 32 iterations.
Solving composition system 2... 39 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 5 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.5 K, 1 K
Compositions min/max/mass: -0.04786/1.017/0.3854 //
-0.04708/1.035/0.3854
*** Timestep 2: t=0.03125 seconds
Solving temperature system... 41 iterations.
Solving composition system 1... 35 iterations.
Solving composition system 2... 44 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 20 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.5 K, 1 K
Compositions min/max/mass: -0.05936/1.007/0.3854 //
-0.05107/1.01/0.3854
*** Timestep 3: t=0.046875 seconds
Solving temperature system... 43 iterations.
Solving composition system 1... 37 iterations.
Solving composition system 2... 45 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 19 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.4999 K, 1 K
Compositions min/max/mass: -0.03993/1.007/0.3854 //
-0.03474/1.004/0.3854
*** Timestep 4: t=0.0625 seconds
Solving temperature system... 41 iterations.
Solving composition system 1... 37 iterations.
Solving composition system 2... 44 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 19 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.4999 K, 1 K
Compositions min/max/mass: -0.027/1.006/0.3854 // -0.02073/1.003/0.3854
*** Timestep 5: t=0.078125 seconds
Solving temperature system... 43 iterations.
Solving composition system 1... 36 iterations.
Solving composition system 2... 43 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 19 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.4999 K, 1 K
Compositions min/max/mass: -0.0177/1.006/0.3854 //
-0.01353/1.002/0.3854
+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 3.16s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system | 6 | 0.565s | 18% |
| Assemble composition system | 12 | 0.712s | 23% |
| Assemble temperature system | 6 | 0.495s | 16% |
| Build Stokes preconditioner | 6 | 0.436s | 14% |
| Build composition preconditioner| 12 | 0.000605s | 0.019% |
| Build temperature preconditioner| 6 | 0.000622s | 0.02% |
| Solve Stokes system | 6 | 0.393s | 12% |
| Solve composition system | 12 | 0.0991s | 3.1% |
| Solve temperature system | 6 | 0.0648s | 2.1% |
| Initialization | 2 | 0.0381s | 1.2% |
| Postprocessing | 6 | 0.0721s | 2.3% |
| Setup dof systems | 1 | 0.149s | 4.7% |
+---------------------------------+-----------+------------+------------+
Trilinos:
Running with 4 MPI tasks.
Number of active cells: 1,024 (on 6 levels)
Number of degrees of freedom: 22,214 (8,450+1,089+4,225+4,225+4,225)
*** Timestep 0: t=0 seconds
Solving temperature system... 0 iterations.
Solving composition system 1... 0 iterations.
Solving composition system 2... 0 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 30+2 iterations.
Postprocessing:
Writing graphical output: output/solution-00000
Temperature min/avg/max: 0 K, 0.5 K, 1 K
Compositions min/max/mass: 0/1/0.3854 // 0/1/0.3854
*** Timestep 1: t=0.015625 seconds
Solving temperature system... 15 iterations.
Solving composition system 1... 13 iterations.
Solving composition system 2... 15 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 9 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.5 K, 1 K
Compositions min/max/mass: -0.04786/1.017/0.3854 //
-0.04708/1.035/0.3854
*** Timestep 2: t=0.03125 seconds
Solving temperature system... 16 iterations.
Solving composition system 1... 13 iterations.
Solving composition system 2... 17 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 30+1 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.5 K, 1 K
Compositions min/max/mass: -0.05936/1.007/0.3854 //
-0.05107/1.01/0.3854
*** Timestep 3: t=0.046875 seconds
Solving temperature system... 16 iterations.
Solving composition system 1... 12 iterations.
Solving composition system 2... 16 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 30 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.4999 K, 1 K
Compositions min/max/mass: -0.03993/1.007/0.3854 //
-0.03474/1.004/0.3854
*** Timestep 4: t=0.0625 seconds
Solving temperature system... 15 iterations.
Solving composition system 1... 12 iterations.
Solving composition system 2... 16 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 30 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.4999 K, 1 K
Compositions min/max/mass: -0.027/1.006/0.3854 // -0.02073/1.003/0.3854
*** Timestep 5: t=0.078125 seconds
Solving temperature system... 15 iterations.
Solving composition system 1... 12 iterations.
Solving composition system 2... 15 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 30+2 iterations.
Postprocessing:
Temperature min/avg/max: 0 K, 0.4999 K, 1 K
Compositions min/max/mass: -0.0177/1.006/0.3854 //
-0.01353/1.002/0.3854
+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 3.29s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system | 6 | 0.545s | 17% |
| Assemble composition system | 12 | 0.541s | 16% |
| Assemble temperature system | 6 | 0.353s | 11% |
| Build Stokes preconditioner | 6 | 0.51s | 15% |
| Build composition preconditioner| 12 | 0.0777s | 2.4% |
| Build temperature preconditioner| 6 | 0.0394s | 1.2% |
| Solve Stokes system | 6 | 0.657s | 20% |
| Solve composition system | 12 | 0.0948s | 2.9% |
| Solve temperature system | 6 | 0.0616s | 1.9% |
| Initialization | 2 | 0.0493s | 1.5% |
| Postprocessing | 6 | 0.0615s | 1.9% |
| Setup dof systems | 1 | 0.16s | 4.9% |
+---------------------------------+-----------+------------+------------+
On Fri, Jan 17, 2014 at 11:58 AM, Ian Rose <ian.rose at berkeley.edu> wrote:
> Alright, the problem seems to be in one of the inner CG solves. I can
> make it disappear by replacing
>
> SolverCG<LinearAlgebra::Vector> solver(solver_control);
> with
> PETScWrappers::SolverCG solver(solver_control);
>
>
> Some timing information follows, composition_passive.prm run in optimized
> mode with four processors--
>
>
> Aspect optimized, PETSc optimized:
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start | 69.9s | |
> | | | |
> | Section | no. calls | wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Assemble Stokes system | 101 | 9.88s | 14% |
> | Assemble composition system | 202 | 12.6s | 18% |
> | Assemble temperature system | 101 | 8.66s | 12% |
> | Build Stokes preconditioner | 101 | 7.66s | 11% |
> | Build composition preconditioner| 202 | 0.0144s | 0.021% |
> | Build temperature preconditioner| 101 | 0.00868s | 0.012% |
> | Solve Stokes system | 101 | 15.2s | 22% |
> | Solve composition system | 202 | 8.11s | 12% |
> | Solve temperature system | 101 | 4.12s | 5.9% |
> | Initialization | 2 | 0.0451s | 0.065% |
> | Postprocessing | 101 | 1.27s | 1.8% |
> | Setup dof systems | 1 | 0.157s | 0.22% |
> +---------------------------------+-----------+------------+------------+
>
> Aspect optimized, Trilinos:
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start | 49.1s | |
> | | | |
> | Section | no. calls | wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Assemble Stokes system | 101 | 9.16s | 19% |
> | Assemble composition system | 202 | 9.31s | 19% |
> | Assemble temperature system | 101 | 6.03s | 12% |
> | Build Stokes preconditioner | 101 | 8.51s | 17% |
> | Build composition preconditioner| 202 | 1.32s | 2.7% |
> | Build temperature preconditioner| 101 | 0.652s | 1.3% |
> | Solve Stokes system | 101 | 8.24s | 17% |
> | Solve composition system | 202 | 1.57s | 3.2% |
> | Solve temperature system | 101 | 0.836s | 1.7% |
> | Initialization | 2 | 0.0537s | 0.11% |
> | Postprocessing | 101 | 1.04s | 2.1% |
> | Setup dof systems | 1 | 0.157s | 0.32% |
> +---------------------------------+-----------+------------+------------+
>
>
>
> On Fri, Jan 17, 2014 at 11:32 AM, Ian Rose <ian.rose at berkeley.edu> wrote:
>
>> Okay, I can reproduce the 50x slowdown by linking to the debug PETSc. So
>> there's that. However, I still get unsynchronized calls to PETSc when
>> running on several processors with the debug version... It's certainly
>> possible that something is up with my install, but the step-40 tutorial
>> does seem to run fine.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://geodynamics.org/pipermail/aspect-devel/attachments/20140117/0c28ef44/attachment-0001.html>
More information about the Aspect-devel
mailing list