[aspect-devel] Assemble temperature system time

Eric Heien emheien at ucdavis.edu
Fri Nov 8 10:11:26 PST 2013


I think that was likely the problem.  I recompiled with release and the numbers are more reasonable.  For 16 cores with your example I get:

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |       289s |            |
|                                             |            |            |
| Section                         | no. calls |  wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system          |      3694 |      19.4s |       6.7% |
| Assemble temperature system     |      3694 |      55.4s |        19% |
| Build Stokes preconditioner     |       322 |      8.23s |       2.9% |
| Build temperature preconditioner|      3694 |      3.16s |       1.1% |
| Solve Stokes system             |      3694 |      77.8s |        27% |
| Solve temperature system        |      3694 |      40.7s |        14% |
| Create snapshot                 |        73 |      7.87s |       2.7% |
| Initialization                  |         5 |     0.138s |     0.048% |
| Postprocessing                  |      3691 |      42.4s |        15% |
| Refine mesh structure, part 1   |       249 |      3.04s |       1.1% |
| Refine mesh structure, part 2   |       249 |     0.506s |      0.18% |
| Setup dof systems               |       250 |        13s |       4.5% |
+---------------------------------+-----------+------------+——————+


Thanks for your help,

-Eric

On Nov 6, 2013, at 5:26 PM, Wolfgang Bangerth <bangerth at math.tamu.edu> wrote:

> On 11/06/2013 06:48 PM, Eric Heien wrote:
>> Hello all,
>> 
>> I’m doing some medium sized 3D box runs on TACC Stampede now (1e6 DOF, 32 cores) and I’ve noticed the Assemble temperature system timing is very large.
>> 
>> +---------------------------------------------+------------+------------+
>> | Total wallclock time elapsed since start    |  3.15e+03s |            |
>> |                                             |            |            |
>> | Section                         | no. calls |  wall time | % of total |
>> +---------------------------------+-----------+------------+------------+
>> | Assemble Stokes system          |       201 |       315s |        10% |
>> | Assemble temperature system     |       201 |  2.16e+03s |        69% |
>> | Build Stokes preconditioner     |         1 |      26.6s |      0.85% |
>> | Build temperature preconditioner|       201 |      21.7s |      0.69% |
>> | Solve Stokes system             |       201 |        90s |       2.9% |
>> | Solve temperature system        |       201 |      13.9s |      0.44% |
>> | Initialization                  |         2 |      1.51s |     0.048% |
>> | Postprocessing                  |       201 |       310s |       9.8% |
>> | Setup dof systems               |         1 |      15.7s |       0.5% |
>> +---------------------------------+-----------+------------+——————+
>> 
>> Does anyone have a suggestion for why this might be?  I know some of the developers recently worked on improving the performance of this, which is why it seems odd it would take so much time.  Before I dig into the reasons with a profiler, I was hoping someone might know what’s wrong.  This was compiled with Intel compiler version 13.0.079, and uses deal.II r31565 and Aspect r2009.
> 
> To me, this looks a lot like you're in debug mode. For reference, the simulation from which I created this movie
>  http://www.youtube.com/watch?v=_bKqU_P4j48
> used the attached version of the 3d box convection input file. There, timing looked like this:
> 
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start    |  5.88e+05s |           |
> |                                             |            |           |
> | Section                         | no. calls |  wall time | % of total|
> +---------------------------------+-----------+------------+------------+
> | Assemble Stokes system          |     12634 |  1.44e+05s |       24% |
> | Assemble temperature system     |     12634 |  1.09e+05s |       19% |
> | Build Stokes preconditioner     |      1098 |  5.96e+04s |       10% |
> | Build temperature preconditioner|     12634 |  1.85e+04s |      3.2% |
> | Solve Stokes system             |     12634 |  1.72e+05s |       29% |
> | Solve temperature system        |     12634 |  5.18e+04s |      8.8% |
> | Create snapshot                 |       252 |       203s |    0.035% |
> | Initialization                  |         5 |     0.258s |  4.4e-05% |
> | Postprocessing                  |     12631 |  6.15e+03s |        1% |
> | Refine mesh structure, part 1   |       846 |  5.28e+03s |      0.9% |
> | Refine mesh structure, part 2   |       846 |       475s |    0.081% |
> | Setup dof systems               |       847 |  1.03e+04s |      1.7% |
> +---------------------------------+-----------+------------+------------+
> 
> I think these percentages are more realistic. This was on 64 processors.
> 
> Best
> Wolfgang
> 
> -- 
> ------------------------------------------------------------------------
> Wolfgang Bangerth               email:            bangerth at math.tamu.edu
>                                www: http://www.math.tamu.edu/~bangerth/
> 
> <3dbox.prm>_______________________________________________
> Aspect-devel mailing list
> Aspect-devel at geodynamics.org
> http://geodynamics.org/cgi-bin/mailman/listinfo/aspect-devel



More information about the Aspect-devel mailing list