[cig-commits] r16375 - in seismo/3D/SPECFEM3D_GLOBE: tags/v5.0.0 tags/v5.0.0/UTILS trunk trunk/UTILS/oldstuff
dkomati1 at geodynamics.org
dkomati1 at geodynamics.org
Wed Mar 3 12:31:40 PST 2010
Author: dkomati1
Date: 2010-03-03 12:31:40 -0800 (Wed, 03 Mar 2010)
New Revision: 16375
Removed:
seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/UTILS/oldstuff/
seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE
Modified:
seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess
seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90
seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess
seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90
Log:
improved comments in specfem3D.f90, restored useful flags in flags.guess,
and suppressed the "oldstuff" directory in tags/v5.0.0
Modified: seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess 2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess 2010-03-03 20:31:40 UTC (rev 16375)
@@ -26,10 +26,10 @@
# Intel ifort Fortran90 for Linux
#
if test x"$FLAGS_CHECK" = x; then
- FLAGS_CHECK="-O3"
+ FLAGS_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
fi
if test x"$FLAGS_NO_CHECK" = x; then
- FLAGS_NO_CHECK="-O3"
+ FLAGS_NO_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
fi
# useful for debugging...
#if test x"$FLAGS_CHECK" = x; then
@@ -38,7 +38,7 @@
#fi
#if test x"$FLAGS_NO_CHECK" = x; then
# # standard options (leave option -ftz, which is *critical* for performance)
- # FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
+ # FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
#fi
#
# Intel Nehalem processor architecture, Intel compiler version 10.1
Modified: seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90 2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90 2010-03-03 20:31:40 UTC (rev 16375)
@@ -764,12 +764,14 @@
!-------------------------------------------------------------------------------------------------
!-------------------------------------------------------------------------------------------------
!-------------------------------------------------------------------------------------------------
-! trivia about the programming style adopted here
!
-! note 1: (it seems) for performance reasons, we will try to use as much from the stack memory as possible.
-! stack memory is a place in computer memory where all the variables that are declared
+! trivia about the programming style adopted here:
+!
+! note 1: for performance reasons, we try to use as much from the stack memory as possible.
+! This is done to avoid memory fragmentation and also to optimize performance.
+! Stack memory is a place in computer memory where all the variables that are declared
! and initialized **before** runtime are stored. Our static array allocation will use that one.
-! all variables declared within our main routine also will be stored on the stack.
+! All variables declared within our main routine also will be stored on the stack.
!
! the heap is the section of computer memory where all the variables created or initialized
! **at** runtime are stored. it is used for dynamic memory allocation.
@@ -788,34 +790,27 @@
! passing them along as arguments to the routine makes the code slower.
! it seems that this stack/heap criterion is more complicated.
!
-! another reason why modules are avoided, is to make the code thread safe.
+! another reason why modules are avoided is to make the code thread safe.
! having different threads access the same data structure and modifying it at the same time
! would lead to problems. passing arguments is a way to avoid such complications.
!
-! nevertheless, it would be nice to test - where possible - , if using modules
-! together with static arrays would perform as well as this.
-! at least, it would make the code more elegant and less error prone...
+! note 2: Most of the computation time is spent
+! inside the time loop (mainly in the compute_forces_crust_mantle_Dev() routine).
+! Any code performance tuning will be most effective in there.
!
-! note 2: in general, most of the computation time for our earthquake simulations is spent
-! inside the time loop (mainly the compute_forces_crust_mantle_Dev() routine).
-! any code performance tuning will be most effective in there.
-!
-! note 3: fortran is a code language that uses column-first ordering for arrays,
+! note 3: Fortran is a code language that uses column-first ordering for arrays,
! e.g., it stores a(i,j) in this order: a(1,1),a(2,1),a(3,1),...,a(1,2),a(2,2),a(3,2),..
-! it is therefor more efficient to have the inner-loop over i, and the outer loop over j
-! for this reason, e.g. the indexing for the pre-computed sourcearrays changed
+! it is therefore more efficient to have the inner-loop over i, and the outer loop over j
!
-! note 4: Deville routines help compilers to better vectorize the do-loops and
-! for most compilers, will result in a significant speedup ( > 30%).
+! note 4: Deville et al. (2002) routines significantly reduce the total number of memory accesses
+! required to perform matrix-matrix products at the spectral element level.
+! For most compilers and hardware, will result in a significant speedup (> 30% or more, sometimes twice faster).
!
-! note 5: one common technique in computational science to help compilers
-! enhance pipelining is loop unrolling. we do attempt this here in a very simple
-! and straigthforward way. so don't be confused about the somewhat
-! bewildering do-loop writing...
+! note 5: a common technique to help compilers enhance pipelining is loop unrolling. We do this here in a simple
+! and straigthforward way, so don't be confused about the do-loop writing.
!
! note 6: whenever adding some new code, please make sure to use
-! spaces rather than tabs. tabulators have different sizes in different editors
-! and most of the time, it messes up the code's formating :(
+! spaces rather than tabs. Tabulators are in principle not allowed in Fortran95.
!
!-------------------------------------------------------------------------------------------------
!-------------------------------------------------------------------------------------------------
Deleted: seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE 2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE 2010-03-03 20:31:40 UTC (rev 16375)
@@ -1,333 +0,0 @@
-!=====================================================================
-!
-! S p e c f e m 3 D G l o b e V e r s i o n 3 . 5
-! --------------------------------------------------
-!
-! Dimitri Komatitsch and Jeroen Tromp
-! Seismological Laboratory - California Institute of Technology
-! (c) California Institute of Technology July 2004
-!
-! This program is free software; you can redistribute it and/or modify
-! it under the terms of the GNU General Public License as published by
-! the Free Software Foundation; either version 2 of the License, or
-! (at your option) any later version.
-!
-! This program is distributed in the hope that it will be useful,
-! but WITHOUT ANY WARRANTY; without even the implied warranty of
-! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-! GNU General Public License for more details.
-!
-! You should have received a copy of the GNU General Public License along
-! with this program; if not, write to the Free Software Foundation, Inc.,
-! 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
-!
-!=====================================================================
-!
-! United States Government Sponsorship Acknowledged.
-!
-
-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-+++++++++++++ NOTES ON USING THE SPECFEM3D PACKAGE +++++++++++++
-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
- If you use this code for your own research, please send an email
- to Jeroen Tromp <jtromp at caltech.edu> for information, and cite:
-
- @ARTICLE{KoRiTr02,
- author={D. Komatitsch and J. Ritsema and J. Tromp},
- year=2002,
- title={The Spectral-Element Method, {B}eowulf Computing, and Global Seismology},
- journal={Science},
- volume=298,
- pages={1737-1742}}
-
- @ARTICLE{KoTr02a,
- author={D. Komatitsch and J. Tromp},
- year=2002,
- title={Spectral-Element Simulations of Global Seismic Wave Propagation{-I. V}alidation},
- journal={Geophys. J. Int.},
- volume=149,
- pages={390-412}}
-
- @ARTICLE{KoTr02b,
- author={D. Komatitsch and J. Tromp},
- year=2002,
- title={Spectral-Element Simulations of Global Seismic Wave Propagation{-II. 3-D} Models, Oceans, Rotation, and Self-Gravitation},
- journal={Geophys. J. Int.},
- volume=150,
- pages={303-318}}
-
- @ARTICLE{KoTr99,
- author={D. Komatitsch and J. Tromp},
- year=1999,
- title={Introduction to the spectral-element method for 3-{D} seismic wave propagation},
- journal={Geophys. J. Int.},
- volume=139,
- pages={806-822}}
-
- If you use 3-D model S20RTS, please cite
-
- @ARTICLE{RiVa00,
- author={J. Ritsema and H. J. {Van Heijst}},
- year=2000,
- title={Seismic imaging of structural heterogeneity in {E}arth's mantle: Evidence for large-scale mantle flow},
- journal={Science Progress},
- volume=83,
- pages={243-259}}
-
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-REFERENCE FRAME - CONVENTION:
-
-The code uses the following convention for the reference frame:
-
- - X axis is East
- - Y axis is North
- - Z axis is up
-
-Note that this convention is different from both the Aki-Richards convention
-and the Harvard CMT convention.
-
-Let us recall that the Aki-Richards convention is:
-
- - X axis is North
- - Y axis is East
- - Z axis is down
-
-and that the Harvard CMT convention is:
-
- - X axis is South
- - Y axis is East
- - Z axis is up
-
-
-PARAMETERS TO CHANGE ON DIFFERENT MACHINES:
-
-- All the codes in the package are written in Fortran90, and also
- conform strictly to the Fortran95 standard. They do not use any
- obsolescent or obsolete feature of f77.
-- Use the appropriate compiler flags in the Makefile.
- Set F90, MPIF90, FLAGS_NO_CHECK, FLAGS_CHECK and MPI_FLAGS.
-- In constants.h set FIX_UNDERFLOW_PROBLEM flag
- (need to fix underflow trapping on some machines, e.g., Pentium processors)
-- In constants.h set LOCAL_PATH_IS_ALSO_GLOBAL flag.
- On clusters (e.g., Beowulfs), set to .false. in most cases.
- In such a case, also customize global path to local
- files in create_serial_name_database.f90 ("20 format ...").
- This flag is used only when one checks the mesh with the serial codes
- ("xcheck_buffers_1D" etc.), ignore it if you do not plan to use them
-- In DATA/Par_file set LOCAL_PATH (this is where databases are written and read,
- and also where seismograms will be stored). The seismogram files will be
- named *.semd (for Spectral Element Method - Displacement)
-- In precision.h choose single versus double precision (double precision
- doubles the memory requirements for the solver but may be slightly faster).
- Use single precision except if you have a very large machine
- with a lot of memory.
-- Also, in constants.h choose the CUSTOM_REAL size depending on
- single or double precision
-- When running on an SGI, add "setenv TRAP_FPE OFF" to your .cshrc file
- *before* compiling, in order to turn underflow trapping off
-- When running on an IBM (e.g., an SP or a Power4), one needs to change
- all the filename extensions from *.f90 to *.f ; a script is provided
- in DATA/util/change_names_IBM to do that. One also needs to change
- all the .f90 in the Makefile to .f
-
-DIRECTORIES:
-
-- make subdirectories obj and OUTPUT_FILES in the directory
- with the source code (also done automatically by the go_mesher script below)
-
-SCRIPTS:
-
-- go_mesher runs the mesher
- (need to set the correct mpirun command at the end of the script)
- (if running on a Beowulf, the script assumes that the nodes are named n001,
- n002 etc. If it is not the case, change the tr -d 'n' line in the script)
-- go_solver runs the solver
- (need to set the correct mpirun command at the end of the script)
-- runall compiles and runs both mesher and solver
-
-MESHER (meshfem3D):
-
-- The mesher meshfem3D uses NPROC_XI * NPROC_ETA * NCHUNKS nodes
- (The full globe consists of NCHUNKS = 6 chunks, and each chunk is divided
- in NPROC_XI * NPROC_ETA slices)
-- For topological reasons related to the mesh, NPROC_XI and NPROC_ETA
- must be equal when NCHUNKS = 6 or NCHUNKS = 3. The option of having
- different values for NPROC_XI and NPROC_ETA is available only when
- NCHUNKS = 1 (1/6th of the sphere) or NCHUNKS = 2.
-- Note that NPROC_XI = 1 and/or NPROC_ETA = 1 is valid. Therefore the
- smallest number of nodes needed to run the code is NCHUNKS
- (NCHUNKS chunks * 1 * 1). If you have less than NCHUNKS nodes
- on your machine, you can start multiple MPI processes on each node
- to emulate a larger machine.
-- NEX_XI and NEX_ETA need to be 16 * multiple of NPROC_XI and NPROC_ETA,
- respectively. So in theory for NPROC_XI = NPROC_ETA = 1 NEX_XI = NEX_ETA = 16, 32, 48, ...,
- 384 and 400 work, for NPROC_XI = NPROC_ETA = 2 NEX_XI = NEX_ETA = 32, 64, 96, ...,
- 352 and 384 work, for NPROC_XI = NPROC_ETA = 3 NEX_XI = NEX_ETA = 48, 96, 144,
- 192, 240, 288, 336 and 384 work, for NPROC_XI = NPROC_ETA = 4 NEX_XI = NEX_ETA = 64,
- 128, 192, 256, 320 and 384, for NPROC_XI = NPROC_ETA = 5 NEX_XI = NEX_ETA = 80, 160,
- 240, 320 and 400, and for NPROC_XI = NPROC_ETA = 6 NEX_XI = NEX_ETA = 96, 192, 288
- and 384. In practice though, the curvature of the Earth cannot be honored if one
- uses too few elements. By trial and error, we found that NEX_XI and NEX_ETA
- should not be smaller than 64 typically. Smaller values are likely to give spectral
- elements with a negative Jacobian, in which case the mesher will exit with an error message.
-
-- set all the parameters in DATA/Par_file, in particular the following:
-
-! shape of the first chunk (not used if full sphere with six chunks)
- ANGULAR_WIDTH_XI_IN_DEGREES ! angular size of the first chunk
- ANGULAR_WIDTH_ETA_IN_DEGREES
- CENTER_LATITUDE_IN_DEGREES ! location of its center
- CENTER_LONGITUDE_IN_DEGREES
- GAMMA_ROTATION_AZIMUTH ! angle of rotation of the first chunk
-
- MODEL ! Earth model to use
-
- OCEANS ! to incorporate the effects of the oceans (cheap)
- ELLIPTICITY ! to incorporate ellipticity (no cost)
- TOPOGRAPHY ! to add topography and bathymetry (no cost)
- GRAVITY ! to incorporate gravity (Cowling approximation, cheap)
- ROTATION ! to incorporate Coriolis effects (cheap)
- ATTENUATION ! to include attenuation (fairly expensive)
-
- ABSORBING_CONDITIONS ! absorbing boundary conditions (cheap)
-
- SAVE_AVS_DX_MESH_FILES ! save mesh files for AVS users, www.avs.com,
- or OpenDX users, www.opendx.org). Do not use if you do not have AVS
- or OpenDX, because this option creates large files.
-
-- Compile the mesher ("make meshfem3D") and run it with the go_mesher script
-- Mesher output is provided in the OUTPUT_FILES directory in output_mesher.txt
- (output can be directed to the screen instead by uncommenting a line
- in constants.h:
- ! uncomment this to write messages to the screen
- ! integer, parameter :: IMAIN = ISTANDARD_OUTPUT )
-- For a given model, set of nodes and set of parameters in DATA/Par_file,
- one only needs to run the mesher once and for all, even if one wants
- to run several simulations with different sources and/or receivers
- (the source and receiver information is used in the solver only)
-- Some useful statistics about the mesh created are saved in the parameter file
- for the solver, type "more OUTPUT_FILES/values_from_mesher.h" to see them.
-
-CHECKING THE MPI BUFFERS (optional, after running the mesher):
-
-- Use the four serial codes check_buffers_1D, check_buffers_2D,
- check_buffers_faces_chunks and check_buffers_corners_chunks
- to check all the MPI buffers generated by the mesher
- (e.g., "make check_buffers_1D" and then "xcheck_buffers_1D")
-
-CHECKING THE MESH (optional, after running the mesher):
-
-- Use the serial code check_mesh_quality_AVS_DX
- ("make check_mesh_quality_AVS_DX" and then "xcheck_mesh_quality_AVS_DX")
- to generate an AVS output file ("AVS_meshquality.inp" in AVS UCD format)
- or an OpenDX output file ("DX_meshquality.dx")
- that can be used to investigate mesh quality, e.g., skewness of elements,
- and a Gnuplot histogram ("mesh_quality_histogram.txt") that can
- be plotted with gnuplot ("gnuplot plot_mesh_quality_histogram.gnu")
-
-- Use the serial code combine_AVS_DX
- ("make combine_AVS_DX" and then "xcombine_AVS_DX")
- to generate AVS output files (in AVS UCD format) or OpenDX output files
- showing the mesh, the MPI partition (slices), the NCHUNKS chunks, the
- source and receiver location etc. Use the AVS UCD files
- AVS_continent_boundaries.inp and AVS_plate_boundaries.inp,
- or the OpenDX files DX_continent_boundaries.dx and DX_plate_boundaries.dx
- for reference.
-
-SOLVER (specfem3D):
-
-- For reasons of speed, the solver uses static memory allocation. Therefore it
- needs to be recompiled ("make clean" and "make specfem3D") every time
- one reruns the mesher. The mesher uses dynamic allocation only,
- and does not need to be recompiled.
-- To compile the solver, one needs a file called
- "OUTPUT_FILES/values_from_mesher.h" which contains the right parameters
- describing the static size of the arrays.
- This file is created by the mesher (meshfem3D.f90).
- This means that one needs to run the mesher before being able to compile
- the solver. For people who want to compile the mesher and the solver at
- the same time, a small program called create_header_file.f90
- is provided, which can be used to create "OUTPUT_FILES/values_from_mesher.h"
- before running the mesher (type "make create_header_file" to compile it
- and "xcreate_header_file" to run it). This is useful for people who want
- to compile all the codes first and then submit the mesher and the solver
- to a batch management system.
-- The solver also needs NPROC_XI * NPROC_ETA * NCHUNKS nodes to run
-- The solver needs the DATA/CMTSOLUTION file for the source and the
- DATA/STATIONS file for the list of stations (CMTSOLUTION files may be
- obtained directly from the Harvard CMT web page, www.seismology.harvard.edu)
-- Set the "time shift" in the CMTSOLUTION file to 0.0
- (the solver will not run otherwise)
-- To simulate a delta source-time function, set "half duration" in
- the CMTSOLUTION file to 0.0. If "half duration" is not zero,
- the code will use a Gaussian (not too different from triangular)
- source-time function with half-width "half duration". We prefer to run
- the solver with "half duration" set to zero and convolve after the fact,
- because this way it's easy to use a variety of source-time functions.
- Use the serial code convolve_source_timefunction.f90 and the script
- convolve_source_timefunction.csh for this purpose. (Set the parameter "hdur"
- in convolve_source_timefunction.csh to the desired half-duration.)
- (type "make convolve_source_timefunction" to compile the code).
-- To simulate multiple events, set the parameter NSOURCES in the DATA/Par_file
- to the desired number. Provide a CMTSOLUTION file that has NSOURCES entries,
- one for each CMT solution (i.e., concatenate NSOURCES CMTSOLUTION files
- to a single CMTSOLUTION file). At least one entry should have a zero "time shift",
- and all the other entries should have non-negative "time shift". Each event
- can have its own half duration, latitude, longitude, depth and moment tensor.
- This feature can also be used to mimic the directivity associated with a
- finite source.
-- Solver output is provided in the OUTPUT_FILES directory in output_solver.txt
- (output can be directed to the screen instead by uncommenting a line
- in constants.h:
- ! uncomment this to write messages to the screen
- ! integer, parameter :: IMAIN = ISTANDARD_OUTPUT )
-- There are two different versions of the main solver routines.
- Type "copy_files_regular.csh" to use the regular version,
- and "copy_files_inlined_5.csh" to use the inlined version, which may be
- faster on some machines (you can try both once to determine which
- code gives the fastest result on your machine). Note that the inlined
- version is written specifically for polynomial degree NGLL = 5
- in constants.h, and cannot run with any other value, while the regular
- version can. Note also that the two versions implement
- the exact same calculations, and therefore give the same results
- down to the roundoff error, which can be different. It is only
- the implementation that differs between the two versions.
-- If you have a fast machine, set NTSTEP_BETWEEN_OUTPUT_INFO
- to a relatively high value (e.g. at least 100, or even 1000 or more)
- to avoid writing to the output text files too often. Same thing
- with NTSTEP_BETWEEN_OUTPUT_SEISMOS.
-- On clusters (e.g., Beowulfs) the seismogram files are distributed on the
- local disks (path LOCAL_PATH in the DATA/Par_file) and need to be gathered
- at the end of the simulation.
-- For the same model, rerun the solver for different events by changing the
- CMTSOLUTION file, or for different stations by changing the STATIONS file.
- There is no need to rerun the mesher. It is best to include as many stations
- as possible, since this does not add to the cost of the simulation.
-
-MOVIE OF THE RESULTS:
-
-- Use create_movie_AVS_DX.f90 ("make create_movie_AVS_DX") to create
- a movie of surface displacement (radial component) or of the entire 3D wave
- field. The movie can be saved in OpenDX or AVS format. Set parameters
- MOVIE_SURFACE, MOVIE_VOLUME, and NTSTEP_BETWEEN_FRAMES in the Par_file.
- Remember to use a DATA/CMTSOLUTION source file with a half-duration
- hdur > 0, otherwise you will get a movie corresponding to a Heaviside
- source, with a lot of high-frequency noise. Note that this option
- creates large files!
-
-Note: The Gauss-Lobatto subroutines in gll_library.f90 are based in part on
- software libraries from M.I.T., Department of Mechanical Engineering.
-
-Note: The non-structured global numbering software was provided
- by Paul F. Fischer.
-
-Note: Subroutines from "Numerical Recipes: The Art of Scientific Computing"
- by W. H. Press et al., Cambridge University Press, are used in
- numerical_recipes.f90. The user must acquire an official
- Numerical Recipes license to run them.
-
-Note: OpenDX is open-source based on IBM Data Explorer, www.opendx.org
- AVS is a trademark of Advanced Visualization Systems, www.avs.com
-
Modified: seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess 2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess 2010-03-03 20:31:40 UTC (rev 16375)
@@ -26,10 +26,10 @@
# Intel ifort Fortran90 for Linux
#
if test x"$FLAGS_CHECK" = x; then
- FLAGS_CHECK="-O3"
+ FLAGS_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
fi
if test x"$FLAGS_NO_CHECK" = x; then
- FLAGS_NO_CHECK="-O3"
+ FLAGS_NO_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
fi
# useful for debugging...
#if test x"$FLAGS_CHECK" = x; then
@@ -38,7 +38,7 @@
#fi
#if test x"$FLAGS_NO_CHECK" = x; then
# # standard options (leave option -ftz, which is *critical* for performance)
- # FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
+ # FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
#fi
#
# Intel Nehalem processor architecture, Intel compiler version 10.1
Modified: seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90 2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90 2010-03-03 20:31:40 UTC (rev 16375)
@@ -764,12 +764,14 @@
!-------------------------------------------------------------------------------------------------
!-------------------------------------------------------------------------------------------------
!-------------------------------------------------------------------------------------------------
-! trivia about the programming style adopted here
!
-! note 1: (it seems) for performance reasons, we will try to use as much from the stack memory as possible.
-! stack memory is a place in computer memory where all the variables that are declared
+! trivia about the programming style adopted here:
+!
+! note 1: for performance reasons, we try to use as much from the stack memory as possible.
+! This is done to avoid memory fragmentation and also to optimize performance.
+! Stack memory is a place in computer memory where all the variables that are declared
! and initialized **before** runtime are stored. Our static array allocation will use that one.
-! all variables declared within our main routine also will be stored on the stack.
+! All variables declared within our main routine also will be stored on the stack.
!
! the heap is the section of computer memory where all the variables created or initialized
! **at** runtime are stored. it is used for dynamic memory allocation.
@@ -788,34 +790,27 @@
! passing them along as arguments to the routine makes the code slower.
! it seems that this stack/heap criterion is more complicated.
!
-! another reason why modules are avoided, is to make the code thread safe.
+! another reason why modules are avoided is to make the code thread safe.
! having different threads access the same data structure and modifying it at the same time
! would lead to problems. passing arguments is a way to avoid such complications.
!
-! nevertheless, it would be nice to test - where possible - , if using modules
-! together with static arrays would perform as well as this.
-! at least, it would make the code more elegant and less error prone...
+! note 2: Most of the computation time is spent
+! inside the time loop (mainly in the compute_forces_crust_mantle_Dev() routine).
+! Any code performance tuning will be most effective in there.
!
-! note 2: in general, most of the computation time for our earthquake simulations is spent
-! inside the time loop (mainly the compute_forces_crust_mantle_Dev() routine).
-! any code performance tuning will be most effective in there.
-!
-! note 3: fortran is a code language that uses column-first ordering for arrays,
+! note 3: Fortran is a code language that uses column-first ordering for arrays,
! e.g., it stores a(i,j) in this order: a(1,1),a(2,1),a(3,1),...,a(1,2),a(2,2),a(3,2),..
-! it is therefor more efficient to have the inner-loop over i, and the outer loop over j
-! for this reason, e.g. the indexing for the pre-computed sourcearrays changed
+! it is therefore more efficient to have the inner-loop over i, and the outer loop over j
!
-! note 4: Deville routines help compilers to better vectorize the do-loops and
-! for most compilers, will result in a significant speedup ( > 30%).
+! note 4: Deville et al. (2002) routines significantly reduce the total number of memory accesses
+! required to perform matrix-matrix products at the spectral element level.
+! For most compilers and hardware, will result in a significant speedup (> 30% or more, sometimes twice faster).
!
-! note 5: one common technique in computational science to help compilers
-! enhance pipelining is loop unrolling. we do attempt this here in a very simple
-! and straigthforward way. so don't be confused about the somewhat
-! bewildering do-loop writing...
+! note 5: a common technique to help compilers enhance pipelining is loop unrolling. We do this here in a simple
+! and straigthforward way, so don't be confused about the do-loop writing.
!
! note 6: whenever adding some new code, please make sure to use
-! spaces rather than tabs. tabulators have different sizes in different editors
-! and most of the time, it messes up the code's formating :(
+! spaces rather than tabs. Tabulators are in principle not allowed in Fortran95.
!
!-------------------------------------------------------------------------------------------------
!-------------------------------------------------------------------------------------------------
More information about the CIG-COMMITS
mailing list