[cig-commits] r16375 - in seismo/3D/SPECFEM3D_GLOBE: tags/v5.0.0 tags/v5.0.0/UTILS trunk trunk/UTILS/oldstuff

Wed Mar 3 12:31:40 PST 2010

Author: dkomati1
Date: 2010-03-03 12:31:40 -0800 (Wed, 03 Mar 2010)
New Revision: 16375

Removed:
   seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/UTILS/oldstuff/
   seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE
Modified:
   seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess
   seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90
   seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess
   seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90
Log:
improved comments in specfem3D.f90, restored useful flags in flags.guess,
and suppressed the "oldstuff" directory in tags/v5.0.0


Modified: seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess
===================================================================

--- seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess	2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/flags.guess	2010-03-03 20:31:40 UTC (rev 16375)
@@ -26,10 +26,10 @@
         # Intel ifort Fortran90 for Linux
         #
         if test x"$FLAGS_CHECK" = x; then
-            FLAGS_CHECK="-O3" 
+            FLAGS_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
         fi
         if test x"$FLAGS_NO_CHECK" = x; then
-            FLAGS_NO_CHECK="-O3" 
+            FLAGS_NO_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
         fi
         # useful for debugging...
         #if test x"$FLAGS_CHECK" = x; then
@@ -38,7 +38,7 @@
         #fi
         #if test x"$FLAGS_NO_CHECK" = x; then
         #    # standard options (leave option -ftz, which is *critical* for performance)
-        #    FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz  -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
+        #    FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
         #fi
         #
         # Intel Nehalem processor architecture, Intel compiler version 10.1

Modified: seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90	2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/tags/v5.0.0/specfem3D.f90	2010-03-03 20:31:40 UTC (rev 16375)
@@ -764,12 +764,14 @@
 !-------------------------------------------------------------------------------------------------
 !-------------------------------------------------------------------------------------------------
 !-------------------------------------------------------------------------------------------------
-! trivia about the programming style adopted here
 !
-! note 1: (it seems) for performance reasons, we will try to use as much from the stack memory as possible.
-!             stack memory is a place in computer memory where all the variables that are declared
+! trivia about the programming style adopted here:
+!
+! note 1: for performance reasons, we try to use as much from the stack memory as possible.
+!             This is done to avoid memory fragmentation and also to optimize performance.
+!             Stack memory is a place in computer memory where all the variables that are declared
 !             and initialized **before** runtime are stored. Our static array allocation will use that one.
-!             all variables declared within our main routine also will be stored on the stack.
+!             All variables declared within our main routine also will be stored on the stack.
 !
 !             the heap is the section of computer memory where all the variables created or initialized
 !             **at** runtime are stored. it is used for dynamic memory allocation.
@@ -788,34 +790,27 @@
 !             passing them along as arguments to the routine makes the code slower.
 !             it seems that this stack/heap criterion is more complicated.
 !
-!             another reason why modules are avoided, is to make the code thread safe.
+!             another reason why modules are avoided is to make the code thread safe.
 !             having different threads access the same data structure and modifying it at the same time
 !             would lead to problems. passing arguments is a way to avoid such complications.
 !
-!             nevertheless, it would be nice to test - where possible - , if using modules
-!             together with static arrays would perform as well as this.
-!             at least, it would make the code more elegant and less error prone...
+! note 2: Most of the computation time is spent
+!             inside the time loop (mainly in the compute_forces_crust_mantle_Dev() routine).
+!             Any code performance tuning will be most effective in there.
 !
-! note 2: in general, most of the computation time for our earthquake simulations is spent
-!             inside the time loop (mainly the compute_forces_crust_mantle_Dev() routine).
-!             any code performance tuning will be most effective in there.
-!
-! note 3: fortran is a code language that uses column-first ordering for arrays,
+! note 3: Fortran is a code language that uses column-first ordering for arrays,
 !             e.g., it stores a(i,j) in this order: a(1,1),a(2,1),a(3,1),...,a(1,2),a(2,2),a(3,2),..
-!             it is therefor more efficient to have the inner-loop over i, and the outer loop over j
-!             for this reason, e.g. the indexing for the pre-computed sourcearrays changed
+!             it is therefore more efficient to have the inner-loop over i, and the outer loop over j
 !
-! note 4: Deville routines help compilers to better vectorize the do-loops and
-!             for most compilers, will result in a significant speedup ( > 30%).
+! note 4: Deville et al. (2002) routines significantly reduce the total number of memory accesses
+!             required to perform matrix-matrix products at the spectral element level.
+!             For most compilers and hardware, will result in a significant speedup (> 30% or more, sometimes twice faster).
 !
-! note 5: one common technique in computational science to help compilers
-!             enhance pipelining is loop unrolling. we do attempt this here in a very simple
-!             and straigthforward way. so don't be confused about the somewhat
-!             bewildering do-loop writing...
+! note 5: a common technique to help compilers enhance pipelining is loop unrolling. We do this here in a simple
+!             and straigthforward way, so don't be confused about the do-loop writing.
 !
 ! note 6: whenever adding some new code, please make sure to use
-!             spaces rather than tabs. tabulators have different sizes in different editors
-!             and most of the time, it messes up the code's formating :(
+!             spaces rather than tabs. Tabulators are in principle not allowed in Fortran95.
 !
 !-------------------------------------------------------------------------------------------------
 !-------------------------------------------------------------------------------------------------

Deleted: seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE	2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/trunk/UTILS/oldstuff/README_SPECFEM3D_GLOBE	2010-03-03 20:31:40 UTC (rev 16375)
@@ -1,333 +0,0 @@
-!=====================================================================
-!
-!          S p e c f e m 3 D  G l o b e  V e r s i o n  3 . 5
-!          --------------------------------------------------
-!
-!                 Dimitri Komatitsch and Jeroen Tromp
-!    Seismological Laboratory - California Institute of Technology
-!        (c) California Institute of Technology July 2004
-!
-! This program is free software; you can redistribute it and/or modify
-! it under the terms of the GNU General Public License as published by
-! the Free Software Foundation; either version 2 of the License, or
-! (at your option) any later version.
-!
-! This program is distributed in the hope that it will be useful,
-! but WITHOUT ANY WARRANTY; without even the implied warranty of
-! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-! GNU General Public License for more details.
-!
-! You should have received a copy of the GNU General Public License along
-! with this program; if not, write to the Free Software Foundation, Inc.,
-! 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
-!
-!=====================================================================
-!
-! United States Government Sponsorship Acknowledged.
-!
-
-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-+++++++++++++ NOTES ON USING THE SPECFEM3D PACKAGE +++++++++++++
-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-  If you use this code for your own research, please send an email
-  to Jeroen Tromp <jtromp at caltech.edu> for information, and cite:
-
-  @ARTICLE{KoRiTr02,
-  author={D. Komatitsch and J. Ritsema and J. Tromp},
-  year=2002,
-  title={The Spectral-Element Method, {B}eowulf Computing, and Global Seismology},
-  journal={Science},
-  volume=298,
-  pages={1737-1742}}
-
-  @ARTICLE{KoTr02a,
-  author={D. Komatitsch and J. Tromp},
-  year=2002,
-  title={Spectral-Element Simulations of Global Seismic Wave Propagation{-I. V}alidation},
-  journal={Geophys. J. Int.},
-  volume=149,
-  pages={390-412}}
-
-  @ARTICLE{KoTr02b,
-  author={D. Komatitsch and J. Tromp},
-  year=2002,
-  title={Spectral-Element Simulations of Global Seismic Wave Propagation{-II. 3-D} Models, Oceans, Rotation, and Self-Gravitation},
-  journal={Geophys. J. Int.},
-  volume=150,
-  pages={303-318}}
-
-  @ARTICLE{KoTr99,
-  author={D. Komatitsch and J. Tromp},
-  year=1999,
-  title={Introduction to the spectral-element method for 3-{D} seismic wave propagation},
-  journal={Geophys. J. Int.},
-  volume=139,
-  pages={806-822}}
-
-  If you use 3-D model S20RTS, please cite
-
-  @ARTICLE{RiVa00,
-  author={J. Ritsema and H. J. {Van Heijst}},
-  year=2000,
-  title={Seismic imaging of structural heterogeneity in {E}arth's mantle: Evidence for large-scale mantle flow},
-  journal={Science Progress},
-  volume=83,
-  pages={243-259}}
-
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-REFERENCE FRAME - CONVENTION:
-
-The code uses the following convention for the reference frame:
-
- - X axis is East
- - Y axis is North
- - Z axis is up
-
-Note that this convention is different from both the Aki-Richards convention
-and the Harvard CMT convention.
-
-Let us recall that the Aki-Richards convention is:
-
- - X axis is North
- - Y axis is East
- - Z axis is down
-
-and that the Harvard CMT convention is:
-
- - X axis is South
- - Y axis is East
- - Z axis is up
-
-
-PARAMETERS TO CHANGE ON DIFFERENT MACHINES:
-
-- All the codes in the package are written in Fortran90, and also
-    conform strictly to the Fortran95 standard. They do not use any
-    obsolescent or obsolete feature of f77.
-- Use the appropriate compiler flags in the Makefile.
-    Set F90, MPIF90, FLAGS_NO_CHECK, FLAGS_CHECK and MPI_FLAGS.
-- In constants.h set FIX_UNDERFLOW_PROBLEM flag
-    (need to fix underflow trapping on some machines, e.g., Pentium processors)
-- In constants.h set LOCAL_PATH_IS_ALSO_GLOBAL flag.
-    On clusters (e.g., Beowulfs), set to .false. in most cases.
-    In such a case, also customize global path to local
-    files in create_serial_name_database.f90 ("20 format ...").
-    This flag is used only when one checks the mesh with the serial codes
-    ("xcheck_buffers_1D" etc.), ignore it if you do not plan to use them
-- In DATA/Par_file set LOCAL_PATH (this is where databases are written and read,
-    and also where seismograms will be stored). The seismogram files will be
-    named *.semd (for Spectral Element Method - Displacement)
-- In precision.h choose single versus double precision (double precision
-    doubles the memory requirements for the solver but may be slightly faster).
-    Use single precision except if you have a very large machine
-    with a lot of memory.
-- Also, in constants.h choose the CUSTOM_REAL size depending on
-    single or double precision
-- When running on an SGI, add "setenv TRAP_FPE OFF" to your .cshrc file
-    *before* compiling, in order to turn underflow trapping off
-- When running on an IBM (e.g., an SP or a Power4), one needs to change
-    all the filename extensions from *.f90 to *.f ; a script is provided
-    in DATA/util/change_names_IBM to do that. One also needs to change
-    all the .f90 in the Makefile to .f
-
-DIRECTORIES:
-
-- make subdirectories obj and OUTPUT_FILES in the directory
-    with the source code (also done automatically by the go_mesher script below)
-
-SCRIPTS:
-
-- go_mesher runs the mesher
-    (need to set the correct mpirun command at the end of the script)
-    (if running on a Beowulf, the script assumes that the nodes are named n001,
-     n002 etc. If it is not the case, change the tr -d 'n' line in the script)
-- go_solver runs the solver
-    (need to set the correct mpirun command at the end of the script)
-- runall compiles and runs both mesher and solver
-
-MESHER (meshfem3D):
-
-- The mesher meshfem3D uses NPROC_XI * NPROC_ETA * NCHUNKS nodes
-    (The full globe consists of NCHUNKS = 6 chunks, and each chunk is divided
-     in NPROC_XI * NPROC_ETA slices)
-- For topological reasons related to the mesh, NPROC_XI and NPROC_ETA
-    must be equal when NCHUNKS = 6 or NCHUNKS = 3. The option of having
-    different values for NPROC_XI and NPROC_ETA is available only when
-    NCHUNKS = 1 (1/6th of the sphere) or NCHUNKS = 2.
-- Note that NPROC_XI = 1 and/or NPROC_ETA = 1 is valid. Therefore the
-    smallest number of nodes needed to run the code is NCHUNKS
-    (NCHUNKS chunks * 1 * 1). If you have less than NCHUNKS nodes
-    on your machine, you can start multiple MPI processes on each node
-    to emulate a larger machine.
-- NEX_XI and NEX_ETA need to be 16 * multiple of NPROC_XI and NPROC_ETA,
-    respectively. So in theory for NPROC_XI = NPROC_ETA = 1 NEX_XI = NEX_ETA = 16, 32, 48, ...,
-    384 and 400 work, for  NPROC_XI = NPROC_ETA = 2 NEX_XI = NEX_ETA = 32, 64, 96, ...,
-    352 and 384 work, for NPROC_XI = NPROC_ETA = 3 NEX_XI = NEX_ETA = 48, 96, 144,
-    192, 240, 288, 336 and 384 work, for NPROC_XI = NPROC_ETA = 4 NEX_XI = NEX_ETA = 64,
-    128, 192, 256, 320 and 384, for NPROC_XI = NPROC_ETA = 5 NEX_XI = NEX_ETA = 80, 160,
-    240, 320 and 400, and for NPROC_XI = NPROC_ETA = 6 NEX_XI = NEX_ETA = 96, 192, 288
-    and 384. In practice though, the curvature of the Earth cannot be honored if one
-    uses too few elements. By trial and error, we found that NEX_XI and NEX_ETA
-    should not be smaller than 64 typically. Smaller values are likely to give spectral
-    elements with a negative Jacobian, in which case the mesher will exit with an error message.
-
-- set all the parameters in DATA/Par_file, in particular the following:
-
-! shape of the first chunk (not used if full sphere with six chunks)
-  ANGULAR_WIDTH_XI_IN_DEGREES   ! angular size of the first chunk
-  ANGULAR_WIDTH_ETA_IN_DEGREES
-  CENTER_LATITUDE_IN_DEGREES    ! location of its center
-  CENTER_LONGITUDE_IN_DEGREES
-  GAMMA_ROTATION_AZIMUTH        ! angle of rotation of the first chunk
-
-  MODEL                  ! Earth model to use
-
-  OCEANS                 ! to incorporate the effects of the oceans (cheap)
-  ELLIPTICITY            ! to incorporate ellipticity (no cost)
-  TOPOGRAPHY             ! to add topography and bathymetry (no cost)
-  GRAVITY                ! to incorporate gravity (Cowling approximation, cheap)
-  ROTATION               ! to incorporate Coriolis effects (cheap)
-  ATTENUATION            ! to include attenuation (fairly expensive)
-
-  ABSORBING_CONDITIONS   ! absorbing boundary conditions (cheap)
-
-  SAVE_AVS_DX_MESH_FILES ! save mesh files for AVS users, www.avs.com,
-    or OpenDX users, www.opendx.org). Do not use if you do not have AVS
-    or OpenDX, because this option creates large files.
-
-- Compile the mesher ("make meshfem3D") and run it with the go_mesher script
-- Mesher output is provided in the OUTPUT_FILES directory in output_mesher.txt
-    (output can be directed to the screen instead by uncommenting a line
-     in constants.h:
-       ! uncomment this to write messages to the screen
-       ! integer, parameter :: IMAIN = ISTANDARD_OUTPUT )
-- For a given model, set of nodes and set of parameters in DATA/Par_file,
-    one only needs to run the mesher once and for all, even if one wants
-    to run several simulations with different sources and/or receivers
-    (the source and receiver information is used in the solver only)
-- Some useful statistics about the mesh created are saved in the parameter file
-    for the solver, type "more OUTPUT_FILES/values_from_mesher.h" to see them.
-
-CHECKING THE MPI BUFFERS (optional, after running the mesher):
-
-- Use the four serial codes check_buffers_1D, check_buffers_2D,
-    check_buffers_faces_chunks and check_buffers_corners_chunks
-    to check all the MPI buffers generated by the mesher
-    (e.g., "make check_buffers_1D" and then "xcheck_buffers_1D")
-
-CHECKING THE MESH (optional, after running the mesher):
-
-- Use the serial code check_mesh_quality_AVS_DX
-    ("make check_mesh_quality_AVS_DX" and then "xcheck_mesh_quality_AVS_DX")
-    to generate an AVS output file ("AVS_meshquality.inp" in AVS UCD format)
-    or an OpenDX output file ("DX_meshquality.dx")
-    that can be used to investigate mesh quality, e.g., skewness of elements,
-    and a Gnuplot histogram ("mesh_quality_histogram.txt") that can
-    be plotted with gnuplot ("gnuplot plot_mesh_quality_histogram.gnu")
-
-- Use the serial code combine_AVS_DX
-    ("make combine_AVS_DX" and then "xcombine_AVS_DX")
-    to generate AVS output files (in AVS UCD format) or OpenDX output files
-    showing the mesh, the MPI partition (slices), the NCHUNKS chunks, the
-    source and receiver location etc. Use the AVS UCD files
-    AVS_continent_boundaries.inp and AVS_plate_boundaries.inp,
-    or the OpenDX files DX_continent_boundaries.dx and DX_plate_boundaries.dx
-    for reference.
-
-SOLVER (specfem3D):
-
-- For reasons of speed, the solver uses static memory allocation. Therefore it
-    needs to be recompiled ("make clean" and "make specfem3D") every time
-    one reruns the mesher. The mesher uses dynamic allocation only,
-    and does not need to be recompiled.
-- To compile the solver, one needs a file called
-    "OUTPUT_FILES/values_from_mesher.h" which contains the right parameters
-    describing the static size of the arrays.
-    This file is created by the mesher (meshfem3D.f90).
-    This means that one needs to run the mesher before being able to compile
-    the solver. For people who want to compile the mesher and the solver at
-    the same time, a small program called create_header_file.f90
-    is provided, which can be used to create "OUTPUT_FILES/values_from_mesher.h"
-    before running the mesher (type "make create_header_file" to compile it
-    and "xcreate_header_file" to run it). This is useful for people who want
-    to compile all the codes first and then submit the mesher and the solver
-    to a batch management system.
-- The solver also needs NPROC_XI * NPROC_ETA * NCHUNKS nodes to run
-- The solver needs the DATA/CMTSOLUTION file for the source and the
-    DATA/STATIONS file for the list of stations (CMTSOLUTION files may be
-    obtained directly from the Harvard CMT web page, www.seismology.harvard.edu)
-- Set the "time shift" in the CMTSOLUTION file to 0.0
-    (the solver will not run otherwise)
-- To simulate a delta source-time function, set "half duration" in
-    the CMTSOLUTION file to 0.0. If "half duration" is not zero,
-    the code will use a Gaussian (not too different from triangular)
-    source-time function with half-width "half duration". We prefer to run
-    the solver with "half duration" set to zero and convolve after the fact,
-    because this way it's easy to use a variety of source-time functions.
-    Use the serial code convolve_source_timefunction.f90 and the script
-    convolve_source_timefunction.csh for this purpose. (Set the parameter "hdur"
-    in convolve_source_timefunction.csh to the desired half-duration.)
-    (type "make convolve_source_timefunction" to compile the code).
-- To simulate multiple events, set the parameter NSOURCES in the DATA/Par_file
-    to the desired number. Provide a CMTSOLUTION file that has NSOURCES entries,
-    one for each CMT solution (i.e., concatenate NSOURCES CMTSOLUTION files
-    to a single CMTSOLUTION file). At least one entry should have a zero "time shift",
-    and all the other entries should have non-negative "time shift". Each event
-    can have its own half duration, latitude, longitude, depth and moment tensor.
-    This feature can also be used to mimic the directivity associated with a
-    finite source.
-- Solver output is provided in the OUTPUT_FILES directory in output_solver.txt
-    (output can be directed to the screen instead by uncommenting a line
-     in constants.h:
-       ! uncomment this to write messages to the screen
-       ! integer, parameter :: IMAIN = ISTANDARD_OUTPUT )
-- There are two different versions of the main solver routines.
-    Type "copy_files_regular.csh" to use the regular version,
-    and "copy_files_inlined_5.csh" to use the inlined version, which may be
-    faster on some machines (you can try both once to determine which
-    code gives the fastest result on your machine). Note that the inlined
-    version is written specifically for polynomial degree NGLL = 5
-    in constants.h, and cannot run with any other value, while the regular
-    version can. Note also that the two versions implement
-    the exact same calculations, and therefore give the same results
-    down to the roundoff error, which can be different. It is only
-    the implementation that differs between the two versions.
-- If you have a fast machine, set NTSTEP_BETWEEN_OUTPUT_INFO
-    to a relatively high value (e.g. at least 100, or even 1000 or more)
-    to avoid writing to the output text files too often. Same thing
-    with NTSTEP_BETWEEN_OUTPUT_SEISMOS.
-- On clusters (e.g., Beowulfs) the seismogram files are distributed on the
-    local disks (path LOCAL_PATH in the DATA/Par_file) and need to be gathered
-    at the end of the simulation.
-- For the same model, rerun the solver for different events by changing the
-    CMTSOLUTION file, or for different stations by changing the STATIONS file.
-    There is no need to rerun the mesher. It is best to include as many stations
-    as possible, since this does not add to the cost of the simulation.
-
-MOVIE OF THE RESULTS:
-
-- Use create_movie_AVS_DX.f90 ("make create_movie_AVS_DX") to create
-    a movie of surface displacement (radial component) or of the entire 3D wave
-    field. The movie can be saved in OpenDX or AVS format. Set parameters
-    MOVIE_SURFACE, MOVIE_VOLUME, and NTSTEP_BETWEEN_FRAMES in the Par_file.
-    Remember to use a DATA/CMTSOLUTION source file with a half-duration
-    hdur > 0, otherwise you will get a movie corresponding to a Heaviside
-    source, with a lot of high-frequency noise. Note that this option
-    creates large files!
-
-Note: The Gauss-Lobatto subroutines in gll_library.f90 are based in part on
-      software libraries from M.I.T., Department of Mechanical Engineering.
-
-Note: The non-structured global numbering software was provided
-      by Paul F. Fischer.
-
-Note: Subroutines from "Numerical Recipes: The Art of Scientific Computing"
-      by W. H. Press et al., Cambridge University Press, are used in
-      numerical_recipes.f90. The user must acquire an official
-      Numerical Recipes license to run them.
-
-Note: OpenDX is open-source based on IBM Data Explorer, www.opendx.org
-      AVS is a trademark of Advanced Visualization Systems, www.avs.com
-

Modified: seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess	2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/trunk/flags.guess	2010-03-03 20:31:40 UTC (rev 16375)
@@ -26,10 +26,10 @@
         # Intel ifort Fortran90 for Linux
         #
         if test x"$FLAGS_CHECK" = x; then
-            FLAGS_CHECK="-O3" 
+            FLAGS_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
         fi
         if test x"$FLAGS_NO_CHECK" = x; then
-            FLAGS_NO_CHECK="-O3" 
+            FLAGS_NO_CHECK="-O3 -fpe0 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
         fi
         # useful for debugging...
         #if test x"$FLAGS_CHECK" = x; then
@@ -38,7 +38,7 @@
         #fi
         #if test x"$FLAGS_NO_CHECK" = x; then
         #    # standard options (leave option -ftz, which is *critical* for performance)
-        #    FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz  -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
+        #    FLAGS_NO_CHECK="-O3 -xP -fpe3 -ftz -align sequence -assume byterecl -vec-report0 -std95 -implicitnone -warn truncated_source -warn argument_checking -warn unused -warn declarations -warn alignments -warn ignore_loc -warn usage" # -mcmodel=medium
         #fi
         #
         # Intel Nehalem processor architecture, Intel compiler version 10.1

Modified: seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90
===================================================================
--- seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90	2010-03-03 20:09:03 UTC (rev 16374)
+++ seismo/3D/SPECFEM3D_GLOBE/trunk/specfem3D.f90	2010-03-03 20:31:40 UTC (rev 16375)
@@ -764,12 +764,14 @@
 !-------------------------------------------------------------------------------------------------
 !-------------------------------------------------------------------------------------------------
 !-------------------------------------------------------------------------------------------------
-! trivia about the programming style adopted here
 !
-! note 1: (it seems) for performance reasons, we will try to use as much from the stack memory as possible.
-!             stack memory is a place in computer memory where all the variables that are declared
+! trivia about the programming style adopted here:
+!
+! note 1: for performance reasons, we try to use as much from the stack memory as possible.
+!             This is done to avoid memory fragmentation and also to optimize performance.
+!             Stack memory is a place in computer memory where all the variables that are declared
 !             and initialized **before** runtime are stored. Our static array allocation will use that one.
-!             all variables declared within our main routine also will be stored on the stack.
+!             All variables declared within our main routine also will be stored on the stack.
 !
 !             the heap is the section of computer memory where all the variables created or initialized
 !             **at** runtime are stored. it is used for dynamic memory allocation.
@@ -788,34 +790,27 @@
 !             passing them along as arguments to the routine makes the code slower.
 !             it seems that this stack/heap criterion is more complicated.
 !
-!             another reason why modules are avoided, is to make the code thread safe.
+!             another reason why modules are avoided is to make the code thread safe.
 !             having different threads access the same data structure and modifying it at the same time
 !             would lead to problems. passing arguments is a way to avoid such complications.
 !
-!             nevertheless, it would be nice to test - where possible - , if using modules
-!             together with static arrays would perform as well as this.
-!             at least, it would make the code more elegant and less error prone...
+! note 2: Most of the computation time is spent
+!             inside the time loop (mainly in the compute_forces_crust_mantle_Dev() routine).
+!             Any code performance tuning will be most effective in there.
 !
-! note 2: in general, most of the computation time for our earthquake simulations is spent
-!             inside the time loop (mainly the compute_forces_crust_mantle_Dev() routine).
-!             any code performance tuning will be most effective in there.
-!
-! note 3: fortran is a code language that uses column-first ordering for arrays,
+! note 3: Fortran is a code language that uses column-first ordering for arrays,
 !             e.g., it stores a(i,j) in this order: a(1,1),a(2,1),a(3,1),...,a(1,2),a(2,2),a(3,2),..
-!             it is therefor more efficient to have the inner-loop over i, and the outer loop over j
-!             for this reason, e.g. the indexing for the pre-computed sourcearrays changed
+!             it is therefore more efficient to have the inner-loop over i, and the outer loop over j
 !
-! note 4: Deville routines help compilers to better vectorize the do-loops and
-!             for most compilers, will result in a significant speedup ( > 30%).
+! note 4: Deville et al. (2002) routines significantly reduce the total number of memory accesses
+!             required to perform matrix-matrix products at the spectral element level.
+!             For most compilers and hardware, will result in a significant speedup (> 30% or more, sometimes twice faster).
 !
-! note 5: one common technique in computational science to help compilers
-!             enhance pipelining is loop unrolling. we do attempt this here in a very simple
-!             and straigthforward way. so don't be confused about the somewhat
-!             bewildering do-loop writing...
+! note 5: a common technique to help compilers enhance pipelining is loop unrolling. We do this here in a simple
+!             and straigthforward way, so don't be confused about the do-loop writing.
 !
 ! note 6: whenever adding some new code, please make sure to use
-!             spaces rather than tabs. tabulators have different sizes in different editors
-!             and most of the time, it messes up the code's formating :(
+!             spaces rather than tabs. Tabulators are in principle not allowed in Fortran95.
 !
 !-------------------------------------------------------------------------------------------------
 !-------------------------------------------------------------------------------------------------