[cig-commits] [commit] devel: added the framework for the future option NUMBER_OF_SIMULTANEOUS_RUNS that I will add to setup/constants.h.in (off by default, i.e. equal to 1 by default) to have the ability to run several independent earthquakes in parallel from the same big job (a5856d0)

Thu Jul 17 10:11:23 PDT 2014

Repository : https://github.com/geodynamics/specfem3d

On branch  : devel
Link       : https://github.com/geodynamics/specfem3d/compare/859403fb62b9ea358a24cbd9c87012b37b4ccbfa...98a6f2f211a3dd0f7c25fb3f1a6b7983fbd839ae

>---------------------------------------------------------------

commit a5856d090063c2d2280818fc6c936fdc6e846a54
Author: Dimitri Komatitsch <komatitsch at lma.cnrs-mrs.fr>
Date:   Thu Jul 17 18:49:16 2014 +0200

    added the framework for the future option NUMBER_OF_SIMULTANEOUS_RUNS that I will add to setup/constants.h.in (off by default, i.e. equal to 1 by default) to have the ability to run several independent earthquakes in parallel from the same big job


>---------------------------------------------------------------

a5856d090063c2d2280818fc6c936fdc6e846a54
 doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.pdf | Bin 12651502 -> 12647247 bytes
 doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.tex | 197 ++--------------
 setup/constants.h.in                           |  24 ++
 src/shared/parallel.f90                        | 306 ++++++++++++++++---------
 4 files changed, 243 insertions(+), 284 deletions(-)

diff --git a/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.pdf b/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.pdf
index 297dbfc..3a28034 100644
Binary files a/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.pdf and b/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.pdf differ
diff --git a/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.tex b/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.tex
index 0b16863..daf4531 100644
--- a/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.tex
+++ b/doc/USER_MANUAL/manual_SPECFEM3D_Cartesian.tex
@@ -2158,6 +2158,23 @@ file. There is no need to rerun the \texttt{xgenerate\_databases}
 executable. Of course it is best to include as many stations as possible,
 since this does not add to the cost of the simulation.
 
+We have also added the ability to run several calculations (several earthquakes)
+in an embarrassingly-parallel fashion from within the same run;
+this can be useful when using a very large supercomputer to compute
+many earthquakes in a catalog, in which case it can be better from
+a batch job submission point of few to start fewer and much larger jobs,
+each of them computing several earthquakes in parallel.
+To turn that option on, set parameter NUMBER\_OF\_SIMULTANEOUS\_RUNS
+to a value greater than 1 in file setup/constants.h.in before
+configuring and compiling the code.
+When that option is on, of course the number of processor cores used to start
+the code in the batch system must be a multiple of NUMBER\_OF\_SIMULTANEOUS\_RUNS,
+all the individual runs must use the same number of processor cores
+(which as usual is NPROC in the input file DATA/Par\_file,
+and thus the total number of processor cores to request from the batch system
+should be NUMBER\_OF\_SIMULTANEOUS\_RUNS * NPROC.
+All the runs to perform must be placed in directories called run0001, run0002, run0003 and so on (with exactly four digits).
+
 
 \chapter{\label{cha:Fault-Kinematics-Dynamics}Kinematic and dynamic fault
 sources}
@@ -3293,9 +3310,7 @@ see Southern California ShakeMovie\textregistered{}\urlwithparentheses{www.shake
 When running xspecfem3D with the \texttt{\small MOVIE\_VOLUME}{\small{}
 flag turned on, the code outputs several files in }\texttt{\small LOCAL\_PATH}{\small{}
 specified in the main }\texttt{\small Par\_file}{\small , e.g. in
-directory }\texttt{\small OUTPUT\_FILES/DATABASES\_MPI}{\small . %As the files can be very large, there are several flags in the \texttt{\small Par\_file}
-%that control the region in space and time that is saved. These are:
-%\texttt{\small NTSTEP\_BETWEEN\_FRAMES}
+directory }\texttt{\small OUTPUT\_FILES/DATABASES\_MPI}{\small .
 The output is saved by each processor at the time interval specified
 by }\texttt{\small NTSTEP\_BETWEEN\_FRAMES}{\small . For all domains,
 the velocity field is output to files: }\texttt{\small proc??????\_velocity\_X\_it??????.bin}{\small ,
@@ -3548,176 +3563,6 @@ Examples of job scripts can be found in the \texttt{\small utils/Cluster/}{\smal
 directory and can straightforwardly be modified and adapted to meet
 more specific running needs.}{\small \par}
 
-We describe here in some detail a job submission procedure for the
-Caltech 1024-node cluster, CITerra, under the LSF scheduling system.
-We consider the submission of a regular forward simulation using the
-internal mesher to create mesh partitions. The two main scripts are
-\texttt{\small run\_lsf.bash}{\small , which compiles the Fortran
-code and submits the job to the scheduler, and }\texttt{\small go\_mesher\_solver\_lsf\_basin.forward}{\small ,
-which contains the instructions that make up the job itself. These
-scripts can be found in }\texttt{\small utils/Cluster/lsf/}{\small{}
-directory}{\small \par}
-
-
-\section{Job submission \texttt{run\_lsf.bash}}
-
-This script first sets the job queue to be `normal'. It then compiles
-the mesher, database generator and solver together, figures out the
-number of processors required for this simulation from the \texttt{DATA/Par\_file},
-and submits the LSF job.
-\begin{lyxcode}
-\#!/bin/bash
-
-\#~use~the~normal~queue~unless~otherwise~directed~queue=\textquotedbl{}-q~normal\textquotedbl{}~
-
-if~{[}~\$\#~-eq~1~{]};~then
-
-~~~~~~~~echo~\textquotedbl{}Setting~the~queue~to~\$1\textquotedbl{}
-
-~~~~~~~~queue=\textquotedbl{}-q~\$1\textquotedbl{}~
-
-fi~~\\
-~~~\\
-~\#~compile~the~mesher~and~the~solver~
-
-d=`date'~echo~\textquotedbl{}Starting~compilation~\$d\textquotedbl{}~
-
-make~clean~
-
-make~xmeshfem3D~
-
-make~xgenerate\_databases~
-
-make~xspecfem3D~
-
-d=`date'~
-
-echo~\textquotedbl{}Finished~compilation~\$d\textquotedbl{}~~\\
-~~~\\
-~\#~get~total~number~of~nodes~needed~for~solver~
-
-NPROC=`grep~NPROC~DATA/Par\_file~|~cut~-c~34-~'~~\\
-~~~\\
-
-
-\#~compute~total~number~of~nodes~needed~for~mesher~
-
-NPROC\_XI=`grep~NPROC\_XI~DATA/meshfem3D\_files/Mesh\_Par\_file~|~cut~-c~34-~'~~~~\\
-
-
-NPROC\_ETA=`grep~NPROC\_ETA~DATA/meshfem3D\_files/Mesh\_Par\_file~|~cut~-c~34-~'~~~~\\
-~\#~total~number~of~nodes~is~the~product~of~the~values~read~
-
-numnodes=\$((~\$NPROC\_XI~{*}~\$NPROC\_ETA~))~~\\
-~~~\\
-
-
-\#~checks~total~number~of~nodes~
-
-if~{[}~\$numnodes~-neq~\$NPROC~{]};~then
-
-~~~~~~~~echo~\textquotedbl{}error~number~of~procs~mismatch\textquotedbl{}
-
-~~~~~~~~exit~
-
-fi~~\\
-
-
-~~\\
-~echo~\textquotedbl{}Submitting~job\textquotedbl{}~
-
-bsub~\$queue~-n~\$numnodes~-W~60~-K~<go\_mesher\_solver\_lsf.forward~
-\end{lyxcode}
-
-\section{Job script \texttt{go\_mesher\_solver\_lsf.forward}}
-
-This script describes the job itself, including setup steps that can
-only be done once the scheduler has assigned a job-ID and a set of
-compute nodes to the job, the \texttt{run\_lsf.bash} commands used
-to run the mesher, database generator and the solver, and calls to
-scripts that collect the output seismograms from the compute nodes
-and perform clean-up operations.
-\begin{enumerate}
-\item First the script directs the scheduler to save its own output and
-output from \texttt{stdout} into ~\\
- \texttt{\small OUTPUT\_FILES/\%J.o}{\small , where }\texttt{\small \%J}{\small{}
-is short-hand for the job-ID; it also tells the scheduler what version
-of }\texttt{\small mpich}{\small{} to use (}\texttt{\small mpich\_gm}{\small )
-and how to name this job (}\texttt{\small go\_mesher\_solver\_lsf}{\small ). }{\small \par}
-\item The script then creates a list of the nodes allocated to this job
-by echoing the value of a dynamically set environment variable \texttt{LSB\_MCPU\_HOSTS}
-and parsing the output into a one-column list using the Perl script
-\texttt{utils/Cluster/lsf/remap\_lsf\_machines.pl}. It then creates
-a set of scratch directories on these nodes (\texttt{\small /scratch/}{\small ~}\\
-{\small{} }\texttt{\small \$USER/DATABASES\_MPI}{\small ) to be used
-as the }\texttt{\small LOCAL\_PATH}{\small{} for temporary storage
-of the database files. The scratch directories are created using }\texttt{\small shmux}{\small ,
-a shell multiplexor that can execute the same commands on many hosts
-in parallel. }\texttt{\small shmux}{\small{} is available from Shmux
-\urlwithparentheses{web.taranis.org/shmux/}. Make sure that the
-}\texttt{\small LOCAL\_PATH}{\small{} parameter in }\texttt{\small DATA/Par\_file}{\small{}
-is also set properly. }{\small \par}
-\item The next portion of the script launches the mesher, database generator
-and then the solver using \texttt{run\_lsf.bash}.
-\item The final portion of the script collects the seismograms and performs
-clean up on the nodes, using the Perl scripts \texttt{collect\_seismo\_lsf\_multi.pl}
-and \texttt{cleanmulti.pl}. \end{enumerate}
-\begin{lyxcode}
-\#!/bin/bash~-v~
-
-\#BSUB~-o~OUTPUT\_FILES/\%J.o~
-
-\#BSUB~-a~mpich\_gm~
-
-\#BSUB~-J~go\_mesher\_solver\_lsf~~\\
-~~~\\
-~\#~set~up~local~scratch~directories
-
-BASEMPIDIR=/scratch/\$USER/DATABASES\_MPI
-
-mkdir~-p~OUTPUT\_FILES~
-
-echo~\textquotedbl{}\$LSB\_MCPU\_HOSTS\textquotedbl{}~>~OUTPUT\_FILES/lsf\_machines~
-
-echo~\textquotedbl{}\$LSB\_JOBID\textquotedbl{}~>~OUTPUT\_FILES/jobid
-
-remap\_lsf\_machines.pl~OUTPUT\_FILES/lsf\_machines~>~OUTPUT\_FILES/machines
-
-shmux~-M50~-Sall~-c~\textquotedbl{}rm~-r~-f~/scratch/\$USER;~\textbackslash{}
-
-~~~~~~mkdir~-p~/scratch/\$USER;~mkdir~-p~\$BASEMPIDIR\textquotedbl{}~\textbackslash{}
-
-~~~~~~-~<~OUTPUT\_FILES/machines~>/dev/null~~\\
-~~~\\
-~\#~run~the~specfem~program
-
-current\_pwd=\$PWD~~\\
-
-
-cd~bin/~~\\
-
-
-run\_lsf.bash~-{}-gm-no-shmem~-{}-gm-copy-env~\$current\_pwd/xmeshfem3D~~\\
-
-
-run\_lsf.bash~-{}-gm-no-shmem~-{}-gm-copy-env~\$current\_pwd/xgenerate\_databases~~\\
-
-
-run\_lsf.bash~-{}-gm-no-shmem~-{}-gm-copy-env~\$current\_pwd/xspecfem3D~~\\
-~~~\\
-~\#~collect~seismograms~and~clean~up
-
-cd~current\_pwd/~~~\\
-
-
-mkdir~-p~SEM
-
-cd~SEM/~
-
-collect\_seismo.pl~../OUTPUT\_FILES/lsf\_machines
-
-cleanbase.pl~../OUTPUT\_FILES/machines
-\end{lyxcode}
 
 \chapter{{\normalsize \label{cha:-Changing-the}} Changing the Model}
 
@@ -3727,12 +3572,8 @@ that replace existing subroutines in the \texttt{SPECFEM3D Cartesian}
 package. Note that \texttt{SPECFEM3D Cartesian} can handle Earth models
 with material properties that vary within each spectral element.
 
-%% magnoni 6/12
-
-
 
-\section{{\normalsize \label{sec:Using-tomographic}}Using external tomographic
-Earth models}
+\section{{\normalsize \label{sec:Using-tomographic}}Using external tomographic Earth models}
 
 To implement your own external tomographic model(s), you must provide
 your own external tomography file(s), and choose between two possible
diff --git a/setup/constants.h.in b/setup/constants.h.in
index 7529b87..ec4369f 100644
--- a/setup/constants.h.in
+++ b/setup/constants.h.in
@@ -46,6 +46,30 @@
 
 !----------- parameters that can be changed by the user -----------
 
+!! DK DK July 2014, CNRS Marseille, France:
+!! DK DK added the ability to run several calculations (several earthquakes)
+!! DK DK in an embarrassingly-parallel fashion from within the same run;
+!! DK DK this can be useful when using a very large supercomputer to compute
+!! DK DK many earthquakes in a catalog, in which case it can be better from
+!! DK DK a batch job submission point of few to start fewer and much larger jobs,
+!! DK DK each of them computing several earthquakes in parallel.
+!! DK DK To turn that option on, set parameter NUMBER_OF_SIMULTANEOUS_RUNS
+!! DK DK to a value greater than 1 in file setup/constants.h.in before
+!! DK DK configuring and compiling the code.
+!! DK DK To implement that, we create NUMBER_OF_SIMULTANEOUS_RUNS MPI sub-communicators,
+!! DK DK each of them being labeled "my_local_mpi_comm_world", and we use them
+!! DK DK in all the routines in "src/shared/parallel.f90", except in MPI_ABORT() because in that case
+!! DK DK we need to keep the entire run.
+!! DK DK When that option is on, of course the number of processor cores used to start
+!! DK DK the code in the batch system must be a multiple of NUMBER_OF_SIMULTANEOUS_RUNS,
+!! DK DK all the individual runs must use the same number of processor cores
+!! DK DK (which as usual is NPROC in the input file DATA/Par_file,
+!! DK DK and thus the total number of processor cores to request from the batch system
+!! DK DK should be NUMBER_OF_SIMULTANEOUS_RUNS * NPROC.
+!! DK DK All the runs to perform must be placed in directories called run0001, run0002, run0003 and so on
+!! DK DK (with exactly four digits).
+  integer, parameter :: NUMBER_OF_SIMULTANEOUS_RUNS = 1
+
 ! set to .false.  if running on a Beowulf-type machine with local disks
 ! set to .true. if running on a shared-memory machine with common file system
 ! if running on a Beowulf, also modify name of nodes in filter_machine_file.f90
diff --git a/src/shared/parallel.f90 b/src/shared/parallel.f90
index 577f795..832839d 100644
--- a/src/shared/parallel.f90
+++ b/src/shared/parallel.f90
@@ -25,11 +25,45 @@
 !
 !=====================================================================
 
+!! DK DK July 2014, CNRS Marseille, France:
+!! DK DK added the ability to run several calculations (several earthquakes)
+!! DK DK in an embarrassingly-parallel fashion from within the same run;
+!! DK DK this can be useful when using a very large supercomputer to compute
+!! DK DK many earthquakes in a catalog, in which case it can be better from
+!! DK DK a batch job submission point of few to start fewer and much larger jobs,
+!! DK DK each of them computing several earthquakes in parallel.
+!! DK DK To turn that option on, set parameter NUMBER_OF_SIMULTANEOUS_RUNS
+!! DK DK to a value greater than 1 in file setup/constants.h.in before
+!! DK DK configuring and compiling the code.
+!! DK DK To implement that, we create NUMBER_OF_SIMULTANEOUS_RUNS MPI sub-communicators,
+!! DK DK each of them being labeled "my_local_mpi_comm_world", and we use them
+!! DK DK in all the routines below, except in MPI_ABORT() because in that case
+!! DK DK we need to keep the entire run.
+!! DK DK When that option is on, of course the number of processor cores used to start
+!! DK DK the code in the batch system must be a multiple of NUMBER_OF_SIMULTANEOUS_RUNS,
+!! DK DK all the individual runs must use the same number of processor cores
+!! DK DK (which as usual is NPROC in the input file DATA/Par_file,
+!! DK DK and thus the total number of processor cores to request from the batch system
+!! DK DK should be NUMBER_OF_SIMULTANEOUS_RUNS * NPROC.
+!! DK DK All the runs to perform must be placed in directories called run0001, run0002, run0003 and so on
+!! DK DK (with exactly four digits).
+
+module my_mpi
+
+! main parameter module for specfem simulations
+
+  use mpi
+
+  implicit none
+
+  integer :: my_local_mpi_comm_world
+
+end module my_mpi
+
 !----
 !---- Parallel routines.  All MPI calls belong in this file!
 !----
 
-
   subroutine stop_all()
 
   use mpi
@@ -50,7 +84,7 @@
 
   double precision function wtime()
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -64,13 +98,13 @@
 
   subroutine synchronize_all()
 
-  use mpi
+  use my_mpi
 
   implicit none
 
   integer ier
 
-  call MPI_BARRIER(MPI_COMM_WORLD,ier)
+  call MPI_BARRIER(my_local_mpi_comm_world,ier)
 
   end subroutine synchronize_all
 
@@ -80,7 +114,7 @@
 
   subroutine bcast_all_i(buffer, countval)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -89,7 +123,7 @@
 
   integer ier
 
-  call MPI_BCAST(buffer,countval,MPI_INTEGER,0,MPI_COMM_WORLD,ier)
+  call MPI_BCAST(buffer,countval,MPI_INTEGER,0,my_local_mpi_comm_world,ier)
 
   end subroutine bcast_all_i
 
@@ -99,7 +133,7 @@
 
   subroutine bcast_all_cr(buffer, countval)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -111,7 +145,7 @@
 
   integer ier
 
-  call MPI_BCAST(buffer,countval,CUSTOM_MPI_TYPE,0,MPI_COMM_WORLD,ier)
+  call MPI_BCAST(buffer,countval,CUSTOM_MPI_TYPE,0,my_local_mpi_comm_world,ier)
 
   end subroutine bcast_all_cr
 
@@ -121,7 +155,7 @@
 
   subroutine bcast_all_dp(buffer, countval)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -130,7 +164,7 @@
 
   integer ier
 
-  call MPI_BCAST(buffer,countval,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ier)
+  call MPI_BCAST(buffer,countval,MPI_DOUBLE_PRECISION,0,my_local_mpi_comm_world,ier)
 
   end subroutine bcast_all_dp
 
@@ -140,7 +174,7 @@
 
   subroutine bcast_all_r(buffer, countval)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -152,7 +186,7 @@
 
   integer ier
 
-  call MPI_BCAST(buffer,countval,MPI_REAL,0,MPI_COMM_WORLD,ier)
+  call MPI_BCAST(buffer,countval,MPI_REAL,0,my_local_mpi_comm_world,ier)
 
   end subroutine bcast_all_r
 
@@ -163,7 +197,7 @@
 
   subroutine gather_all_i(sendbuf, sendcnt, recvbuf, recvcount, NPROC)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -175,7 +209,7 @@
 
   call MPI_GATHER(sendbuf,sendcnt,MPI_INTEGER, &
                   recvbuf,recvcount,MPI_INTEGER, &
-                  0,MPI_COMM_WORLD,ier)
+                  0,my_local_mpi_comm_world,ier)
 
   end subroutine gather_all_i
 
@@ -186,7 +220,7 @@
 
   subroutine gather_all_singlei(sendbuf, recvbuf, NPROC)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -198,7 +232,7 @@
 
   call MPI_GATHER(sendbuf,1,MPI_INTEGER, &
                   recvbuf,1,MPI_INTEGER, &
-                  0,MPI_COMM_WORLD,ier)
+                  0,my_local_mpi_comm_world,ier)
 
   end subroutine gather_all_singlei
 
@@ -209,7 +243,7 @@
 
   subroutine gather_all_dp(sendbuf, sendcnt, recvbuf, recvcount, NPROC)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -221,7 +255,7 @@
 
   call MPI_GATHER(sendbuf,sendcnt,MPI_DOUBLE_PRECISION, &
                   recvbuf,recvcount,MPI_DOUBLE_PRECISION, &
-                  0,MPI_COMM_WORLD,ier)
+                  0,my_local_mpi_comm_world,ier)
 
   end subroutine gather_all_dp
 
@@ -231,7 +265,7 @@
 
   subroutine gather_all_cr(sendbuf, sendcnt, recvbuf, recvcount, NPROC)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -246,7 +280,7 @@
 
   call MPI_GATHER(sendbuf,sendcnt,CUSTOM_MPI_TYPE, &
                   recvbuf,recvcount,CUSTOM_MPI_TYPE, &
-                  0,MPI_COMM_WORLD,ier)
+                  0,my_local_mpi_comm_world,ier)
 
   end subroutine gather_all_cr
 
@@ -256,7 +290,7 @@
 
   subroutine gather_all_all_cr(sendbuf, recvbuf, counts, NPROC)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -270,7 +304,7 @@
   integer ier
 
   call MPI_ALLGATHER(sendbuf,counts,CUSTOM_MPI_TYPE,recvbuf,counts,CUSTOM_MPI_TYPE, &
-                 MPI_COMM_WORLD,ier)
+                 my_local_mpi_comm_world,ier)
 
   end subroutine gather_all_all_cr
 
@@ -280,7 +314,7 @@
 
   subroutine gatherv_all_cr(sendbuf, sendcnt, recvbuf, recvcount, recvoffset,recvcounttot, NPROC)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -296,7 +330,7 @@
 
   call MPI_GATHERV(sendbuf,sendcnt,CUSTOM_MPI_TYPE, &
                   recvbuf,recvcount,recvoffset,CUSTOM_MPI_TYPE, &
-                  0,MPI_COMM_WORLD,ier)
+                  0,my_local_mpi_comm_world,ier)
 
   end subroutine gatherv_all_cr
 
@@ -306,7 +340,7 @@
 
   subroutine init()
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -315,6 +349,9 @@
 ! initialize the MPI communicator and start the NPROCTOT MPI processes.
   call MPI_INIT(ier)
 
+! create sub-communicators if needed, if running more than one earthquake from the same job
+  call world_split()
+
   end subroutine init
 
 !
@@ -323,12 +360,15 @@
 
   subroutine finalize()
 
-  use mpi
+  use my_mpi
 
   implicit none
 
   integer ier
 
+! close sub-communicators if needed, if running more than one earthquake from the same job
+  call world_unsplit()
+
 ! stop all the MPI processes, and exit
   call MPI_FINALIZE(ier)
 
@@ -340,14 +380,14 @@
 
   subroutine world_size(sizeval)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
   integer sizeval
   integer ier
 
-  call MPI_COMM_SIZE(MPI_COMM_WORLD,sizeval,ier)
+  call MPI_COMM_SIZE(my_local_mpi_comm_world,sizeval,ier)
 
   end subroutine world_size
 
@@ -357,14 +397,14 @@
 
   subroutine world_rank(rank)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
   integer rank
   integer ier
 
-  call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ier)
+  call MPI_COMM_RANK(my_local_mpi_comm_world,rank,ier)
 
   end subroutine world_rank
 
@@ -374,7 +414,7 @@
 
   subroutine min_all_dp(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -382,7 +422,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,MPI_DOUBLE_PRECISION, &
-                  MPI_MIN,0,MPI_COMM_WORLD,ier)
+                  MPI_MIN,0,my_local_mpi_comm_world,ier)
 
   end subroutine min_all_dp
 
@@ -392,7 +432,7 @@
 
   subroutine max_all_dp(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -400,7 +440,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,MPI_DOUBLE_PRECISION, &
-                  MPI_MAX,0,MPI_COMM_WORLD,ier)
+                  MPI_MAX,0,my_local_mpi_comm_world,ier)
 
   end subroutine max_all_dp
 
@@ -410,7 +450,7 @@
 
   subroutine max_all_cr(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -421,7 +461,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,CUSTOM_MPI_TYPE, &
-                  MPI_MAX,0,MPI_COMM_WORLD,ier)
+                  MPI_MAX,0,my_local_mpi_comm_world,ier)
 
   end subroutine max_all_cr
 
@@ -431,7 +471,7 @@
 
   subroutine min_all_cr(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -442,7 +482,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,CUSTOM_MPI_TYPE, &
-                  MPI_MIN,0,MPI_COMM_WORLD,ier)
+                  MPI_MIN,0,my_local_mpi_comm_world,ier)
 
   end subroutine min_all_cr
 
@@ -453,7 +493,7 @@
 
   subroutine min_all_all_cr(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -464,7 +504,7 @@
   integer ier
 
   call MPI_ALLREDUCE(sendbuf,recvbuf,1,CUSTOM_MPI_TYPE, &
-                  MPI_MIN,MPI_COMM_WORLD,ier)
+                  MPI_MIN,my_local_mpi_comm_world,ier)
 
   end subroutine min_all_all_cr
 
@@ -475,7 +515,7 @@
 !
 !  subroutine min_all_all_dp(sendbuf, recvbuf)
 !
-!  use mpi
+!  use my_mpi
 !
 !  implicit none
 !
@@ -486,7 +526,7 @@
 !  integer ier
 !
 !  call MPI_ALLREDUCE(sendbuf,recvbuf,1,MPI_DOUBLE_PRECISION, &
-!                  MPI_MIN,MPI_COMM_WORLD,ier)
+!                  MPI_MIN,my_local_mpi_comm_world,ier)
 !
 !  end subroutine min_all_all_dp
 !
@@ -496,7 +536,7 @@
 
   subroutine max_all_i(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -507,7 +547,7 @@
   integer :: ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,MPI_INTEGER, &
-                  MPI_MAX,0,MPI_COMM_WORLD,ier)
+                  MPI_MAX,0,my_local_mpi_comm_world,ier)
 
   end subroutine max_all_i
 
@@ -518,7 +558,7 @@
 
   subroutine max_allreduce_i(buffer,countval)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -530,11 +570,11 @@
   integer,dimension(countval) :: send
 
   ! seems not to be supported on all kind of MPI implementations...
-  !call MPI_ALLREDUCE(MPI_IN_PLACE, buffer, countval, MPI_INTEGER, MPI_MAX, MPI_COMM_WORLD, ier)
+  !call MPI_ALLREDUCE(MPI_IN_PLACE, buffer, countval, MPI_INTEGER, MPI_MAX, my_local_mpi_comm_world, ier)
 
   send(:) = buffer(:)
 
-  call MPI_ALLREDUCE(send, buffer, countval, MPI_INTEGER, MPI_MAX, MPI_COMM_WORLD, ier)
+  call MPI_ALLREDUCE(send, buffer, countval, MPI_INTEGER, MPI_MAX, my_local_mpi_comm_world, ier)
   if( ier /= 0 ) stop 'Allreduce to get max values failed.'
 
   end subroutine max_allreduce_i
@@ -545,7 +585,7 @@
 
   subroutine max_all_all_cr(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -556,7 +596,7 @@
   integer ier
 
   call MPI_ALLREDUCE(sendbuf,recvbuf,1,CUSTOM_MPI_TYPE, &
-                  MPI_MAX,MPI_COMM_WORLD,ier)
+                  MPI_MAX,my_local_mpi_comm_world,ier)
 
   end subroutine max_all_all_cr
 
@@ -568,7 +608,7 @@
 
   subroutine max_all_all_dp(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -579,7 +619,7 @@
   integer ier
 
   call MPI_ALLREDUCE(sendbuf,recvbuf,1,MPI_DOUBLE_PRECISION, &
-                  MPI_MAX,MPI_COMM_WORLD,ier)
+                  MPI_MAX,my_local_mpi_comm_world,ier)
 
   end subroutine max_all_all_dp
 
@@ -590,7 +630,7 @@
 
   subroutine min_all_i(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -601,7 +641,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,MPI_INTEGER, &
-                  MPI_MIN,0,MPI_COMM_WORLD,ier)
+                  MPI_MIN,0,my_local_mpi_comm_world,ier)
 
   end subroutine min_all_i
 
@@ -611,7 +651,7 @@
 
   subroutine maxloc_all_dp(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -619,7 +659,7 @@
   integer ier
 
   call MPI_ALLREDUCE(sendbuf,recvbuf,1,MPI_2DOUBLE_PRECISION, &
-                  MPI_MAXLOC,MPI_COMM_WORLD,ier)
+                  MPI_MAXLOC,my_local_mpi_comm_world,ier)
 
   end subroutine maxloc_all_dp
 
@@ -631,7 +671,7 @@
 
   subroutine sum_all_dp(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -639,7 +679,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,MPI_DOUBLE_PRECISION, &
-                  MPI_SUM,0,MPI_COMM_WORLD,ier)
+                  MPI_SUM,0,my_local_mpi_comm_world,ier)
 
   end subroutine sum_all_dp
 
@@ -649,7 +689,7 @@
 
   subroutine sum_all_cr(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -660,7 +700,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,CUSTOM_MPI_TYPE, &
-                  MPI_SUM,0,MPI_COMM_WORLD,ier)
+                  MPI_SUM,0,my_local_mpi_comm_world,ier)
 
   end subroutine sum_all_cr
 
@@ -670,7 +710,7 @@
 
   subroutine sum_all_1Darray_dp(sendbuf, recvbuf, nx)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -678,7 +718,7 @@
   double precision, dimension(nx) :: sendbuf, recvbuf
   integer :: ier
 
-  call MPI_REDUCE(sendbuf,recvbuf,nx,MPI_DOUBLE_PRECISION,MPI_SUM,0,MPI_COMM_WORLD,ier)
+  call MPI_REDUCE(sendbuf,recvbuf,nx,MPI_DOUBLE_PRECISION,MPI_SUM,0,my_local_mpi_comm_world,ier)
 
   end subroutine sum_all_1Darray_dp
 
@@ -688,7 +728,7 @@
 
   subroutine sum_all_all_cr(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -699,7 +739,7 @@
   integer ier
 
   call MPI_ALLREDUCE(sendbuf,recvbuf,1,CUSTOM_MPI_TYPE, &
-                  MPI_SUM,MPI_COMM_WORLD,ier)
+                  MPI_SUM,my_local_mpi_comm_world,ier)
 
   end subroutine sum_all_all_cr
 
@@ -709,7 +749,7 @@
 
   subroutine sum_all_i(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -717,7 +757,7 @@
   integer ier
 
   call MPI_REDUCE(sendbuf,recvbuf,1,MPI_INTEGER, &
-                  MPI_SUM,0,MPI_COMM_WORLD,ier)
+                  MPI_SUM,0,my_local_mpi_comm_world,ier)
 
   end subroutine sum_all_i
 
@@ -727,7 +767,7 @@
 
   subroutine sum_all_all_i(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -735,7 +775,7 @@
   integer ier
 
   call MPI_ALLREDUCE(sendbuf,recvbuf,1,MPI_INTEGER, &
-                  MPI_SUM,MPI_COMM_WORLD,ier)
+                  MPI_SUM,my_local_mpi_comm_world,ier)
 
   end subroutine sum_all_all_i
 
@@ -745,7 +785,7 @@
 
   subroutine any_all_l(sendbuf, recvbuf)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -753,7 +793,7 @@
   integer ier
 
   call MPI_ALLREDUCE(sendbuf,recvbuf,1,MPI_LOGICAL, &
-                  MPI_LOR,MPI_COMM_WORLD,ier)
+                  MPI_LOR,my_local_mpi_comm_world,ier)
 
   end subroutine any_all_l
 
@@ -764,7 +804,7 @@
   subroutine sendrecv_all_cr(sendbuf, sendcount, dest, sendtag, &
                              recvbuf, recvcount, source, recvtag)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -779,7 +819,7 @@
 
   call MPI_SENDRECV(sendbuf,sendcount,CUSTOM_MPI_TYPE,dest,sendtag, &
                     recvbuf,recvcount,CUSTOM_MPI_TYPE,source,recvtag, &
-                    MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)
+                    my_local_mpi_comm_world,MPI_STATUS_IGNORE,ier)
 
   end subroutine sendrecv_all_cr
 
@@ -789,7 +829,7 @@
 
   integer function proc_null()
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -803,7 +843,7 @@
 
   subroutine isend_cr(sendbuf, sendcount, dest, sendtag, req)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -816,7 +856,7 @@
   integer ier
 
   call MPI_ISEND(sendbuf,sendcount,CUSTOM_MPI_TYPE,dest,sendtag, &
-                  MPI_COMM_WORLD,req,ier)
+                  my_local_mpi_comm_world,req,ier)
 
   end subroutine isend_cr
 
@@ -826,7 +866,7 @@
 
   subroutine irecv_cr(recvbuf, recvcount, dest, recvtag, req)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -839,7 +879,7 @@
   integer ier
 
   call MPI_IRECV(recvbuf,recvcount,CUSTOM_MPI_TYPE,dest,recvtag, &
-                  MPI_COMM_WORLD,req,ier)
+                  my_local_mpi_comm_world,req,ier)
 
   end subroutine irecv_cr
 
@@ -849,7 +889,7 @@
 
   subroutine isend_i(sendbuf, sendcount, dest, sendtag, req)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -862,7 +902,7 @@
   integer ier
 
   call MPI_ISEND(sendbuf,sendcount,MPI_INTEGER,dest,sendtag, &
-                  MPI_COMM_WORLD,req,ier)
+                  my_local_mpi_comm_world,req,ier)
 
   end subroutine isend_i
 
@@ -872,7 +912,7 @@
 
   subroutine irecv_i(recvbuf, recvcount, dest, recvtag, req)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -884,7 +924,7 @@
   integer ier
 
   call MPI_IRECV(recvbuf,recvcount,MPI_INTEGER,dest,recvtag, &
-                  MPI_COMM_WORLD,req,ier)
+                  my_local_mpi_comm_world,req,ier)
 
   end subroutine irecv_i
 
@@ -895,7 +935,7 @@
 
   subroutine recv_i(recvbuf, recvcount, dest, recvtag )
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -906,7 +946,7 @@
   integer ier
 
   call MPI_RECV(recvbuf,recvcount,MPI_INTEGER,dest,recvtag, &
-                MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)
+                my_local_mpi_comm_world,MPI_STATUS_IGNORE,ier)
 
   end subroutine recv_i
 
@@ -916,7 +956,7 @@
 
   subroutine recvv_cr(recvbuf, recvcount, dest, recvtag )
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -928,7 +968,7 @@
   integer ier
 
   call MPI_RECV(recvbuf,recvcount,CUSTOM_MPI_TYPE,dest,recvtag, &
-                MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)
+                my_local_mpi_comm_world,MPI_STATUS_IGNORE,ier)
 
   end subroutine recvv_cr
 
@@ -939,7 +979,7 @@
 
   subroutine send_i(sendbuf, sendcount, dest, sendtag)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -949,7 +989,7 @@
   integer,dimension(sendcount):: sendbuf
   integer ier
 
-  call MPI_SEND(sendbuf,sendcount,MPI_INTEGER,dest,sendtag,MPI_COMM_WORLD,ier)
+  call MPI_SEND(sendbuf,sendcount,MPI_INTEGER,dest,sendtag,my_local_mpi_comm_world,ier)
 
   end subroutine send_i
 
@@ -960,7 +1000,7 @@
 
   subroutine send_i_t(sendbuf,sendcount,dest)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -969,7 +1009,7 @@
   integer, dimension(sendcount) :: sendbuf
 
   call MPI_SEND(sendbuf,sendcount,MPI_INTEGER,dest,tag, &
-       MPI_COMM_WORLD,ier)
+       my_local_mpi_comm_world,ier)
 
   end subroutine send_i_t
 
@@ -980,7 +1020,7 @@
 
   subroutine recv_i_t(recvbuf,recvcount,source)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -989,7 +1029,7 @@
   integer, dimension(recvcount) :: recvbuf
 
   call MPI_RECV(recvbuf,recvcount,MPI_INTEGER,source,tag, &
-                MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)
+                my_local_mpi_comm_world,MPI_STATUS_IGNORE,ier)
 
   end subroutine recv_i_t
 
@@ -1000,7 +1040,7 @@
 !
 !  subroutine send_dp_t(sendbuf,sendcount,dest)
 !
-!  use mpi
+!  use my_mpi
 !
 !  implicit none
 !
@@ -1009,7 +1049,7 @@
 !  double precision, dimension(sendcount) :: sendbuf
 !
 !  call MPI_SEND(sendbuf,sendcount,MPI_DOUBLE_PRECISION,dest,tag, &
-!       MPI_COMM_WORLD,ier)
+!       my_local_mpi_comm_world,ier)
 !
 !  end subroutine send_dp_t
 !
@@ -1019,7 +1059,7 @@
 !
 !  subroutine recv_dp_t(recvbuf,recvcount,source)
 !
-!  use mpi
+!  use my_mpi
 !
 !  implicit none
 !
@@ -1028,7 +1068,7 @@
 !  double precision, dimension(recvcount) :: recvbuf
 !
 !  call MPI_RECV(recvbuf,recvcount,MPI_DOUBLE_PRECISION,source,tag, &
-!                MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)
+!                my_local_mpi_comm_world,MPI_STATUS_IGNORE,ier)
 !
 !  end subroutine recv_dp_t
 !
@@ -1039,7 +1079,7 @@
 
   subroutine send_dp(sendbuf, sendcount, dest, sendtag)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -1048,7 +1088,7 @@
   double precision,dimension(sendcount):: sendbuf
   integer ier
 
-  call MPI_SEND(sendbuf,sendcount,MPI_DOUBLE_PRECISION,dest,sendtag,MPI_COMM_WORLD,ier)
+  call MPI_SEND(sendbuf,sendcount,MPI_DOUBLE_PRECISION,dest,sendtag,my_local_mpi_comm_world,ier)
 
   end subroutine send_dp
 
@@ -1058,7 +1098,7 @@
 
   subroutine recv_dp(recvbuf, recvcount, dest, recvtag)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -1068,7 +1108,7 @@
   integer ier
 
   call MPI_RECV(recvbuf,recvcount,MPI_DOUBLE_PRECISION,dest,recvtag, &
-                MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)
+                my_local_mpi_comm_world,MPI_STATUS_IGNORE,ier)
 
   end subroutine recv_dp
 
@@ -1078,7 +1118,7 @@
 
   subroutine sendv_cr(sendbuf, sendcount, dest, sendtag)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -1089,7 +1129,7 @@
   real(kind=CUSTOM_REAL),dimension(sendcount) :: sendbuf
   integer ier
 
-  call MPI_SEND(sendbuf,sendcount,CUSTOM_MPI_TYPE,dest,sendtag,MPI_COMM_WORLD,ier)
+  call MPI_SEND(sendbuf,sendcount,CUSTOM_MPI_TYPE,dest,sendtag,my_local_mpi_comm_world,ier)
 
   end subroutine sendv_cr
 
@@ -1099,7 +1139,7 @@
 
   subroutine wait_req(req)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
@@ -1117,13 +1157,13 @@
 
   subroutine world_get_comm(comm)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
   integer,intent(out) :: comm
 
-  comm = MPI_COMM_WORLD
+  comm = my_local_mpi_comm_world
 
   end subroutine world_get_comm
 
@@ -1133,14 +1173,68 @@
 
   subroutine world_duplicate(comm)
 
-  use mpi
+  use my_mpi
 
   implicit none
 
   integer,intent(out) :: comm
   integer :: ier
 
-  call MPI_COMM_DUP(MPI_COMM_WORLD,comm,ier)
-  if( ier /= 0 ) stop 'error duplicating MPI_COMM_WORLD communicator'
+  call MPI_COMM_DUP(my_local_mpi_comm_world,comm,ier)
+  if( ier /= 0 ) stop 'error duplicating my_local_mpi_comm_world communicator'
 
   end subroutine world_duplicate
+
+!
+!----
+!
+
+! create sub-communicators if needed, if running more than one earthquake from the same job.
+!! DK DK create a sub-communicator for each independent run;
+!! DK DK if there is a single run to do, then just copy the default communicator to the new one
+  subroutine world_split()
+
+  use constants
+  use my_mpi
+
+  implicit none
+
+! integer,intent(out) :: comm
+! integer :: ier
+
+! call MPI_COMM_DUP(my_local_mpi_comm_world,comm,ier)
+! if( ier /= 0 ) stop 'error duplicating my_local_mpi_comm_world communicator'
+
+  if(NUMBER_OF_SIMULTANEOUS_RUNS == 1) then
+    my_local_mpi_comm_world = MPI_COMM_WORLD
+  else if(NUMBER_OF_SIMULTANEOUS_RUNS <= 0) then
+    stop 'NUMBER_OF_SIMULTANEOUS_RUNS <= 0 makes no sense'
+  else
+    !! DK DK temporary hack
+    my_local_mpi_comm_world = MPI_COMM_WORLD
+    stop 'NUMBER_OF_SIMULTANEOUS_RUNS > 1 not implemented yet'
+    ! we should call MPI_COMM_SPLIT() here
+  endif
+
+  end subroutine world_split
+
+!
+!----
+!
+
+! close sub-communicators if needed, if running more than one earthquake from the same job.
+  subroutine world_unsplit()
+
+  use constants
+  use my_mpi
+
+  implicit none
+
+  if(NUMBER_OF_SIMULTANEOUS_RUNS > 1) then
+    !! DK DK temporary hack
+    stop 'NUMBER_OF_SIMULTANEOUS_RUNS > 1 not implemented yet'
+    ! we should call MPI_COMM_FREE() here
+  endif
+
+  end subroutine world_unsplit
+