Message: Clarifying the correct way to run MT jobs with MPI on a cluster with multiple nodes Not Logged In (login)
 Next-in-Thread Next-in-Thread
 Next-in-Forum Next-in-Forum

Question Clarifying the correct way to run MT jobs with MPI on a cluster with multiple nodes 

Forum: Multithreading
Date: 03 Mar, 2016
From: Sam Blake <Sam Blake>

Hi all,

I just wanted to clarify a few, presumably quite straightforward, things about running MT jobs with MPI on a cluster. I have gone through the forums, MT migrations guides, as well as several powerpoint presentations found via Google (thank you Andrea!), but am still having trouble answering some very basic questions (so apologies in advance if these have already been answered elsewhere!)

First, a few notes about my configuration: I am running v10.2 on a cluster that has 21 nodes and 12 cores per node, with hyperthreading (I assume 2 threads per core). My application follows that of exMPI03 with multithreading enabled and I am using MPI. I specify the number of threads in my application’s input macro with the UI command /run/numberOfThreads <n>. When running jobs on the cluster, I submit a PBS script that includes the following (I’ve omitted the majority for clarity):

      #!/bin/csh
      …
      #PBS -l nodes=<i>:ppn=<j>
      …
      mpiexec -n <k> <G4_application_name> <input_macro.mac>

My question is really: what is the correct combination of these 4 parameters (<n>, <i>, <j>, and <k>) to run MT jobs with MPI?

I have read in Andrea’s powerpoint presentations that users should “scale across cores with MT and scale across nodes with MPI”, but I’m a bit confused as to how to achieve this. My understanding of MPI was that in the above script, I ought to set <k> = <i> * <j>, which ends up creating a separate clone of my application on each of the <k> cores, and <n> threads are then created for each of these cores. Intuitively this doesn’t seem like “using MT to scale across cores” because I believe that things like detectorconstruction are still being created on ALL cores.

If the proper way to run MT jobs is to have one clone per node, and use MT to run each clone across all available cores on the corresponding node, how does one achieve this?

Thank you in advance for clarifying this for me!

Inline Depth:
 1 1
 All All
Outline Depth:
 1 1
 2 2
 All All
Add message: (add)

1 Question: Re: Clarifying the correct way to run MT jobs with MPI on a cluster with multiple nodes   (Svetlana Shasharina - 29 Mar, 2016)
1 Feedback: Re: Clarifying the correct way to run MT jobs with MPI on a cluster with multiple nodes   (Andrea Dotti - 07 Apr, 2016)
2 Feedback: Re: Clarifying the correct way to run MT jobs with MPI on a cluster with multiple nodes   (Andrea Dotti - 07 Apr, 2016)
 Add Message Add Message
to: "Clarifying the correct way to run MT jobs with MPI on a cluster with multiple nodes"

 Subscribe Subscribe

This site runs SLAC HyperNews version 1.11-slac-98, derived from the original HyperNews