|Message: Re: What's the best choice for number of processes and threads?||Not Logged In (login)|
Click on the Forum title, e.g. on the "Forums by Category" page, to read a sequence of postings to the Forum and its threads all in one page. If you are only interested in one thread or the thread following a specific posting, click the thread or the posting, which takes you to a smaller page, which contains only the part you are interested in and may be easier to navigate.
Messages are "chained" if there are only replies at the first level, i.e. 1/1.html, 1/1/1.html etc. In case of "chained" messages the message number is replaced by the icon and there is no indentation.
Inline: Display the subject line only or also the text of the posting(s); for the choice "All" the "Outline" choices are switched off.
|1||0||1||no text / full text of posting|
|2||1||All||text for level 1 only / text for All postings|
Outline: Choose the depth of the posting thread, successive toggle controls provide increasing detail.
|1||2||1||2 levels / 1 level (original posting)|
|2||3||2||3 levels / 2 levels|
|3||3||All||3 levels / all levels (all postings)|
The user is using a VM. I assume the two cores are the one seen by the VM. So HT is not extremely relevant in this case (not even sure the user has an Intel based machine or if HT is enabled). It is true that in a VM a 10% over-scheduling could be helpful.
In any case over-scheduling for a factor x46 is not very effective in all cases... In all MT applications that I've seen I've never seen getting benefits from over-scheduling beyond 1 or 2 extra threads. If this extreme over-scheduling is indeed necessary I would suggest a detailed analysis of the workload of the application, since this is not normal for a G4 applications. With a 2-core VM, I would doubt that the configuration NP*NT>3 gives benefits. Said this for a single box, MT is usually better than MPI (for memory considerations and complexity of the system). Thus in this case I would scale over the nodes via MPI and over the cores with MT, with: NP=#nodes NT=#cores(+10% to be experimentally verified)
Beside performance considerations, if the test fails at maximum scale it is an indication of an error in the application level or maybe the memory resources of the system have been exhausted and there is a crash.
Hope this clarifies, Andrea
|Inline Depth:||Outline Depth:||Add message:|