|Message: Parallelization of very short events, help!||Not Logged In (login)|
Click on the Forum title, e.g. on the "Forums by Category" page, to read a sequence of postings to the Forum and its threads all in one page. If you are only interested in one thread or the thread following a specific posting, click the thread or the posting, which takes you to a smaller page, which contains only the part you are interested in and may be easier to navigate.
Messages are "chained" if there are only replies at the first level, i.e. 1/1.html, 1/1/1.html etc. In case of "chained" messages the message number is replaced by the icon and there is no indentation.
Inline: Display the subject line only or also the text of the posting(s); for the choice "All" the "Outline" choices are switched off.
|1||0||1||no text / full text of posting|
|2||1||All||text for level 1 only / text for All postings|
Outline: Choose the depth of the posting thread, successive toggle controls provide increasing detail.
|1||2||1||2 levels / 1 level (original posting)|
|2||3||2||3 levels / 2 levels|
|3||3||All||3 levels / all levels (all postings)|
Hello, I need to make my simulation parallel because, although each event takes only 0.1-0.15 ms, I need to run billions of events, which takes days. Once I get to work in parallel, I'll have access to a 64 node cluster, but meanwhile I am trying this out in a 2-processor computer under Linux. I got my parallel program to work, but it is actually slower than the serial version. To check this, I tried various things with the included N02 and ParN02 and the same things happen:
1.- If the event length is relatively long, approx. 2 ms per event in N02, in the parallel version, ParN02, the event takes approx 1.2 ms (with aggregated-tasks=100), which makes sense in a 2-cpu computer.
2.- But, if the event length is very short, e.g. 0.02-0.04 ms in N02, this takes 0.6 ms in ParN02 even with aggregated-tasks=100. This I understand is probably because the communication between master and slave is still the predominant time factor over the simulation time of the 100 events. So I tried increasing the aggregated tasks to 1000 and even 10,000 and higher, and still ParN02 is slower, reaching approx 0.2 ms per event, but it seems that it is slower for a different reason. When I use the trace=1 option, I see that when the master is trying to send a job to the slaves, a lot of:
master -> -1:
are produced before the job is actually accepted by one of the slaves. I believe this "rejection" (I am not sure if this -1 means the job was rejected) is what is slowing down the simulation. From what I saw in the TOP-C code, a -1 is returned if there are no slaves available.
So it seems like if the total job size sent to a slave is too small (i.e. aggregated-tasks=100 or so), the parallel version suffers because of the communication overhead, and if the job included many aggregated-tasks then the job keeps getting rejected for some reason.
Is there somebody with more experience in ParGeant4 that could tell me if this is the cause of the problem, and, more importantly, how to solve it? Will this problem go away if I use a 4-cpu workstation or a 64 node cluster?
Thank you very much!
|Inline Depth:||Outline Depth:||Add message:|