|Message: Re: Multithreading with Xeon Phi||Not Logged In (login)|
Click on the Forum title, e.g. on the "Forums by Category" page, to read a sequence of postings to the Forum and its threads all in one page. If you are only interested in one thread or the thread following a specific posting, click the thread or the posting, which takes you to a smaller page, which contains only the part you are interested in and may be easier to navigate.
Messages are "chained" if there are only replies at the first level, i.e. 1/1.html, 1/1/1.html etc. In case of "chained" messages the message number is replaced by the icon and there is no indentation.
Inline: Display the subject line only or also the text of the posting(s); for the choice "All" the "Outline" choices are switched off.
|1||0||1||no text / full text of posting|
|2||1||All||text for level 1 only / text for All postings|
Outline: Choose the depth of the posting thread, successive toggle controls provide increasing detail.
|1||2||1||2 levels / 1 level (original posting)|
|2||3||2||3 levels / 2 levels|
|3||3||All||3 levels / all levels (all postings)|
How to use efficiently a co-processor like the Xeon Phi card strongly depends on your application. I can only provide here general comments to your questions. Also I have done only few tests on the Xeon Phi and the comments I provide are based on my experience. From your questions I do understand that you are new to the problem. I encourage you to search online some articles and material on how Phi cards have been used for scientific applications. This can give you an idea on how to use the cards and especially what to expect from them. As an example let me give you a pointer to a talk done by a company on the subject to one of our tutorials: http://research.colfaxinternational.com/post/2014/03/06/Geant4-Tutorial.aspx
We have successfully run Geant4 MT on a Intel Xeon Phi cards obtaining promising results. More work in foreseen in the future. It is important to note that, while G4 can run on the MIC, you may need to adapt (in some cases substantially) your user code. See: http://geant4.slac.stanford.edu/SLACTutorial14/MultiThreading2.pdf
Now to your questions:
Question 1: We have not done tests in offload mode, but instead we have done it running a native version of the application on the card. After some experimentation this solution seems, at the moment, the best way to proceed because it does not require the (usually complex) handling of offload.
In this scenario the HOST cpu is not used at all, and thus there is no strict requirement on the host cpu. Remember, however, that there are minimum requirements on the host to run a xeon phi card. You should look on intel website for recent updates. Typically you need a Xeon processor, clearly at least a PCIe slot, but the most important requirements come from power consumption. Several companies provide solutions that are compatible with Xeon Phi cards. What I foresee is a MPI-based approach in which multiple instances of the application run on the PC equipped with Phi cards (one on the host, the other on the cards). In such a case it is probably important to have a host that is similar in performance with the card (it is the case for all recent Xeon processors).
Question 2: Xeon Phi computing uses a heterogenous memory model. The card has its own memory, no (direct) access to host memory is foreseen. You can use offload from hosts (see previous comments), but this will "copy" the data from/to the card and not "share". For input/output files sharing: a virtual FS is also present for storage of output and providing input (NFS is also supported for large files). Optimizing memory is not a trivial task. What you have to do is to share among threads the most memory consumption objects of your application. For Geant4 we have done this for you, but you need to do something similar for user data if they are very memory consuming. In our work plan for 2014 we foresee to work on further reducing thread-private memory consumption in G4.
Question 3: From a user point of view, the main differences between card models is the quantity of memory (plus small difference in number of cores/performances per core). Clearly with the 3120 that has only 6GB of ram it is quite important to optimize memory usage so you can run all hardware threads in the memory budget. Other constrains are thermal and power needs, but I do not know the details here. Again intel website provides comparisons.