I just found out I have access to a few systems with dual Xeon PHI 5110p and have just started read up on them.
These are pretty interesting devices. The have 60 cores running 4 threads per core using x86 architecture.
It seems like compute intensive tasks that can run multiple threads (like hundreds) can really take advantage. I do have some embarrassingly parallel problems that use Java code. I haven't done the parallelization of them yet but if all I have to do is crank up the number of threads in the pool it might be worth it.
I'm interested in any experience you can share.
It's not what your program can do, it's what your users do with the program.
I'm pretty comfortable paralellizing the project. I may come back when I run into trouble but I've gone through the process successfully a few times and I'll start with the easiest problem.
My question is more how effective is Java 7 at taking advantage of this thing. It sounds like all I have to do is generate a lot of compute bound threads. Working out a thread pool, set to 8 to test on my desktop workstation and set it to 240 to run on the PHI (tweaking those numbers is expected). That sounds just too good to be true. I'm hoping for some confirmation.
I just learned this resource is available on one of the clusters I use and started on the Intel tutorials for C++ programming, so any comments about experience with Java running on a PHI coprocessor are more than welcome.
Right now the sys admin is having trouble with NFS so ssh is disabled. I expect to be able to answer the JVM question soon.
This is a university cluster with something like 5000 cores, I believe 20 nVidia K10's and 10 Xeon Phi. I'm not sure how many clusters are actually in the collaboration but I have login privileges on 5 that are similar in specs. It's an amazing amount of resources.
In case your interested this is the LIGO project (http://www.ligo.org). We're currently in year 3 of a 4 year upgrade to the instruments. When we go into Science mode late 2014 or early 2015 we expect to be recording data at the rate of something like 2PB/yr.