One of the reason for starting this project, was an excuse to have a hard look at the SIMD promise of project panama on the JVM. Here are the key takeaways.
- Where it works, it can yiels 4-10x performance benefits for larger arrays. This makes it worthwhile.
- Benchmarking is soverign. Just using SIMD can yield no benefit, some benefit, or make you algorithm 50x slower. Benchmark benchmark benchmark.
- Project panama is "not finished". Although it has establised it's API, the only way to know which parts of the API actrually get accelerated, are to test on your hardware.
- Anything involving integer division is catastrophically slow. There is no hardware instruction for integer division, and so it falls back to a slow implementation that the standard JVM beats the pants off. This is a surpisingly big limitation in linear algebra type algorithms.