Using cutting edge vector and parallel code, Codex is taking software optimization to new levels, increasing software performance in orders of magnitude (quite like using rocket fuel in your SUV !)

Eigen port to ARM NEON! | 

Today, I finished the ARM NEON port of the popular open source Eigen math library.

Here is the commit log:

http://bitbucket.org/eigen/eigen/changeset/e26af7abc0af/

And here are the results of a simple benchmark (doing matrix addition/multiplication of 512x512 matrices):

Scalar:

$ ./bench_gemm.gcc4.4.1cs

eigen cpu 3.84s 0.0699051 GFLOPS (19.27s)

eigen real 3.8469s 0.0697796 GFLOPS (19.2648s)

NEON:

$ ./bench_gemm.gcc4.4.1cs.neon

eigen cpu 0.81s 0.331402 GFLOPS (4.07s)

eigen real 0.813919s 0.329806 GFLOPS (4.07218s)


~4.6x faster...

Other tests/benchmarks show ~200-600%. Still a few things remain to be fixed, but nothing serious and nothing that we expect will affect NEON performance that much.


No other comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC.

posted on: 3 Mar, 2010

Anonymous12 Jun 2010, 21:54

http://x264dev.multimedia.cx/?p=142 come to that , will you also be helping the other x264 assembly ARM dev's code New NEON code for it too ?

Anonymous12 Jun 2010, 21:53

...in fact i think it was you that said and actually proved with your SIMD code using that PC you sold off now, that Linux PPC and all the Glib and related Linux library's needed some serious SIMD work to even make use of them at the core Linux library levels for increased generic speed.

Anonymous12 Jun 2010, 21:51

cool, will you and/or Genesi be bringing ARM A8/9 NEON optimisations to the generic Linux ARM core Glib etc library's upstream any time soon, and so help 3rd party apps take advantage of NEON SIMD speed without them optimising for it themselves one app at a time, such as the x264 AVC/H.264 video Encoder ARM Port etc. in fact i think it was you that said and actually proved with your SIMD code usi

post comment

Your Name*

Your Message* (max 400 characters)

Security Check:*