Using cutting edge vector and parallel code, Codex is taking software optimization to new levels, increasing software performance in orders of magnitude (quite like using rocket fuel in your SUV !)

What is SIMD?

SIMD stands for Single Instruction Multiple Data. It is the methodology which most vector units operate on. Basically what it means is that the CPU operates the same instruction on multiple data, thus achieving parallel operation. Scalar CPUs operate SISD (Single Instruction Single Data) while true parallel computers operate MIMD (Multiple Instruction Multiple Data). (I guess MISD does not really make much sense in most cases :-).

These diagrams give a simple explanation of how SIMD works:
Scalar: Scalar (SISD processing)Scalar: Scalar (SISD processing)
Vector: Vector (SIMD processing)Vector: Vector (SIMD processing)

(Disclaimer: These images were not created by us. I found them sometime ago on some website and I thought they were cool, but I don't remember which though. They're used here with the hope that the original creator won't mind)

So how much faster is SIMD?

Given the fact that SIMD operates on 128-bit vectors (in most implementations anyway), that is four 32-bit words, it's not unusual to expect a performance increase of 400% in an algorithm. If the operation is carried out on 16-bit values or 8-bit values the performance increase can be even more. However, that's the theoritical estimate. One has to consider also the overhead of loading/storing the values from/to variables, and the accompanying multiplexing/demultiplexing. But with proper instruction scheduling and careful programming, the 400% speed increase can be achieved, yes.

What CPUs use SIMD?

Most modern CPUs offer a SIMD engine in one form. Eg. Intel offered initially MMX, then SSE/SSE2/SSE3. AMD offers 3dNow!, Motorola/Freescale offers AltiVec (a.k.a VMX or Velocity Engine) in its PowerPC74xx and 8641(D) CPUs, which is also used in the IBM CPUs (PowerPC 970, Power6).

What about Cell?

Cell Broadband Engine(BE), the CPU that is used by Sony in the Playstation 3 and by IBM in many high-end workstations/clusters, includes 2 AltiVec units in the PPUs. However, Cell's real strength is not AltiVec but the 8 SPEs, which are highly specialized parallel, asynchronous vector units with 256KBs of local RAM each.

More on Cell...

What's so special about Cell?

I guess you don't have to take our word for that. A simple Google search on Cell benchmarks will convince you that Cell is a formidable platform when it comes to applications that are heavy on computation, be it 3D rendering, medical applications, scientific simulations, etc.

The questions should be rather can my software benefit from Cell's computational power? and how easy is that?. Read on for the answers for both questions.

Can my software benefit from Cell's computational power?

Well, does your software make heavy use of computations? If it does it will probably benefit from Cell.

How easy is it to optimize for the Cell?

There are many ways to use Cell, some are easier than others. (blah blah, must write more on APIs, post some links, etc).

Codex's role

So what does Codex have to do with SIMD and Cell?

Codex offers software optimization services, both for AltiVec-powered CPUs and also for Cell. If your software can benefit from such optimizations, we can do it. Note that not all kinds of software, need such optimizations or can benefit from them. But in those cases that do and can, the performance gain will be huge.

In case you didn't click on the previous Google search, here's a link to convince you:

http://www.richtigsaidfred.com/?p=99

A simple Playstation 3 wins a Cray X1E (a supercomputer costing Millions of USD)!!