Anyone with 32 bit Linux x86 CPUs, it would be great if you could test this new application for me:
Please test the same as the Linux64 thread method.
This application has been tested to run on kernels 2.4.31 and 2.6 successfully.
The application is similar to the 64bit SIMD application in that it applies SIMD SSE or SSE2 instructions in place of costly division in part of the SITO array lookups.
CPU type and features are checked automatically at runtime for SSE or SSE2 support, and optimized code will be run depending on the test. If the processor supports neither, then the base x86 C code will be selected.
Intel Compiler 11.1 was used during compile, with static linking and -mia32 flag. Therefore, any processor Pentium 1 or better should be able to run the program according to Intel docs: "Generates code that will run on any Pentium or later processor"
You may be thinking, why does the optimized code require SSE instead of MMX? Good question, especially since integers are used in the SIMD code. Intel's Pentium III was the first CPU with SSE. With SSE came the added integer extract function. I tested pure MMX code with a manual vector element extract and it was no faster than the regular C code. The built-in SSE integer element extract function made the MMX code much, much faster.
32bit cpus with SSE2 support, only SSE support, and CPUs without either SSE/SSE2 need to be tested. All tests I have run by manually picking the code path have tested ok so far.