2)
Message boards :
Generalized Fermat Prime Search :
AVX on AMD CPUs
(Message 88518)
Posted 2684 days ago by pvh
Running the binary you supplied confirms what you said
Command line: ./genefer_linux64 -b -x avx-amd
Priority change succeeded.
Priority change failed (needs superuser privileges).
Generalized Fermat Number Bench
Running benchmarks for transform implementation "AVX (AMD)"
6008024^256+1 Time: 4.82 us/mul. Err: 0.1406 1736 digits
4913974^512+1 Time: 7.21 us/mul. Err: 0.1250 3427 digits
4019150^1024+1 Time: 16 us/mul. Err: 0.1250 6763 digits
3287270^2048+1 Time: 33.1 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 72.2 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 157 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 341 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 743 us/mul. Err: 0.1641 202102 digits
1203210^65536+1 Time: 1.59 ms/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 3.44 ms/mul. Err: 0.1406 785521 digits
804904^262144+1 Time: 7.31 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 15.7 ms/mul. Err: 0.1406 3050541 digits
538452^1048576+1 Time: 33.3 ms/mul. Err: 0.1328 6009544 digits
440400^2097152+1 Time: 71 ms/mul. Err: 0.1328 11836006 digits
360204^4194304+1 Time: 147 ms/mul. Err: 0.1250 23305854 digits
294612^8388608+1 Time: 312 ms/mul. Err: 0.1328 45879398 digits
Genefer Mark = 3.
Priority change succeeded.
Command line: ./genefer_linux64 -b -x sse4
Priority change succeeded.
Priority change failed (needs superuser privileges).
Generalized Fermat Number Bench
Running benchmarks for transform implementation "SSE4"
6008024^256+1 Time: 2.53 us/mul. Err: 0.1250 1736 digits
4913974^512+1 Time: 4.39 us/mul. Err: 0.1406 3427 digits
4019150^1024+1 Time: 8.26 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 18.1 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 40.9 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 89.3 us/mul. Err: 0.1719 51956 digits
1798620^16384+1 Time: 192 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 418 us/mul. Err: 0.1719 202102 digits
1203210^65536+1 Time: 893 us/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 1.95 ms/mul. Err: 0.1641 785521 digits
804904^262144+1 Time: 4.22 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 9.21 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 20.1 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 43.2 ms/mul. Err: 0.1484 11836006 digits
360204^4194304+1 Time: 95.2 ms/mul. Err: 0.1328 23305854 digits
294612^8388608+1 Time: 199 ms/mul. Err: 0.1328 45879398 digits
Genefer Mark = 5.
Priority change succeeded.
Command line: ./genefer_linux64 -b -x fma4
Priority change succeeded.
Priority change failed (needs superuser privileges).
Generalized Fermat Number Bench
Running benchmarks for transform implementation "FMA4"
6008024^256+1 Time: 4.87 us/mul. Err: 0.1484 1736 digits
4913974^512+1 Time: 7.36 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 16 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 31.5 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 71.6 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 151 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 337 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 711 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 1.56 ms/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 3.28 ms/mul. Err: 0.1406 785521 digits
804904^262144+1 Time: 7.16 ms/mul. Err: 0.1484 1548156 digits
658332^524288+1 Time: 15.1 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 32.5 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 68.4 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 145 ms/mul. Err: 0.1328 23305854 digits
294612^8388608+1 Time: 300 ms/mul. Err: 0.1309 45879398 digits
Genefer Mark = 3.
Priority change succeeded.
Command line: ./genefer_linux64 -b -x fma3
Priority change succeeded.
Priority change failed (needs superuser privileges).
Generalized Fermat Number Bench
Running benchmarks for transform implementation "FMA3"
6008024^256+1 Time: 4.92 us/mul. Err: 0.1484 1736 digits
4913974^512+1 Time: 7.32 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 16.2 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 32 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 73.9 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 153 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 347 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 723 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 1.61 ms/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 3.36 ms/mul. Err: 0.1406 785521 digits
804904^262144+1 Time: 7.41 ms/mul. Err: 0.1484 1548156 digits
658332^524288+1 Time: 15.5 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 33.7 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 70.3 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 150 ms/mul. Err: 0.1328 23305854 digits
294612^8388608+1 Time: 309 ms/mul. Err: 0.1309 45879398 digits
Genefer Mark = 3.
Priority change succeeded.
This doesn't agree with the experience I had with my code, so I will download the source tree and play a bit with that. One thing I noted when looking at the binary is that it had references to both gcc 4.9.2 and 4.4.3. The former should be fine, but the latter could be a problem. In my work I found that gcc versions prior to 4.6.0 gave sub-optimal AVX performance. I could never figure out why that was, but using AVX with older versions actually slowed down the code. So your binary could potentially have a similar problem. I will try compiling my own binaries and track this down further when I can find some time.
|