PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Generalized Fermat Prime Search : AVX on AMD CPUs (Message 89008)
Posted 2670 days ago by pvh
I had a look at the assembly and found that inside the main loops that burn the CPU time, there are a lot of instructions that read directly from memory. So one theory is that the pace is set by the memory/cache access and not the speed of the AVX instructions itself. Did you check for cache misses with cachegrind?

This also suggests that doing a single-core benchmark could lead to the wrong focus. It could be worthwhile to do a benchmark that loads all cores / threads on a CPU. This would be the typical way a production code would be run by BOINC users. Loading all cores creates competition in the memory controllers that would otherwise be absent. This could result in differences in the relative performance of the various versions of genefer.
2) Message boards : Generalized Fermat Prime Search : AVX on AMD CPUs (Message 88518)
Posted 2684 days ago by pvh
Running the binary you supplied confirms what you said

Command line: ./genefer_linux64 -b -x avx-amd Priority change succeeded. Priority change failed (needs superuser privileges). Generalized Fermat Number Bench Running benchmarks for transform implementation "AVX (AMD)" 6008024^256+1 Time: 4.82 us/mul. Err: 0.1406 1736 digits 4913974^512+1 Time: 7.21 us/mul. Err: 0.1250 3427 digits 4019150^1024+1 Time: 16 us/mul. Err: 0.1250 6763 digits 3287270^2048+1 Time: 33.1 us/mul. Err: 0.1406 13347 digits 2688666^4096+1 Time: 72.2 us/mul. Err: 0.1562 26336 digits 2199064^8192+1 Time: 157 us/mul. Err: 0.1562 51956 digits 1798620^16384+1 Time: 341 us/mul. Err: 0.1562 102481 digits 1471094^32768+1 Time: 743 us/mul. Err: 0.1641 202102 digits 1203210^65536+1 Time: 1.59 ms/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 3.44 ms/mul. Err: 0.1406 785521 digits 804904^262144+1 Time: 7.31 ms/mul. Err: 0.1562 1548156 digits 658332^524288+1 Time: 15.7 ms/mul. Err: 0.1406 3050541 digits 538452^1048576+1 Time: 33.3 ms/mul. Err: 0.1328 6009544 digits 440400^2097152+1 Time: 71 ms/mul. Err: 0.1328 11836006 digits 360204^4194304+1 Time: 147 ms/mul. Err: 0.1250 23305854 digits 294612^8388608+1 Time: 312 ms/mul. Err: 0.1328 45879398 digits Genefer Mark = 3. Priority change succeeded.


Command line: ./genefer_linux64 -b -x sse4 Priority change succeeded. Priority change failed (needs superuser privileges). Generalized Fermat Number Bench Running benchmarks for transform implementation "SSE4" 6008024^256+1 Time: 2.53 us/mul. Err: 0.1250 1736 digits 4913974^512+1 Time: 4.39 us/mul. Err: 0.1406 3427 digits 4019150^1024+1 Time: 8.26 us/mul. Err: 0.1562 6763 digits 3287270^2048+1 Time: 18.1 us/mul. Err: 0.1562 13347 digits 2688666^4096+1 Time: 40.9 us/mul. Err: 0.1562 26336 digits 2199064^8192+1 Time: 89.3 us/mul. Err: 0.1719 51956 digits 1798620^16384+1 Time: 192 us/mul. Err: 0.1719 102481 digits 1471094^32768+1 Time: 418 us/mul. Err: 0.1719 202102 digits 1203210^65536+1 Time: 893 us/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 1.95 ms/mul. Err: 0.1641 785521 digits 804904^262144+1 Time: 4.22 ms/mul. Err: 0.1562 1548156 digits 658332^524288+1 Time: 9.21 ms/mul. Err: 0.1562 3050541 digits 538452^1048576+1 Time: 20.1 ms/mul. Err: 0.1406 6009544 digits 440400^2097152+1 Time: 43.2 ms/mul. Err: 0.1484 11836006 digits 360204^4194304+1 Time: 95.2 ms/mul. Err: 0.1328 23305854 digits 294612^8388608+1 Time: 199 ms/mul. Err: 0.1328 45879398 digits Genefer Mark = 5. Priority change succeeded.


Command line: ./genefer_linux64 -b -x fma4 Priority change succeeded. Priority change failed (needs superuser privileges). Generalized Fermat Number Bench Running benchmarks for transform implementation "FMA4" 6008024^256+1 Time: 4.87 us/mul. Err: 0.1484 1736 digits 4913974^512+1 Time: 7.36 us/mul. Err: 0.1562 3427 digits 4019150^1024+1 Time: 16 us/mul. Err: 0.1562 6763 digits 3287270^2048+1 Time: 31.5 us/mul. Err: 0.1406 13347 digits 2688666^4096+1 Time: 71.6 us/mul. Err: 0.1562 26336 digits 2199064^8192+1 Time: 151 us/mul. Err: 0.1562 51956 digits 1798620^16384+1 Time: 337 us/mul. Err: 0.1719 102481 digits 1471094^32768+1 Time: 711 us/mul. Err: 0.1562 202102 digits 1203210^65536+1 Time: 1.56 ms/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 3.28 ms/mul. Err: 0.1406 785521 digits 804904^262144+1 Time: 7.16 ms/mul. Err: 0.1484 1548156 digits 658332^524288+1 Time: 15.1 ms/mul. Err: 0.1445 3050541 digits 538452^1048576+1 Time: 32.5 ms/mul. Err: 0.1406 6009544 digits 440400^2097152+1 Time: 68.4 ms/mul. Err: 0.1406 11836006 digits 360204^4194304+1 Time: 145 ms/mul. Err: 0.1328 23305854 digits 294612^8388608+1 Time: 300 ms/mul. Err: 0.1309 45879398 digits Genefer Mark = 3. Priority change succeeded.


Command line: ./genefer_linux64 -b -x fma3 Priority change succeeded. Priority change failed (needs superuser privileges). Generalized Fermat Number Bench Running benchmarks for transform implementation "FMA3" 6008024^256+1 Time: 4.92 us/mul. Err: 0.1484 1736 digits 4913974^512+1 Time: 7.32 us/mul. Err: 0.1562 3427 digits 4019150^1024+1 Time: 16.2 us/mul. Err: 0.1562 6763 digits 3287270^2048+1 Time: 32 us/mul. Err: 0.1406 13347 digits 2688666^4096+1 Time: 73.9 us/mul. Err: 0.1562 26336 digits 2199064^8192+1 Time: 153 us/mul. Err: 0.1562 51956 digits 1798620^16384+1 Time: 347 us/mul. Err: 0.1719 102481 digits 1471094^32768+1 Time: 723 us/mul. Err: 0.1562 202102 digits 1203210^65536+1 Time: 1.61 ms/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 3.36 ms/mul. Err: 0.1406 785521 digits 804904^262144+1 Time: 7.41 ms/mul. Err: 0.1484 1548156 digits 658332^524288+1 Time: 15.5 ms/mul. Err: 0.1445 3050541 digits 538452^1048576+1 Time: 33.7 ms/mul. Err: 0.1406 6009544 digits 440400^2097152+1 Time: 70.3 ms/mul. Err: 0.1406 11836006 digits 360204^4194304+1 Time: 150 ms/mul. Err: 0.1328 23305854 digits 294612^8388608+1 Time: 309 ms/mul. Err: 0.1309 45879398 digits Genefer Mark = 3. Priority change succeeded.


This doesn't agree with the experience I had with my code, so I will download the source tree and play a bit with that. One thing I noted when looking at the binary is that it had references to both gcc 4.9.2 and 4.4.3. The former should be fine, but the latter could be a problem. In my work I found that gcc versions prior to 4.6.0 gave sub-optimal AVX performance. I could never figure out why that was, but using AVX with older versions actually slowed down the code. So your binary could potentially have a similar problem. I will try compiling my own binaries and track this down further when I can find some time.
3) Message boards : Generalized Fermat Prime Search : AVX on AMD CPUs (Message 88481)
Posted 2686 days ago by pvh
I wasn't running genefer. I was running my own code, e.g. this here
http://viewvc.nublado.org/index.cgi/trunk/source/vectorize_exp_core.h?view=markup&revision=10404&root=cloudy
This runs just as well on AMD as on Intel, giving similar speedups (or sometimes even better on AMD), so I was wondering why you reached such a different conclusion...
4) Message boards : Generalized Fermat Prime Search : AVX on AMD CPUs (Message 88471)
Posted 2686 days ago by pvh
In his announcement, Michael Goetz stated the following about the GFN-21 project
Expect run times to be ... about 130 hours for fast modern CPUs without AVX (which unfortunately includes all AMD CPUs, since their AVX instruction set is almost useless.)

Could you elaborate in this point since it certainly doesn't agree with my experience! I found the AMD AVX instruction set to be perfectly compatible with Intel's and very useful indeed, giving significant speedups.
5) Message boards : News : Server Upgraded, GPU/CPU bug fixed (Message 64298)
Posted 3579 days ago by pvh
I am not getting any work for my AMD GPU either. Maybe <gpu_type> needs to be ati instead of amd??
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2023 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.01, 0.01, 0.00
Generated 3 Feb 2023 | 22:30:50 UTC