PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68694)
Posted 2348 days ago by Husu*
The CPU "Low" / limited tasks:

http://www.primegrid.com/result.php?resultid=476443237

Run Time: 18,781.24
CPU Time: 3,118.29

http://www.primegrid.com/result.php?resultid=475975850

Run TIme: 19,037.48
CPU Time: 1,634.53

-----

Currently the CPU usage on the host is:
geneferocl-windows - 13
geneferocl-windows - 13
primegrid_cllr - 12
primegrid_cllr - 25
primegrid_cllr - 25
primegrid_cllr - 12

This is with geneferocl's on "Normal". I switched them to "Low", but no effect, the CPU Usage stays the same with geneferocl's eating half core (this is mainly "Red" kernel time).

Chip had said earlier that the x64 Windows client would not act like this, have to test with that again.

----------

Edit: Yep, x64 client acts differently.

With geneferocl 3.1.2-2 x64 client CPU usage is 2% during initialization phase.

BUT, after watching it for 30 minutes, it's again the same like x32 client and one CPU is gone for genferocl.

No idea if it's my host then and does it have something to do with the Titans.

The core seems to be tossed around in between the geneferocl processes, at first its the 1st process using 25%, then both use 13%, then latter one uses 25%, then 1st again.

Do other people see this or is this just my host?
2) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68690)
Posted 2348 days ago by Husu*
It will only do that when the cores are idle, not when they're busy, regardless of CPU priority.


Well according to what happened, it will use up cpu even they are busy. The first screenclip was from the initialization, the ones after that are after it.

It used up the cpu also when I put up all into same core 0, the rosetta's lost the fight and only geneferocl used up the CPU, so it does not do sleep very well.

IF it would be sleeping process completely, it should've been 1-2 % on geneferocl and all the rosettas would've split the core between themselves.

I only restarted the geneferocls / BOINC after all tests, they all were running for 20+ minutes or so.

-----

It messes up CPU scheduling also, if you need for example heat purposes to keep processors only 50% used, with geneferocl you end up having 100% CPU usage if you have 4 cores and 2 video cards (2 for CPU tasks, geneferocl will take up the other 2 cores which would've been idle otherwise).

It would be better if it would work like the Cuda version, that it won't reserve the 1 CPU core. But that's probably up to Nvidia and their implementation of the OpenCL, if there's no such command to not do it like there is on Cuda.

It doesn't seem to work this way on ATI, darn.
3) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68686)
Posted 2348 days ago by Husu*
Well, some pictures how it acts on my computer. It eats up the CPU of the other calculation processes, even when they are on "Low".

I only have 4 cores on this host, so one core is 25% CPU time in the task manager representation.



After it has been doing that for several minutes, it gets into this:



And stays like that (initialization phase?).

But, if I switch BOINC to "On multiprocessor systems, use at most 50% of the processors", it gets like this:



If I change it to use 100% of the processors, it looks like this:



And stays like that with one geneferocl eating up one CPU.

If I force affinity for all calculation processes to use core 0, I get this:



Only the two geneferocl using the CPU time, other CPU tasks get no CPU time when they are all on the same core 0.

Edit: I close BOINC and all science apps, restarted, swapped the geneferocl priorities to Low and they seem to stay there and not using CPU time.

So maybe it might work if the priority is set directly at the beginning, and it's not meddled with afterwards.

No idea how it effects the CPU usage when/if the other CPU processes die and restart new, as in the beginning BOINC was doing Primegrid LLR's and then swapped into doing Rosetta with all cores.

I'll leave the 2 units on calculation with priorities on Low and like this when it's visible that all rosettas are actually using the 4 cores like they should.
4) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68679)
Posted 2348 days ago by Husu*
What you might want to try is running GeneferOCL on your Titan and manually change the task priority in Task Manager to "Lowest" and see what that does to its performance.


I still have 2 cuda tasks on going which will finish in a bit over an hour, so after that I'll change to GeneferOCL again and switch the tasks manually to "Lowest", so we'll see if there's much difference in the long run.

I see the "Normal" priority as beneficial, because the GPU gets fed with data always, even if the CPU is otherwise occupied with tasks.

This "CPU Reservation" just complicates things a bit :)
5) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68671)
Posted 2348 days ago by Husu*
Been running some tests to see the overall diffence between full runs.

BOINC starts GPU jobs on "Normal" priority, so currently OpenCL version will always use one core per task, as CPU jobs are on "Low" priority.

-----

Genefer v2.05 (cudaGFN)
Run Time 29,661.90
CPU Time 3,473.71

Run Time 29,760.98
CPU Time 6,037.36

Cuda version on Titan takes around ~30000 seconds to process one WU, so roughly 8h 20 minutes.

Benefit is no CPU core used, so the extra CPU core can be used for other purposes.

----------

geneferocl 3.1.2-6 (Windows 32-bit OpenCL)
Run Time 16,141.49
CPU Time 16,134.01

Run time 15,941.54
CPU time 15,937.47

Run time 15,134.54
CPU time 15,860.28

Run time 15,550.60
CPU time 15,538.15

OpenCL version on Titan takes around ~16000 seconds to process one WU, so roughly 4h 27 minutes.

Benefit is that the computation time is roughly halved(!), negative is that you lose one CPU core per active task, as BOINC makes the GPU have higher priority.

-----

Any idea if there will be an option for Nvidia to use the OpenCL application, or do I have to use the app_info.xml to use it on Nvidia in the future also?

And any idea if there will be updates to the Cuda application, as it seems it could be optimized a bit?
6) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68635)
Posted 2350 days ago by Husu*
I put in 2nd Titan to see if it has any negative side-effects, but none so far everything ok even with SLI enabled.

In future, the WU's will vary a bit in execution time as there's a slight MHz difference in between the cards factory clocking / boosting, depending which one is computing them.
7) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68612)
Posted 2351 days ago by Husu*
I also updated to 3.1.2-6 (Windows 32-bit OpenCL), so will see if there's any difference tomorrow after few wu's get done.

Has been running stable so far, no errors in WU's and wingmans have confirmed results with genefercuda OK.

Edit: Updated to newer -6 version as Michael just did the build. I'll leave it at that.
8) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68568)
Posted 2352 days ago by Husu*
Tested the new version on Titan, [...] geneferocl 3.1.2-4 (Windows 32-bit OpenCL)

It was faster with some previous versions. But because that's true for some exponents for which the code is similar, the reason is "Windows 64-bit -> 32-bit" or "driver '320.49' => '326.41' ?


I made re-runs with the earlier versions I have available and current settings, Driver version '326.41'.

3.1.2-2 (Windows 64-bit OpenGL)
Generalized Fermat Number Bench 2199064^8192+1 Time: 79.3 us/mul. Err: 0.2344 51956 digits 1798620^16384+1 Time: 78.7 us/mul. Err: 0.2266 102481 digits 1471094^32768+1 Time: 84.2 us/mul. Err: 0.2344 202102 digits 1203210^65536+1 Time: 97.7 us/mul. Err: 0.2656 398482 digits 984108^131072+1 Time: 137 us/mul. Err: 0.2188 785521 digits 804904^262144+1 Time: 283 us/mul. Err: 0.2188 1548156 digits 658332^524288+1 Time: 488 us/mul. Err: 0.2188 3050541 digits 538452^1048576+1 Time: 898 us/mul. Err: 0.2266 6009544 digits 440400^2097152+1 Time: 1.64 ms/mul. Err: 0.2266 11836006 digits 360204^4194304+1 Time: 3.44 ms/mul. Err: 0.2031 23305854 digits 294612^8388608+1 Time: 6.88 ms/mul. Err: 0.1895 45879398 digits


3.1.2-3 (Windows 32-bit OpenGL)
Generalized Fermat Number Bench 2199064^8192+1 Time: 79.3 us/mul. Err: 0.2188 51956 digits 1798620^16384+1 Time: 78.7 us/mul. Err: 0.2344 102481 digits 1471094^32768+1 Time: 83 us/mul. Err: 0.2344 202102 digits 1203210^65536+1 Time: 97.7 us/mul. Err: 0.2813 398482 digits 984108^131072+1 Time: 137 us/mul. Err: 0.2295 785521 digits 804904^262144+1 Time: 273 us/mul. Err: 0.2188 1548156 digits 658332^524288+1 Time: 488 us/mul. Err: 0.2266 3050541 digits 538452^1048576+1 Time: 898 us/mul. Err: 0.2188 6009544 digits 440400^2097152+1 Time: 1.64 ms/mul. Err: 0.2188 11836006 digits 360204^4194304+1 Time: 3.44 ms/mul. Err: 0.1953 23305854 digits 294612^8388608+1 Time: 7.19 ms/mul. Err: 0.2070 45879398 digits


3.1.2-4 (Windows 32-bit OpenCL)
Generalized Fermat Number Bench 2199064^8192+1 Time: 79.3 us/mul. Err: 0.2188 51956 digits 1798620^16384+1 Time: 78.7 us/mul. Err: 0.2344 102481 digits 1471094^32768+1 Time: 84.2 us/mul. Err: 0.2344 202102 digits 1203210^65536+1 Time: 97.7 us/mul. Err: 0.2813 398482 digits 984108^131072+1 Time: 137 us/mul. Err: 0.2295 785521 digits 804904^262144+1 Time: 283 us/mul. Err: 0.2188 1548156 digits 658332^524288+1 Time: 488 us/mul. Err: 0.2266 3050541 digits 538452^1048576+1 Time: 898 us/mul. Err: 0.2188 6009544 digits 440400^2097152+1 Time: 1.64 ms/mul. Err: 0.2188 11836006 digits 360204^4194304+1 Time: 3.44 ms/mul. Err: 0.1953 23305854 digits 294612^8388608+1 Time: 7.19 ms/mul. Err: 0.2070 45879398 digits


-----

I did more test runs and there's a slight variation in the numbers per run, this may be because of the "boosting" effects of 1) CPU 2) GPU, so in general for Titan I'd just read the averages instead of to the letter.

Titan "boosts" itself depending on temperature and load, can't make it run on fixed speed, can't disable the feature either. The double precision "slows the boost down" a bit so it won't boost that much over the default GPU Clock.

Example:
Default GPU Clock on my Titan is 837MHz from GPU-Z application information, on idle it's 324MHz.

On Double Precision -b run it's 849.2MHz (48C temperature), hotter it's 836.1MHz on double precision (79C).

On Single Precision -b run it's 1006MHz (no matter of the temperature), other GPU load below 78C it's 992.9MHz - 1006MHz, 79C it's 940.6MHz, etc, etc.

So really depends on the load and temperatures.

NOTE: This is without any overclocking or meddling with the GPU, this is how it works as-is out of the box.

Anyways, the 32-bit version (latest) is more stable in terms of what the output will be, 64-bit 3.1.2-2 version has larger variance

For example I get this on 3.1.2-4 occasionally, usually it's the one I posted before:

Generalized Fermat Number Bench 2199064^8192+1 Time: 79.3 us/mul. Err: 0.2188 51956 digits 1798620^16384+1 Time: 78.7 us/mul. Err: 0.2344 102481 digits 1471094^32768+1 Time: 84.2 us/mul. Err: 0.2344 202102 digits 1203210^65536+1 Time: 97.7 us/mul. Err: 0.2813 398482 digits 984108^131072+1 Time: 132 us/mul. Err: 0.2295 785521 digits 804904^262144+1 Time: 283 us/mul. Err: 0.2188 1548156 digits 658332^524288+1 Time: 488 us/mul. Err: 0.2266 3050541 digits 538452^1048576+1 Time: 859 us/mul. Err: 0.2188 6009544 digits 440400^2097152+1 Time: 1.72 ms/mul. Err: 0.2188 11836006 digits 360204^4194304+1 Time: 3.44 ms/mul. Err: 0.1953 23305854 digits 294612^8388608+1 Time: 6.88 ms/mul. Err: 0.2070 45879398 digits
9) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68562)
Posted 2352 days ago by Husu*
Tested the new version on Titan, I'll replace my app_info executable so get a view of full run.

-----

geneferocl 3.1.2-4 (Windows 32-bit OpenCL)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider

Command line: geneferocl-windows.exe -b


Running on platform 'NVIDIA CUDA', device 'GeForce GTX TITAN', version 'OpenCL 1.1 CUDA' and driver '326.41'.

Generalized Fermat Number Bench
2199064^8192+1 Time: 78.1 us/mul. Err: 0.2188 51956 digits
1798620^16384+1 Time: 78.7 us/mul. Err: 0.2344 102481 digits
1471094^32768+1 Time: 83 us/mul. Err: 0.2344 202102 digits
1203210^65536+1 Time: 97.7 us/mul. Err: 0.2813 398482 digits
984108^131072+1 Time: 137 us/mul. Err: 0.2295 785521 digits
804904^262144+1 Time: 293 us/mul. Err: 0.2188 1548156 digits
658332^524288+1 Time: 469 us/mul. Err: 0.2266 3050541 digits
538452^1048576+1 Time: 898 us/mul. Err: 0.2188 6009544 digits
440400^2097152+1 Time: 1.64 ms/mul. Err: 0.2188 11836006 digits
360204^4194304+1 Time: 3.28 ms/mul. Err: 0.1953 23305854 digits
294612^8388608+1 Time: 7.19 ms/mul. Err: 0.2070 45879398 digits
Genefer Mark = 120.

------

EDIT:

Had to abort current WU because of the earlier version name difference, probably won't effect anything after this change :D

Checkpoint saved by genefer Windows 32-bit OpenGL, expected Windows 32-bit OpenCL
10) Message boards : Generalized Fermat Prime Search : Source code of Genefer for OpenCL is available. (Message 68541)
Posted 2352 days ago by Husu*
One of my just tasks got validated, so roughly:
Titan (16,600sec) on OpenCL is about 2x faster than 580 on CUDA (31,596sec).

Although OpenCL also uses as much CPU time as it does GPU time currently, the workunit:
http://www.primegrid.com/workunit.php?wuid=350124153

Also for comparison a 670 takes 45,393sec on OpenCL (ran two wu's): http://www.primegrid.com/workunit.php?wuid=350124165

I'll leave the Titan to continue with OpenCL Genefer, as it's way faster than CUDA per workunit.


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 3.57, 3.63, 3.76
Generated 27 Jan 2020 | 11:15:09 UTC