Author |
Message |
|
Hi is the new 30 series from Nvidia going to be supported? |
|
|
|
yes, why wouldn't it be? |
|
|
Dave  Send message
Joined: 13 Feb 12 Posts: 2829 ID: 130544 Credit: 954,747,840 RAC: 0
                     
|
3080 has 3× the TFLOPS of 2080S. |
|
|
|
3080 has 3× the TFLOPS of 2080S.
Hopefully that will put ALOT of 2080's on the secondary market!!! |
|
|
|
He means 2080Super
Also, the FP64 units are removed from Ampere RTX GPUs.
____________
SHSID Electronics Group
SHSIDElectronicsGroup@outlook.com
GFN-14: 50103906^16384+1
Proth "SoB": 44243*2^440969+1
|
|
|
|
Also, the FP64 units are removed from Ampere RTX GPUs.
I hadn't noticed this in all the gaming hype. Interesting... does that mean you can't run FP64 at all, or is there some software emulation perhaps at even lower performance?
FP64 perf has been crippled for a long time in consumer cards, and if it isn't needed in a consumer space I guess the silicon budget can better go elsewhere.
BTW I'm almost certain to get one at some point, however rumours are that AMD will likely go "big VRAM" with their next GPU launch, and nvidia have designs in reserve to go against that. I think I'd rather hold out for those for a little more future resistance than get one on day 1. |
|
|
tng Send message
Joined: 29 Aug 10 Posts: 398 ID: 66603 Credit: 22,878,263,783 RAC: 0
                                    
|
Also, the FP64 units are removed from Ampere RTX GPUs.
I hadn't noticed this in all the gaming hype. Interesting... does that mean you can't run FP64 at all, or is there some software emulation perhaps at even lower performance?
FP64 perf has been crippled for a long time in consumer cards, and if it isn't needed in a consumer space I guess the silicon budget can better go elsewhere.
BTW I'm almost certain to get one at some point, however rumours are that AMD will likely go "big VRAM" with their next GPU launch, and nvidia have designs in reserve to go against that. I think I'd rather hold out for those for a little more future resistance than get one on day 1.
I'll probably go for 1 or 2 early just to check out, then wait and see what I want to do.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2165 ID: 1178 Credit: 8,777,295,508 RAC: 0
                                     
|
Debating on a 3070 or waiting another month or so and seeing the 3060 specs first.
Also, if I was reading correctly, the loss of FP64 made room for another FP32 line in these chips. That may be better for the OCL versions we are on for most GFN...definitely I am eager to see these in action across our projects.
|
|
|
James Project administrator Volunteer tester Send message
Joined: 19 Sep 14 Posts: 95 ID: 366225 Credit: 523,713,437 RAC: 0
                   
|
They claim 2x FP32 performance among other things, which sounds really good for now...
I was looking at the lineup and was hesitant on a 3070 given that it runs on GDDR6, as opposed to the 3080 and 3090 running on GDDR6X. If I remember correctly, most genefer tasks are memory bandwidth limited on Turing already, and Turing runs on GDDR6. |
|
|
|
They claim 2x FP32 performance among other things, which sounds really good for now...
I was looking at the lineup and was hesitant on a 3070 given that it runs on GDDR6, as opposed to the 3080 and 3090 running on GDDR6X. If I remember correctly, most genefer tasks are memory bandwidth limited on Turing already, and Turing runs on GDDR6.
Yes, 2x FP32 since 2x FP32 compute units.
I will try to get a RTX3070 if my friend and I share our budget.
PG softwares use FP32 right?
____________
SHSID Electronics Group
SHSIDElectronicsGroup@outlook.com
GFN-14: 50103906^16384+1
Proth "SoB": 44243*2^440969+1
|
|
|
|
FP64 is still at 1/32, and has official performance specs from Nvidia. I've only seen a couple deep architecture reviews that don't really cover it (waiting for Anandtech's take, they usually cover these things), but it does seem that the fixed FP64 hardware is gone, but I remember reading somewhere on the A100 that the cards can virtually combine two FP32's into a single FP64 with little overhead.
For PG, I'd think that 30+ TFLOPs of FP32 OCL3+ is going to win out over 1 TFLOP of FP64 OCL.
For my own part, I'm looking forward to my extra November paycheck and December safety/vacation payout bonus checks to get a sweet, sweet 3090 to join my new 10980XE. Then I'll start updating the old 900s/1000s with either 3070s or even cheaper used 2080(ti)/Ss, work overtime willing. The massive performance increase at the same price shocked the used market, in that a $1200 GPU is now slower than a $500 one. Turing was priced so high that Pascal held its value. Thankfully, that is no longer the case.
____________
Eating more cheese on Thursdays. |
|
|
|
2080Tis, and other 20xx cards are already hitting ebay and craigslist. Lot of old mining cards too, thought those are usually pretty worn as it were.
____________
My lucky #: 60133106^131072+1 (GFN 17-mega) |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
Speaking of miner problems...
https://www.tomshardware.com/news/geforce-rtx-3080-cryptomining-frenzy-ampere
I hope these cards can be affordable. |
|
|
|
also their a minor one atm with 56 or 57 cards |
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 644 ID: 164101 Credit: 305,010,093 RAC: 0

|
They claim 2x FP32 performance among other things, which sounds really good for now...
Yes, 2x FP32 since 2x FP32 compute units.
PG softwares use FP32 right?
No, all of them use INT32.
Each Pascal core is a single INT32+FP32 unit. Turing has two separate units: INT32 and FP32. Each RTX-30 Ampere core has two units: INT32+FP32 and FP32. Then RTX-30 can perform 2x FP32 but 1x INT32.
|
|
|
|
They claim 2x FP32 performance among other things, which sounds really good for now...
Yes, 2x FP32 since 2x FP32 compute units.
PG softwares use FP32 right?
No, all of them use INT32.
Each Pascal core is a single INT32+FP32 unit. Turing has two separate units: INT32 and FP32. Each RTX-30 Ampere core has two units: INT32+FP32 and FP32. Then RTX-30 can perform 2x FP32 but 1x INT32.
So that means no big improvement can be expected compared to the 2xxx-series, since the INT32 is used by PG?
____________
|
|
|
|
I saw a review yesterday showing "blender cuda performance". How relevant is that?
2080: 4:50
2080ti: 3:53
3080: 2:22 |
|
|
|
According to reviews, the RTX3080 is around 20% stronger than the RTX2080Ti on INT32.
____________
SHSID Electronics Group
SHSIDElectronicsGroup@outlook.com
GFN-14: 50103906^16384+1
Proth "SoB": 44243*2^440969+1
|
|
|
|
According to reviews, the RTX3080 is around 20% stronger than the RTX2080Ti on INT32.
okay given the much higher power usage it's not really interesting to switch from 2080ti to a 3080.
But at least you now get 2080ti performance plus a bit against lower prices (not taking into consideration the power bill). For those who dont need to pay the powerbill it may be a better choise to buy a 3080. But running 24/7 and paying your own power bills, and only 20% increase... a 2080ti will be the better choice money wise (if only 20% increase).
____________
|
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 644 ID: 164101 Credit: 305,010,093 RAC: 0

|
No, all of them use INT32.
Each Pascal core is a single INT32+FP32 unit. Turing has two separate units: INT32 and FP32. Each RTX-30 Ampere core has two units: INT32+FP32 and FP32. Then RTX-30 can perform 2x FP32 but 1x INT32.
So that means no big improvement can be expected compared to the 2xxx-series, since the INT32 is used by PG?
It is unclear, I can just say that the GeForce 30 series may not be faster than the 20 series per core.
GA102/104 is different from GA100. The same name for both but it's clearly two different architectures.
Only half of the cores of the RTX-30 are true cores. It's a new CUDA compute capability: 8.6 (GA100 is 8.0) and this version is still undocumented. With CUDA (or OpenCL) each core executes the same code, I don't see how it is possible with half of the cores that are not able to execute INT32 instructions.
I think that the RTX 3080 doesn't have 8704 cores but 4352 cores with two FP32 units per core. There is probably a new set of SIMD instructions able to execute two FP32 operations (?).
Then for PrimeGrid applications the comparison is RTX 2080 Ti: 4352 cores @ 1545MHz and RTX 3080: 4352 cores @ 1700MHz + faster memory and larger cache. |
|
|
|
AP27:
2080: 660-720
3080: 290
PPS Sieve: (2 tasks)
2080: 215
3080: 170
DYFL:
2080: 103000-110000
3080: estimate after 15 mins = ~72000
So if you have a 2080ti don't go selling it just yet.
The 3080 is stock, no tweaking. There's possibly another 10% there with watercooling as it's currently running @ 70C. |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1893 ID: 352 Credit: 3,141,488,980 RAC: 0
                             
|
Can you run genefer benchmark?
Similar to this one so we can have good comparation over GFN.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
Can you run genefer benchmark?
Similar to this one so we can have good comparation over GFN.
Sure, once the GFN-Extreme is finished I'll run the others.
edit: how do I run the benchmarks? |
|
|
|
ok, I worked it out:
Running on platform 'NVIDIA CUDA', device 'GeForce RTX 2080', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '451.67'.
46 computeUnits @ 1710MHz, memSize=8192MB, cacheSize=1472kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
High priority change succeeded.
Generalized Fermat Prime Search benchmarks
100000000^32768+1 262145 digits OCL2 Estimated time: 0:00:22
50000000^65536+1 504560 digits OCL2 Estimated time: 0:00:53
15000000^131072+1 940585 digits OCL2 Estimated time: 0:03:04
50000000^131072+1 1009120 digits OCL2 Estimated time: 0:03:17
6000000^262144+1 1776852 digits OCL2 Estimated time: 0:10:30
2500000^524288+1 3354364 digits OCL5 Estimated time: 0:23:50
1100000^1048576+1 6334860 digits OCL5 Estimated time: 1:42:00
270000^2097152+1 11390396 digits OCL5 Estimated time: 6:26:00
130000^4194304+1 21449434 digits OCL4 Estimated time: 24:30:00
Normal priority change succeeded.
Running on platform 'NVIDIA CUDA', device 'GeForce RTX 3080', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '456.38'.
68 computeUnits @ 1710MHz, memSize=10240MB, cacheSize=1904kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
High priority change succeeded.
Generalized Fermat Prime Search benchmarks
100000000^32768+1 262145 digits OCL2 Estimated time: 0:00:22
50000000^65536+1 504560 digits OCL2 Estimated time: 0:00:48
15000000^131072+1 940585 digits OCL2 Estimated time: 0:02:10
50000000^131072+1 1009120 digits OCL2 Estimated time: 0:02:22
6000000^262144+1 1776852 digits OCL2 Estimated time: 0:07:44
2500000^524288+1 3354364 digits OCL5 Estimated time: 0:17:10
1100000^1048576+1 6334860 digits OCL3 Estimated time: 1:08:00
270000^2097152+1 11390396 digits OCL4 Estimated time: 4:24:00
130000^4194304+1 21449434 digits OCL3 Estimated time: 15:40:00
Normal priority change succeeded.
|
|
|
|
I have an older window machine running with an i7 processor but it has an outdated 1080ti, would it be worth replacing the 1080ti with an 3080? Assuming no power supply issues, is there anything else making the upgrade to a 3080 not make sense? |
|
|
|
I have an older window machine running with an i7 processor but it has an outdated 1080ti, would it be worth replacing the 1080ti with an 3080? Assuming no power supply issues, is there anything else making the upgrade to a 3080 not make sense?
depends on your funds. Some may consider a used 2080ti with its lower power consumption a better buy.
|
|
|
|
I have an older window machine running with an i7 processor but it has an outdated 1080ti, would it be worth replacing the 1080ti with an 3080? Assuming no power supply issues, is there anything else making the upgrade to a 3080 not make sense?
depends on your funds. Some may consider a used 2080ti with its lower power consumption a better buy.
I am always hesitant to buy used tech, I know I beat the crap out of my cards running them 24/7 at full load. That is a good option though to consider |
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 644 ID: 164101 Credit: 305,010,093 RAC: 0

|
Some may consider a used 2080ti with its lower power consumption a better buy.
Did you check power consumption?
Because of the 8 nm process and as FP32, RTX and Tensor Cores are not used, power usage may be lower than TPD = 320W.
Thanks for benchmarks. As expected, the real number of cores is 4352 and not 8704. 8704 is the number of ALUs. With similar reasoning, the i9-9900K is not an octa-core but a 32-core processor!
Since both RTX 2080 and RTX 3080 have the same clock speed, we can easily compare core speed. RTX-30 cores seem to be similar to RTX-20 cores (expect the new FP32 unit) and the speed improvement for GFN-22 (per core) must come from GDDR6X bandwidth.
Now, I have to find how to exploit these useless FP32 and Tensor Cores... |
|
|
|
Now, I have to find how to exploit these useless FP32 and Tensor Cores...
if anyone can do it, you can. |
|
|
|
Also, the FP64 units are removed from Ampere RTX GPUs.
I hadn't noticed this in all the gaming hype. Interesting... does that mean you can't run FP64 at all, or is there some software emulation perhaps at even lower performance?
FP64 perf has been crippled for a long time in consumer cards, and if it isn't needed in a consumer space I guess the silicon budget can better go elsewhere.
BTW I'm almost certain to get one at some point, however rumours are that AMD will likely go "big VRAM" with their next GPU launch, and nvidia have designs in reserve to go against that. I think I'd rather hold out for those for a little more future resistance than get one on day 1.
Oh, NVIDIA are definitely holding back on a "SUPER" or "Ti" variant to combat AMD when it releases big Navi in late October.
There will be no stock of 3080's anyway, so I'd rather wait to see what AMD bring to the table......then I'll wait for our crunchers with deep pockets to go out and buy these new cards to see how they perform.
I remember quite a few people racing out to buy the 5700XT when it came out, then there were all sorts of issues trying to run Einstein or Milkyway with them. People (BOINC crunchers) started taking them back! Seems the drivers have ironed out since then but still....it was a good example of NOT racing out to buy a new card.
I read there's also speculation of a "Titan" type card possibly in the works with 48gig of VRAM. (and the typical Titan price tag to match no doubt.
The 3000 series just highlights how STUPIDLY priced the 2000 series cards were. 2080Ti's now under AUD$1000 on the second hand market.
I'll watch on with interest... like I said, there will be limited to no stock anyway.
____________
|
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 644 ID: 164101 Credit: 305,010,093 RAC: 0

|
Also, the FP64 units are removed from Ampere RTX GPUs.
I hadn't noticed this in all the gaming hype. Interesting... does that mean you can't run FP64 at all, or is there some software emulation perhaps at even lower performance?
FP64 is slightly slower on the RTX 3080
AIDA64 GPGPU Benchmark
Model Freq Mem Read Mem Write Mem Copy FP32 FP64 INT32 INT64
RTX 2080 Ti 1635 12.4 GB/s 12.2 GB/s 499 GB/s 16496 507 15556 3486
RTX 3080 1710 12.4 GB/s 12.2 GB/s 643 GB/s 32513 516 16854 3998
507 * 1710 / 1635 = 530 |
|
|
|
Some may consider a used 2080ti with its lower power consumption a better buy.
Did you check power consumption?
Because of the 8 nm process and as FP32, RTX and Tensor Cores are not used, power usage may be lower than TPD = 320W.
My UPS shows 64% load with it running GFN and 25% load without. It's a Smart UPS-1000 which is supposed to have 700W maximum load which means the 3080 is using 210W.
Some actual runtimes to confirm the earlier benchmarks. The DYFL time is probably a bit slower than optimum as I had a few hours of doing some other tests without suspending it.
24s Genefer 15 v3.21
52s Genefer 16 v3.21
135s Genefer 17 Low v3.21
145s Genefer 17 Mega v3.21
455s Genefer 18 v3.21
1,020s Genefer 19 v3.21
4,262s Genefer 20 v3.21
65,973s Do You Feel Lucky? v3.21 |
|
|
|
Some may consider a used 2080ti with its lower power consumption a better buy.
Did you check power consumption?
Because of the 8 nm process and as FP32, RTX and Tensor Cores are not used, power usage may be lower than TPD = 320W.
My UPS shows 64% load with it running GFN and 25% load without. It's a Smart UPS-1000 which is supposed to have 700W maximum load which means the 3080 is using 210W.
Some actual runtimes to confirm the earlier benchmarks. The DYFL time is probably a bit slower than optimum as I had a few hours of doing some other tests without suspending it.
24s Genefer 15 v3.21
52s Genefer 16 v3.21
135s Genefer 17 Low v3.21
145s Genefer 17 Mega v3.21
455s Genefer 18 v3.21
1,020s Genefer 19 v3.21
4,262s Genefer 20 v3.21
65,973s Do You Feel Lucky? v3.21
WOW those are quick!! |
|
|
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 332 ID: 1241833 Credit: 22,611,276 RAC: 0
               
|
Nice side-effect: prices for old generation get low. I bought a GTX 1660 for 165 € incl S/H yesterday. So I'll finally be able to get into GFN.
That is, if the power supply is strong enough, we'll see... :D
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) |
|
|
|
You do know these are not real 8nm GPUs right? They a special 10nm process created for NVIDIA and it's actually called 8NM...not even close to being the same as 8nm.
Don't spread misinformation as it doesn't do anyone but NVIDIA any good. |
|
|
|
You do know these are not real 8nm GPUs right? They a special 10nm process created for NVIDIA and it's actually called 8NM...not even close to being the same as 8nm.
Don't spread misinformation as it doesn't do anyone but NVIDIA any good.
All process node names are technically misinformation/marketing. The number given hasn't been tied to actual feature size for many years. Generally speaking, however, a smaller number usually means smaller features (but not always: compare Intel's 14nm/+/++), lower power and/or higher performance per transistor. Numbers can't even be compared between foundries.
____________
Eating more cheese on Thursdays. |
|
|
Dave  Send message
Joined: 13 Feb 12 Posts: 2829 ID: 130544 Credit: 954,747,840 RAC: 0
                     
|
8NM = 8 nautical miles. |
|
|
Nick  Send message
Joined: 11 Jul 11 Posts: 882 ID: 105020 Credit: 1,318,826,036 RAC: 0
                    
|
8NM = 8 Newton metres?
Hey that might be a lot of torque for a graphics card. |
|
|
|
8NM = 8 nautical miles.
The unit newton·molar confused me. /JeppeSN |
|
|
|
8NM = 8 Newton metres?
Hey that might be a lot of torque for a graphics card.
Newton meters is Joules (work done=force * distance moved)
For example I used 50N of force (eastwards) to move a heavy desktop 1 meter to the east, The result is 50Nm of work done (50J)
Coming back to the 8nm process, yes albeit the naming 8nm is simply a version of 10nm. But it does have much better stats than 10nm though its stats are incomparable to 7nm.
Albeit its poor performance, Intel it the only one who still sticks close to the exact transistor sizes, and that's why it has the "audacity" to claim its 10nm better than TSMC's 5nm. (which in my view is not true at all)
____________
SHSID Electronics Group
SHSIDElectronicsGroup@outlook.com
GFN-14: 50103906^16384+1
Proth "SoB": 44243*2^440969+1
|
|
|
Nick  Send message
Joined: 11 Jul 11 Posts: 882 ID: 105020 Credit: 1,318,826,036 RAC: 0
                    
|
8NM = 8 Newton metres?
Hey that might be a lot of torque for a graphics card.
Newton meters is Joules (work done=force * distance moved)
For example I used 50N of force (eastwards) to move a heavy desktop 1 meter to the east, The result is 50Nm of work done (50J)
You have described work, not torque.
A newton-metre is discouraged for being used for anything other than torque.
One newton-metre is equal to the torque resulting from a force of one newton applied perpendicularly to the end of a moment arm that is one metre long.
Torque represents energy transferred or expended per angle of revolution, one newton-metre of torque is equivalent to one joule per radian.
What was your point? |
|
|
|
8NM = 8 Newton metres?
Hey that might be a lot of torque for a graphics card.
Newton meters is Joules (work done=force * distance moved)
For example I used 50N of force (eastwards) to move a heavy desktop 1 meter to the east, The result is 50Nm of work done (50J)
You have described work, not torque.
A newton-metre is discouraged for being used for anything other than torque.
One newton-metre is equal to the torque resulting from a force of one newton applied perpendicularly to the end of a moment arm that is one metre long.
Torque represents energy transferred or expended per angle of revolution, one newton-metre of torque is equivalent to one joule per radian.
What was your point?
My point was to describe work 😂😂
____________
SHSID Electronics Group
SHSIDElectronicsGroup@outlook.com
GFN-14: 50103906^16384+1
Proth "SoB": 44243*2^440969+1
|
|
|
Nick  Send message
Joined: 11 Jul 11 Posts: 882 ID: 105020 Credit: 1,318,826,036 RAC: 0
                    
|
My point was to describe work 😂😂
Nice. :) |
|
|
|
My physics really ISNT that good though my friends might rebut this.
I'm still much younger, remember? 😂😂😂ðŸ˜ðŸ˜
This thread is OFF-TOPIC
Anyways my two cents on ampere:
Why buy it now when prices are sky-high and RDNA2's still a mystery? When RDNA 2 comes out ampere will definitely fall in price, whether a little bit or a lot.
(I still support AMD ðŸ˜ðŸ˜ðŸ˜ albeit its worse driver support)
____________
SHSID Electronics Group
SHSIDElectronicsGroup@outlook.com
GFN-14: 50103906^16384+1
Proth "SoB": 44243*2^440969+1
|
|
|
|
I support the brand that is the fastest for my money.
So if AMD is faster i buy that but for the last few years AMD has nothing compared to NVIDIA.
So lets see what they come up with in a month or 2.
____________
|
|
|
|
I support the brand that is the fastest for my money.
So if AMD is faster i buy that but for the last few years AMD has nothing compared to NVIDIA.
So lets see what they come up with in a month or 2.
My situation is simlar but slightly different mine is 'I support the brand that is the fastest for the money I have to spend'. Since I use mostly Windows machines it really doesn't matter to me which, I don't game, and if I bring up a Linux machine I just move things around so it gets an Nvidia gpu. |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1893 ID: 352 Credit: 3,141,488,980 RAC: 0
                             
|
Asus TUF 3090:
Running on platform 'NVIDIA CUDA', device 'GeForce RTX 3090', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '456.55'.
82 computeUnits @ 1695MHz, memSize=24576MB, cacheSize=2296kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
High priority change succeeded.
Generalized Fermat Prime Search benchmarks
100000000^32768+1 262145 digits OCL2 Estimated time: 0:00:24
50000000^65536+1 504560 digits OCL2 Estimated time: 0:00:53
15000000^131072+1 940585 digits OCL2 Estimated time: 0:02:06
50000000^131072+1 1009120 digits OCL2 Estimated time: 0:02:16
6000000^262144+1 1776852 digits OCL2 Estimated time: 0:06:53
2500000^524288+1 3354364 digits OCL5 Estimated time: 0:15:20
1100000^1048576+1 6334860 digits OCL5 Estimated time: 0:58:40
270000^2097152+1 11390396 digits OCL4 Estimated time: 3:24:00
130000^4194304+1 21449434 digits OCL5 Estimated time: 12:30:00
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
So about 20% faster than a 3080 for 2x the price.
Can't say that stacks up for me unless you absolutely have to have the fastest thing available. |
|
|
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 332 ID: 1241833 Credit: 22,611,276 RAC: 0
               
|
Incredible completion times... But I'm not (yet ... ;)) prepared to spend $2000 just to find a large prime. I hope a lot of people here do though. :D
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) |
|
|
|
So about 20% faster than a 3080 for 2x the price.
Can't say that stacks up for me unless you absolutely have to have the fastest thing available.
There is definitely a (brief) benefit to having the fastest newest card firsts. I landed a 1080ti two days early (thanks Newegg!) and part of my 3 mega prime finds was from having a faster card that could not only burn through tasks quickly, but pretty much be guaranteed to be the first to report back.
And probably lots of luck too. Luck never hurts :)
Looking at what's starting to trickle into the fastest GPU stats I'll keep an eye out for 2080 or 2080ti deals but I'm not getting a 3000 series this time. If my 970ti hadn't burned out I probably wouldn't have lept on the 1080ti at the time. I blundered into it in Newegg before everyone else noticed and still couldn't believe it even after the card showed up :) |
|
|
|
Time to compare a RTX 2080 and a RTX 3080. Both GPU's are slightly overclocked ex works. The 3080 is more or less in front of the 2080. 100% in front means twice as fast:
PPS-Sieve: 27% (both GPU weren't fully utilized)
AP27: 160% (the 3080 is more than twice (2.6x) as fast !!!)
GFN-15 (OCL2): 9%
GFN-16 (OCL2): 33%
GFN-17 (OCL2): 63%
GFN-17-Mega (OCL2): 64%
GFN-18 (OCL2): 69%
GFN-19 (OCL5): 64%
GFN-20 (OCL5): 64%
GFN-21 (OCL4): 83%
GFN-22 (OCL4): 75%
GFN-extreme (OCL3/OCL5): 68% (the 2080 chooses a different transformation than the 3080)
____________
DeleteNull |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1893 ID: 352 Credit: 3,141,488,980 RAC: 0
                             
|
Can you please post benchmark numbers like I did for 3090?
Times are much better for comparation across the board, even with latest/beta GFN CPU app.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
Benchmarks are not comparable because Linux uses different numbers to test. Here are the results of the 2080:
Running on platform 'NVIDIA CUDA', device 'GeForce RTX 2080', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '455.38'.
46 computeUnits @ 1845MHz, memSize=7979MB, cacheSize=1472kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
High priority change succeeded.
Generalized Fermat Prime Search benchmarks
75000000^32768+1 258051 digits OCL2 Estimated time: 0:00:16
27000000^65536+1 487022 digits OCL2 Estimated time: 0:00:45
10000000^131072+1 917505 digits OCL2 Estimated time: 0:02:42
48000000^131072+1 1006796 digits OCL2 Estimated time: 0:02:58
3600000^262144+1 1718696 digits OCL5 Estimated time: 0:05:28
1700000^524288+1 3266550 digits OCL5 Estimated time: 0:22:50
950000^1048576+1 6268098 digits OCL5 Estimated time: 1:38:00
180000^2097152+1 11021106 digits OCL4 Estimated time: 6:21:00
110000^4194304+1 21145134 digits OCL3 Estimated time: 24:20:00
Normal priority change succeeded.
3080:
Running on platform 'NVIDIA CUDA', device 'GeForce RTX 3080', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '455.38'.
68 computeUnits @ 1815MHz, memSize=10015MB, cacheSize=1904kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
High priority change succeeded.
Generalized Fermat Prime Search benchmarks
75000000^32768+1 258051 digits OCL2 Estimated time: 0:00:14
27000000^65536+1 487022 digits OCL2 Estimated time: 0:00:32
10000000^131072+1 917505 digits OCL2 Estimated time: 0:01:36
48000000^131072+1 1006796 digits OCL2 Estimated time: 0:01:45
3600000^262144+1 1718696 digits OCL5 Estimated time: 0:03:35
1700000^524288+1 3266550 digits OCL4 Estimated time: 0:12:10
950000^1048576+1 6268098 digits OCL4 Estimated time: 0:56:50
180000^2097152+1 11021106 digits OCL4 Estimated time: 3:27:00
110000^4194304+1 21145134 digits OCL4 Estimated time: 14:00:00
Normal priority change succeeded.
____________
DeleteNull |
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 644 ID: 164101 Credit: 305,010,093 RAC: 0

|
Very interesting but it is difficult to know why the improvement is more than 50% and why the RTX 3080 likes GFN-21 (but this is good news!).
The theory is:
RTX 3080: 68 SM, 4352 cores, 1710 MHz, 760 GB/sec, 320 W, CC 8.6
RTX 2080: 46 SM, 2944 cores, 1710 MHz, 448 GB/sec, 215 W, CC 7.5
and the ratio is 68/46 = +47.8%.
Compute Capabilities 7.5 and 8.6 are very similar (see arithmetic-instructions)... the memory bandwidth?
A possible explaination is the maximum number of resident threads per SM (see features-and-technical-specifications).
The RTX 3080 can process 68 SM * 1536 threads/SM = 104,448 threads and the RTX 2080 46 SM * 1024 threads/SM = 47,104 threads.
The number of threads for GFN tests is the exponent N (GFN-20 => 220 = 1,048,576 threads).
If one thread is waiting for a memory read or the completion of the previous operation, another thread will be executed if its inputs are ready. With 24 vs 16 resident threads per core, the throughput of GeForce 30 series could be the key point.
Note that if AMD GPUs are slower than Nvidia GPUs this is not because of the numbers of cores but because these sorts of features such as number of resident threads or the workgroup size are smaller. |
|
|
|
HI, Having acquired a RTX 3090 (PNY XLR8) I am benchmarking GFN tasks.
Interestingly the DYFL returned 56220 secs.
The 3090 GPU Framebuffer was 81% in use, with 99% application usage. (from Afterburner)
The CPU, I9 10980XE, was 50% (all cores) committed to World Community Grid tasks whilst this DYFL was running.
Name genefer_extreme_39824598_0
Workunit 684223025
Created 9 Nov 2020 | 14:05:05 UTC
Sent 16 Nov 2020 | 11:45:55 UTC
Received 17 Nov 2020 | 3:23:05 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 978470
Report deadline 8 Dec 2020 | 12:45:55 UTC
Run time 56,220.15
CPU time 1,093.52
Validate state Initial
Credit 0.00
Application version Do You Feel Lucky? v3.21 (OCLcudaGFNEXTREME)
Stderr output
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
geneferocl 3.3.3-2 (Windows/OpenCL/32-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Running on platform 'NVIDIA CUDA', device 'GeForce RTX 3090', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '456.71'.
82 computeUnits @ 1695MHz, memSize=24576MB, cacheSize=2296kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Supported transform implementations: ocl ocl2 ocl3 ocl4 ocl5
Command line: projects/www.primegrid.com/geneferocl_windows_3.3.3-2.exe -boinc -q 921096^4194304+1
Normal priority change succeeded.
Checking available transform implementations...
OCL transform is past its b limit.
OCL4 transform is past its b limit.
A benchmark is needed to determine best transform, testing available transform implementations...
Testing OCL2 transform...
Testing OCL3 transform...
Testing OCL5 transform...
Benchmarks completed (11.799 seconds).
Using OCL3 transform
Starting initialization...
Initialization complete (53.165 seconds).
Testing 921096^4194304+1...
Estimated time for 921096^4194304+1 is 15:40:00
921096^4194304+1 is complete. (25016108 digits) (err = 0.0000) (time = 15:36:47) 03:22:57
03:22:57 (3272): called boinc_finish(0)
</stderr_txt>
]]>
____________
|
|
|
|
That's pretty fast, my 3080 needed 62316s.
____________
DeleteNull |
|
|
|
GFN 22 Same GPU as DYFL below, and very similar timing.
I believe the 3000 series are low ability on FP64 (?)
genefer22_29583047_8
Workunit 681452482
Created 16 Nov 2020 | 10:09:27 UTC
Sent 17 Nov 2020 | 3:20:37 UTC
Received 17 Nov 2020 | 17:20:35 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 978470
Report deadline 9 Dec 2020 | 4:20:37 UTC
Run time 50,251.96
CPU time 586.98
Validate state Initial
Credit 0.00
Application version Genefer 22 v3.21 (OCLcudaGFNWR)
Stderr output
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
geneferocl 3.3.3-2 (Windows/OpenCL/32-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Running on platform 'NVIDIA CUDA', device 'GeForce RTX 3090', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '456.71'.
82 computeUnits @ 1695MHz, memSize=24576MB, cacheSize=2296kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Supported transform implementations: ocl ocl2 ocl3 ocl4 ocl5
Command line: projects/www.primegrid.com/geneferocl_windows_3.3.3-2.exe -boinc -q 182870^4194304+1
Normal priority change succeeded.
Checking available transform implementations...
A benchmark is needed to determine best transform, testing available transform implementations...
Testing OCL transform...
Testing OCL2 transform...
Testing OCL3 transform...
Testing OCL4 transform...
Testing OCL5 transform...
Benchmarks completed (17.343 seconds).
Using OCL5 transform
Starting initialization...
Initialization complete (52.324 seconds).
Testing 182870^4194304+1...
Estimated time for 182870^4194304+1 is 13:30:00
182870^4194304+1 is complete. (22071026 digits) (err = 0.0000) (time = 13:57:12) 17:20:29
17:20:29 (18160): called boinc_finish(0)
</stderr_txt>
]]>
____________
|
|
|
|
3090 is fab, hey did some sums on this thread as to how long DYFL should take
https://www.primegrid.com/forum_thread.php?id=8422&nowrap=true#144603
This is fastest so far of mine, 15:20 (55,208 seconds) no idea why so much shorter, maybe I had the windows open that day :)
https://www.primegrid.com/workunit.php?wuid=680624171 |
|
|