PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : General discussion : Will CUDA 3.1 appear soon?

Author Message
Profile Nawiedzony
Avatar
Send message
Joined: 8 Aug 10
Posts: 4
ID: 65128
Credit: 2,812,958
RAC: 0
321 LLR Bronze: Earned 10,000 credits (14,629)PPS LLR Silver: Earned 100,000 credits (170,664)SGS LLR Silver: Earned 100,000 credits (133,666)TRP LLR Bronze: Earned 10,000 credits (10,570)Woodall LLR Bronze: Earned 10,000 credits (11,572)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (249,640)PPS Sieve Ruby: Earned 2,000,000 credits (2,221,920)
Message 25943 - Posted: 27 Aug 2010 | 20:06:54 UTC

Is it planned to carry CUDA 3.1 technology into effect within the project?

Ken_g6Project donor
Volunteer developer
Avatar
Send message
Joined: 4 Jul 06
Posts: 915
ID: 3110
Credit: 183,164,814
RAC: 0
Discovered 1 mega primeFound 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,010,094)Cullen LLR Ruby: Earned 2,000,000 credits (2,022,806)ESP LLR Amethyst: Earned 1,000,000 credits (1,193,202)Generalized Cullen/Woodall LLR Gold: Earned 500,000 credits (767,102)PPS LLR Jade: Earned 10,000,000 credits (17,537,174)PSP LLR Ruby: Earned 2,000,000 credits (4,060,731)SoB LLR Ruby: Earned 2,000,000 credits (3,860,053)SR5 LLR Ruby: Earned 2,000,000 credits (2,061,736)SGS LLR Amethyst: Earned 1,000,000 credits (1,923,102)TPS LLR (retired) Bronze: Earned 10,000 credits (19,376)TRP LLR Ruby: Earned 2,000,000 credits (2,520,745)Woodall LLR Ruby: Earned 2,000,000 credits (2,021,413)321 Sieve Ruby: Earned 2,000,000 credits (2,915,071)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (8,584,236)Generalized Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (2,461,309)PPS Sieve Emerald: Earned 50,000,000 credits (83,501,701)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (352,129)TRP Sieve (suspended) Gold: Earned 500,000 credits (776,202)AP 26/27 Turquoise: Earned 5,000,000 credits (5,851,049)GFN Sapphire: Earned 20,000,000 credits (34,062,697)PSA Ruby: Earned 2,000,000 credits (4,154,617)
Message 26113 - Posted: 4 Sep 2010 | 19:04:52 UTC - in response to Message 25943.
Last modified: 4 Sep 2010 | 19:44:56 UTC

There isn't anything I see CUDA 3.1 can do that CUDA 2.3 isn't already doing equally well in PPSieve-CUDA. I don't need memory accesses except to load and save registers at the start and end of a kernel. I have no need to run multiple kernels in parallel. And the 64-bit multiplies already get improved on Fermi without changing CUDA versions. If you see another particular innovation not involving the memory systems (Edit: or floating point), you're welcome to point it out.
____________

Profile Nawiedzony
Avatar
Send message
Joined: 8 Aug 10
Posts: 4
ID: 65128
Credit: 2,812,958
RAC: 0
321 LLR Bronze: Earned 10,000 credits (14,629)PPS LLR Silver: Earned 100,000 credits (170,664)SGS LLR Silver: Earned 100,000 credits (133,666)TRP LLR Bronze: Earned 10,000 credits (10,570)Woodall LLR Bronze: Earned 10,000 credits (11,572)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (249,640)PPS Sieve Ruby: Earned 2,000,000 credits (2,221,920)
Message 26156 - Posted: 6 Sep 2010 | 23:29:00 UTC - in response to Message 26113.

"Math Libraries Performance Improvements, including:
- Improved performance of selected transcendental functions from the log, pow, erf, and gamma families
- Significant improvements in double-precision FFT performance on Fermi-architecture GPUs for 2^n transform sizes
- Streaming API now supported in CUBLAS for overlapping copy and compute operations
- CUFFT Real-to-complex (R2C) and complex-to-real (C2R) optimizations for 2^n data sizes
- Improved performance for GEMV and SYMV subroutines in CUBLAS
- Optimized double-precision implementations of divide and reciprocal routines for the Fermi architecture "


Some people try cuda3.1 vs cuda2.3 in Collatz project, and 3.1 is a bit faster...

jjwhalen
Avatar
Send message
Joined: 28 Dec 09
Posts: 78
ID: 52778
Credit: 476,139,121
RAC: 0
321 LLR Bronze: Earned 10,000 credits (35,831)Cullen LLR Bronze: Earned 10,000 credits (35,150)ESP LLR Bronze: Earned 10,000 credits (35,604)PPS LLR Silver: Earned 100,000 credits (103,518)PSP LLR Silver: Earned 100,000 credits (153,683)SoB LLR Bronze: Earned 10,000 credits (17,742)SGS LLR Bronze: Earned 10,000 credits (54,444)TRP LLR Silver: Earned 100,000 credits (137,791)Woodall LLR Bronze: Earned 10,000 credits (50,603)321 Sieve Silver: Earned 100,000 credits (202,830)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,071,626)PPS Sieve Double Silver: Earned 200,000,000 credits (466,804,040)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (26,614)TRP Sieve (suspended) Silver: Earned 100,000 credits (363,752)AP 26/27 Silver: Earned 100,000 credits (101,981)GFN Ruby: Earned 2,000,000 credits (2,943,912)
Message 26160 - Posted: 7 Sep 2010 | 1:54:59 UTC - in response to Message 26156.
Last modified: 7 Sep 2010 | 2:05:35 UTC


Some people try cuda3.1 vs cuda2.3 in Collatz project, and 3.1 is a bit faster...


I can confirm that empirically on my GTX 465SC, though the CPU load is correspondingly higher (probably not a big shocker). By empirically I mean average elapsed time of Collatz v2.05 (cuda 3.1) vs. 2.03 (cuda 2.3).
____________

Mutiny32*
Send message
Joined: 17 Aug 10
Posts: 2
ID: 65772
Credit: 1,417,690
RAC: 0
PSP LLR Silver: Earned 100,000 credits (357,702)TRP LLR Gold: Earned 500,000 credits (509,796)321 Sieve Bronze: Earned 10,000 credits (25,158)PPS Sieve Gold: Earned 500,000 credits (524,651)
Message 26595 - Posted: 25 Sep 2010 | 18:27:46 UTC

CUDA 3.2 RC is out and it appears to be touting a big jump in performance for most libraries and it looks like it has been highly optimized for Fermi over 3.1. Some notes:


  • CUSPARSE, a new library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations
  • CURAND, a new library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines for in both host and device code
  • CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs
  • CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
  • Support for malloc() and free() in CUDA C compute kernels
  • Added cuda-memcheck support for Fermi architecture GPUs

Message boards : General discussion : Will CUDA 3.1 appear soon?

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2022 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.15, 0.04, 0.01
Generated 18 Aug 2022 | 7:03:59 UTC