Author |
Message |
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
Until thursday cuda-sieving run without any failure. Since then every task abort with following stderr:
Skipping: 87369
Skipping: /hostid
Unrecognized XML in parse_init_data_file: starting_elapsed_time
Skipping: 0.000000
Skipping: /starting_elapsed_time
Sieve started: 1507576000000000 <= p < 1507579000000000
Thread 0 starting
Detected GPU 0: GeForce 9800 GTX+
Detected compute capability: 1.1
Detected 16 multiprocessors.
Computation Error: no candidates found for p=1507576042643183.
called boinc_finish
System is Ubuntu 10.04.1 LTS Linux 64-bit (kernel 2.6.32-26-server)
Does anybody could give any hint ...
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
Update:
done an additional test (from sieve-subforum):
/var/lib/boinc-client/projects/www.primegrid.com $ ./primegrid_tpsieve_1.35_x86_64-pc-linux-gnu__cuda23 -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60 -M2 --device 0
tpsieve version cuda-0.2.2d (testing)
Compiled Nov 8 2010 with GCC 4.3.3
nstart=76, nstep=32
Didn't change nstep from 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000154219 | 6023*2^934790-1
42070000198537 | 3373*2^1046686+1
42070001803331 | 5237*2^486598-1
42070003062431 | 7465*2^1994555-1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070005645821 | 3633*2^119620-1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070007733361 | 7007*2^1691614-1
42070008458437 | 7095*2^1422761-1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
42070010190569 | 5625*2^1903125+1
42070011430123 | 3821*2^1406279+1
42070012209011 | 9405*2^360411-1
42070012301263 | 1957*2^1185814+1
42070013521999 | 1965*2^404493+1
42070013970587 | 7143*2^1462422+1
42070013989247 | 5037*2^838603+1
42070016416499 | 4571*2^466510-1
42070017332953 | 6237*2^1916994+1
42070018235321 | 1941*2^363948+1
42070019117111 | 2523*2^999263-1
42070019542387 | 8587*2^1703626+1
42070021901227 | 6589*2^1149693-1
42070023987581 | 9811*2^318944+1
42070024242289 | 8319*2^1792800-1
42070024339237 | 9257*2^1170495+1
42070024532551 | 4311*2^1690093+1
42070024936837 | 5679*2^1726142+1
42070024995961 | 9111*2^1707153+1
42070026021997 | 4039*2^1819590+1
42070026719239 | 9981*2^629165-1
42070027452199 | 1323*2^854008+1
42070028029061 | 8205*2^1394191-1
42070029006583 | 5943*2^663870+1
Found 40 factors
stderr looks good:
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce 9800 GTX+
Detected compute capability: 1.1
Detected 16 multiprocessors.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070030000000
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 15.38 sec. (0.02 init + 15.36 sieve) at 1962751 p/sec.
Processor time: 0.41 sec. (0.03 init + 0.38 sieve) at 79333053 p/sec.
Average processor utilization: 1.57 (init), 0.02 (sieve)
called boinc_finish
I am more and more confused ...
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I don't know if it's related, but my CUDA tasks are now taking about 21-22 minutes, vs. the 18-20 minutes they were taking before. It appears as if something changed.
I think John mentioned something about transitioning to a new phase of work.
____________
My lucky number is 75898524288+1 |
|
|
|
I don't know if it's related, but my CUDA tasks are now taking about 21-22 minutes, vs. the 18-20 minutes they were taking before. It appears as if something changed.
I think John mentioned something about transitioning to a new phase of work.
He has mentioned it here.
____________
|
|
|
|
Update:
done an additional test (from sieve-subforum):
/var/lib/boinc-client/projects/www.primegrid.com $ ./primegrid_tpsieve_1.35_x86_64-pc-linux-gnu__cuda23 -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60 -M2 --device 0
tpsieve version cuda-0.2.2d (testing)
Compiled Nov 8 2010 with GCC 4.3.3
nstart=76, nstep=32
Didn't change nstep from 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000154219 | 6023*2^934790-1
.
.
.
42070027452199 | 1323*2^854008+1
42070028029061 | 8205*2^1394191-1
42070029006583 | 5943*2^663870+1
Found 40 factors
stderr looks good:
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce 9800 GTX+
Detected compute capability: 1.1
Detected 16 multiprocessors.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070030000000
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 15.38 sec. (0.02 init + 15.36 sieve) at 1962751 p/sec.
Processor time: 0.41 sec. (0.03 init + 0.38 sieve) at 79333053 p/sec.
Average processor utilization: 1.57 (init), 0.02 (sieve)
called boinc_finish
I am more and more confused ...
You could start a test run with the range from your first post (-p 1507576G -P 1507579G) and -k 5 -K 9999 -N 3M
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
done as mentioned by ralf
$ ./primegrid_tpsieve_1.35_x86_64-pc-linux-gnu__cuda23 -p1507576000000000 -P1507579000000000 -k 5 -K 9999 -N 3M -c 60 -M2 --device 0
sieving stops after reporting some factors
stderr says
Computation Error: no candidates found for p=1507576375167301.
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
|
done as mentioned by ralf
$ ./primegrid_tpsieve_1.35_x86_64-pc-linux-gnu__cuda23 -p1507576000000000 -P1507579000000000 -k 5 -K 9999 -N 3M -c 60 -M2 --device 0
sieving stops after reporting some factors
stderr says
Computation Error: no candidates found for p=1507576375167301.
Just started a test run on the 460. It has already passed the point where it failed on your card. Could you start a second run to see if it fails at the same point again? In your first post it failed at p=1507576042643183.
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
second test:
Computation Error: no candidates found for p=1507576223882821
will now try another NVIDIA as mentioned by a teammate in our chat
I will keep you informed
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
|
With the comment that the Fermi GPU Code is somewhat different from the non-Fermi Code here is the result of my test run:
C:\Users\Ralf\Desktop\tpsieve-cuda>tpsieve-cuda-x86-windows.exe -p 1507576G -P 1507579G -k 5 -K 9999
-N 3M -c 60 -M2 -q
tpsieve version cuda-0.2.2d (testing)
nstart=86, nstep=38
nstep changed to 32
tpsieve initialized: 5 <= k <= 9999, 86 <= n < 3000000
Sieve started: 1507576000000000 <= p < 1507579000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
p=1507578852913153, 5.266M p/sec, 0.03 CPU cores, 95.1% done. ETA 26 Nov 18:57
Thread 0 completed
Waiting for threads to exit
Sieve complete: 1507576000000000 <= p < 1507579000000000
Found 162 factors
count=85841579,sum=0x7bd6428c20144c75
Elapsed time: 570.27 sec. (0.14 init + 570.13 sieve) at 5262348 p/sec.
Processor time: 18.33 sec. (0.16 init + 18.17 sieve) at 165083027 p/sec.
Average processor utilization: 1.14 (init), 0.03 (sieve)
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
Watch out you are using Windows, I am using Linux
changed now NVIDIA
old (with failure) NVIDIA-Linux-x86_64-260.19.12.run
new (to come) NVIDIA-Linux-x86_64-256.53.run
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
|
The BOINC version delivers the same results:
C:\Users\Ralf\Desktop\tpsieve-cuda>tpsieve-cuda-boinc-x86-windows.exe -p 1507576G -P 1507579G -k 5 -K 9999 -N 3M -c 60 -M2 -q
tpsieve version cuda-0.2.2d (testing)
nstart=86, nstep=38
nstep changed to 32
tpsieve initialized: 5 <= k <= 9999, 86 <= n < 3000000
Found 162 factors
stderr.txt:
18:59:25 (5108): Can't open init data file - running in standalone mode
Sieve started: 1507576000000000 <= p < 1507579000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
Thread 0 completed
Sieve complete: 1507576000000000 <= p < 1507579000000000
count=85841579,sum=0x7bd6428c20144c75
Elapsed time: 569.63 sec. (0.14 init + 569.49 sieve) at 5268336 p/sec.
Processor time: 16.47 sec. (0.14 init + 16.33 sieve) at 183688364 p/sec.
Average processor utilization: 1.00 (init), 0.03 (sieve)
19:08:54 (5108): called boinc_finish
____________
|
|
|
|
Watch out you are using Windows, I am using Linux
changed now NVIDIA
old (with failure) NVIDIA-Linux-x86_64-260.19.12.run
new (to come) NVIDIA-Linux-x86_64-256.53.run
I know ;) I'll reboot into Maverick (10.10) and start another test...
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
result to come round about 19:30 CET / 18:30 UTC
it is still running, so I hope ...
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
|
I've started a test run with Maverick (10.10) amd64 and the 260.19.21 drivers and it looks like a driver or linux problem:
ralf@quadriga:~/Desktop$ ./tpsieve-cuda-boinc-x86_64-linux -p 1507576G -P 1507579G -k 5 -K 9999 -N 3M -c 60 -M2 -q
tpsieve version cuda-0.2.2d (testing)
Compiled Nov 8 2010 with GCC 4.3.3
nstart=86, nstep=38
nstep changed to 32
tpsieve initialized: 5 <= k <= 9999, 86 <= n < 3000000
ralf@quadriga:~/Desktop$ cat stderr.txt
Can't open init data file - running in standalone mode
Sieve started: 1507576000000000 <= p < 1507579000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
Computation Error: no candidates found for p=1507576091573129.
called boinc_finish
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
take my driver as mentioned before
or my teammates driver (he is still running successfully)
wget http://de.download.nvidia.com/XFree86/Linux-x86_64/256.53/NVIDIA-Linux-x86_64-256.53.run
wget http://de.download.nvidia.com/XFree86/Linux-x86_64/256.44/NVIDIA-Linux-x86_64-256.44.run
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
|
Yup. Repeated crashes here too. I've read about problems with the 260.19.xy driver series and the reports seem to be true.
____________
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
I think this is a driver problem. I have no problems with my 260GTX with driver 256.44. After downgrading the driver on my other computer (2x460GTX) from 260.19.xx to 256.53 i have no errors. I test both driverversions of the 260.19, i get errors with both.
Both systems run with Debian GNU Linux testing.
____________
|
|
|
Sysadm@Nbg Volunteer moderator Volunteer tester Project scientist
 Send message
Joined: 5 Feb 08 Posts: 1188 ID: 18646 Credit: 490,016,651 RAC: 0
                    
|
done, success, Found 162 factors!
stderr
Will run in standalone mode.
Sieve started: 1507576000000000 <= p < 1507579000000000
Thread 0 starting
Detected GPU 0: GeForce 9800 GTX+
Detected compute capability: 1.1
Detected 16 multiprocessors.
Thread 0 completed
Sieve complete: 1507576000000000 <= p < 1507579000000000
count=85841579,sum=0x7bd6428c20144c75
Elapsed time: 1672.97 sec. (0.11 init + 1672.86 sieve) at 1793476 p/sec.
Processor time: 30.25 sec. (0.12 init + 30.13 sieve) at 99576438 p/sec.
Average processor utilization: 1.08 (init), 0.02 (sieve)
called boinc_finish
so it seems to me a driver problem with 260.xx
Thanks for your investigations
____________
Sysadm@Nbg
my current lucky number: 3749*2^1555697+1
PSA-PRPNet-Stats-URL: http://u-g-f.de/PRPNet/
|
|
|
|
Interesting. I sieved several thousand G for PSA (PPR3M) with tpsieve and Linux without problems. I'll send Ken a PM.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
I suspect the bug with 260 drivers is a timing issue. Manual sieving usually uses the non-BOINC version, which has a separate thread for sieving.
I'm using the 260.19.14 driver, and I haven't had too many problems yet; but I haven't tried the newest work.
____________
|
|
|
|
I suspect the bug with 260 drivers is a timing issue. Manual sieving usually uses the non-BOINC version, which has a separate thread for sieving.
I'm using the 260.19.14 driver, and I haven't had too many problems yet; but I haven't tried the newest work.
The test which crashed was a "new WU" with an exponent range from 1-3M. My Linux PSA sieving (PPR3M) work had a lower exponent range (-n 2M -N 3M) and the GPU worked without flaws with the 260.19.xy drivers*. Sysadm@Nbg's box worked without problems until yesterday. Around that time the new "shorter" WUs with the larger exponent range were introduced.
*With one exception: The -m 64 problem that I've reported earlier...
____________
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
With the old Workunits (1800 credits) i had also no problems with driverversion 260.19.21, but after PG send out new work (2314 credits) all wus ended with error.
tested driver on 460GTX:
260.19.21: works fine with old workunits, error with new workunits
260.19.12: erros with new workunits, not tested with old
256.53: works good with new workunits, not tested with old
256.44: works fine with old and new, but is 40 seconds slower on old wus in comparison with 260.19.21
____________
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
256.53: works good with new workunits, not tested with old
I became some resends of old workunits and i have no errors with 256.53. Looks like that the driver is faster than 256.44.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
For those sticking with 260 drivers (or unable to leave them for some reason), it looks like adding "-m 8" to the command line may make the results better for Fermi users. I believe you can add "<cmdline>-m 8</cmdline>" to your app_info.xml file after "</coproc>", but I wouldn't swear that's right.
____________
|
|
|
|
it looks like adding "-m 8" to the command line may make the results better for Fermi users.
Likewise, it would seem that "-m 2" allows my GTS 250 to run these new WUs without errors. |
|
|
|
it looks like adding "-m 8" to the command line may make the results better for Fermi users.
Likewise, it would seem that "-m 2" allows my GTS 250 to run these new WUs without errors.
Could these cmdlines be implemented project-side pleaaaaaaaaaaase? Otherwise I'll need to do all the app_info kaboodle. :(
BR
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
I've already asked to get a new client installed with -m 8 default for Fermi. As for -m 2 and a GTS 250, though, that's just way smaller than it should have to be.
By the way, I got a mod at nVIDIA interested but not yet convinced. Can anyone verify that a particular command line works with earlier drivers but not with the current ones? Try one of these programs and a command line like this, for example:
./tpsieve-cuda-(version) -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2165 ID: 1178 Credit: 8,777,295,508 RAC: 0
                                     
|
I've already asked to get a new client installed with -m 8 default for Fermi. As for -m 2 and a GTS 250, though, that's just way smaller than it should have to be.
By the way, I got a mod at nVIDIA interested but not yet convinced. Can anyone verify that a particular command line works with earlier drivers but not with the current ones? Try one of these programs and a command line like this, for example:
./tpsieve-cuda-(version) -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64
Not where I can test now, but the -m2 option was always fastest on my G92 cards (8800GS, 9600GSO, 9800GTX+) of which the GTS 250 is one...
____________
141941*2^4299438-1 is prime!
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
After work (22:00 UTC) i can make same tests with the 260 driver. I can also make same tests with 260GTX if needed.
2xGTX460 driver 256.53:
<cmdline> -m 8</cmdline> works (same runtime, same cpu-time; but i had only time for some fast tests)
./tpsieve-cuda-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0
......
Thread 0 completed
Waiting for threads to exit
Sieve complete: 420700000000000 <= p < 420701000000000
Found 208 factors
count=29703006,sum=0x69ccf6011cb93a06
Elapsed time: 250.82 sec. (0.04 init + 250.78 sieve) at 3987928 p/sec.
Processor time: 4.78 sec. (0.04 init + 4.74 sieve) at 210974032 p/sec.
Average processor utilization: 1.04 (init), 0.02 (sieve)
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
GT240(GT215) driver 195.36.31
<cmdline>-m 6</cmdline> seems to be the fastest
boinc@vmware2k-3:~/Cuda/tpsieve$ ./tpsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 3000000 -c 60 -M2 -m 2 -T -q --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=76, nstep=32
Changed nstep to 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 3000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GT 240
Detected compute capability: 1.2
Detected 12 multiprocessors.
...
Elapsed time: 40.24 sec. (0.02 init + 40.22 sieve) at 749565 p/sec.
Processor time: 0.55 sec. (0.02 init + 0.53 sieve) at 56663058 p/sec.
Average processor utilization: 1.20 (init), 0.01 (sieve)
boinc@vmware2k-3:~/Cuda/tpsieve$ ./tpsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 3000000 -c 60 -M2 -m 4 -T -q --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=76, nstep=32
Changed nstep to 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 3000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GT 240
Detected compute capability: 1.2
Detected 12 multiprocessors.
...
Elapsed time: 39.78 sec. (0.02 init + 39.76 sieve) at 758152 p/sec.
Processor time: 0.53 sec. (0.02 init + 0.51 sieve) at 59340001 p/sec.
Average processor utilization: 1.20 (init), 0.01 (sieve)
boinc@vmware2k-3:~/Cuda/tpsieve$ ./tpsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 3000000 -c 60 -M2 -m 6 -T -q --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=76, nstep=32
Changed nstep to 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 3000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GT 240
Detected compute capability: 1.2
Detected 12 multiprocessors.
...
Elapsed time: 39.77 sec. (0.02 init + 39.76 sieve) at 758261 p/sec.
Processor time: 0.53 sec. (0.02 init + 0.51 sieve) at 58876435 p/sec.
Average processor utilization: 1.20 (init), 0.01 (sieve)
boinc@vmware2k-3:~/Cuda/tpsieve$ ./tpsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 3000000 -c 60 -M2 -m 8 -T -q --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=76, nstep=32
Changed nstep to 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 3000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GT 240
Detected compute capability: 1.2
Detected 12 multiprocessors.
...
Elapsed time: 56.30 sec. (0.02 init + 56.29 sieve) at 535576 p/sec.
Processor time: 0.52 sec. (0.02 init + 0.50 sieve) at 60289382 p/sec.
Average processor utilization: 1.20 (init), 0.01 (sieve)
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
GT240(GT215) driver 195.36.31
<cmdline>-m 6</cmdline> seems to be the fastest
depends on the number and organisation of cores.
my GTS250 (128:64:16) get's really slow with -m 6 and ins fastest with - m 16
|
|
|
|
As for -m 2 and a GTS 250, though, that's just way smaller than it should have to be.
I've tried all values between 2 and 8. Most bail-out within 30 seconds, some sooner. Should I try higher values?
Also, I've rebuilt the version in your git repo against CUDA 3.2, and get the exact same behaviour. (and roughly the same performance), so CUDA version doesn't seem to matter. |
|
|
|
As for -m 2 and a GTS 250, though, that's just way smaller than it should have to be.
I've tried all values between 2 and 8. Most bail-out within 30 seconds, some sooner. Should I try higher values?
Also, I've rebuilt the version in your git repo against CUDA 3.2, and get the exact same behaviour. (and roughly the same performance), so CUDA version doesn't seem to matter.
maybe the linux-app is behaving different - i'm running win64.. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I'm runing linux64 with the older kernel 2.6.27-17 and driver 195.36.31 (cuda3.0). Your kernel is 2.6.37-3 and we don't know your driver-version. IIRC, cuda3.2 needs a driver higher than 262.x
"Ken_g6" wrote: I've already asked to get a new client installed with -m 8 default for Fermi. As for -m 2 and a GTS 250, though, that's just way smaller than it should have to be.
By the way, I got a mod at nVIDIA interested but not yet convinced. Can anyone verify that a particular command line works with earlier drivers but not with the current ones? Try one of these programs and a command line like this, for example:
./tpsieve-cuda-(version) -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I forgot to post my value for "-m 64":
boinc@vmware2k-3:~/Cuda/tpsieve$ ./tpsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 3000000 -c 60 -M2 -m 64 -T -q --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=76, nstep=32
Changed nstep to 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 3000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GT 240
Detected compute capability: 1.2
Detected 12 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 69 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 42.30 sec. (0.02 init + 42.28 sieve) at 712943 p/sec.
Processor time: 21.24 sec. (0.02 init + 21.23 sieve) at 1420311 p/sec.
Average processor utilization: 0.96 (init), 0.50 (sieve)
boinc@vmware2k-3:~/Cuda/tpsieve$ ./tpsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 3000000 -c 60 -M2 -m 60 -T -q --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=76, nstep=32
Changed nstep to 31
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 3000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GT 240
Detected compute capability: 1.2
Detected 12 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 69 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 42.37 sec. (0.02 init + 42.35 sieve) at 711830 p/sec.
Processor time: 19.92 sec. (0.02 init + 19.91 sieve) at 1514503 p/sec.
Average processor utilization: 0.96 (init), 0.47 (sieve)
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
test with driver 256.53 & 260.19.xx on GTX460
driver 256.53 (GTX460):
###################
./tpsieve-cuda-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=82, nstep=36
nstep changed to 32
tpsieve initialized: 1201 <= k <= 9999, 82 <= n < 3000000
Sieve started: 420700000000000 <= p < 420701000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
....
Thread 0 completed
Waiting for threads to exit
Sieve complete: 420700000000000 <= p < 420701000000000
Found 208 factors
count=29703006,sum=0x69ccf6011cb93a06
Elapsed time: 250.62 sec. (0.04 init + 250.58 sieve) at 3991101 p/sec.
Processor time: 4.68 sec. (0.04 init + 4.64 sieve) at 215520875 p/sec.
Average processor utilization: 0.98 (init), 0.02 (sieve)
driver 260.19.12 (GTX460):
#####################
./tpsieve-cuda-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=82, nstep=36
nstep changed to 32
tpsieve initialized: 1201 <= k <= 9999, 82 <= n < 3000000
Sieve started: 420700000000000 <= p < 420701000000000
Resuming from checkpoint p=420700689438721 in tpcheck420700e9.txt
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
....
Thread 0 completed
Waiting for threads to exit
Sieve complete: 420700000000000 <= p < 420701000000000
Found 208 factors
count=29703006,sum=0x69ccf6011cb93a06
Elapsed time: 259.13 sec. (0.04 init + 259.09 sieve) at 3859950 p/sec.
Processor time: 4.66 sec. (0.04 init + 4.61 sieve) at 216829339 p/sec.
Average processor utilization: 1.02 (init), 0.02 (sieve)
driver 260.19.21 (GTX460):
#####################
./tpsieve-cuda-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=82, nstep=36
nstep changed to 32
tpsieve initialized: 1201 <= k <= 9999, 82 <= n < 3000000
Sieve started: 420700000000000 <= p < 420701000000000
Resuming from checkpoint p=420700090963969 in tpcheck420700e9.txt
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
....
Thread 0 completed
Waiting for threads to exit
Sieve complete: 420700000000000 <= p < 420701000000000
Found 208 factors
count=29703006,sum=0x69ccf6011cb93a06
Elapsed time: 235.51 sec. (0.07 init + 235.43 sieve) at 3861472 p/sec.
Processor time: 4.42 sec. (0.04 init + 4.38 sieve) at 207737311 p/sec.
Average processor utilization: 0.59 (init), 0.02 (sieve)
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
1. You have to use the BOINC version to get errors, it appears. Edit: Yes, I mean tpsieve-cuda-boinc-(version).
2. Please delete any tpcheck*.txt between runs.
____________
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
- make 10 test per driver (5 per card, 2x MSI N460GTX CYCLONE OC 1GD5)
- all tasks with driver 260.xx have errors, but the the number of factors before the task aborted is differing
- delete stderr.txt, tpfactors.txt and tpcheck*.txt (if available) between runs
driver 256.53 (GTX460):
#######################
./tpsieve-cuda-boinc-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=82, nstep=36
nstep changed to 32
tpsieve initialized: 1201 <= k <= 9999, 82 <= n < 3000000
...
Found 208 factors
more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 420700000000000 <= p < 420701000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
Thread 0 completed
Sieve complete: 420700000000000 <= p < 420701000000000
count=29703006,sum=0x69ccf6011cb93a06
Elapsed time: 248.63 sec. (0.04 init + 248.59 sieve) at 4022983 p/sec.
Processor time: 4.57 sec. (0.04 init + 4.52 sieve) at 221047039 p/sec.
Average processor utilization: 1.06 (init), 0.02 (sieve)
called boinc_finish
driver 260.19.12 (GTX460):
##########################
./tpsieve-cuda-boinc-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=82, nstep=36
nstep changed to 32
tpsieve initialized: 1201 <= k <= 9999, 82 <= n < 3000000
more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 420700000000000 <= p < 420701000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
Computation Error: no candidates found for p=420700026453163.
called boinc_finish
driver 260.19.21 (GTX460):
##########################
./tpsieve-cuda-boinc-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0
tpsieve version cuda-0.2.2e (testing)
Compiled Nov 27 2010 with GCC 4.3.3
nstart=82, nstep=36
nstep changed to 32
tpsieve initialized: 1201 <= k <= 9999, 82 <= n < 3000000
more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 420700000000000 <= p < 420701000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 460
Detected compute capability: 2.1
Detected 7 multiprocessors.
Computation Error: no candidates found for p=420700154581337.
called boinc_finish
____________
|
|
|
|
So, basically, we linux users need to downgrade to nvidia driver 256.xx? |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
That is the best option for now, yes. The second-best option is to include a -m argument with a low value in an app_info.xml file.
____________
|
|
|
|
Maybe a bug should be filed at Nvidia?
If this is a real driver-error it should be fixed by the developer of the driver if think. |
|
|
|
http://www.nvnews.net/vbulletin/showthread.php?t=157563 New nvidia BETA linux driver released today. It's lacking an updated changelog, so no way to know what's different/new.
I just tried with this new driver, and continue to have the same problem; just in case anyone else is inclined to test the new version, it won't help this.
They did however fix being able to change the clockspeeds through nvidia-settings, though, so some people using the 260 driver might find it useful. |
|
|