Author |
Message |
|
I have been getting the same computation error on about half of my Proth Prime Search (Sieve) work units. The all have the same error message so it is a reproducible bug.
Error output from the WU report.
<core_client_version>6.10.58</core_client_version>
< Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Yes I'm having the same problem... at least half of my pps sieve wu's error out (SIGPIPE) after running for about the same time it takes to complete successfully. No idea why some work and some don't. It seems to me this started when the executable switched to version 1.29. I'm quite sure I didn't see this problem on earlier versions (e.g. 1.27?) I'm on the latest cuda version (3.1.17) and OSX 10.6.4.
--Gary
Long shot here... Did your problem start on September 21st?
____________
My lucky number is 75898524288+1 |
|
|
|
Could be... that date sounds about right but I don't know specifically. There are error'd WUs in the DB from me on the 23rd, but the date on the 1.29 executable is Sept 15th. It seemed to be running OK (and a bit faster than the previous version) for at least a few days before I noticed problems. Sorry I can't be more precise.
--Gary |
|
|
|
Hi,
Sorry for the slow reply, this has just been brought to my attention.
Firstly, I think the SIGPIPE stuff is a red-herring, the actual first cause is a segmentation fault (SIGSEGV) earlier on. Not any closer to knowing the cause however...
I've had a look through the WUs from both Bernd and Gary's machines, and there isn't a common factor (different CPU, GPU & OS level...) that is an obvious smoking gun, so I assume there is a code bug of some sort.
We need to isolate this for testing (and hopefully I can recreate it on my machine here) - I'll get hold of the command-line args for some of the failing WUs and get back to you when I have something to test. With a bit of luck, it may be fixed by accident in the latest version of PPSieve, but it would be good to fully understand the cause first.
Cheers
- Iain |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Could be... that date sounds about right but I don't know specifically. There are error'd WUs in the DB from me on the 23rd, but the date on the 1.29 executable is Sept 15th. It seemed to be running OK (and a bit faster than the previous version) for at least a few days before I noticed problems. Sorry I can't be more precise.
--Gary
The reason I asked about Sept. 21 is because that was the release date for a game, "Civilization 5", which will trash these WUs if they run while the game is running. That's least on a PC, but the problem has to do with the GPU (the game grabs all video memory, so the BOINC tasks crash), so it might affect MACs as well.
I said it was a longshot.
____________
My lucky number is 75898524288+1 |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
Gary's first post about this issue was on the 19th. Plus, taking all GPU memory wouldn't cause this kind of error; it would cause an "Insufficient available memory" error.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Gary's first post about this issue was on the 19th. Plus, taking all GPU memory wouldn't cause this kind of error; it would cause an "Insufficient available memory" error.
You are correct.
____________
My lucky number is 75898524288+1 |
|
|
|
I'm seeing this too.
I don't have Civilization 5.
I would be more than happy to help debug this.
|
|
|
|
Just a follow up CIV 5 for Mac has not yet been released much less announced. CIV V for Mac is a much wished for hope some are saying 3-8 months from now.
Wish it was that simple a cause. I have not been gaming on the computer this month. |
|
|
|
Hi, I've managed to reproduce this locally - it appears that the crash is caused when the application is interrupted (i.e. ctrl-C on command line, or possibly BOINC terminating the executable to switch between tasks?). This does not happen with the previous version of ppsieve-CUDA (1.29) so I'll try and figure out what has changed that causes this.
Cheers
- Iain |
|
|
|
I've tried to capture a core file to help in this effort but it seems my programming skills are insufficient in that regard... BOINC must be doing something to inhibit them as I have been able to generate core files (not as easily as a veteran unix weenie is used to) from other non-boinc apps, but not from BOINC subprocesses.
I will say that I've had the app error out when it's been interrupted (via a shutdown) and not. And, I've had them complete successfully either way as well.
--Gary |
|
|
|
The interruption may be the cause. I do/(did) crunch 2 GPU projects. I just noticed that when the GPU switched from Prime to Collatz it would burn 10 or so Collatz WUs with out of memory errors about 1 every 5 minutes. It also seems from looking at the time stamp on the WU that the Prime grid WU from before the switch seems to have errored. I just noticed the 3 Pages of burned Collatz WUs yesterday. They date from when I started to run Prime on GPU again. I have stopped running Prime on the GPU until some of this is fixed. Since I have stopped Primegrid on GPU all the Collatz GPU WUs have run to completion. |
|
|
|
Bug found and fixed. Expect a new app on PrimeGrid in the next couple of days.
- Iain |
|
|
|
Yay!
Thank you, thank you, thank you!
|
|
|
|
Still waiting. Is there anything that we can do to accelerate the fix? My GPU cycles are still going into computing errors... ;-(
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
You can fix your own computer with an app_info.xml file and the latest Mac binary.
Unfortunately, I hear Rytis is stuck on the wrong side of the Baltic. But when he gets back, I should have an even faster version ready!
____________
|
|
|
|
Any updates on the new Science Apps for OSX CUDA? |
|
|
|
I sent the new app to Rytis a couple of weeks ago but haven't had any response. I sent him another mail a few days ago, but as Ken points out he has been out of touch for some reason. All we can do for now is wait I think... |
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2649 ID: 1 Credit: 26,363,112 RAC: 0
                    
|
The apps are now live.
____________
|
|
|
|
Proth Prime Search (sieve) v1.29 (cuda31) <---is that what's now "live" and supposed to be improved?
Because I still get similar computation errors on about half my tasks (take a look at my page and ignore the aborted ones, that's a work buffer overload).
FYI it's OSX 10.6.4 running up-to-date CUDA drivers on a GeForce GT 330M (yes it's a laptop), tasks take ~1h40m to complete regardless of whether they result in a computation error, which I don't find out til they complete anyway.
I'd imagine it is reproducible.
Oddly, I was sent this last batch of tasks on or around the date on which the "new app" went live and I still get these errors. In the interest of not wasting my/your time, and not overheating my machine, I'm gonna stop crunching CUDA tasks altogether. Once I get my BFG GeForce GTX 260 up and running, we'll see...
Just letting you know I still get this problem, and I run it overnight whilst nothing else is using the GPU (or even really using the CPU!)[/i] |
|
|
|
Proth Prime Search (sieve) v1.29 (cuda31) <---is that what's now "live" and supposed to be improved?
Because I still get similar computation errors on about half my tasks (take a look at my page and ignore the aborted ones, that's a work buffer overload).
FYI it's OSX 10.6.4 running up-to-date CUDA drivers on a GeForce GT 330M (yes it's a laptop), tasks take ~1h40m to complete regardless of whether they result in a computation error, which I don't find out til they complete anyway.
I'd imagine it is reproducible.
Oddly, I was sent this last batch of tasks on or around the date on which the "new app" went live and I still get these errors. In the interest of not wasting my/your time, and not overheating my machine, I'm gonna stop crunching CUDA tasks altogether. Once I get my BFG GeForce GTX 260 up and running, we'll see...
Just letting you know I still get this problem, and I run it overnight whilst nothing else is using the GPU (or even really using the CPU!)[/i]
It is supposed to be 1.30 CUDA 3.1 for OSX now. |
|
|
|
Ah ok...well I'm still not going to run it on my laptop (heating issues and such) but the custom machine I'm building will run Win7 and (probably) Ubuntu, so there should be no issues there anyway.
Thanks for the info. |
|
|
|
Yes, the current app is 1.30 - I believe if you abort the WUs you currently have, any new ones you download will use the new app.
Cheers
- Iain |
|
|
|
Seems to be working well at this time. Switches nicely between PG and Collatz. I think you Killed the Bugs in the GPU apps for OSX. Thank you for the time and efforts. |
|
|