Author |
Message |
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I've turned off the Mac OpenCL ATI GFN app. It's not working. That's really kind of strange because the Mac OpenCL Nvidia GFN app **IS** working -- and they're the exact same app.
This problem is not related to the new 3.2.0 apps released a couple of days ago. This problem also existed with the previous version of the app.
Both the short and WR tasks are turned off. The Mac CPU, Mac CUDA, and Mac Nvidia OpenCL apps are not affected.
If you have a Mac with a double-precision ATI GPU and are willing to help us diagnose the problem, please post here. None of us have a computer on which we can test this app.
____________
My lucky number is 75898524288+1 |
|
|
|
Host 432092
I would try to help but don't have great tech knowledge.My help would probably amount to running wus and posting the stderr output.
Result from failed task :
Name genefer_1048576_394209_4
Workunit 394093491
Created 24 May 2014 | 5:42:39 UTC
Sent 24 May 2014 | 7:16:50 UTC
Received 24 May 2014 | 7:18:27 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1)
Computer ID 432092
Report deadline 12 Jun 2014 | 8:16:50 UTC
Run time 2.03
CPU time 0.03
Validate state Invalid
Credit 0.00
Application version Genefer v2.08 (openclGFNMAC)
Stderr output
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
geneferocl 3.1.2-7 (Apple x86 64-bit OpenCL)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: primegrid_genefer_3_1_2_7_2.08_i686-apple-darwin__openclGFNMAC -boinc -q 425070^1048576+1 --device 1
No OpenCL device found.
08:16:52 (78044): called boinc_finish
</stderr_txt>
]]>
Jim B |
|
|
|
Hi Jim, thanks for the offer of help! I've sent you a PM.
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! |
|
|
|
I have an iMac with an Intel i7-860 CPU and an ATI Radeon HD 4850 (512Mb) GPU. I'd be glad to help in whatever way I can.
Marnix A. van Ammers
Benicia, California, USA
|
|
|
|
I've a new (late-2013) MacPro with the AMD FirePro D700 cards. If I can help, let me know.
James. |
|
|
|
Thanks for all the offers. I have someone running the tests now but will be back in touch if I need further testing.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! |
|
|
|
Has there been any progress identifying the problem?
Are there other versions of ATI OpenCL, like on Windows or Linux available and do they pass or fail ? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Has there been any progress identifying the problem?
Are there other versions of ATI OpenCL, like on Windows or Linux available and do they pass or fail ?
Mac ATI is the only version affected by this particular bug. All the others work, which is why only the Mac ATI app is disabled. This is not, however, the only bug being fixed in the next release.
This problem has been fixed and will go live once the next version of the apps are deployed. Several problems were corrected in the upcoming release, including the cpu-hogging issue. That's fixed on Windows, but we're still working on the Linux version. I expect it to be resolved soon, and am hopeful that we'll complete our testing and be able to install the new apps shortly. What does "shortly" mean? That depends on how pernicious the bug is. If we're lucky, it will be less than a week.
The next version of the OpenCL app is also significantly faster. So fast, in fact, that's it's faster than the CUDA version on all GPUs. When it goes live, we'll therefore be removing the CUDA app altogether, and exclusively using the OpenCL app on Nvidia as well as ATI GPUs.
Your patience during this downtime for the Mac ATI app will be rewarded with a much faster GFN app.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Genefer 3.2.2 has been released into production (as BOINC app 3.01), and with this the Mac ATI apps are turned back on.
____________
My lucky number is 75898524288+1 |
|
|
|
Both of my ATI equipped macs (old iMac & new Mac Pro) are erroring out on all GFN GPU tasks :-(
http://www.primegrid.com/results.php?hostid=113925
http://www.primegrid.com/results.php?hostid=434562 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Both of my ATI equipped macs (old iMac & new Mac Pro) are erroring out on all GFN GPU tasks :-(
http://www.primegrid.com/results.php?hostid=113925
http://www.primegrid.com/results.php?hostid=434562
FirePro GPUs don't work. That's why the second one doesn't work.
Is your other GPU double precision? The error message you're getting is "no GPU". I think you'll get that error if there's no double precision GPU available.
As a side note, it seems we have a problem with Linux and Mac builds. When an error like this is detected, it's supposed to wait an hour before reporting the error. That prevents a single computer from trashing thousands of tasks. The delay seems to be broken. We'll fix that part.
But there's nothing we can do about it not running on Single Precision GPUs. Right now, it won't work on FirePro either, but it's unclear whether that can be fixed.
____________
My lucky number is 75898524288+1 |
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 644 ID: 164101 Credit: 305,010,093 RAC: 0

|
FirePro GPUs don't work. That's why the second one doesn't work.
FirePro round-off errors are much larger than other GPU.
b ~ 450000 (for N = 2^20) is two large (err ~ 0.2 on IEEE GPU).
But you can try GFN-WR (N = 2^22) : b ~ 30000 and round-off errors are smaller (err ~ 0.002 on IEEE GPU).
|
|
|
|
[quote]Both of my ATI equipped macs (old iMac & new Mac Pro) are erroring out on all GFN GPU tasks :-(
http://www.primegrid.com/results.php?hostid=113925
Is your other GPU double precision? The error message you're getting is "no GPU". I think you'll get that error if there's no double precision GPU available.
As a side note, it seems we have a problem with Linux and Mac builds. When an error like this is detected, it's supposed to wait an hour before reporting the error. That prevents a single computer from trashing thousands of tasks. The delay seems to be broken. We'll fix that part.
But there's nothing we can do about it not running on Single Precision GPUs. Right now, it won't work on FirePro either, but it's unclear whether that can be fixed.
The HD 4850 should have DP. In past (on windows side), I had a lot of trouble with the ati drivers and the opencl support for the old Radeon 4XXX series. AMD dropped the support some years ago and the legacy driver is horrible. The only stable driver with opencl on my side was an old release of 2010 or 2011. But I don't know, how this is in mac world ;)
Regards Odi
____________
|
|
|
|
I ran a couple of GFN-WR on my D700s in my MacPro and as far as I can tell, they finished running on Friday and were uploaded, but when I look online they report as abandoned.
http://www.primegrid.com/result.php?resultid=562455071
http://www.primegrid.com/result.php?resultid=561316338
Does anyone know why that would be the case? Is it related to the FirePro round off error? (I would think that it would show as invalid/Error while computing if that was the case.)
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I ran a couple of GFN-WR on my D700s in my MacPro and as far as I can tell, they finished running on Friday and were uploaded, but when I look online they report as abandoned.
http://www.primegrid.com/result.php?resultid=562455071
http://www.primegrid.com/result.php?resultid=561316338
Does anyone know why that would be the case? Is it related to the FirePro round off error? (I would think that it would show as invalid/Error while computing if that was the case.)
I don't know -- from our point of view, the server is saying your host detached from PrimeGrid.
However, with at least one of the two (the first one), the result WAS uiploaded, and I can fix the status so that it will validate. I haven't looked at the second one yet, but I'll fix it if possible.
____________
My lucky number is 75898524288+1 |
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 644 ID: 164101 Credit: 305,010,093 RAC: 0

|
I ran a couple of GFN-WR on my D700s in my MacPro and as far as I can tell, they finished running on Friday and were uploaded, but when I look online they report as abandoned.
http://www.primegrid.com/result.php?resultid=562455071
http://www.primegrid.com/result.php?resultid=561316338
However, with at least one of the two (the first one), the result WAS uiploaded, and I can fix the status so that it will validate.
Good news, a FirePro is able to check a GFN-WR!
What is the round-off error? (it is 0.0016 / 0.0017 on a Tahiti and GeForce GTX 670).
|
|
|
|
How do I tell what the round off is? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I ran a couple of GFN-WR on my D700s in my MacPro and as far as I can tell, they finished running on Friday and were uploaded, but when I look online they report as abandoned.
http://www.primegrid.com/result.php?resultid=562455071
http://www.primegrid.com/result.php?resultid=561316338
However, with at least one of the two (the first one), the result WAS uiploaded, and I can fix the status so that it will validate.
Good news, a FirePro is able to check a GFN-WR!
What is the round-off error? (it is 0.0016 / 0.0017 on a Tahiti and GeForce GTX 670).
The stderr is missing from those results, but the actual result file also has the round off error amount. For those two tasks they were 0.1501 and 0.1685. The wingmen's roundoff errors were 0.017 and 0.016.
____________
My lucky number is 75898524288+1 |
|
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 834 ID: 301928 Credit: 488,476,972 RAC: 0
                       
|
I ran a couple of GFN-WR on my D700s in my MacPro and as far as I can tell, they finished running on Friday and were uploaded, but when I look online they report as abandoned.
Does anyone know why that would be the case? Is it related to the FirePro round off error?
This problem not related to tasks, this is a general Boinc "feature". For some reason, Boinc client and server become out of sync. Each client request has sequential number in it. When the number sent by client is less then number expected by server, server thinks that you did something wrong - for example, copied whole data directory to another computer - so server restarts computer identification procedure, but quietly puts all active tasks into "Abandoned" state before identification is done. In 99.9% of cases server finds that everything is OK, this is a same computer, but tasks are already abandoned. Most bad part in this show is that client is not notified, it will continue crunching and return results, which will be silently ignored by server.
The out-of-sync will happen if you've restored Boinc data directory from backup, this is acceptable. But there are enough reports on this and other forums from users which didn't did anything unusual with their computers. It must be Boinc bug or design flaw, which appears quite rare, but it exist. From my experience, most suspicious part could be handling of situation when request was processed by server but reply wasn't received by client due to unlucky connection loss - this will lead to off-by-one error in sequence numbers.
|
|
|
|
Thanks for the explanation. Unfortunately, I seem to have run into the bug as I didn't restore the directory or anything like that. |
|
|