Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Number crunching :
Year of the Horse - Pony Express Edition
Author |
Message |
|
Welcome to the Year of the Horse Challenge - Pony Express
Special Note: This is the third ever GFN challenge and probably the last, so for those who are not familiar with GFN we are searching for exceptionally large prime numbers. If a prime is found during the challenge it will be the 10th largest prime number ever found.
A 15 day Challenge celebrating the start of a new year is being offered on PrimeGrid's Generalized Fermat Number (GFN) application. Only SHORT Work Units may be used in this challenge. To participate in the Challenge, please select only the Generalized Fermat Prime Search project and also select Short tasks in your PrimeGrid preferences section. Although the World Record units can also be completed within 15 days, they do not count as they have their own challenge. You may leave the Block Size setting at 0.
If you are in it for the challenge score and you have a GPU capable of running GFN WR units, it is worth considering running those as you will get 50% more challenge credit for the same position.
The challenge will start on 3 January 2014 18:00 UTC and end 15 days later on 18 January 2014 18:00 UTC.
Important: The deadline for these WUs is significantly longer than fifteen days, so make sure your computer returns the WUs before the end of the challenge.
Application Builds
Application builds are available for 64 bit CPUs (with a 64 bit OS) with special versions for AVX and SSE3 capable CPU's. Windows user must select these manually, Linux and Mac users will get them automatically. Clients are ALSO available for double precision Nvidia AND AMD GPUs. The GPU apps are 32 bits and will run on 32 or 64 bit CPUs and OSs. Apps are available for Linux, Windows and MacIntel.
Unfortunately any GPU that lacks double precision hardware can't be used. Please see the list of Compatible GPUs for NVidia and for AMD/ATi more details.
A Cautionary Reminder
ATTENTION: The primality programs for Genefer, both on CPU and GPU, are computationally intensive; so, it is vital to have a stable system with good cooling. They do not tolerate "even the slightest of errors." Please see this post for more details on how you can "stress test" your CPU, and please see this post for tips on running GFN on your GPU successfully.
As with all number crunching, excessive heat can potentially cause permanent hardware failure. Please ensure your cooling system is sufficient.
If you're using an Nvidia GPU and have never run GeneferCUDA before, please read these tips. GeneferCUDA stresses a GPU more than any other program I've seen and is incredibly intolerant of overclocking. It is strongly suggested that you run at least one (Short) Work Unit before the challenge to make sure your computer can handle it. Note that "Overclocking" includes GPUs that come overclocked from the factory, and lowering clock rates to the reference speeds is sometimes necessary. In the case of one particular GPU, the GTX 550 Ti, lowering the memory clock below the reference speed is often necessary.
WU's will take ~10 hours on a GTX 580 GPU and about 2 to 3 days on the fastest CPU core. A GTX 680 will be slower than a GTX 580 for this application. On my computer (GTX 460 & C2D Q6600), which is more of an "average" computer and certainly not the fastest, a Work Unit takes about 20 hours on the GPU and 7 to 10 days on the CPU. For a general idea of how your GPU stacks up, you can have a look at the fastest gpu list
If your CPU is highly overclocked, please consider "stress testing" it. Overclocking your GPU not recommended at all for GeneferCUDA. Sieving is an excellent alternative for computers that are not able to run Genefer. :)
Please, please, please make sure your machines are up to the task.
Time zone converter:
The World Clock - Time Zone Converter
NOTE: The countdown clock on the front page uses the host computer time. Therefore, if your computer time is off, so will the countdown clock. For precise timing, use the UTC Time in the data section to the left of the countdown clock.
Scoring Information
Scores will be kept for individuals and teams. Only work units issued AFTER 3 January 2014 18:00 UTC and received BEFORE 18 January 2014 18:00 UTC will be considered for credit. For scoring, we will use BOINC Credits which are based on the b and n values (b^2^n+1).
Therefore, each completed WU will earn a unique score based on its b value ('n' will be the same for all Work Units). The higher the b, the higher the score. A quorum of 2 is NOT needed to award Challenge score - i.e. no double checker. Therefore, each returned result will earn a Challenge score. Please note that if the result is eventually declared invalid, the score will be removed.
What to expect at Challenge start
The server always gets hit hard at the beginning of the Challenge. However, GFN tasks are quite long so we don't expect a heavy load. Nevertheless, a few hours before the start (3 January 2014 18:00 UTC) "max cache" will be dropped. Additionally, "max to send" will be dropped. Anyone attempting to request more will be met with "max cache" messages.
We'll raise the front page buffer before the start of the Challenge. These settings will be adjusted as necessary. Afterwards, as conditions allow, we'll raise "max cache". We'll continue to increase "max cache" when the server can handle it.
This method allows the greatest opportunity for the most clients to get work. Even if it's just 1 task for the first few minutes, at least the client is crunching. Past Challenges have shown that clients have an excellent chance to reach "max cache" before their first task is complete. We expect the same this time.
Strategies for starting the Challenge
Depending on a variety of factors, different strategies work for different users. Here are just a few to consider:
- large farm, user can be present at start
-Set Computer is connected to the Internet about every to 0 days
-Set Maintain enough work for an additional to 0 days
-Change PrimeGrid preferences to a fast WU project such as PPSE (LLR)
-At Challenge start, update PrimeGrid preferences and only select the GFN (SHORT) project
-At worse, you'll only be 1 PPSE (LLR) WU late on each core starting the Challenge.
-After all machines have work, increase Maintain enough work for an additional buffer
- large farm, user NOT able to be present at start
-Same connection settings as above
-At earliest access to hosts after start, switch to the GFN (SHORT) project
- a few computers, user can be present at start
-Change PrimeGrid preferences and only select the GFN (SHORT) project
-Set computers to "No New Tasks" for PrimeGrid
-At Challenge Start, update computers to "Allow New Tasks"
- a few computers, user NOT able to be present at start
-same settings as "large farm, user NOT able to be present at start"
NOTE: This presumes you have all other BOINC projects suspended.
At the Conclusion of the Challenge
We kindly ask users "moving on" to ABORT their WU's instead of DETACHING, RESETTING, or PAUSING.
ABORTING WU's alows them to be recycled immediately; thus a much faster "clean up" to the end of a challenge. DETACHING, RESETTING, and PAUSING WU's causes them to remain in limbo until they EXPIRE. Therefore, we must wait until WU's expire to send them out to be completed.
Please consider either completing what's in the queue or ABORTING them. Thank you. :)
For those of you who wish to continue to help with GFN sieving, we are still sieving n=22 (World Record Units), as well as the n=20 (short) range and n=21 (next short) range. Please see We need your help with GFN Sieving! for more information.
More information on Generalized Fermat Numbers and the Genefer program
Best of Luck to everyone!
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
General rules for running GFN successfully:
General guidlines:
- While BOINC's time estimates are notoriously unreliable, the Genefer app is very good at predicting how long it will take to run. After a few minutes, it writes a very accurate time prediction to the file "stderr.txt" located in the BOINC "slot" directory where the app is running. You can open this file in your text editor of choice to view this time estimate.
- If you're experiencing screen lag in your web browser, try turning off "Use hardware acceleration". That's what it's called in Chrome; other browsers should have a similar setting. Turning that off can result in drastically better web browser performance if you're crunching on your GPU.
Running on a CPU:
- CPU tasks are ONLY available for the GFN-Short project (the "Pony Express") project.
- On Mac or Linux, if your CPU supports AVX, the server will automatically send you the AVX version of Genefer when you select the CPU checkbox, which is significantly faster than the standard app.
- On Windows, you can manually select the AVX app with the "Force AVX" checkbox (you must also uncheck the "CPU" box), but the server won't automatically send you the AVX app when the CPU checkbox is selected. To run the AVX app you MUST have an AVX enabled CPU (i.e., Sandy Bridge or later) and a version of Windows that supports AVX (Windows 7 SP1 or later, or Windows Server 2008 SP1 or later). If you don't have a suitable CPU or have an older version of Windows the AVX app will crash.
- Although recent AMD CPUs support AVX, their implementation is terrible and Genefer (and LLR) do not see any speed improvement over the standard app.
- These tasks take a long time on a CPU. If you have a hyperthreaded CPU, I STRONGLY recommend turning off hyperthreading or setting BOINC to use only 50% of the CPUs. You will complete more tasks during the challenge.
- Unless you have a really old CPU, you should be running either the AVX (planclass AVXGFN, forceAVXGFN, or AVX10GFN), or SSE3 (planclass SSE3cpuGFN) apps. Only very, very old cpus should be running the plain SSE2 app, aka "genefx64" (planclass cpuGFN). The planclass shows up in parenthesis next to the application name in the BOINC manager display, e.g., "Genefer 2.04 (SSE3cpuGFN)". If the server mistakenly sent you an SSE2 task, feel free to abort it and get a new, faster, SSE3 task (or AVX if your system supports it).
- Expect about 10 days for a single core of a Core2Quad Q6600 to process these tasks. Newer CPUs may be about 50% faster (with hyperthreading OFF), and AVX gives you about a 40% boost as well, so the best times will be about 3 days.
- Due to cache misses and main memory bandwidth, as you run more copies of Genefer on multiple cores you will see a noticeable slowdown. This is another reason not to use hyperthreading.
- Three excellent reasons NOT to use hyperthreading: (These also apply to LLR challenges.)
- Overall throughput diminishes as the number of copies of Genefer that are running simultaneously increases. Running half as many copies of Genefer (without hyperthreading) will likely mean that they will run MORE than twice as fast as running with hyperthreading.
- You're likely to complete more tasks within the boundaries of the challenge if you're running tasks that take half as long to run, even though you're running half as many as once. (Think about running 4 tasks at once that take 1 day each to run vs. 8 tasks at once when they take 2 days to run and the challenge is 3 days long.)
- Should somebody be lucky enough to find a monster prime during this challenge, it's likely that the person with hyperthreading will be the double checker and the person without hyperthreading will be the prime finder.
- Don't try running this on an Atom or similar netbook type CPU.
Running on an Nvidia GPU:
- Your GPU MUST support double precision floating point. Most modern Nvidia GPUs will work. See this thread for a list of compatible GPUs.
- If the second digit in your GPU model number (e.g., the "6" in "GTX 460") is 3 or lower, you have a very slow GPU and it might not be able to finish the WR tasks in time. You should run only the short tasks for the "Pony Express" challenge.
- If you have a GTX Titan, you definitely want to run it in DP mode.
- Do NOT overclock the GPU!!! This includes factory overclocked GPUs -- you should lower the clock speed to the reference speeds. Nvidia GPUs, regardless of model, are incredibly prone to error if overclocked when running Genefer. For a few models, they may need to be underclocked to get stable operation. Increasing cooling (e.g., by increasing fan speed) helps. Lowering the memory clock is most effective in achieving stable operation. This is doubly true of the "top" card in multiple GPU installations.
- If you haven't run GFN before, I strongly recommend running at least one task before the challenge to make sure your GPU can handle it.
- There's two different apps you can use, a CUDA app and an OpenCL app. Use whichever one runs fastest on your computer. In general, the OpenCL app runs faster on Kepler (GTX 6xx) and later GPUs and CUDA runs faster on older GPUs, but there are exceptions. If you don't know which is faster you might want to try them both before the challenge.
- You don't have to run a full task all the way to the end to determine which app is faster. Genefer prints a message in the stderr.txt file that tells you how long it will take. Start a task with one of the two apps, wait a few minutes (how long varies depending on the speed of your GPU), and look in stderr.txt for a message like this:
Estimated total run time for 282686^1048576+1 is 46:30:02
You can then abort the task and repeat the procedure with the other app. The fille will be found in the BOINC slot directory where the app is being run. For example, on Windows this is usually C:\ProgramData\BOINC/slots/#/ where "#" is a small number.
- Running more than one copy of GFN on your GPU is strongly discouraged. It's likely there will be little improvement in throughput but you will make the tasks run about twice a long, which means you're likely to finish fewer tasks during the challenge.
Running on an ATI/AMD GPU:
- Your GPU MUST support double precision floating point. Most modern ATI/AMD GPUs will work. See this thread for a list of compatible GPUs. Also please read this thread for a discussion of using ATI/AMD GPUs.
- At the present time, the server won't send tasks to the new 290x (Hawaii) GPU. Using app_info is required to get tasks for this GPU. I do not expect this problem to be resolved before the challenge. (This applies to PPS-sieve as well as GFN-WR and GFN-Short tasks.)
- Running more than one copy of GFN on your GPU is strongly discouraged. It's likely there will be little improvement in throughput but you will make the tasks run about twice a long, which means you're likely to finish fewer tasks during the challenge.
____________
My lucky number is 75898524288+1
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2165 ID: 1178 Credit: 8,777,295,508 RAC: 0
                                     
|
From my own experience with running GFN on various GPUs on PRPnet (N=19--i.e., the size just below the GFN short work for the challenge), I think the following points will be helpful for those preparing for the challenges.
1) If you have a Kepler GPU, you should run the OpenCL app. Across multiple different models, the performance is always better, and usually substantially. For example, at N=19 a GT 640 will run the CUDA app in about 17 hours, but with the OpenCL app the times drop to 11 to 12 hours. (***note that not all 6xx cards are Kepler***; GT 610, GT 620, most GT 630, and GT 645 cards are actually Fermi renumbered cards).
2) The OpenCL advantage on Kepler is such that Kepler cards typically perform with OpenCL as well as the next higher numbered card in the earlier series. For example, my GTX 650 Ti card has similar performance with the OpenCL application as does my GTX 460 with the CUDA app.
3) To reiterate Mike's point, the GTX 460 (or equivalent in other series) is pretty much the bottom end for running the GFN WR units comfortably. I have had success with overclocked GTS 450 or GTX 550 Ti cards working on these, but if you choose to do so, it is at your own risk as the overclocking will increase the error rate and you will still push the deadlines.
4) On the "short" GFN units, the absolute bottom end is probably the various 96 shader Fermi cards (e.g., the GT 440). There are some 48 shader cards with double-precision, but I do not recommend these for the challenge.
5) If you get errors, consider down clocking the card even if it is running at stock clocks. My recommendation, contrary to what one does typically with over/under-clocking, is that you down clock the memory first. This is a known issue on cards such as the GTX 550 Ti (it will error out often with stock memory clocks), but likely will affect others as well.
6) Keep it cool! Heat is a killer on these units. If you do not have adequate cooling (or cannot create temporary challenge cooling solutions), you probably should not run these work units.
____________
141941*2^4299438-1 is prime!
| |
|
|
Do you have any data for block sizes or should we just use the default?
____________
@AggieThePew
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Do you have any data for block sizes or should we just use the default?
Excellent question!
On the CUDA app, the block size affects two things, speed and screen lag. A higher number generally means faster speeds, but more screen lag.
The app automatically chooses the FASTEST block size, so if you want it to run as fast as possible, just leave it at 0 and let it do all the work.
If you find the screen lag to be unacceptable, then you should use the block size setting to lower the block size. You can look in stderr.txt to see what GeneferCUDA choose to use, and then manually set the block size to be one lower.
On my system, I find that 7 gives a good compromise between speed and lag. (On my computer, 7 is also the fastest setting for the short tasks. 8 is the fastest setting for the WR tasks.)
It IS possible -- and safe -- to change the block size setting in the middle of a task. Use this procedure:
1) Change the block size on the PrimeGrid server. This tells the server what the setting should be.
2) Hit UPDATE on your BOINC manager -- now the BOINC client on your computer gets the block size setting from our server.
3) Finally, in the BOINC manager, SUSPEND and RESUME the Genefer task. Genefer will now pick up the new block size setting from the BOINC client. (Please don't hit ABORT instead of SUSPEND -- you'll be unhappy!)
____________
My lucky number is 75898524288+1 | |
|
|
Thanks Mike. Some very good info. The Opencl app works the same as the cuda app in that 0 means fastest?
Oh and I will remember to avoid Abort! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
The Opencl app works the same as the cuda app in that 0 means fastest?
No. The block size setting is only for CUDA.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I've turned on the statistics about a day early. Just in case I get knocked off line by the big storm hitting the east coast of North America, all the important stuff for the challenge is already running.
____________
My lucky number is 75898524288+1 | |
|
|
Now I'm really confused! I set my perferences to run the Pony Express challenge, fine checked twice to make sure I'd got it right. So why have 4 units each saying they are genefer 2.4 and each of an estimated 1712 hrs plus run time been dowloaded to my machine?? All with a return date of 22/01/2014. It doesn't compute. For a 15 day challenge a max of 360 hrs each would be pushing it and even and with the extra 96 hour to 22/01 this won't see them much over 25% complete! What has gone wrong??? | |
|
|
Nothing has gone wrong. BOINC is just notoriously bad estimating times. Genefer itself is much better at estimating the time it needs. You can check this in the stderr.txt file you can find in the relevant subdirectories of c:\ProgramData\BOINC\slots
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
IMPORTANT:
IF you have a Windows AVX-capable machine:
I just discovered that a BOINC feature isn't the working the way it is supposed to work (or at least not the way I thought it's supposed to work.)
The instructions on the preferences page USED to say that, in order to get the GeneferAVX app, you must check the ForceAVX box and it didn't matter if you also checked the CPU box. It doesn't seem to actually work that way.
To get the GeneferAVX app, you must check the "Force AVX" box and NOT check the "CPU" box. If you check the "CPU" box you will get the SSE3 app instead of the AVX app.
The instructions on the preferences page have been modified accordingly. Sorry for the inconvenience.
If you've downloaded the SSE3 app and want the AVX app, feel free to correct the preferences and then abort the SSE3 apps. The difference in speed is well worth wasting a little time aborting the SSE3 task.
____________
My lucky number is 75898524288+1 | |
|
|
It appears if u tick both, you get both! - I did, and, as luck would have it, my new (AVX) i7-2600 got all sse3 tasks, and my old non-AVX i7-920 got (and trashed) about 16 AVX tasks before I could stop it and reset its preferences! - but we're all working on the right flavours now!
____________
| |
|
|
I had a problem recently, and again this morning, that sounds similar to what Michael Goetz and SteveRC described. BIG difference: I run Linux. I have two boxes, a SB 2600K and a IB 3700K. They are running the same version of ubuntu (13.04), and the same version of boinc (7.0.65).
When I was testing things out a few days before the start I noticed these issues. I don't think I had ever run GFN-CPU on these machines. Ever.
- When I first downloaded work, I got the SSE3 app downloaded to both machines, and work for each. stderr.txt confirmed that's what it was.
- I set "no new tasks" and aborted them. Spent several minutes poring over the prefs pages to find my error. Found none. Checked BOINC manager preferences... all looked OK.
- Puzzled, tried again without changing anything. All I did was "Allow new tasks". The AVX GFN app downloaded along with work units for each box. Estimated runtimes were appropriately (errr... "awesomely") faster for each. They ran for a little while and it seemed like all was well, so I aborted and moved back to my prior subprojects.
This morning, immediately after the challenge start, within the first couple of minutes, this is what happened:
- The 3770K immediately downloaded a load of AVX work, started running them, and they are still running fine now 14 hours later.
- The 2600K downloaded SSE3 work (it already had both the SSE3 and AVX apps in place). I noticed this after a couple minutes and aborted them like I did the previous time, assuming that the correct behavior would recur. It did not. Instead, a second, assuredly unrelated problem started. The boinc manager went into a seemingly endless loop of "scheduler request in progress", "not requesting new work", "scheduler request completed" messages, every 10 seconds. I tried suspending/resuming both the project and boinc itself, to no avail. I again checked my preferences web page, and on boinc manager; nothing looked wrong. I tried temporarily changing preferences; still no CPU work (GPU was running fine from the start on WR). Suspend and exit boinc, then restart it. Still no work... oddly, from the event log, it looked like the underlying boinc client (different from the gui manager) might not have quit... the event log didn't show a break. Okay, nuclear option: suspend and exit boinc, reboot the whole dang box, and resume. This worked. AVX work was downloaded and has been running since. No more crazy-fast update requests.
Sorry, I know that was long-winded, but I'm trying to be detailed in case it is helpful.
Query for TPTB: Would checking "force avx" be helpful for a Linux box? Potentially harmful? I'm fine now, but a bit worried about when it comes time to grab the next load.
--Gary | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Query for TPTB: Would checking "force avx" be helpful for a Linux box? Potentially harmful? I'm fine now, but a bit worried about when it comes time to grab the next load.
--Gary
"ForceAVX" only applies to Windows. It has no efffect at all on Linux hosts.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-04 13:16:11)
10254 tasks have been sent out.
Of those tasks that have been sent out:
5615 (55%) came back with some kind of an error. (2350 (23%) CPU / 3265 (32%) GPU)
131 (1%) have returned a successful result. (0 (0%) CPU / 131 (1%) GPU)
4508 (44%) are still in progress. (3122 (30%) CPU / 1386 (14%) GPU)
Of the tasks that have been returned successfully:
122 (1%) are pending validation. (0 (0%) CPU / 122 (1%) GPU)
9 (0%) have been successfully validated. (0 (0%) CPU / 9 (0%) GPU)
0 (0%) were invalid. (0 (0%) CPU / 0 (0%) GPU)
0 (0%) are inconclusive. (0 (0%) CPU / 0 (0%) GPU)
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=309270. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 3.86% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
ALSO IMPORTANT:
It seems that the same problem affecting AVX selection is also affecting the automatic distribution of SSE3 tasks.
If you have a CPU that doesn't support AVX, unless it's VERY old, it should support SSE3. However, at least in some situations, the server is erroneously sending out SSE2 tasks when it could be sending the faster SSE3 tasks.
If you're running tasks on the CPU, look at the plan class for the tasks (that's the name in parenthesis.) If it's "(cpuGFN)", that's the slower SSE2 task. It should be "(SSE3cpuGFN)". If you have an SSE2 task and it's been running for a while, it probably pays to let it finish, but if it just started it probably is faster to abort it and get a new task. If you just downloaded a task and it's SSE2, you should abort it and get a new task until you get the faster SSE3 tasks.
Sorry for the inconvenience.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I just released a new version of the Windows CUDA app with even better error handling. It will be sent to you automatically when you download the next WR or short tasks.
The newest app has a BOINC version of 2.12 (the internal version is 3.1.2-9).
I do NOT recommend aborting GFN tasks that are already running, but if you have downloaded any 2.11 tasks that haven't started running yet, you should abort them and download newer 2.12 tasks since they're more resilient in certain circumstances than 2.11.
If you're running 2.11 right now, you can replicate the additional error handling by checking in the BOINC slot directory about once a day for a file called "retry.txt" and deleting that file (and ONLY that file!!!).
This only applies to the Windows CUDA application.
____________
My lucky number is 75898524288+1 | |
|
|
4 Jan 2014 | 19:27:28 UTC 5 Jan 2014 | 4:04:49 UTC Error while computing 30,619.42 26,964.00 --- Genefer v2.10 (OCLcudaGFN)
That`s the error number 3 (after 2 World record WU for GPU), if i have another error on the GPU i`m going to leave the challenge!
____________
| |
|
|
4 Jan 2014 | 19:27:28 UTC 5 Jan 2014 | 4:04:49 UTC Error while computing 30,619.42 26,964.00 --- Genefer v2.10 (OCLcudaGFN)
That`s the error number 3 (after 2 World record WU for GPU), if i have another error on the GPU i`m going to leave the challenge! I have no problems using OpenCL apps for either GFN-short or GFN-World Record.
What temperature does your GTX TITAN run when processing Genefer tasks? Is it set to run in double-precision mode? | |
|
|
4 Jan 2014 | 19:27:28 UTC 5 Jan 2014 | 4:04:49 UTC Error while computing 30,619.42 26,964.00 --- Genefer v2.10 (OCLcudaGFN)
That`s the error number 3 (after 2 World record WU for GPU), if i have another error on the GPU i`m going to leave the challenge! I have no problems using OpenCL apps for either GFN-short or GFN-World Record.
What temperature does your GTX TITAN run when processing Genefer tasks? Is it set to run in double-precision mode?
78-80° C, the same temperature reached with the PPS Sieve GPU task...
____________
| |
|
|
4 Jan 2014 | 19:27:28 UTC 5 Jan 2014 | 4:04:49 UTC Error while computing 30,619.42 26,964.00 --- Genefer v2.10 (OCLcudaGFN)
That`s the error number 3 (after 2 World record WU for GPU), if i have another error on the GPU i`m going to leave the challenge! I have no problems using OpenCL apps for either GFN-short or GFN-World Record.
What temperature does your GTX TITAN run when processing Genefer tasks? Is it set to run in double-precision mode?
78-80° C, the same temperature reached with the PPS Sieve GPU task...
Yes, it's true. But Genefer uses GPU on some other manner as PPS Sieve (cuda) does.
Genefer more intensively uses Video Memory Controller:
You can check when you'll run PPS Sieve that the level for Memory Controller Load will be close to 0%. In contrast to Genefer.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-05 12:01:56 UTC)
18261 tasks have been sent out. [CPU/GPU/anonymous_platform: 8056 (44%) / 10202 (56%) / 3 (0%)]
Of those tasks that have been sent out:
12739 (70%) came back with some kind of an error. [4614 (25%) / 8122 (44%) / 3 (0%)]
510 (3%) have returned a successful result. [0 (0%) / 510 (3%) / 0 (0%)]
5012 (27%) are still in progress. [3442 (19%) / 1570 (9%) / 0 (0%)]
Of the tasks that have been returned successfully:
425 (83%) are pending validation. [0 (0%) / 425 (83%) / 0 (0%)]
80 (16%) have been successfully validated. [0 (0%) / 80 (16%) / 0 (0%)]
0 (0%) were invalid. (0 [0%) / 0 (0%) / 0 (0%)]
4 (1%) are inconclusive. [0 (0%) / 4 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=311960. The leading edge was at b=303170 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 2.90% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
|
78-80° C, the same temperature reached with the PPS Sieve GPU task...
My GTX TITAN runs at 64° C when running GFN-WR tasks and possibly slightly lower when running GFN Short tasks. I have changed the fan curve profile to be at 85% by 70° C. This card resides in a case with a lot of fans with good air flow. I have not manually overclocked this card, except to raise the Power Target. The following image shows settings and status of clocks for this particular card. You neglected to indicate if your TITAN is set using NVIDIA Control Panel to use double precision.
EDIT: This card is the EVGA GTX TITAN Superclocked Edition, therefore may run at higher default clocks than the 'normal' TITAN, even in double precision mode.
Your mileage may vary.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
EDIT: This card is the EVGA GTX TITAN Superclocked Edition, therefore may run at higher default clocks than the 'normal' TITAN, even in double precision mode.
That is exactly what I'm talking about. While we don't have a lot of data on either the newer Nvidia GPUs nor the OpenCL app, we absolutely, positively had lots of failures with factory overclocked GPUs such as your "superclocked" GPU with older generation GPUs and the CUDA app.
It's unclear whether that experience applies to your setup, but this was a problem that existed across several Nvidia GPU generations, let alone individual models. So you might see the same bahavior -- or not. It's really hard to say.
My recommendation would be, if you see the "MaxErr exceeded" error, to lower your clocks from their "superclocked" values down to the reference speed for a Titan. That might help.
____________
My lucky number is 75898524288+1 | |
|
|
That is exactly what I'm talking about. While we don't have a lot of data on either the newer Nvidia GPUs nor the OpenCL app, we absolutely, positively had lots of failures with factory overclocked GPUs such as your "superclocked" GPU with older generation GPUs and the CUDA app.
It's unclear whether that experience applies to your setup, but this was a problem that existed across several Nvidia GPU generations, let alone individual models. So you might see the same bahavior -- or not. It's really hard to say.
My recommendation would be, if you see the "MaxErr exceeded" error, to lower your clocks from their "superclocked" values down to the reference speed for a Titan. That might help.
I hope I have not confused things. Just to be clear, I was responding to the post by Gattorantolo [Lombardia] to assure him that the OpenCL apps DO work fine using a GTX TITAN.
My setup using the TITAN Superclocked Edition has successfully completed several GFN-WR and GFN-short OpenCL tasks before the challenge began at the default clock speeds. It's entirely possible that other cards, including other TITANs, may need to be downclocked to successfully complete a World Record or short GFN task. | |
|
|
4 Jan 2014 | 19:27:28 UTC 5 Jan 2014 | 4:04:49 UTC Error while computing 30,619.42 26,964.00 --- Genefer v2.10 (OCLcudaGFN)
That`s the error number 3 (after 2 World record WU for GPU), if i have another error on the GPU i`m going to leave the challenge!
Error again, this time on CUDA...i leave the challenge with my GPU!
____________
| |
|
|
So after an extraordinary amount of messing about, and a good 10 minutes worried that I'd broken one of my GPUs (turned out it wasn't actually in the computer at the time), I managed to get Genefer running on my Radeon HD 8730M and Radeon HD 7770, in addition to my two Radeon HD 7970s. It did start running on my Radeon HD 6550D, but I disabled it on that device as the 6550D wouldn't contribute particularly much and is better serviced actually rendering the display. I'm not even sure it would be able to complete the tasks.
That said, Michael, I'm fairly sure I asked you about the server requiring CAL for OpenCL tasks about 6 months ago, and I discovered a forum post you made from about a fortnight ago saying you knew how to fix it; is it possible to do that? I managed to work around it by manually downloading the genefer.exe and buggering around in app_info, but that's really not optimal.
Anyway, hopefully now I can stay in the top 15 now! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-06 17:27:53 UTC)
25929 tasks have been sent out. [CPU/GPU/anonymous_platform: 10138 (39%) / 15780 (61%) / 11 (0%)]
Of those tasks that have been sent out:
19241 (74%) came back with some kind of an error. [6287 (24%) / 12949 (50%) / 5 (0%)]
1149 (4%) have returned a successful result. [43 (0%) / 1106 (4%) / 0 (0%)]
5539 (21%) are still in progress. [3808 (15%) / 1725 (7%) / 6 (0%)]
Of the tasks that have been returned successfully:
878 (76%) are pending validation. [37 (3%) / 841 (73%) / 0 (0%)]
241 (21%) have been successfully validated. [4 (0%) / 237 (21%) / 0 (0%)]
1 (0%) were invalid. (0 [0%) / 1 (0%) / 0 (0%)]
13 (1%) are inconclusive. [1 (0%) / 12 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=314908. The leading edge was at b=310346 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 1.47% as much as it had prior to the challenge!
NOTE: The first CPU results have come back!
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
That said, Michael, I'm fairly sure I asked you about the server requiring CAL for OpenCL tasks about 6 months ago, and I discovered a forum post you made from about a fortnight ago saying you knew how to fix it; is it possible to do that? I managed to work around it by manually downloading the genefer.exe and buggering around in app_info, but that's really not optimal.
Ask me again after the challenge ends. The interaction between the server and the BOINC clients with regards to GPUs is, at best, complex, and chances are I'll break something horribly when I attempt to change this. I'm definitely not going to try it during the challenge.
____________
My lucky number is 75898524288+1 | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
19241 (74%) came back with some kind of an error. [6287 (24%) / 12949 (50%) / 5 (0%)]
That's a lot of errors! Can you separate out aborts from actual computation errors?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
19241 (74%) came back with some kind of an error. [6287 (24%) / 12949 (50%) / 5 (0%)]
That's a lot of errors! Can you separate out aborts from actual computation errors?
They're not what you would normally think of as "computation errors", such as the errors caused by overclocking.
Most of these are configuration errors. Get a task -- it can't run and dies -- get reported back to the server 2 seconds later. Rinse and repeat. It makes for a LOT of errors, even though it might be only a small number of hosts causing all the errors. This phenomena occurs with CPU programs too. All these errors don't have much of an effect on the server, or the overall progress of the project, but it does produce some impressive (albeit unnerving) statistics.
Yes, people DO run their hosts and have thousands and thousands of errors and never do anything about it. Somehow, they don't notice, or don't care. Maybe the computer's primary use is as a space heater. Actually, this hurts that too, as the computer spends more time communicating with the server and very little time actually crunching, so it's not producing nearly as much heat as it should be.
____________
My lucky number is 75898524288+1 | |
|
|
The SSE3 vs. AVX selection problem happens on reloads too. My first load of AVX challenge units finished a few hours ago and are now pending validation. My next load of 3 (it is a 2600K and I am leaving 1 core open with HT off) downloaded as 2 AVX units and 1 SSE3. Unfortunately I did not notice this for 4 hours, but I aborted at that point and the replacement work came as AVX.
This is all on a Linux box, so the "Force AVX" preference is irrelevant.
The SSE3 app is in place, and I'm considering leaving it there but removing execute permission, so that if it tries to run again, it will immediately crash. Presumably that will cause new work to download, which might (correctly) be AVX. Any problems with that strategy? I certainly do not want to inadvertently cause a denial-of-service attack on PrimeGrid! Removing execute permission worked like a charm back in the day when we had the problem of getting PPS Sieve CPU jobs when only GPU was selected.
Best of luck to everyone,
--Gary | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
The SSE3 app is in place, and I'm considering leaving it there but removing execute permission, so that if it tries to run again, it will immediately crash. Presumably that will cause new work to download, which might (correctly) be AVX. Any problems with that strategy? I certainly do not want to inadvertently cause a denial-of-service attack on PrimeGrid! Removing execute permission worked like a charm back in the day when we had the problem of getting PPS Sieve CPU jobs when only GPU was selected.
Best of luck to everyone,
--Gary
That might -- or might not -- be the cause of an occaisional problem we have with hosts continuously downloading large files, which causes problems with bandwidth.
If that's what it does, I'd prefer that you didn't do this. You might also find your IP blocked on the server, because that's how we deal with those hosts. On the other hand, if it just causes the task to fail and a new task is fetched, that's perfectly alright.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-07 11:45:54 UTC)
32102 tasks have been sent out. [CPU/GPU/anonymous_platform: 11421 (36%) / 20670 (64%) / 11 (0%)]
Of those tasks that have been sent out:
24326 (76%) came back with some kind of an error. [7274 (23%) / 17047 (53%) / 5 (0%)]
1579 (5%) have returned a successful result. [112 (0%) / 1467 (5%) / 0 (0%)]
6197 (19%) are still in progress. [4035 (13%) / 2156 (7%) / 6 (0%)]
Of the tasks that have been returned successfully:
1135 (72%) are pending validation. [82 (5%) / 1053 (67%) / 0 (0%)]
406 (26%) have been successfully validated. [29 (2%) / 377 (24%) / 0 (0%)]
1 (0%) were invalid. (0 [0%) / 1 (0%) / 0 (0%)]
15 (1%) are inconclusive. [0 (0%) / 15 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=317802. The leading edge was at b=313798 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 1.28% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
|
(As of 2014-01-04 13:16:11)
The leading edge was at b=297788 at the beginning of the challenge.
(As of 2014-01-07 11:45:54 UTC)
The leading edge was at b=313798 at the beginning of the challenge.
Mike I love the updates on WUs but wanted to point out you seem to be moving the starting line.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
(As of 2014-01-04 13:16:11)
The leading edge was at b=297788 at the beginning of the challenge.
(As of 2014-01-07 11:45:54 UTC)
The leading edge was at b=313798 at the beginning of the challenge.
Mike I love the updates on WUs but wanted to point out you seem to be moving the starting line.
Interesting. That's computed dynamically. Perhaps a little bit too dynamically!
I'll look into it, thanks.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Fixed:
Challenge: Pony Express
(As of 2014-01-07 14:19:12 UTC)
33462 tasks have been sent out. [CPU/GPU/anonymous_platform: 11528 (34%) / 21923 (66%) / 11 (0%)]
Of those tasks that have been sent out:
25579 (76%) came back with some kind of an error. [7324 (22%) / 18250 (55%) / 5 (0%)]
1667 (5%) have returned a successful result. [124 (0%) / 1543 (5%) / 0 (0%)]
6216 (19%) are still in progress. [4080 (12%) / 2130 (6%) / 6 (0%)]
Of the tasks that have been returned successfully:
1177 (71%) are pending validation. [89 (5%) / 1088 (65%) / 0 (0%)]
448 (27%) have been successfully validated. [34 (2%) / 414 (25%) / 0 (0%)]
1 (0%) were invalid. (0 [0%) / 1 (0%) / 0 (0%)]
14 (1%) are inconclusive. [0 (0%) / 14 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=318048. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 6.80% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
Geez these are a long haul for my CPU and GPU. I mean a really long haul...
Fun either way
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Geez these are a long haul for my CPU and GPU. I mean a really long haul...
Fun either way
These are REALLY big numbers. :)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
To put the statistics in perspective, we normally receive about 200 to 250 completed results each day, so we're running at about twice that rate so far. HOWEVER, this early in the challenge we're not going to see very many CPU tasks coming back because it's too soon. Essentially, therefore, on GPUs alone we're doing double what we normally do on both GPUs and CPUs -- AND most people with fast GPUs are probably (hopefully) crunching the WR tasks.
____________
My lucky number is 75898524288+1 | |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
I have a GTX260 and GTX650 in two different boxes crunching Genefer short work units and they take 46 and 66 hours respectively for each work unit. The GTX650 does not get that hot wide open but the GTX260 would catch fire without the fan cranked up to 85%. I have had a little trouble getting the GTX260 to keep going wide open due to driver timeouts. I tried messing with the TDR registry settings but that was a dead end. The CPU's are in the hundreds of hours per work unit. I actually think my one set of CPU work units might finish before the dead line, but the other set is in the 660+ hour range per work unit and they probably will not make it.
My GPU times pale in comparison to the higher end nVidia and AMD cards. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I have a GTX260 and GTX650 in two different boxes crunching Genefer short work units and they take 46 and 66 hours respectively for each work unit. The GTX650 does not get that hot wide open but the GTX260 would catch fire without the fan cranked up to 85%. I have had a little trouble getting the GTX260 to keep going wide open due to driver timeouts. I tried messing with the TDR registry settings but that was a dead end. The CPU's are in the hundreds of hours per work unit. I actually think my one set of CPU work units might finish before the dead line, but the other set is in the 660+ hour range per work unit and they probably will not make it.
My GPU times pale in comparison to the higher end nVidia and AMD cards.
Just to be clear, when you're looking at the task times, you're looking at the estimates in stderr.txt and not BOINC's estimates, right? I wouldn't trust BOINC's estimates at all.
The 650 might benefit from using the OpenCL app rather than the CUDA app. The difference in speed might be substantial. (You didn't say which you're running.)
As for temperature, The GTX 2xx GPUs are actually designed to operate safely at slightly over 100 degrees, and the default fan control is designed to not really push a lot of air until the temps are in the 80s. A lot of people prefer more noise and lower temps and raise the fan speed. That being said, if you're seeing temps in the 90s on a 260 then that's too high unless the ambient temperature is very warm. That GPU is old and might have a lot of dust built up inside. If it's in the 80s I wouldn't worry too much.
For the CPU tasks, when you said 'hundreds", just how many hundreds do you mean? Older (Core2 or Phenom II) non-hyperthreaded CPUs should do those in 200 to 300 hours. The challenge is 360 hours and the deadline is 456 hours.
Older hyperthreaded CPUs will take longer, but newer CPUs, especially if they have AVX and have hyperthreading disabled, can run them a lot faster than that.
____________
My lucky number is 75898524288+1 | |
|
|
GPU ( WR = 76 hr. - answer from BopinC / :-)
Reality is 155 hr. in BOX
With Gra.Ca GTX 660 ti ( keppler technology ) and " OpenCL "
2 WR Tasks in this Challenge are:
310 hours = 12 Days and 21 hours, 16 mins
I drive the WR with 2 Core ( SOB TASKS ) that is more cooling power for the GPU.
Why SOB ? SOB requires little core power, perhaps 60 to 70 % per core, this is my experience, the GPU run faster
and the Cooler run good,faster and not noisy
650 ti is not the same.
Its not to late to do that
Anyone who has a similar graphics card, can still create a WR (100% bonus), the remaining time / short tasks
feel free, if you have the same Grafic Card, you have 155 hr. guaranteed in this Challenge for one WR ( from today, hhe )
____________
| |
|
|
GTX 780, after 82% of work the GPU load is going from 96% to 0%, the GPU is not more working…why?
If i stop and resume the task the GPU load is going for 1 minute to 96% and then again to 0%…what i have to do?
____________
| |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
What is SOB? I am not familiar with a lot of the Primegrid options for processing.
I am using a bottom of the line EVGA GTX650...it runs okay but does not really crunch that fast using cudaGFN. I will try the openCl ride.
The WU times on an Intel I7-3770 are 330ish hours. On an I7-920 they are 660ish hours. That is with hyperthreading on. I test a little before the challenge and did not see that much difference in doing one thread or two per core. Over time there probably is a difference. I test for only two hours.
The GTX260 is clean inside.
It would probably run at in the 70's wide open with just the built-in cooling curve. Heat is not really the problem. My catching fire comment was a joke but they do get rather warm. The card draws 50a at 12v with full load. In this older box it is hard to keep the nVidia driver from timing out and recovering. It ran all the last challenge ppsSieve using cuda and the driver never timed out. I have a second card but there is something wrong with it. They are both ASUS TOP cards and almost impossible to get inside of for a look see or to replace heat sink material.
If there is a faster way of doing the work units I would be interested in trying almost anything.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
GTX 780, after 82% of work the GPU load is going from 96% to 0%, the GPU is not more working…why?
If i stop and resume the task the GPU load is going for 1 minute to 96% and then again to 0%…what i have to do?
When the program encounters an error, it shuts down for 10 minutes. This allows the GPU to cool off (sometimes the problem is heat related), but sometimes the problem is caused by another program, and hopefully during that 10 minutes, the problem will go away by itself.
If this happens 6 times, the program will give up and the task will abort.
If you're running version 2.11 of the app, and you see a file called "reply.txt" in the slot directory, you should delete that file. (If you don't know what version is running, that's ok. It's ok to delete that file if you see it.) DO NOT DELETE ANY OTHER FILE because doing so will cause the task to fail. (Deleting the retry.txt file sets the retry counter back to 0, which lets the app try to continue more times.)
You should look in the stderr.txt file in the appropriate slot directory to see the specific error message. Post that here and perhaps we can tell you what needs to be done to correct the problem.
If you see "maxErr exceeded", that's the overclocking/overheating problem (or the GPU is faulty.)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
What is SOB? I am not familiar with a lot of the Primegrid options for processing.
SoB == Seventeen or Bust, also known as The Sierpinski Problem. It's a CPU-only LLR application with very long WU's (longer than the GFN-Short tasks that we're running in this challenge.)
The WU times on an Intel I7-3770 are 330ish hours. On an I7-920 they are 660ish hours. That is with hyperthreading on. I test a little before the challenge and did not see that much difference in doing one thread or two per core. Over time there probably is a difference. I test for only two hours.
I may have missed it, but I don't think you answered my question about where you're getting those time estimates. If that's the numbers you see displayed in the BOINC manager, those numbers are likely wrong.
Also, on the i7-3770, which application is running? cpuGFN, SSE3cpuGFN, AVXGFN, or forceAVXGFN? You should be running either AVXGFN or forceAVXGFN on that machine because it is MUCH faster.
Likewise, on the i7-920, the tasks should be SSE3cpuGFN rather cpuGFN; SSE3 is faster. (That CPU doesn't have the AVX instructions.)
____________
My lucky number is 75898524288+1 | |
|
|
Sob takes no 100% per core and made the GPU power faster !!! I don`t know Why, but its true !!!
A Magic Experience
Either the cooling makes the GPU faster, or the clock rate goes to the Grafic Card. I don`t know, thats 5 % faster as without SOB
Whats up ?
I`ve take 8 SOB in BoinC Manager
I`ve wrote the Time from the runing WR on a paper
now start 2 Core with SOB
The first 10 % from WR is runing alone
the next 10 % run with 2 SOB TASKS
whats happend ?
The GPU Time is faster, also I stop the Sob, wrote the time on my Paper,
and the WR is runing alone and slower again
I have wait the next 10 % and start again the Sob
GPU is faster,
what an Magic is this ?
I think, all other Tasks go on 100% in the Cores, only SOB not.
But long Time for the tasks
____________
| |
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1077 ID: 183129 Credit: 1,280,170,555 RAC: 0
                     
|
Just looked in the stderr for my gfn short forceavx task... Aparently there were quite a few maxerr messages.. Here is the stderr
EDIT: Here's the task, it just returned. http://www.primegrid.com/result.php?resultid=514588275
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Starting initialization...
Initialization complete (2.724 seconds).
Testing 295440^1048576+1...
Estimated total run time for 295440^1048576+1 is 58:16:00
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (18317311 iterations left)
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (18317311 iterations left)
Estimated total run time for 295440^1048576+1 is 59:36:02
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (16183295 iterations left)
Estimated total run time for 295440^1048576+1 is 60:48:07
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (14610431 iterations left)
Estimated total run time for 295440^1048576+1 is 64:27:15
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
BOINC client requested that we should suspend.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (9341760 iterations left)
Estimated total run time for 295440^1048576+1 is 59:56:40
BOINC client requested that we should suspend.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
The checkpoint version doesn't match current test. Current test will be restarted
Starting initialization...
Initialization complete (4.181 seconds).
Testing 295440^1048576+1...
Estimated total run time for 295440^1048576+1 is 53:40:39
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
BOINC client requested that we should suspend.
Terminating because BOINC client requested that we should quit.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (18522538 iterations left)
Terminating because BOINC client requested that we should quit.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (18521830 iterations left)
Estimated total run time for 295440^1048576+1 is 59:28:05
maxErr exceeded for 295440^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (18448383 iterations left)
Estimated total run time for 295440^1048576+1 is 59:10:37
Terminating because BOINC client requested that we should quit.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (18336602 iterations left)
Estimated total run time for 295440^1048576+1 is 59:51:35
Terminating because BOINC client requested that we should quit.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (7758518 iterations left)
Estimated total run time for 295440^1048576+1 is 57:23:17
Terminating because BOINC client requested that we should quit.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (4312590 iterations left)
Estimated total run time for 295440^1048576+1 is 57:28:21
maxErr exceeded for 295440^1048576+1, 0.4839 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
Resuming 295440^1048576+1 from a checkpoint (1765375 iterations left)
Estimated total run time for 295440^1048576+1 is 58:02:58
maxErr exceeded for 295440^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
Resuming 295440^1048576+1 from a checkpoint (1765375 iterations left)
Estimated total run time for 295440^1048576+1 is 57:13:26
Terminating because BOINC client requested that we should quit.
geneferavx 3.1.2-0 (Windows 64-bit AVX)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__forceAVXGFN.exe -boinc -q 295440^1048576+1
Resuming 295440^1048576+1 from a checkpoint (1313218 iterations left)
Estimated total run time for 295440^1048576+1 is 59:37:37
maxErr exceeded for 295440^1048576+1, 0.4922 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
Resuming 295440^1048576+1 from a checkpoint (1313218 iterations left)
Estimated total run time for 295440^1048576+1 is 56:43:35
maxErr exceeded for 295440^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
Resuming 295440^1048576+1 from a checkpoint (1313218 iterations left)
Estimated total run time for 295440^1048576+1 is 56:48:40
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
| |
|
|
I noticed that there's another new version of Genefer (info link http://www.primegrid.com/forum_thread.php?id=5389&nowrap=true#71972). Will this version resolve the "The checkpoint version doesn't match current test. Current test will be restarted" issue as mentioned in the above post, and will it be implemented soon (during this challenge)? I had a loss of power earlier today and this happened to my GTX 570 after a reboot on a WR task that was ~75% done and ended up as all wasted time. My GTX 580 on the same system restarted fine and is now nearing completion though.
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I noticed that there's another new version of Genefer (info link http://www.primegrid.com/forum_thread.php?id=5389&nowrap=true#71972). Will this version resolve the "The checkpoint version doesn't match current test. Current test will be restarted" issue as mentioned in the above post, and will it be implemented soon (during this challenge)? I had a loss of power earlier today and this happened to my GTX 570 after a reboot on a WR task that was ~75% done and ended up as all wasted time. My GTX 580 on the same system restarted fine and is now nearing completion though.
3.2.0 will definitely NOT be going into production before the challenge ends. I may put in a change to the preferences screen, but we're not releasing a major upgrade like 3.2.0 in the middle of a challenge.
As for the checkpoint problem, that error occurs for one of two reasons. Either you really did try to restart genefer with a checkpoint file from another version, or, as was probably the case here, the checkpoint file got corrupted by the power loss, which probably happened because the operating system didn't have a chance to flush its disk buffers. (Or I could be completely wrong about the cause; that's pure speculation on my part.)
But, no, there's no fix for that kind of problem on the drawing board at this time.
____________
My lucky number is 75898524288+1 | |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
I just accidently closed a long post window. I will repost tomorrow with more information about the system. If you look at my computer COSMOS_S shown under my name Lee in the challenge you will see the CPU task I have tried to run on the 920. More tomorrow.
Here are the two types of errors I have gotten recently (the last three or four hours) with the GTX260 card.
This one may times.
genefercuda 3.1.2-8 (Windows 32-bit CUDA)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_8_2.11_windows_intelx86__cudaGFN.exe -boinc -q 309340^1048576+1 --device 0
Priority change succeeded.
GPU=GeForce GTX 260
Global memory=939524096 Shared memory/block=16384 Registers/block=16384 Warp size=32
Max threads/block=512
Max thread dim=512 512 64
Max grid=65535 65535 1
CC=1.3
Clock=1400 MHz
# of MP=27
Using project preferences to override SHIFT; using 7 instead of 7
Resuming 309340^1048576+1 from a checkpoint (13610512 iterations left)
Estimated total run time for 309340^1048576+1 is 35:11:41
cuda_subs.cu(329) : cufftSafeCall() CUFFT error: 6.
cuda_subs.cu(271) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(272) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(273) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(274) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(275) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(276) : cudaSafeCall() Runtime API error : unknown error.
An error (2006) occured.
Waiting 10 minutes before attempting to continue from last checkpoint...
This one 9 times
genefercuda 3.1.2-8 (Windows 32-bit CUDA)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_8_2.11_windows_intelx86__cudaGFN.exe -boinc -q 309340^1048576+1 --device 0
Priority change succeeded.
GPU=GeForce GTX 260
Global memory=939524096 Shared memory/block=16384 Registers/block=16384 Warp size=32
Max threads/block=512
Max thread dim=512 512 64
Max grid=65535 65535 1
CC=1.3
Clock=1400 MHz
# of MP=27
Using project preferences to override SHIFT; using 7 instead of 7
Resuming 309340^1048576+1 from a checkpoint (15311736 iterations left)
Estimated total run time for 309340^1048576+1 is 35:45:09
maxErr exceeded for 309340^1048576+1, 0.4727 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
I am pretty sure the default extra performance level (4) clocks are overclocked levels for this GTX260 as it is a TOP version. Tomorrow I am going to down clock that level in a card bios copy and upload it to the card to see if I can get the errors under control. Controlling the heat with higher fan speeds has not helped much.
Still a lot of fun though...;-) | |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
I had two tasks go from a good bit of time remaining to no time in one jump.
Here is the message listing from the maxerr file of one task where it went to zero time remaining.
genefx64 3.1.2-0 (Windows 64-bit SSE2)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__cpuGFN.exe -boinc -q 303416^1048576+1
Resuming 303416^1048576+1 from a checkpoint (16051796 iterations left)
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
Estimated total run time for 303416^1048576+1 is 266:41:45
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
genefx64 3.1.2-0 (Windows 64-bit SSE2)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_0_2.04_windows_x86_64__cpuGFN.exe -boinc -q 303416^1048576+1
Resuming 303416^1048576+1 from a checkpoint (16049943 iterations left)
BOINC client requested that we should suspend.
BOINC client requested that we should resume.
Estimated total run time for 303416^1048576+1 is 0:00:00
In the boinc_task_state.xml file it is showing <fraction_done>1.000000</fraction_done>
It still appears to be running but the remaining time listing in BOINC is ---.
Seems to be doomed or is this a normal situation?
| |
|
|
The SSE3 app is in place, and I'm considering leaving it there but removing execute permission, so that if it tries to run again, it will immediately crash. Presumably that will cause new work to download, which might (correctly) be AVX. Any problems with that strategy? I certainly do not want to inadvertently cause a denial-of-service attack on PrimeGrid! Removing execute permission worked like a charm back in the day when we had the problem of getting PPS Sieve CPU jobs when only GPU was selected.
Best of luck to everyone,
--Gary
That might -- or might not -- be the cause of an occaisional problem we have with hosts continuously downloading large files, which causes problems with bandwidth.
If that's what it does, I'd prefer that you didn't do this. You might also find your IP blocked on the server, because that's how we deal with those hosts. On the other hand, if it just causes the task to fail and a new task is fetched, that's perfectly alright.
For now, I have taken no action. My next CPU load should be arriving in just a very few hours and I'll just keep an eye on it.
--Gary
| |
|
|
GTX 780, after 82% of work the GPU load is going from 96% to 0%, the GPU is not more working…why?
If i stop and resume the task the GPU load is going for 1 minute to 96% and then again to 0%…what i have to do?
When the program encounters an error, it shuts down for 10 minutes. This allows the GPU to cool off (sometimes the problem is heat related), but sometimes the problem is caused by another program, and hopefully during that 10 minutes, the problem will go away by itself.
If this happens 6 times, the program will give up and the task will abort.
I changed from OpenCL to Cuda, i hope this will go better...
____________
| |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
I went to OpenCL for the GTX650. Based on early calculations it appears the work unit will finish 33% faster... | |
|
|
3 Jan 2014 | 18:19:35 UTC 5 Jan 2014 | 23:27:40 UTC Completed, can't validate, trying again 168,012.69 167,963.30 0.00 Genefer v2.04 (forceAVXGFN)
…trying again…what i have to try again?????
____________
| |
|
|
3 Jan 2014 | 18:19:35 UTC 5 Jan 2014 | 23:27:40 UTC Completed, can't validate, trying again 168,012.69 167,963.30 0.00 Genefer v2.04 (forceAVXGFN)
…trying again…what i have to try again?????
I think that means that the validation is trying again. Might be wrong though, I've admittedly never seen that status before! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I just accidently closed a long post window.
I do that all the time. Don't you just hate that?
cuda_subs.cu(329) : cufftSafeCall() CUFFT error: 6.
cuda_subs.cu(271) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(272) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(273) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(274) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(275) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(276) : cudaSafeCall() Runtime API error : unknown error.
An error (2006) occured.
Waiting 10 minutes before attempting to continue from last checkpoint...
The CUDA driver died. Not much that can be done about that except to stop and try again -- which is what the Genefer program does. (To allow the program more errors before it gives up, you should delete the "retry.txt" file from the slot directory -- and ONLY the retry.txt file. The next task you get will have the 2.12 app which does that automatically.)
There's not much you can do about that error, however, I've noticed I haven't seen that error once since smoke started coming out of my computer, forcing me to shut it down and eventually reboot. (The smoke was due to a failed power supply fan and had nothing to do with crunching.) Not sure if that's a coincidence or if rebooting makes these CUDA driver problems less likely. I did have two of those during the challenge prior to my forced reboot.
This one 9 times
maxErr exceeded for 309340^1048576+1, 0.4727 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
That's the typical overclocking/overheating problem. You said your card came overclocked from the factory. That doesn't matter -- it's still overclocked, and calculation errors are very likely with overclocked GPUs. If you're very lucky it will make it through to the end of the calculation, and if you're even luckier it will produce the correct result and pass validation. I strongly recommend reducing the clocks (especially the memory clock) to the reference specifications.
I am pretty sure the default extra performance level (4) clocks are overclocked levels for this GTX260 as it is a TOP version. Tomorrow I am going to down clock that level in a card bios copy and upload it to the card to see if I can get the errors under control. Controlling the heat with higher fan speeds has not helped much.
Both are good ideas.
Still a lot of fun though...;-)
Me too!
____________
My lucky number is 75898524288+1 | |
|
|
For now, I have taken no action. My next CPU load should be arriving in just a very few hours and I'll just keep an eye on it.
--Gary
My second work load just arrived on my 3770K (ubuntu 13.04, boinc 7.0.65). HT is off; 4 cores crunching. 3 WUs came as AVX, one as SSE3. I aborted the SSE3 unit before it started and a replacement came as AVX.
--Gary | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
I had two tasks go from a good bit of time remaining to no time in one jump.
That doesn't sound good. It's also not a problem we're very familiar with.
Here is the message listing from the maxerr file of one task where it went to zero time remaining.
genefx64 3.1.2-0 (Windows 64-bit SSE2)
Resuming 303416^1048576+1 from a checkpoint (16051796 iterations left)
Estimated total run time for 303416^1048576+1 is 266:41:45
genefx64 3.1.2-0 (Windows 64-bit SSE2)
Resuming 303416^1048576+1 from a checkpoint (16049943 iterations left)
Estimated total run time for 303416^1048576+1 is 0:00:00
That time estimate of zero is something I've never seen before. I don't know what to make of that.
In the boinc_task_state.xml file it is showing <fraction_done>1.000000</fraction_done>
It still appears to be running but the remaining time listing in BOINC is ---.
Seems to be doomed or is this a normal situation?
Probably doomed, but before you kill it, three questions:
1) Before it got stuck like that, how long was it running? The initial estimate was 266 hours. That estimate is probably pretty good. If you started the task during the challenge, obviously it hasn't been running for anywhere close to 266 hours.
2) Is the CPU actually still crunching that task? You should be able to tell by looking at a CPU monitor such as Task Manager. Or is that CPU core idle?
3) In the BOINC project directory (probably C:\ProgramData\BOINC\projects\www.primegrid.com\) is there a file called genefer_1048576_366926_4_0 (that exact name). If there is, could you send me a PM with the contents of that file? Don't post it on the forum.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
The SSE3 app is in place, and I'm considering leaving it there but removing execute permission, so that if it tries to run again, it will immediately crash. Presumably that will cause new work to download, which might (correctly) be AVX. Any problems with that strategy? I certainly do not want to inadvertently cause a denial-of-service attack on PrimeGrid! Removing execute permission worked like a charm back in the day when we had the problem of getting PPS Sieve CPU jobs when only GPU was selected.
Best of luck to everyone,
--Gary
That might -- or might not -- be the cause of an occaisional problem we have with hosts continuously downloading large files, which causes problems with bandwidth.
If that's what it does, I'd prefer that you didn't do this. You might also find your IP blocked on the server, because that's how we deal with those hosts. On the other hand, if it just causes the task to fail and a new task is fetched, that's perfectly alright.
For now, I have taken no action. My next CPU load should be arriving in just a very few hours and I'll just keep an eye on it.
--Gary
I'm close to deploying to the server a revision to the project preferences webpages that would switch to full manual control of the app selection rather than having the server automatically send you the fastest app. The automatic selection seems to be completely broken. (I don't know why it's broken, either -- it worked when it was set up last year, or at least it seemed to work.)
For now, I'd recommend just aborting any tasks which aren't using the correct app. Once I install the web pages, you may need to change your preferences, depending on your system.
____________
My lucky number is 75898524288+1 | |
|
|
Hello,
One of my linux boxes, http://www.primegrid.com/results.php?hostid=414618 swapped from avxgfn to sse3cpugfn. It is currently running 2 of each. I don't know why it changed. Anyone here who can check or tell me what to check?
Much appreciated,
Yankton
[edit]Aborted one of the ss3 units, less than an hour in it and it pulled one for avx. The other sse3 is 13 hours in, would it finish a unit faster if I left it or aborted it and tried to get another on avx?[/edit] | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Hello,
One of my linux boxes, http://www.primegrid.com/results.php?hostid=414618 swapped from avxgfn to sse3cpugfn. It is currently running 2 of each. I don't know why it changed. Anyone here who can check or tell me what to check?
Much appreciated,
Yankton
[edit]Aborted one of the ss3 units, less than an hour in it and it pulled one for avx. The other sse3 is 13 hours in, would it finish a unit faster if I left it or aborted it and tried to get another on avx?[/edit]
It's a server bug; it sent you the wrong tasks. If they just started, please abort them and download new tasks. Repeat as needed until you get the right ones.
I'm planning on changing the website to let you manually select SSE2, SSE3, or AVX tasks since the server isn't doing it correctly. When I do so, linux boxes will get only SSE3 tasks until you change your preferences.
Watch this thread for an announcement, hopefully sometime today.
I apologize for the inconvenience.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-08 14:45:41 UTC)
41696 tasks have been sent out. [CPU/GPU/anonymous_platform: 13207 (32%) / 28478 (68%) / 11 (0%)]
Of those tasks that have been sent out:
32658 (78%) came back with some kind of an error. [8472 (20%) / 24176 (58%) / 10 (0%)]
2304 (6%) have returned a successful result. [264 (1%) / 2040 (5%) / 0 (0%)]
6734 (16%) are still in progress. [4471 (11%) / 2262 (5%) / 1 (0%)]
Of the tasks that have been returned successfully:
1519 (66%) are pending validation. [172 (7%) / 1347 (58%) / 0 (0%)]
692 (30%) have been successfully validated. [84 (4%) / 608 (26%) / 0 (0%)]
1 (0%) were invalid. (0 [0%) / 1 (0%) / 0 (0%)]
30 (1%) are inconclusive. [3 (0%) / 27 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=321320. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 7.90% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
IMPORTANT:
New Project Preferences for CPU app selection!
Those of you who have been running CPU apps for this challenge probably have noticed that the server hasn't been sending out the correct version of the app (SSE3/SSE4/AVX). It's supposed to choose the correct version, but it hasn't been. (It worked fine in testing when installed last year, of course.)
I have replaced the automatic method with a manual selection method. For Genefer-Short, you can now explicitly choose either the normal CPU-SSE3 app, the much faster AVX app (if your CPU and operating system support it), or the slower SSE2 app if you're running on a very old 64 bit Athon II.
Here's what you need to know:
WHICH APP SHOULD I USE?
AVX: This is the fastest, but if your CPU or operating system doesn't support AVX the app will crash. To use this app, you must have:
An appropriate CPU, which is any Intel CPU starting with the Sandy Bridge CPUs. (Sandy Bridge, Ivy Bridge, or Haswell). Although recent AMD CPUs support AVX, their implimentation is very slow and there's probably no benefit to running the AVX app.
An appropriate operating system: 64 bit Linux, OSX, Windows 7 SP1 or later, and Windows Server 2008 SP1 or later.
SSE3: This will run on almost any 64 bit CPU, so if you can't run the AVX app, run the SSE3 app. Only the very earliest Athon II CPUs lack SSE3 support. All Athlon II's since Revision E (Venice and San Diego), and all other 64 bit CPUs, support SSE3 and can run this version of the app.
SSE2: If you have a 64 bit Athon II that is older than Rev e (see above), you most run this version of the app. This is equivalent to Genefx64.
WHAT YOU NEED TO DO:
WINDOWS:
If you had the FORCE AVX box checked before, the AVX box will be checked now, and you'll be receiving the AVX app. No action is necessary.
If you had the CPU box checked before, the SSE3 box will be checked now, and you will be receiving the SSE3 app. Unless you have one of those ancient Athon II CPUs, no action is necessary on your part.
IF YOU DO have an old Athlon II, you must uncheck the SSE3 box and check the SSE2 box.
MAC:
If you have an AVX capable CPU, you must uncheck the SSE3 box and check the AVX box to get the AVX app.
If you do not have an AVX capable CPU, no action is necessary on your part.
LINUX:
If you have an AVX capable CPU, you must uncheck the SSE3 box and check the AVX box to get the AVX app.
If you don't have an AVX capable CPU and had the CPU box checked before, the SSE3 box will be checked now, and you will be receiving the SSE3 app. Unless you have one of those ancient Athon II CPUs, no action is necessary on your part.
IF YOU DO have an old Athlon II, you must uncheck the SSE3 box and check the SSE2 box.
Sorry for all the inconvenience and confusion, but after these adjustments are made you should reliably receive the correct CPU apps.
____________
My lucky number is 75898524288+1 | |
|
|
???
Users with more than one PC will have problems now. "Force AVX" will send AVX-WU's to my SSE3-PCs that will fail.
Only the first 5 minutes of the challenge the server was "confused". After 5 minutes all my Windows/Linux clients with AVX .....got AVX-WU's, and all my SSE3 clients got SSE3-WU's. And the second bunch for the fast (AVX) clients was also correct .....without forcing anything.
Now i have to control every time a client is requesting work ("baby sitting" with 8 babys)?!?
____________
DeleteNull | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
???
Users with more than one PC will have problems now. "Force AVX" will send AVX-WU's to my SSE3-PCs that will fail.
Only the first 5 minutes of the challenge the server was "confused". After 5 minutes all my Windows/Linux clients with AVX .....got AVX-WU's, and all my SSE3 clients got SSE3-WU's. And the second bunch for the fast (AVX) clients was also correct .....without forcing anything.
Now i have to control every time a client is requesting work ("baby sitting" with 8 babys)?!?
The server was sending out the tasks randomly. Every time you got the correct task -- that was luck. To get the correct tasks required baby sitting each computer on every download. You have 8 -- what if you had thousands?
This situation isn't ideal, but at least now people can select the correct tasks -- and actually get them -- even if they have to put their computers into two different venues.
For this challenge, this is the best that can be done. Looking down the road a bit, the Genefer 3.2.0 app selects the correct transform internally, so there's only one app instead of 3. The problem will go away permanently once we start using 3.2.0. Until then, this is the best we can do.
____________
My lucky number is 75898524288+1 | |
|
|
Yep, there's four different computer settings (Home, School, Work, and Default), and there's only three different CPU tasks (AVX, SSE3, and SSE2), so it's going to be a very very small segment of crunchers who are actually impacted by that, I don't think many people use those settings for things that can't be temporarily achieved by a config .xml. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Yep, there's four different computer settings (Home, School, Work, and Default), and there's only three different CPU tasks (AVX, SSE3, and SSE2), so it's going to be a very very small segment of crunchers who are actually impacted by that, I don't think many people use those settings for things that can't be temporarily achieved by a config .xml.
It's really only 2 since very few hosts have a 64 bit CPU which lacks SSE3 instructions.
____________
My lucky number is 75898524288+1 | |
|
|
ALL MY LINUX HOSTS STOP RECEIVING AVX EVEN WHEN SELECTED.
No task available for selected app ....................
| |
|
|
Yep, there's four different computer settings (Home, School, Work, and Default), and there's only three different CPU tasks (AVX, SSE3, and SSE2), so it's going to be a very very small segment of crunchers who are actually impacted by that, I don't think many people use those settings for things that can't be temporarily achieved by a config .xml.
Actually there are at least 10 different possible competitive profiles:
1) AVX - NVIDIA CUDA
2) AVX - NVIDIA OpenCL
3) SSE3 - NVIDIA CUDA
4) SSE3 - NVIDIA OpenCL
5) SSE2 - NVIDIA CUDA
6) SSE2 - NVIDIA OpenCL
7) non-Genefer subproject - NVIDIA CUDA
8) non-Genefer subproject - NVIDIA OpenCL
9) no CPU - NVIDIA CUDA
10) no CPU - NVIDIA OpenCL
But I agree there is a very very small segment of crunchers who needs more than 4 profiles.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
ALL MY LINUX HOSTS STOP RECEIVING AVX EVEN WHEN SELECTED.
No task available for selected app ....................
Oh?
http://www.primegrid.com/results.php?hostid=265441
That one is, and certainly other linux hosts are, so the question is why are your hosts not getting anything now. Certainly they should be.
Near the top of your BOINC log you should find a line like this:
1/6/2014 11:22:34 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 syscall nx lm vmx tm2 pbe
Could you post that from one of the hosts that's not getting AVX tasks now? Thanks.
____________
My lucky number is 75898524288+1 | |
|
|
8.1.2014 22:34:25 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
This host:
AVX only selected, SSE3 task aborted.
http://www.primegrid.com/results.php?hostid=265442 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Try now. It should be fixed.
____________
My lucky number is 75898524288+1 | |
|
|
Try now. It should be fixed.
Yes it works. Thanks | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Try now. It should be fixed.
Yes it works. Thanks
You're welcome, and my apologies for that bug. That one was my fault. :(
____________
My lucky number is 75898524288+1 | |
|
|
No problem. Long time remaining. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Yep, there's four different computer settings (Home, School, Work, and Default), and there's only three different CPU tasks (AVX, SSE3, and SSE2), so it's going to be a very very small segment of crunchers who are actually impacted by that, I don't think many people use those settings for things that can't be temporarily achieved by a config .xml.
Actually there are at least 10 different possible competitive profiles:
1) AVX - NVIDIA CUDA
2) AVX - NVIDIA OpenCL
3) SSE3 - NVIDIA CUDA
4) SSE3 - NVIDIA OpenCL
5) SSE2 - NVIDIA CUDA
6) SSE2 - NVIDIA OpenCL
7) non-Genefer subproject - NVIDIA CUDA
8) non-Genefer subproject - NVIDIA OpenCL
9) no CPU - NVIDIA CUDA
10) no CPU - NVIDIA OpenCL
But I agree there is a very very small segment of crunchers who needs more than 4 profiles.
And what if you have a GTX 580 and a GTX TITAN in the same computer? You'll want to run GeneferCUDA on the 580 and GeneferOCL on the TITAN. Yes, you can select both the CUDA and OpenCL apps at the same time (the website will warn you that you selected more than one app for Nvidia, but it will let you do it), but I'm not sure if there's a way to tell the BOINC client which GPU gets which app. I know you can specify which sub-project can run on a GPU, but I don't think you can specify it at the plan_class level.
It's not perfect.
____________
My lucky number is 75898524288+1 | |
|
|
Yep, there's four different computer settings (Home, School, Work, and Default), and there's only three different CPU tasks (AVX, SSE3, and SSE2), so it's going to be a very very small segment of crunchers who are actually impacted by that, I don't think many people use those settings for things that can't be temporarily achieved by a config .xml.
Actually there are at least 10 different possible competitive profiles:
1) AVX - NVIDIA CUDA
2) AVX - NVIDIA OpenCL
3) SSE3 - NVIDIA CUDA
4) SSE3 - NVIDIA OpenCL
5) SSE2 - NVIDIA CUDA
6) SSE2 - NVIDIA OpenCL
7) non-Genefer subproject - NVIDIA CUDA
8) non-Genefer subproject - NVIDIA OpenCL
9) no CPU - NVIDIA CUDA
10) no CPU - NVIDIA OpenCL
But I agree there is a very very small segment of crunchers who needs more than 4 profiles.
And what if you have a GTX 580 and a GTX TITAN in the same computer? You'll want to run GeneferCUDA on the 580 and GeneferOCL on the TITAN. Yes, you can select both the CUDA and OpenCL apps at the same time (the website will warn you that you selected more than one app for Nvidia, but it will let you do it), but I'm not sure if there's a way to tell the BOINC client which GPU gets which app. I know you can specify which sub-project can run on a GPU, but I don't think you can specify it at the plan_class level.
It's not perfect.
In this case I can exclude GTX 580 for GeneferOCL and GTX TITAN for GeneferCUDA in cc_config:
<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>NVIDIA|ATI|intel_gpu</type>]
[<app>appname</app>]
</exclude_gpu>
____________
| |
|
|
Unfortunately the problem you figured remains.
Because excluding app for particular device only restricts from execution but doesn't garantee you'll be so lucky on getting particular plan_class task. BOINC will request tasks for NVIDIA until every GPU device will get its plan_class task.
Unfortunately sometimes happens that BOINC requested dozen inapplicable redundant tasks until it gets necessary plan_class task.
The sadness.
____________
| |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
Wow you had your hands full with work unit selection issues....bag of worms!@
Update on my GTX260 card funkiness in the COSMOS_S box. I down clocked the extra performance level (4) in the card bios 20% which is below default specs for all clocks in that level. I also moved up the minimum fan duty to 50% from 40%. I increased the VDDC voltage for that level from 1.125v to 1.150v.
I ended the existing work units as they were error off anyway. I dumped BONC and reloaded. I fixed some small errors with SFC and repaired the image health on that box with DISM. I put in the latest nVidia drivers 332.31 that came out todayish.
I started with new work units and still got a couple of maxerr exceeded errors and several of the cuda error (cufft error=6).
So...I took the card out and moved it to the other PCI-E slot that is full speed. It has run now for awhile without any issues so I am keeping my fingers crossed. Thanks for your input and advice on where to look for resolutions.
Much fun! | |
|
|
I'm very pleased to see over 300 people on the scoring board already! :) That's quite the dedication for such a tough challenge, especially with the Stallion edition running consecutively. It also tells me that updating the scoring positions to 300 was really needed :)
Keep on crunching!
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-09 12:26:57 UTC)
47100 tasks have been sent out. [CPU/GPU/anonymous_platform: 15137 (32%) / 31952 (68%) / 11 (0%)]
Of those tasks that have been sent out:
37074 (79%) came back with some kind of an error. [9960 (21%) / 27104 (58%) / 10 (0%)]
3045 (6%) have returned a successful result. [506 (1%) / 2539 (5%) / 0 (0%)]
6981 (15%) are still in progress. [4671 (10%) / 2309 (5%) / 1 (0%)]
Of the tasks that have been returned successfully:
1870 (61%) are pending validation. [305 (10%) / 1565 (51%) / 0 (0%)]
1069 (35%) have been successfully validated. [190 (6%) / 879 (29%) / 0 (0%)]
4 (0%) were invalid. (0 [0%) / 4 (0%) / 0 (0%)]
32 (1%) are inconclusive. [5 (0%) / 27 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=324156. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 8.85% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
So this is the cuda driver dying. Is there any hint from this message what the fault might be? Do you think running this on a 32 bit version of the OS might help? I do not have any information CUFFT error:6 is about.
genefercuda 3.1.2-9 (Windows 32-bit CUDA)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_1_2_9_2.12_windows_intelx86__cudaGFN.exe -boinc -q 319590^1048576+1 --device 0
Priority change succeeded.
GPU=GeForce GTX 260
Global memory=939524096 Shared memory/block=16384 Registers/block=16384 Warp size=32
Max threads/block=512
Max thread dim=512 512 64
Max grid=65535 65535 1
CC=1.3
Clock=980 MHz
# of MP=27
No project preference set; using AUTO-SHIFT=8
Resuming 319590^1048576+1 from a checkpoint (13893631 iterations left)
Estimated total run time for 319590^1048576+1 is 42:46:27
cuda_subs.cu(329) : cufftSafeCall() CUFFT error: 6.
cuda_subs.cu(271) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(272) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(273) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(274) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(275) : cudaSafeCall() Runtime API error : unknown error.
cuda_subs.cu(276) : cudaSafeCall() Runtime API error : unknown error.
An error (2006) occured.
Waiting 10 minutes before attempting to continue from last checkpoint...
All this does is stop the program for 10 minutes and then it restarts. The fast Fourier transform dll's appear to be from the nVidia toolkit. I remember helping a guy work on an array processor way back in the day but that was another time and place.
Superior fun!
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
So this is the cuda driver dying.
Yes.
Is there any hint from this message what the fault might be?
No.
Do you think running this on a 32 bit version of the OS might help?
I doubt it.
I do not have any information CUFFT error:6 is about.
Error 6 is "CUDA call failed."
All this does is stop the program for 10 minutes and then it restarts.
The 10 minute delay is an attempt to allow two different potential causes of errors to correct themselves:
1) If the GPU is overclocked or overheating, letting it cool off a bit may help.
2) If an external program is interfering with the GPU, it may shut off and allow the GPU to run again.
The restart is the second best way to reset the video driver. (The best way is to reboot the computer, but that doesn't seem to be necessary!)
____________
My lucky number is 75898524288+1 | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 240 ID: 119185 Credit: 2,577,314,587 RAC: 0
                   
|
When your resulted is validated, doesn't it show the number you tested, and whether it's prime or not like LLR tests? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
When your resulted is validated, doesn't it show the number you tested, and whether it's prime or not like LLR tests?
No, it does not.
If it's prime, you'll know. A prime found on either of the GFN projects will be a significant event.
____________
My lucky number is 75898524288+1 | |
|
|
When your resulted is validated, doesn't it show the number you tested, and whether it's prime or not like LLR tests?
Once reported, to see the number you tested, you can look in the task's output; follow the link under the "Task" (leftmost) column on your results page. While a task is still running, you can instead look in your {boinc-data}/slots/{N}/stderr.txt file.
But, the final "prime" vs. "not prime" verdict is not directly revealed, as MG implied. Expect "not prime", but hope for "prime" :-)
--Gary | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
If you are experiencing lag in your web browser, try turning off the "Use hardware acceleration" setting. (That's what it's called in chrome. There should be a similar setting in other browsers.)
I was having bad lag in Chrome, but not with anything else. Turning that setting off completely eliminated the lag.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-10 13:26:28 UTC)
51994 tasks have been sent out. [CPU/GPU/anonymous_platform: 17096 (33%) / 34883 (67%) / 15 (0%)]
Of those tasks that have been sent out:
40782 (78%) came back with some kind of an error. [11489 (22%) / 29281 (56%) / 12 (0%)]
3885 (7%) have returned a successful result. [735 (1%) / 3150 (6%) / 0 (0%)]
7327 (14%) are still in progress. [4872 (9%) / 2452 (5%) / 3 (0%)]
Of the tasks that have been returned successfully:
2224 (57%) are pending validation. [416 (11%) / 1808 (47%) / 0 (0%)]
1518 (39%) have been successfully validated. [303 (8%) / 1215 (31%) / 0 (0%)]
6 (0%) were invalid. (0 [0%) / 6 (0%) / 0 (0%)]
53 (1%) are inconclusive. [10 (0%) / 43 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=327248. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 9.89% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
|
should I finish this wus?
____________
wbr, Me. Dead J. Dona
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
should I finish this wus?
It looks like they're going to take about 24.5 days to finish (that's from dividing the time used so far by the percentage done). The deadline is 19 days.
You'll probably get credit for them if you complete them even though you'll miss the deadline by a little bit. (We don't guarantee you'll get credit, but a it's almost a certainty.) Whether you want to continue running them is up to you.
____________
My lucky number is 75898524288+1 | |
|
LeeSend message
Joined: 20 Mar 13 Posts: 41 ID: 206688 Credit: 18,268,601 RAC: 0
            
|
Okay thanks Michael!
I have reached the point with this particular card where I will have to live with it when doing Genefer cude work units. The work unit stops for 10 minutes every hour or two but I can live with that I suppose. It runs the rest of the types of work units no issue especially the ppsSieve
Now if I just had $700-800 dollars extra I could get one of those nVidia 780 Ti boards....;-) | |
|
|
If you are experiencing lag in your web browser, try turning off the "Use hardware acceleration" setting. (That's what it's called in chrome. There should be a similar setting in other browsers.)
I was having bad lag in Chrome, but not with anything else. Turning that setting off completely eliminated the lag.
This was excellent advise. It helped me a lot. Thanks, Michael! | |
|
|
Hi Michael,
Thank you for these daily updates !
Kind Regards,
Philippe | |
|
|
Me Again,
have the following question : what should I expect from this WU ?
Or what should I do ?
514909319 414411 4 Jan 2014 | 13:02:09 UTC 9 Jan 2014 | 20:22:09 UTC Completed, can't validate, trying again 229,693.91 225,514.00 0.00 Genefer v2.04 (forceAVXGFN)
Thank You
Kind Regards
Philippe | |
|
|
have the following question : what should I expect from this WU ?
These units are set to a maximum of 20 errors and it hit that. Then they are automatically set to 60 errors, and the WU continues. Your unit will still be compared to the second returned unit. Nothing to worry about. They do not show up in Pending, but it is still like a Pending unit.
Crunch on without fear.
____________
My lucky numbers are 121*2^4553899-1 and 3756801695685*2^666669±1 | |
|
|
Thank You !
Philippe | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-11 13:28:37 UTC)
56973 tasks have been sent out. [CPU/GPU/anonymous_platform: 18689 (33%) / 38269 (67%) / 15 (0%)]
Of those tasks that have been sent out:
44788 (79%) came back with some kind of an error. [12803 (22%) / 31971 (56%) / 14 (0%)]
4735 (8%) have returned a successful result. [960 (2%) / 3775 (7%) / 0 (0%)]
7450 (13%) are still in progress. [4926 (9%) / 2523 (4%) / 1 (0%)]
Of the tasks that have been returned successfully:
2583 (55%) are pending validation. [516 (11%) / 2067 (44%) / 0 (0%)]
1990 (42%) have been successfully validated. [423 (9%) / 1567 (33%) / 0 (0%)]
11 (0%) were invalid. (0 [0%) / 11 (0%) / 0 (0%)]
65 (1%) are inconclusive. [12 (0%) / 53 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=329562. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 10.67% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
|
Challenge: Pony Express
(As of 2014-01-11 13:28:37 UTC)
56973 tasks have been sent out. [CPU/GPU/anonymous_platform: 18689 (33%) / 38269 (67%) / 15 (0%)]
Of those tasks that have been sent out:
44788 (79%) came back with some kind of an error. [12803 (22%) / 31971 (56%) / 14 (0%)]
…that`s extreme high…why this?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-11 13:28:37 UTC)
56973 tasks have been sent out. [CPU/GPU/anonymous_platform: 18689 (33%) / 38269 (67%) / 15 (0%)]
Of those tasks that have been sent out:
44788 (79%) came back with some kind of an error. [12803 (22%) / 31971 (56%) / 14 (0%)]
…that`s extreme high…why this?
That exact question was answered in this very thread. Read the earlier messages.
____________
My lucky number is 75898524288+1 | |
|
tng Send message
Joined: 29 Aug 10 Posts: 398 ID: 66603 Credit: 22,925,088,044 RAC: 1
                                    
|
Challenge: Pony Express
(As of 2014-01-11 13:28:37 UTC)
56973 tasks have been sent out. [CPU/GPU/anonymous_platform: 18689 (33%) / 38269 (67%) / 15 (0%)]
Of those tasks that have been sent out:
44788 (79%) came back with some kind of an error. [12803 (22%) / 31971 (56%) / 14 (0%)]
…that`s extreme high…why this?
Because of hosts that error out almost immediately, and return enormous numbers of errors, like this one.
This seems to be a configuration issue -- the message says that no OpenCL device was found.
This can also due to an overclocked or overheated GPU. Run stock clocks (not even factory overclocking), downclock if needed, make sure your cooling is good, and monitor your systems when running Genefer.
____________
| |
|
|
WR not counted
This WR with Download two days after Challenge Start is not counted in the Participants | Teams Stats.
But why ?
ID
515119910 375538943 Genefer (World Record) 579,303.26 ( not counted in Challenge )
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
WR not counted
This WR with Download two days after Challenge Start is not counted in the
Did you happen to check the time on the Challenge statistics page? The statistics are generated every 15 minutes, so if you had just returned that task, you would have to wait until the statistics get generated in order to see the result.
As I'm writing this, I do see that you have one task listed for you on the challenge statistics page. Is that the one you're looking for? Or should there be a second?
____________
My lucky number is 75898524288+1 | |
|
|
No, the first Task from the first Day in this Challenge is a Short Task
U can see it in the List of Participants - I`m on #414
This WR is from this evening, I`ve wait 5 times the Updates per 15 mins.
but not counted.
its a " Pending" Task, but this is not the Problem ?= or what do you mean?
Look in my account please
____________
| |
|
|
WR units count towards the Stallion challenge, not the Pony Express one. You are in position 90 over there with your one unit.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
| |
|
|
Damn WR was for the garbage?
Your link is for challenges from last year,
I have given the task back yesterday, not last year.
5 days before the challenge I've read here, that include both tasks,
Now I have checked, and now counted only short tasks.
This is hard for me.
____________
| |
|
|
Don't get cross. There are two challenges running at the same time: short GFN and WR GFN. All your WU's count. The short ones for the Pony Express challenge, the WR ones for the Stallion challenge.
You will get challenge points for both challenges. | |
|
|
Thanks Dirk,
all OK
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Pony Express
(As of 2014-01-12 13:26:30 UTC)
60938 tasks have been sent out. [CPU/GPU/anonymous_platform: 19671 (32%) / 41251 (68%) / 16 (0%)]
Of those tasks that have been sent out:
47530 (78%) came back with some kind of an error. [13461 (22%) / 34054 (56%) / 15 (0%)]
5621 (9%) have returned a successful result. [1183 (2%) / 4438 (7%) / 0 (0%)]
7787 (13%) are still in progress. [5027 (8%) / 2759 (5%) / 1 (0%)]
Of the tasks that have been returned successfully:
2890 (51%) are pending validation. [588 (10%) / 2302 (41%) / 0 (0%)]
2537 (45%) have been successfully validated. [570 (10%) / 1967 (35%) / 0 (0%)]
13 (0%) were invalid. (0 [0%) / 13 (0%) / 0 (0%)]
99 (2%) are inconclusive. [16 (0%) / 83 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=332716. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 11.73% as much as it had prior to the challenge!
____________
My lucky number is 75898524288+1 | |
|
|
Hi.
I think I have a problem with getting AVX : jobs do not come if a checkbox. But the processor is able to AVX, and bschink client sees. I think the problem because of the old client. Oratsle system Linux 6 , 64 :
$ Cat / proc / cpuinfo | grep avx | tail -1
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx f16c lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep EMC
12 -Jan- 2014 22:29:21 [---] Starting BOINC client version 6.12.22 for x86_64-pc-linux-gnu
12 -Jan- 2014 22:29:21 [---] log flags: file_xfer, sched_ops, task
12 -Jan- 2014 22:29:21 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3 c-ares/1.5.1
12 -Jan- 2014 22:29:21 [---] Data directory: / home/ruslan/1/BOINC
getaddrinfo: Success
12 -Jan- 2014 22:29:21 [---] Processor: 4 GenuineIntel Intel (R) Core (TM) i3- 3240 CPU@3.40GHz [Family 6 Model 58 Stepping 9]
12 -Jan- 2014 22:29:21 [---] Processor: 3.00 MB cache
12 -Jan- 2014 22:29:21 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor
12 -Jan- 2014 22:29:21 [---] OS: Linux: 2.6.32-358.18.1.el6.x86_64
12 -Jan- 2014 07:35:33 [PrimeGrid] Sending scheduler request: To fetch work.
12 -Jan- 2014 07:35:33 [PrimeGrid] Requesting new tasks for CPU
12 -Jan- 2014 07:35:36 [PrimeGrid] Scheduler request completed: got 0 new tasks
12 -Jan- 2014 07:35:36 [PrimeGrid] No tasks sent
12 -Jan- 2014 07:35:36 [PrimeGrid] No tasks are available for Genefer
12 -Jan- 2014 07:35:36 [PrimeGrid] No tasks are available for the applications you have selected.
12 -Jan- 2014 07:35:36 [PrimeGrid] Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
getaddrinfo: Success
12-Jan-2014 22:29:21 [---] Starting BOINC client version 6.12.22 for x86_64-pc-linux-gnu
12-Jan-2014 22:29:21 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9] | |
|
|
Your BOINC client is very old and probably your linux kernel is also too old
____________
| |
|
|
Windows also doesn't show AVX as a processor feature even with BOINC 7.x, but this fact doesn't hinder receiving AVX tasks after forcing in preferences, isn't it?
(after the last preferences update correct to say "after setting")
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Windows also doesn't show AVX as a processor feature even with BOINC 7.x, but this fact doesn't hinder receiving AVX tasks after forcing in preferences, isn't it?
(after the last preferences update correct to say "after setting")
The Windows client does not yet show AVX, so on Windows -- and ONLY Windows -- the server is set up to not check to make sure the host supports AVX.
However, on other platforms (Mac and Linux) the AVX reporting mechanism works correctly, and the server expects the BOINC client to report the AVX flag. If it's not reported, you can't get AVX tasks.
Your BOINC client is very old and probably your linux kernel is also too old
Definitely update the BOINC client to see if that helps. From your log, it looks like the BOINC client isn't reporting AVX, which will prevent you from getting AVX tasks.
If the kernel doesn't support AVX, it doesn't matter if the server sends AVX tasks or not because if you try running an AVX app it will crash. If the OS doesn't know about the AVX registers, it won't swap them during a context switch, and that will cause errors.
____________
My lucky number is 75898524288+1 | |
|
|
Yes you will need to update both BOINC and your operating system to get AVX tasks. I had to be on at least BOINC 7.0.64 to get AVX recognized. But, the 7.x.x version of BOINC won't run on a 2.6.xx Linux kernel. I had to go to Ubuntu 12.10, which I think is a 3.5 kernel, to get that version of BOINC to run at all (and of course 12.10 is more than a year out of date, and BOINC is at 7.2.xx now... I'm just quoting minimums).
--Gary | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2165 ID: 1178 Credit: 8,777,295,508 RAC: 0
                                     
|
Been trying to get a Tesla K10 box running, but have had tons of issues with the GFN work on both OpenCL and CUDA apps. Switched to a different shell that cleared up some issues and allowed for some driver upgrades, but now on the OpenCL app I get the following:
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/www.primegrid.com/primegrid_genefer_3_1_2_7_2.07_i686-pc-linux-gnu__OCLcudaGFN: error while loading shared libraries: libOpenCL.so.1: wrong ELF class: ELFCLASS64
</stderr_txt>
]]>
Anyone have any ides? | |
|
ardo  Send message
Joined: 12 Dec 10 Posts: 168 ID: 76659 Credit: 1,690,471,713 RAC: 0
                   
|
Been trying to get a Tesla K10 box running, but have had tons of issues with the GFN work on both OpenCL and CUDA apps. Switched to a different shell that cleared up some issues and allowed for some driver upgrades, but now on the OpenCL app I get the following:
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/www.primegrid.com/primegrid_genefer_3_1_2_7_2.07_i686-pc-linux-gnu__OCLcudaGFN: error while loading shared libraries: libOpenCL.so.1: wrong ELF class: ELFCLASS64
</stderr_txt>
]]>
Anyone have any ides?
Based on the error message I'm guessing you are running a 64-bit OS and installed the 64-bit version of the driver, but the executable is 32-bit. If you do 'ldd' on the executable you should see one or more libraries listed as 'not found'. Looks like you need to find a way to install the 32-bit version of the driver and the libraries it needs.
Thanks,
Ardo
____________
Badge score: 2*5 + 8*7 + 3*8 + 3*9 + 1*10 + 1*11 + 1*13 = 151
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Year of the Horse - Pony Express
(As of 2014-01-13 13:19:46 UTC)
65297 tasks have been sent out. [CPU/GPU/anonymous_platform: 21298 (33%) / 43983 (67%) / 16 (0%)]
Of those tasks that have been sent out:
50748 (78%) came back with some kind of an error. [14775 (23%) / 35958 (55%) / 15 (0%)]
6609 (10%) have returned a successful result. [1475 (2%) / 5133 (8%) / 1 (0%)]
7940 (12%) are still in progress. [5048 (8%) / 2892 (4%) / 0 (0%)]
Of the tasks that have been returned successfully:
3207 (49%) are pending validation. [688 (10%) / 2518 (38%) / 1 (0%)]
3186 (48%) have been successfully validated. [761 (12%) / 2425 (37%) / 0 (0%)]
22 (0%) were invalid. (0 [0%) / 22 (0%) / 0 (0%)]
111 (2%) are inconclusive. [16 (0%) / 95 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=335750. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 12.75% as much as it had prior to the challenge! (When task size is considered, that's an increase of 13.75%.)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Year of the Horse - Pony Express
(As of 2014-01-14 13:46:50 UTC)
69538 tasks have been sent out. [CPU/GPU/anonymous_platform: 22537 (32%) / 46985 (68%) / 16 (0%)]
Of those tasks that have been sent out:
53913 (78%) came back with some kind of an error. [15824 (23%) / 38074 (55%) / 15 (0%)]
7710 (11%) have returned a successful result. [1813 (3%) / 5896 (8%) / 1 (0%)]
7915 (11%) are still in progress. [4900 (7%) / 3015 (4%) / 0 (0%)]
Of the tasks that have been returned successfully:
3444 (45%) are pending validation. [747 (10%) / 2696 (35%) / 1 (0%)]
4024 (52%) have been successfully validated. [1039 (13%) / 2985 (39%) / 0 (0%)]
34 (0%) were invalid. (1 [0%) / 33 (0%) / 0 (0%)]
124 (2%) are inconclusive. [14 (0%) / 110 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=338652. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 13.72% as much as it had prior to the challenge! (When task size is considered, that's an increase of 14.78%.)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
For those of you who are wondering about the large number of errors (that includes me), here's an interesting tidbit:
These are the hosts that have more than 1000 errors each:
+--------+--------+---------------------+
| hostid | errors | last result |
+--------+--------+---------------------+
| 376011 | 4241 | 2014-01-13 21:40:07 |
| 402669 | 3994 | 2014-01-14 03:53:01 |
| 207805 | 3776 | 2014-01-10 04:14:47 |
| 299890 | 3139 | 2014-01-08 18:35:55 |
| 405897 | 1962 | 2014-01-14 10:57:35 |
| 419018 | 1947 | 2014-01-14 02:42:35 |
| 300586 | 1870 | 2014-01-07 11:15:40 |
| 419542 | 1338 | 2014-01-08 08:30:32 |
| 419006 | 1292 | 2014-01-14 14:04:30 |
| 400690 | 1227 | 2014-01-08 05:53:47 |
| 418956 | 1117 | 2014-01-08 06:08:15 |
| 403885 | 1032 | 2014-01-08 06:08:40 |
| 153253 | 1005 | 2014-01-14 14:02:36 |
+--------+--------+---------------------+
Just 13 computers (out of 1926 computers total) account for about half of all the errors.
What's usually happening in this situation is that something is configured wrong on the computer -- there's a driver missing, the directory permissions are bad, etc. -- and the task errors out immediately. The error gets reported back to the server very quickly, and new tasks are sent out. This can repeat at a very high frequency.
What I find interesting is that, presumably, these are from computers just added to GFN for the challenge by people who want to participate in the challenge. You would think they would notice that all of the tasks are failing. As you can see from the timestamps, some of them have been stopped or fixed, but others are still generating errors today.
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2165 ID: 1178 Credit: 8,777,295,508 RAC: 0
                                     
|
What I find interesting is that, presumably, these are from computers just added to GFN for the challenge by people who want to participate in the challenge. You would think they would notice that all of the tasks are failing. As you can see from the timestamps, some of them have been stopped or fixed, but others are still generating errors today.
I didn't quite make the list with a particular machine (mine has had about 350 errors), but I wanted to point out (to everyone in general...not to Mike who knows this sort of thing) that at least some of the massive error producing machines are indeed noticed. In my case, there have been multiple problems (and complicated diagnosis since this particular box was successful in the winter solstice challenge). Several of these problems have been successfully fixed (e.g., switching the Linux shell, driver update, etc.) resulting in new errors to be addressed. As such, it looks like there will be no contribution from this machine for the challenge. Yet the massive number of errors reflect a directed effort at diagnosis of problems rather than apathy.
And if you are wondering about my error-prone machine, it looks like the latest error may be not having the 32-bit driver installed (I actually forgot that GFN has only 32-bit GPU apps).
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
And if you are wondering about my error-prone machine, it looks like the latest error may be not having the 32-bit driver installed (I actually forgot that GFN has only 32-bit GPU apps).
For whatever reason, it seems significantly more difficult in general to get GPU apps (and, to some degree, even CPU apps) running on Linux than on Windows.
____________
My lucky number is 75898524288+1 | |
|
|
For whatever reason, it seems significantly more difficult in general to get GPU apps (and, to some degree, even CPU apps) running on Linux than on Windows.
At least some recent versions of 64-bit Ubuntu do not come with 32-bit libraries installed by default. There's a package called ia32-libs that needs to be installed "by hand" (sudo apt-get install ia32-libs). Also, the default video driver was not the "real" nvidia driver. It was a "free" version called "nouveau", which does not support CUDA, and so must be replaced.
--Gary | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Year of the Horse - Pony Express
(As of 2014-01-15 13:19:43 UTC)
74454 tasks have been sent out. [CPU/GPU/anonymous_platform: 23467 (32%) / 50968 (68%) / 19 (0%)]
Of those tasks that have been sent out:
57337 (77%) came back with some kind of an error. [16504 (22%) / 40818 (55%) / 15 (0%)]
8904 (12%) have returned a successful result. [2178 (3%) / 6724 (9%) / 2 (0%)]
8213 (11%) are still in progress. [4785 (6%) / 3426 (5%) / 2 (0%)]
Of the tasks that have been returned successfully:
3644 (41%) are pending validation. [816 (9%) / 2826 (32%) / 2 (0%)]
4992 (56%) have been successfully validated. [1321 (15%) / 3671 (41%) / 0 (0%)]
51 (1%) were invalid. (5 [0%) / 46 (1%) / 0 (0%)]
128 (1%) are inconclusive. [22 (0%) / 106 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=342546. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 15.03% as much as it had prior to the challenge! (When task size is considered, that's an increase of 16.27%.)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Year of the Horse - Pony Express
(As of 2014-01-16 12:23:47 UTC)
78346 tasks have been sent out. [CPU/GPU/anonymous_platform: 23969 (31%) / 54356 (69%) / 21 (0%)]
Of those tasks that have been sent out:
60040 (77%) came back with some kind of an error. [16887 (22%) / 43135 (55%) / 18 (0%)]
10236 (13%) have returned a successful result. [2556 (3%) / 7677 (10%) / 3 (0%)]
8070 (10%) are still in progress. [4526 (6%) / 3544 (5%) / 0 (0%)]
Of the tasks that have been returned successfully:
3837 (37%) are pending validation. [866 (8%) / 2969 (29%) / 2 (0%)]
6118 (60%) have been successfully validated. [1643 (16%) / 4474 (44%) / 1 (0%)]
64 (1%) were invalid. (8 [0%) / 56 (1%) / 0 (0%)]
137 (1%) are inconclusive. [24 (0%) / 113 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=345858. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 16.14% as much as it had prior to the challenge! (When task size is considered, that's an increase of 17.45%.)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
Challenge: Year of the Horse - Pony Express
(As of 2014-01-17 13:30:23 UTC)
83191 tasks have been sent out. [CPU/GPU/anonymous_platform: 24424 (29%) / 58743 (71%) / 24 (0%)]
Of those tasks that have been sent out:
63581 (76%) came back with some kind of an error. [17335 (21%) / 46228 (56%) / 18 (0%)]
11948 (14%) have returned a successful result. [3069 (4%) / 8874 (11%) / 5 (0%)]
7662 (9%) are still in progress. [4020 (5%) / 3641 (4%) / 1 (0%)]
Of the tasks that have been returned successfully:
4014 (34%) are pending validation. [897 (8%) / 3114 (26%) / 3 (0%)]
7606 (64%) have been successfully validated. [2113 (18%) / 5491 (46%) / 2 (0%)]
72 (1%) were invalid. (8 [0%) / 64 (1%) / 0 (0%)]
167 (1%) are inconclusive. [34 (0%) / 133 (1%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is b=348666. The leading edge was at b=297788 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 17.09% as much as it had prior to the challenge! (When task size is considered, that's an increase of 18.52%.)
____________
My lucky number is 75898524288+1 | |
|
|
With about a day left to go it's time for the standard end of challenge request :)
At the Conclusion of the Challenge
We would prefer users "moving on" to finish those tasks they have downloaded, if not then please ABORT the WU's instead of DETACHING, RESETTING, or PAUSING.
ABORTING WU's allows them to be recycled immediately; thus a much faster "clean up" to the end of a Challenge. DETACHING, RESETTING, and PAUSING WU's causes them to remain in limbo until they EXPIRE. Therefore, we must wait until WU's expire to send them out to be completed.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
| |
|
|
genefer_1048576_372855_1_0 reported at approximately January 17 01:46:29 UTC (if my conversion from my Event Log's local time is correct) does not appear in the list of my reported tasks. The other two Genefer short tasks completed just a few minutes later are reported correctly. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 237,712,514 RAC: 0
                           
|
genefer_1048576_372855_1_0 reported at approximately January 17 01:46:29 UTC (if my conversion from my Event Log's local time is correct) does not appear in the list of my reported tasks. The other two Genefer short tasks completed just a few minutes later are reported correctly.
Everything looks fine to me. That server shows that task as having completed at 17 Jan 2014 1:46:40 UTC.
____________
My lucky number is 75898524288+1 | |
|
|
|