Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Proth Prime Search :
Cuda error: getting factors found: unspecified launch failure
Author |
Message |
|
I would like to ask for advice.
My new GPU card (http://www.primegrid.com/forum_thread.php?id=2677) from time to time completes WU with error.
I wonder if it is possible to do something about it. I have checked error code is always the same:
Cuda error: getting factors found: unspecified launch failure
So far the card has completed 126 WU: 104 WU successfully, 22 WU were discarded as error (which is about 20%).
Card is running with default settings, card is not overclocked.
As for run time I see no pattern here, some WU ended after 20sec other almost completed about 2100sec.
Card drivers were downloaded from nvidia webpage: NVIDIA-Linux-x86_64-256.53.run
BOINC detects card as:
03-Sep-2010 14:01:10 [---] Starting BOINC client version 6.10.56 for x86_64-pc-linux-gnu
03-Sep-2010 14:01:10 [---] Config: GUI RPC allowed from any host
03-Sep-2010 14:01:10 [---] log flags: file_xfer, sched_ops, task
03-Sep-2010 14:01:10 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
03-Sep-2010 14:01:10 [---] Data directory: /home/boinc/BOINC
03-Sep-2010 14:01:10 [---] Processor: 2 GenuineIntel Pentium(R) Dual-Core CPU E5200 @ 2.50GHz [Family 6 Model 23 Stepping 10]
03-Sep-2010 14:01:10 [---] Processor: 2.00 MB cache
03-Sep-2010 14:01:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
03-Sep-2010 14:01:10 [---] OS: Linux: 2.6.27-17-generic
03-Sep-2010 14:01:10 [---] Memory: 1.96 GB physical, 227.44 MB virtual
03-Sep-2010 14:01:10 [---] Disk: 3.74 GB total, 987.09 MB free
03-Sep-2010 14:01:10 [---] Local time is UTC +2 hours
03-Sep-2010 14:01:10 [---] NVIDIA GPU 0: GeForce GT 240 (driver version unknown, CUDA version 3010, compute capability 1.2, 511MB, 280 GFLOPS peak)
What is the reason for "getting factors found: unspecified launch failure" error message?
Should I suspect hardware or software problem?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13513 ID: 53948 Credit: 236,922,854 RAC: 0
                           
|
Cuda error: getting factors found: unspecified launch failure
03-Sep-2010 14:01:10 [---] Disk: 3.74 GB total, 987.09 MB free
What is the reason for "getting factors found: unspecified launch failure" error message?
Should I suspect hardware or software problem?
4GB hard drive? Is this a virtual machine?
If so, I suspect that may be the problem. As far as I was aware, you couldn't run CUDA apps in a VM.
You might want to try running CUDA on the host machine rather than in the VM. Since your computers are hidden, it makes it a little hard to diagnose the problem from afar.
____________
My lucky number is 75898524288+1 | |
|
|
This is not a virtual machine.
I wanted to install Dotsch/UX linux distribution at pendrive, but I have found this drive in my garage and decided to try it.
It has worked as you see...and now it is my "vintage" pendrive :-)
The hard drive is: MPC3043A 4.3GB U/ATA FUJITSU HARD DRIVE
Other PC components are: E5200 @ 2.50GHz running Gigabyte GA-P43-ES3G with 2GB RAM, powered by 400W power supply.
I have checked syslog/messages logs for any suspicious information, but found nothing.
Besides this PC is also crunching other BOINC projects for CPU like WEP-M+2 and WCG.
No problems with WU for those projects...so if that would be hardware issue (memory failure or CPU overheating) those projects should be affected too.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
It looks like "unspecified launch failure" is the equivalent of "segmentation fault", which I've been getting in the OpenCL testing. These errors are caused most often by a bug in the code, but could also be caused by hardware failure. On the other hand, since I haven't heard about many CUDA users having problems, I'm not sure what to think.
Have you tried Collatz yet on that card? It's another simple math app that would provide a good test of the hardware.
It would also be helpful if people who asked for help with their computers didn't hide their computers, or at least posted links to example results.
____________
| |
|
|
Computers at my account are now visible, the one having problem is hostid=157777.
Is there any other way to test the card, besides running Collatz?
Some other application under Windows? I mean I have nothing against it, just asking for other possibilities.
I could also move the card to another host, for example hostid=147474.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
The fact that most of the WUs are passing makes me think hardware. I'm sure there are better GPU testing programs; I'm just not sure what they are.
____________
| |
|
|
I am not saying it is not a hardware, but after reviewing results it looks like I am not the only one having problem. I have checked what were the results for my task run by other computers to reach quorum. Below are few examples, outcome of three attempts plus additionally at round brackets GPU details.
WU=129821754
Cuda error: getting factors found: unspecified launch failure
Insufficient available memory on GPU 0. (NVIDIA Quadro NVS 160M (255MB))
In progress
WU=129819839
Cuda error: getting factors found: unspecified launch failure
Insufficient available memory on GPU 0. (NVIDIA GeForce 9600 GT (499MB) driver: 25896)
Insufficient available memory on GPU 0. (NVIDIA Quadro NVS 160M (255MB))
WU=129819591
Cuda error: getting factors found: unspecified launch failure
Insufficient available memory on GPU 0. (NVIDIA GeForce 9600 GT (499MB) driver: 25896)
In progress >>without cuda<<
WU=129816514
Cuda error: getting factors found: unspecified launch failure
CreateProcess() failed - (NVIDIA GeForce GTS 250 (512MB) driver: 19107)
CreateProcess() failed - (0x36b1) >>without cuda23<<
WU=129806137
Cuda error: getting factors found: unspecified launch failure
Cuda error: cudaEventCreate: out of memory (NVIDIA GeForce 9500 GT (511MB) driver: 19062)
OK (NVIDIA GeForce 8800 GT (497MB) driver: 25896)
WU=129806440
Cuda error: getting factors found: unspecified launch failure
Insufficient available memory on GPU 0. ([2] NVIDIA GeForce 9600M GT (255MB))
In progress
WU=129690184
Cuda error: getting factors found: unspecified launch failure
Cuda error: getting device properties: CUDA driver version is insufficient for CUDA runtime version (NVIDIA GeForce 8800 GS (511MB))
Detected emulator! We can't use that! ( NVIDIA GeForce GTX 260 (895MB))
WU=129692478
Cuda error: getting factors found: unspecified launch failure
CreateProcess() failed - (0x36b1) (NVIDIA GeForce GTS 250 (512MB) driver: 19107)
OK (NVIDIA GeForce 8500 GT (255MB) driver: 25896)
I see frequently message about insufficient memory available, also at the other task I have not pasted above. What is the minimal amount of memory required to run the PPS CUDA? Does it change by WU, or this value could grow and exceed initial value during calculation? Maybe my cards has the same problem, but due to drivers reports it with other message?
As for my case I will move today the card to another host running Windows to exclude drivers and operating system impact.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
You're not the only one getting errors, but I'm not seeing anyone else getting your specific error.
Short lesson on known error codes:
Detected emulator! We can't use that! ( NVIDIA GeForce GTX 260 (895MB))
For some reason, when a CUDA API older than 3.1 is installed, BOINC thinks the emulator it installs is a real GPU! Well, it's not, and we can't use it. The app could be compiled to use it, but the emulator runs about 1/30 as fast as the CPU code. So just failing the WU seems like the best option.
WU=129819591
...
Insufficient available memory on GPU 0. (NVIDIA GeForce 9600 GT (499MB) driver: 25896)
This is usually the exact same error as the last one, just interpreted differently. v1.28 specifically calls out the emulator attempt, but it's only running on Linux right now. v1.27 reports this generic error code, after reporting "Detected compute capability: 9999.9999". When it's running on a real card, this error means something else used up the video RAM.
WU=129816514
...
CreateProcess() failed - (NVIDIA GeForce GTS 250 (512MB) driver: 19107)
CreateProcess() failed - (0x36b1) >>without cuda23<<
It looks like, in the first case, someone needs to reinstall the Microsoft VCredist 2005 C++ redistributable package. I found BioShock failing with the same error. I don't know what caused the other failure, but I'm pretty sure it wasn't my code.
Cuda error: getting device properties: CUDA driver version is insufficient for CUDA runtime version Self-explanatory.
Cuda error: cudaEventCreate: out of memory (NVIDIA GeForce 9500 GT (511MB) driver: 19062)
OK, this one's a mystery. I suspect it's a case that some other process ate up all the video RAM while the program was running, but I'm not entirely sure.
Let me know if you find anyone else reporting your specific error. Especially if a single WU has multiple people reporting that error.
____________
| |
|
|
Thank you very much for explanation for error codes! I may also help somebody in the future.
I will keep monitoring my card and continue to search other people reporting "Cuda error: getting factors found: unspecified launch failure" error.
P.S. So far 108 out of 142 WU completed with success, which sets completion rate at 76%...:-(
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2165 ID: 1178 Credit: 8,777,295,508 RAC: 0
                                     
|
Cuda error: cudaEventCreate: out of memory (NVIDIA GeForce 9500 GT (511MB) driver: 19062)
OK, this one's a mystery. I suspect it's a case that some other process ate up all the video RAM while the program was running, but I'm not entirely sure.
This one is the driver. On multiple GPU's I had the same with all 190.xx and 191.xx drivers. 195.xx and above solve the problems...if I recall correctly, some new memory stuff was introduced in the 195.xx driver series that slowed down some cards (on Collatz and with the AP26 app here) and also resulted in the larger memory report problems (i.e., 191.xx drivers reported 255mb out of 256mb on some cards/OS'es whereas 195.xx reported even less...typically 243mb).
____________
141941*2^4299438-1 is prime!
| |
|
|
After moving GT240 card from hostID:157777 (Ubuntu x64) to hostID:147474 (Windows XP x86) I was able to successfully complete 46 out of 48 tasks. This is 95% comparing to 75% at previous computer. Moreover at new host I have increased shrader speed from 1460MHz to 1650MHz, while meantime lowering memory speed from 4000MHz to 3400MHz. Those two invalid WU were generated with even higher shader speed, so I think that could be the issues. After those I lowered the speed to stable 1650MHz and all WU passed since that time.
I will keep it there running until I complete 100WU to have reasonable representative sample.
So far it does not look like faulty GPU, but to diagnose the root cause there is a still long way to go....
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 915 ID: 3110 Credit: 183,164,814 RAC: 0
                        
|
Did it work before you lowered the memory speed?
____________
| |
|
|
Yes, it did. At factory settings I finished about 4 WU, then decided to increase shrader speed and did another few. Then I crossed 1700MHz range and failed, but not two times in a row. I moved back to 1650MHz and did another 10 or so and finally decided to lower memory speed (since it should not affect the performance) to keep temperature down. Temperature dropped from 64 to 63 Celsius, so not much really...
____________
| |
|
|
Is there any possibility to independently change speed of core, shrader and memory under Linux? I would like to move card back to old host, keeping the same settings as under Windows. Zotec does not provide similar software for Linux and Windows. I checked Nvidia X Server after activating Coolbits, only GPU and memory speed could be changed. I also checked nvclock and it does not support GT 240.
____________
| |
|
Message boards :
Proth Prime Search :
Cuda error: getting factors found: unspecified launch failure |