PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Problems and Help : ap27_2.6_opencl got stuck

Author Message
Profile davidak
Send message
Joined: 11 Feb 16
Posts: 3
ID: 438932
Credit: 7,371,259
RAC: 0
PPS LLR Silver: Earned 100,000 credits (364,911)SGS LLR Bronze: Earned 10,000 credits (53,324)PPS Sieve Turquoise: Earned 5,000,000 credits (6,064,429)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (25,804)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,866)AP 26/27 Bronze: Earned 10,000 credits (64,688)GFN Gold: Earned 500,000 credits (776,237)
Message 140352 - Posted: 20 May 2020 | 15:34:11 UTC
Last modified: 20 May 2020 | 15:49:33 UTC

I had a ap27_2.6_opencl WU running and it was already past the remaining time, which was displayed as --- but running time still counts up. But it was also stuck at 27% and the progress did not change.

So i followed the advice from another WU issue here and suspended and resumed it. Then it had the status "Waiting to run" forever.

I restarted BOINC, but it could not stop that process. Now the linux kernel seems to have issues because of that process.

What's going on here? Can i do anything to complete this WU?

May 20 17:11:53 gaming kernel: INFO: task ap27_2.6_opencl:1605 blocked for more than 122 seconds.
May 20 17:11:53 gaming kernel: Not tainted 5.6.13 #1-NixOS
May 20 17:11:53 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 17:11:53 gaming kernel: ap27_2.6_opencl D 0 1605 1482 0x80004002
May 20 17:11:53 gaming kernel: Call Trace:
May 20 17:11:53 gaming kernel: ? __schedule+0x250/0x6d0
May 20 17:11:53 gaming kernel: ? schedule+0x4a/0xb0
May 20 17:11:53 gaming kernel: ? schedule_timeout+0x20f/0x300
May 20 17:11:53 gaming kernel: ? ttm_bo_move_to_lru_tail+0x28/0xc0 [ttm]
May 20 17:11:53 gaming kernel: ? ttm_eu_backoff_reservation+0x43/0x60 [ttm]
May 20 17:11:53 gaming kernel: ? dma_fence_default_wait+0x15f/0x1f0
May 20 17:11:53 gaming kernel: ? dma_fence_release+0x140/0x140
May 20 17:11:53 gaming kernel: ? dma_fence_wait_timeout+0xdd/0x100
May 20 17:11:53 gaming kernel: ? amdgpu_vm_fini+0xe7/0x470 [amdgpu]
May 20 17:11:53 gaming kernel: ? idr_destroy+0x71/0xb0
May 20 17:11:53 gaming kernel: ? amdgpu_driver_postclose_kms+0x15d/0x230 [amdgpu]
May 20 17:11:53 gaming kernel: ? drm_file_free.part.0+0x210/0x2c0 [drm]
May 20 17:11:53 gaming kernel: ? drm_release+0x4b/0x80 [drm]
May 20 17:11:53 gaming kernel: ? __fput+0xb9/0x250
May 20 17:11:53 gaming kernel: ? task_work_run+0x8a/0xb0
May 20 17:11:53 gaming kernel: ? do_exit+0x360/0xaa0
May 20 17:11:53 gaming kernel: ? handle_mm_fault+0xc4/0x1f0
May 20 17:11:53 gaming kernel: ? do_group_exit+0x3a/0xa0
May 20 17:11:53 gaming kernel: ? __x64_sys_exit_group+0x14/0x20
May 20 17:11:53 gaming kernel: ? do_syscall_64+0x4e/0x160
May 20 17:11:53 gaming kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 20 17:12:34 gaming sudo[25188]: davidak : TTY=pts/1 ; PWD=/home/davidak ; USER=root ; COMMAND=/run/wrappers/bin/su
May 20 17:12:34 gaming sudo[25188]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 20 17:12:34 gaming su[25189]: Successful su for root by root
May 20 17:12:34 gaming su[25189]: pam_unix(su:session): session opened for user root by (uid=0)
May 20 17:12:41 gaming systemd[1]: Stopping BOINC Client...
May 20 17:13:56 gaming kernel: INFO: task ap27_2.6_opencl:1605 blocked for more than 245 seconds.
May 20 17:13:56 gaming kernel: Not tainted 5.6.13 #1-NixOS
May 20 17:13:56 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 17:13:56 gaming kernel: ap27_2.6_opencl D 0 1605 1 0x80004002
May 20 17:13:56 gaming kernel: Call Trace:
May 20 17:13:56 gaming kernel: ? __schedule+0x250/0x6d0
May 20 17:13:56 gaming kernel: ? schedule+0x4a/0xb0
May 20 17:13:56 gaming kernel: ? schedule_timeout+0x20f/0x300
May 20 17:13:56 gaming kernel: ? ttm_bo_move_to_lru_tail+0x28/0xc0 [ttm]
May 20 17:13:56 gaming kernel: ? ttm_eu_backoff_reservation+0x43/0x60 [ttm]
May 20 17:13:56 gaming kernel: ? dma_fence_default_wait+0x15f/0x1f0
May 20 17:13:56 gaming kernel: ? dma_fence_release+0x140/0x140
May 20 17:13:56 gaming kernel: ? dma_fence_wait_timeout+0xdd/0x100
May 20 17:13:56 gaming kernel: ? amdgpu_vm_fini+0xe7/0x470 [amdgpu]
May 20 17:13:56 gaming kernel: ? idr_destroy+0x71/0xb0
May 20 17:13:56 gaming kernel: ? amdgpu_driver_postclose_kms+0x15d/0x230 [amdgpu]
May 20 17:13:56 gaming kernel: ? drm_file_free.part.0+0x210/0x2c0 [drm]
May 20 17:13:56 gaming kernel: ? drm_release+0x4b/0x80 [drm]
May 20 17:13:56 gaming kernel: ? __fput+0xb9/0x250
May 20 17:13:56 gaming kernel: ? task_work_run+0x8a/0xb0
May 20 17:13:56 gaming kernel: ? do_exit+0x360/0xaa0
May 20 17:13:56 gaming kernel: ? handle_mm_fault+0xc4/0x1f0
May 20 17:13:56 gaming kernel: ? do_group_exit+0x3a/0xa0
May 20 17:13:56 gaming kernel: ? __x64_sys_exit_group+0x14/0x20
May 20 17:13:56 gaming kernel: ? do_syscall_64+0x4e/0x160
May 20 17:13:56 gaming kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 20 17:14:11 gaming systemd[1]: boinc.service: State 'stop-final-sigterm' timed out. Killing.
May 20 17:14:11 gaming systemd[1]: boinc.service: Killing process 1605 (ap27_2.6_opencl) with signal SIGKILL.
May 20 17:14:11 gaming systemd[1]: boinc.service: Failed with result 'timeout'.
May 20 17:14:11 gaming systemd[1]: Stopped BOINC Client.
May 20 17:14:11 gaming systemd[1]: boinc.service: Consumed 1month 3w 6d 1h 18min 20.255s CPU time, received 240.1M IP traffic, sent 7.4G IP traffic.
May 20 17:14:11 gaming systemd[1]: boinc.service: Found left-over process 1605 (ap27_2.6_opencl) in control group while starting unit. Ignoring.
May 20 17:14:11 gaming systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 20 17:14:11 gaming systemd[1]: Started BOINC Client.

I'm not sure if the PCIe connectors of the Mainboard are OK. I bought it used and tested with 4 different GPUs and all had dropouts for 1-3 seconds. That would probably lead to calculation errors in BOINC. But i don't had that issue with shorter WUs.

Update: I rebootet the system (well reboot didn't work, so i pulled the power plug) and the WU runs again. It's at 30% now and shows 30 minutes runtime. It was 1h30m before! It seem to have crashed the whole system.

Does it make sense to run it for a day or for however it will take or is the WU invalid anyway?

Profile davidak
Send message
Joined: 11 Feb 16
Posts: 3
ID: 438932
Credit: 7,371,259
RAC: 0
PPS LLR Silver: Earned 100,000 credits (364,911)SGS LLR Bronze: Earned 10,000 credits (53,324)PPS Sieve Turquoise: Earned 5,000,000 credits (6,064,429)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (25,804)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,866)AP 26/27 Bronze: Earned 10,000 credits (64,688)GFN Gold: Earned 500,000 credits (776,237)
Message 140373 - Posted: 21 May 2020 | 13:00:35 UTC
Last modified: 21 May 2020 | 13:14:24 UTC

I can't edit my post again.

Update: The WU got stuck at 36% again... it seems to be cursed.

The same computer with the same GPU completed 14 validated WUs yesterday and had no errors. So i'm not sure if it is a hardware problem or just this one WU.

https://www.primegrid.com/workunit.php?wuid=660232884
https://www.primegrid.com/result.php?resultid=1099648571

Suspending and resume don't work.

Stopping BOINC and stopping that process brake the kernel!

Can't even stop the process with

kill -9 32086


The only way to stopp the process is to pull the power plug!


May 21 14:57:20 gaming systemd[1]: Stopping BOINC Client...
May 21 14:58:50 gaming systemd[1]: boinc.service: State 'stop-final-sigterm' timed out. Killing.
May 21 14:58:50 gaming systemd[1]: boinc.service: Killing process 32086 (ap27_2.6_opencl) with signal SIGKILL.
May 21 14:58:50 gaming systemd[1]: boinc.service: Failed with result 'timeout'.
May 21 14:58:50 gaming systemd[1]: Stopped BOINC Client.
May 21 14:58:50 gaming systemd[1]: boinc.service: Consumed 1month 2d 20h 8min 29.690s CPU time, received 137.6M IP traffic, sent 7.5G IP traffic.
May 21 14:58:50 gaming systemd[1]: boinc.service: Found left-over process 32086 (ap27_2.6_opencl) in control group while starting unit. Ignoring.
May 21 14:58:50 gaming systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 14:58:50 gaming systemd[1]: Started BOINC Client.
May 21 14:58:57 gaming kernel: INFO: task ap27_2.6_opencl:32086 blocked for more than 122 seconds.
May 21 14:58:57 gaming kernel: Not tainted 5.6.13 #1-NixOS
May 21 14:58:57 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 14:58:57 gaming kernel: ap27_2.6_opencl D 0 32086 1 0x80004002
May 21 14:58:57 gaming kernel: Call Trace:
May 21 14:58:57 gaming kernel: ? __schedule+0x250/0x6d0
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x40/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? schedule+0x4a/0xb0
May 21 14:58:57 gaming kernel: ? schedule_timeout+0x20f/0x300
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x40/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x40/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? __switch_to+0x10d/0x3c0
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? dma_fence_default_wait+0x15f/0x1f0
May 21 14:58:57 gaming kernel: ? dma_fence_release+0x140/0x140
May 21 14:58:57 gaming kernel: ? dma_fence_wait_timeout+0xdd/0x100
May 21 14:58:57 gaming kernel: ? amdgpu_vm_fini+0xe7/0x470 [amdgpu]
May 21 14:58:57 gaming kernel: ? idr_destroy+0x71/0xb0
May 21 14:58:57 gaming kernel: ? amdgpu_driver_postclose_kms+0x15d/0x230 [amdgpu]
May 21 14:58:57 gaming kernel: ? drm_file_free.part.0+0x210/0x2c0 [drm]
May 21 14:58:57 gaming kernel: ? drm_release+0x4b/0x80 [drm]
May 21 14:58:57 gaming kernel: ? __fput+0xb9/0x250
May 21 14:58:57 gaming kernel: ? task_work_run+0x8a/0xb0
May 21 14:58:57 gaming kernel: ? do_exit+0x360/0xaa0
May 21 14:58:57 gaming kernel: ? handle_mm_fault+0xc4/0x1f0
May 21 14:58:57 gaming kernel: ? do_group_exit+0x3a/0xa0
May 21 14:58:57 gaming kernel: ? __x64_sys_exit_group+0x14/0x20
May 21 14:58:57 gaming kernel: ? do_syscall_64+0x4e/0x160
May 21 14:58:57 gaming kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

stream
Volunteer moderator
Project administrator
Volunteer developer
Volunteer tester
Send message
Joined: 1 Mar 14
Posts: 834
ID: 301928
Credit: 488,476,972
RAC: 0
Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (10,011,570)Cullen LLR Jade: Earned 10,000,000 credits (10,009,374)ESP LLR Jade: Earned 10,000,000 credits (10,009,221)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,012,217)PPS LLR Jade: Earned 10,000,000 credits (11,055,307)PSP LLR Jade: Earned 10,000,000 credits (10,044,081)SoB LLR Jade: Earned 10,000,000 credits (10,064,750)SR5 LLR Jade: Earned 10,000,000 credits (10,002,051)SGS LLR Jade: Earned 10,000,000 credits (10,001,215)TRP LLR Jade: Earned 10,000,000 credits (10,002,411)Woodall LLR Jade: Earned 10,000,000 credits (10,013,921)321 Sieve Sapphire: Earned 20,000,000 credits (20,004,228)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (10,000)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,047,667)PPS Sieve Sapphire: Earned 20,000,000 credits (20,866,490)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,043,271)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,015,177)AP 26/27 Sapphire: Earned 20,000,000 credits (20,045,194)GFN Emerald: Earned 50,000,000 credits (53,545,385)PSA Double Silver: Earned 200,000,000 credits (200,301,443)
Message 140376 - Posted: 21 May 2020 | 14:05:42 UTC

The trace suggests that "amd gpu driver" is a cause of the problem.

Common answer is to find new or instead, try older driver; or report this problem to driver developers.

Profile davidak
Send message
Joined: 11 Feb 16
Posts: 3
ID: 438932
Credit: 7,371,259
RAC: 0
PPS LLR Silver: Earned 100,000 credits (364,911)SGS LLR Bronze: Earned 10,000 credits (53,324)PPS Sieve Turquoise: Earned 5,000,000 credits (6,064,429)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (25,804)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,866)AP 26/27 Bronze: Earned 10,000 credits (64,688)GFN Gold: Earned 500,000 credits (776,237)
Message 140377 - Posted: 21 May 2020 | 14:36:21 UTC - in response to Message 140376.
Last modified: 21 May 2020 | 15:29:29 UTC

OK, thank you.

After starting the WU a 3. time it actually finished and is valid. No AP found.

I created a report at http://amd.com/report but it seems they only support games on windows using the "AMD Radeon Software Adrenalin Edition 2020 Driver".

Note: Only issues affecting the latest driver releases will be investigated.
The latest AMD Radeon Software Adrenalin Edition Drivers can be downloaded from the following link


support for linux is always bad or nonexistent. why do we give them money :(

Post to thread

Message boards : Problems and Help : ap27_2.6_opencl got stuck

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2023 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.00, 0.01, 0.01
Generated 23 Sep 2023 | 21:25:11 UTC