PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Problems and Help : ap27_2.6_opencl got stuck (Message 140377)
Posted 1109 days ago by Profile davidak
OK, thank you.

After starting the WU a 3. time it actually finished and is valid. No AP found.

I created a report at http://amd.com/report but it seems they only support games on windows using the "AMD Radeon Software Adrenalin Edition 2020 Driver".

Note: Only issues affecting the latest driver releases will be investigated.
The latest AMD Radeon Software Adrenalin Edition Drivers can be downloaded from the following link


support for linux is always bad or nonexistent. why do we give them money :(
2) Message boards : Problems and Help : ap27_2.6_opencl got stuck (Message 140373)
Posted 1110 days ago by Profile davidak
I can't edit my post again.

Update: The WU got stuck at 36% again... it seems to be cursed.

The same computer with the same GPU completed 14 validated WUs yesterday and had no errors. So i'm not sure if it is a hardware problem or just this one WU.

https://www.primegrid.com/workunit.php?wuid=660232884
https://www.primegrid.com/result.php?resultid=1099648571

Suspending and resume don't work.

Stopping BOINC and stopping that process brake the kernel!

Can't even stop the process with
kill -9 32086


The only way to stopp the process is to pull the power plug!


May 21 14:57:20 gaming systemd[1]: Stopping BOINC Client...
May 21 14:58:50 gaming systemd[1]: boinc.service: State 'stop-final-sigterm' timed out. Killing.
May 21 14:58:50 gaming systemd[1]: boinc.service: Killing process 32086 (ap27_2.6_opencl) with signal SIGKILL.
May 21 14:58:50 gaming systemd[1]: boinc.service: Failed with result 'timeout'.
May 21 14:58:50 gaming systemd[1]: Stopped BOINC Client.
May 21 14:58:50 gaming systemd[1]: boinc.service: Consumed 1month 2d 20h 8min 29.690s CPU time, received 137.6M IP traffic, sent 7.5G IP traffic.
May 21 14:58:50 gaming systemd[1]: boinc.service: Found left-over process 32086 (ap27_2.6_opencl) in control group while starting unit. Ignoring.
May 21 14:58:50 gaming systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 21 14:58:50 gaming systemd[1]: Started BOINC Client.
May 21 14:58:57 gaming kernel: INFO: task ap27_2.6_opencl:32086 blocked for more than 122 seconds.
May 21 14:58:57 gaming kernel: Not tainted 5.6.13 #1-NixOS
May 21 14:58:57 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 14:58:57 gaming kernel: ap27_2.6_opencl D 0 32086 1 0x80004002
May 21 14:58:57 gaming kernel: Call Trace:
May 21 14:58:57 gaming kernel: ? __schedule+0x250/0x6d0
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x40/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? schedule+0x4a/0xb0
May 21 14:58:57 gaming kernel: ? schedule_timeout+0x20f/0x300
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x40/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x40/0x70
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? __switch_to+0x10d/0x3c0
May 21 14:58:57 gaming kernel: ? __switch_to_asm+0x34/0x70
May 21 14:58:57 gaming kernel: ? dma_fence_default_wait+0x15f/0x1f0
May 21 14:58:57 gaming kernel: ? dma_fence_release+0x140/0x140
May 21 14:58:57 gaming kernel: ? dma_fence_wait_timeout+0xdd/0x100
May 21 14:58:57 gaming kernel: ? amdgpu_vm_fini+0xe7/0x470 [amdgpu]
May 21 14:58:57 gaming kernel: ? idr_destroy+0x71/0xb0
May 21 14:58:57 gaming kernel: ? amdgpu_driver_postclose_kms+0x15d/0x230 [amdgpu]
May 21 14:58:57 gaming kernel: ? drm_file_free.part.0+0x210/0x2c0 [drm]
May 21 14:58:57 gaming kernel: ? drm_release+0x4b/0x80 [drm]
May 21 14:58:57 gaming kernel: ? __fput+0xb9/0x250
May 21 14:58:57 gaming kernel: ? task_work_run+0x8a/0xb0
May 21 14:58:57 gaming kernel: ? do_exit+0x360/0xaa0
May 21 14:58:57 gaming kernel: ? handle_mm_fault+0xc4/0x1f0
May 21 14:58:57 gaming kernel: ? do_group_exit+0x3a/0xa0
May 21 14:58:57 gaming kernel: ? __x64_sys_exit_group+0x14/0x20
May 21 14:58:57 gaming kernel: ? do_syscall_64+0x4e/0x160
May 21 14:58:57 gaming kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
3) Message boards : Problems and Help : ap27_2.6_opencl got stuck (Message 140352)
Posted 1110 days ago by Profile davidak
I had a ap27_2.6_opencl WU running and it was already past the remaining time, which was displayed as --- but running time still counts up. But it was also stuck at 27% and the progress did not change.

So i followed the advice from another WU issue here and suspended and resumed it. Then it had the status "Waiting to run" forever.

I restarted BOINC, but it could not stop that process. Now the linux kernel seems to have issues because of that process.

What's going on here? Can i do anything to complete this WU?

May 20 17:11:53 gaming kernel: INFO: task ap27_2.6_opencl:1605 blocked for more than 122 seconds.
May 20 17:11:53 gaming kernel: Not tainted 5.6.13 #1-NixOS
May 20 17:11:53 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 17:11:53 gaming kernel: ap27_2.6_opencl D 0 1605 1482 0x80004002
May 20 17:11:53 gaming kernel: Call Trace:
May 20 17:11:53 gaming kernel: ? __schedule+0x250/0x6d0
May 20 17:11:53 gaming kernel: ? schedule+0x4a/0xb0
May 20 17:11:53 gaming kernel: ? schedule_timeout+0x20f/0x300
May 20 17:11:53 gaming kernel: ? ttm_bo_move_to_lru_tail+0x28/0xc0 [ttm]
May 20 17:11:53 gaming kernel: ? ttm_eu_backoff_reservation+0x43/0x60 [ttm]
May 20 17:11:53 gaming kernel: ? dma_fence_default_wait+0x15f/0x1f0
May 20 17:11:53 gaming kernel: ? dma_fence_release+0x140/0x140
May 20 17:11:53 gaming kernel: ? dma_fence_wait_timeout+0xdd/0x100
May 20 17:11:53 gaming kernel: ? amdgpu_vm_fini+0xe7/0x470 [amdgpu]
May 20 17:11:53 gaming kernel: ? idr_destroy+0x71/0xb0
May 20 17:11:53 gaming kernel: ? amdgpu_driver_postclose_kms+0x15d/0x230 [amdgpu]
May 20 17:11:53 gaming kernel: ? drm_file_free.part.0+0x210/0x2c0 [drm]
May 20 17:11:53 gaming kernel: ? drm_release+0x4b/0x80 [drm]
May 20 17:11:53 gaming kernel: ? __fput+0xb9/0x250
May 20 17:11:53 gaming kernel: ? task_work_run+0x8a/0xb0
May 20 17:11:53 gaming kernel: ? do_exit+0x360/0xaa0
May 20 17:11:53 gaming kernel: ? handle_mm_fault+0xc4/0x1f0
May 20 17:11:53 gaming kernel: ? do_group_exit+0x3a/0xa0
May 20 17:11:53 gaming kernel: ? __x64_sys_exit_group+0x14/0x20
May 20 17:11:53 gaming kernel: ? do_syscall_64+0x4e/0x160
May 20 17:11:53 gaming kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 20 17:12:34 gaming sudo[25188]: davidak : TTY=pts/1 ; PWD=/home/davidak ; USER=root ; COMMAND=/run/wrappers/bin/su
May 20 17:12:34 gaming sudo[25188]: pam_unix(sudo:session): session opened for user root by (uid=0)
May 20 17:12:34 gaming su[25189]: Successful su for root by root
May 20 17:12:34 gaming su[25189]: pam_unix(su:session): session opened for user root by (uid=0)
May 20 17:12:41 gaming systemd[1]: Stopping BOINC Client...
May 20 17:13:56 gaming kernel: INFO: task ap27_2.6_opencl:1605 blocked for more than 245 seconds.
May 20 17:13:56 gaming kernel: Not tainted 5.6.13 #1-NixOS
May 20 17:13:56 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 20 17:13:56 gaming kernel: ap27_2.6_opencl D 0 1605 1 0x80004002
May 20 17:13:56 gaming kernel: Call Trace:
May 20 17:13:56 gaming kernel: ? __schedule+0x250/0x6d0
May 20 17:13:56 gaming kernel: ? schedule+0x4a/0xb0
May 20 17:13:56 gaming kernel: ? schedule_timeout+0x20f/0x300
May 20 17:13:56 gaming kernel: ? ttm_bo_move_to_lru_tail+0x28/0xc0 [ttm]
May 20 17:13:56 gaming kernel: ? ttm_eu_backoff_reservation+0x43/0x60 [ttm]
May 20 17:13:56 gaming kernel: ? dma_fence_default_wait+0x15f/0x1f0
May 20 17:13:56 gaming kernel: ? dma_fence_release+0x140/0x140
May 20 17:13:56 gaming kernel: ? dma_fence_wait_timeout+0xdd/0x100
May 20 17:13:56 gaming kernel: ? amdgpu_vm_fini+0xe7/0x470 [amdgpu]
May 20 17:13:56 gaming kernel: ? idr_destroy+0x71/0xb0
May 20 17:13:56 gaming kernel: ? amdgpu_driver_postclose_kms+0x15d/0x230 [amdgpu]
May 20 17:13:56 gaming kernel: ? drm_file_free.part.0+0x210/0x2c0 [drm]
May 20 17:13:56 gaming kernel: ? drm_release+0x4b/0x80 [drm]
May 20 17:13:56 gaming kernel: ? __fput+0xb9/0x250
May 20 17:13:56 gaming kernel: ? task_work_run+0x8a/0xb0
May 20 17:13:56 gaming kernel: ? do_exit+0x360/0xaa0
May 20 17:13:56 gaming kernel: ? handle_mm_fault+0xc4/0x1f0
May 20 17:13:56 gaming kernel: ? do_group_exit+0x3a/0xa0
May 20 17:13:56 gaming kernel: ? __x64_sys_exit_group+0x14/0x20
May 20 17:13:56 gaming kernel: ? do_syscall_64+0x4e/0x160
May 20 17:13:56 gaming kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 20 17:14:11 gaming systemd[1]: boinc.service: State 'stop-final-sigterm' timed out. Killing.
May 20 17:14:11 gaming systemd[1]: boinc.service: Killing process 1605 (ap27_2.6_opencl) with signal SIGKILL.
May 20 17:14:11 gaming systemd[1]: boinc.service: Failed with result 'timeout'.
May 20 17:14:11 gaming systemd[1]: Stopped BOINC Client.
May 20 17:14:11 gaming systemd[1]: boinc.service: Consumed 1month 3w 6d 1h 18min 20.255s CPU time, received 240.1M IP traffic, sent 7.4G IP traffic.
May 20 17:14:11 gaming systemd[1]: boinc.service: Found left-over process 1605 (ap27_2.6_opencl) in control group while starting unit. Ignoring.
May 20 17:14:11 gaming systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 20 17:14:11 gaming systemd[1]: Started BOINC Client.

I'm not sure if the PCIe connectors of the Mainboard are OK. I bought it used and tested with 4 different GPUs and all had dropouts for 1-3 seconds. That would probably lead to calculation errors in BOINC. But i don't had that issue with shorter WUs.

Update: I rebootet the system (well reboot didn't work, so i pulled the power plug) and the WU runs again. It's at 30% now and shows 30 minutes runtime. It was 1h30m before! It seem to have crashed the whole system.

Does it make sense to run it for a day or for however it will take or is the WU invalid anyway?
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2023 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.03, 0.02, 0.00
Generated 5 Jun 2023 | 13:49:38 UTC