Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.7.1 gfx1010 hang up #64

Closed
mfu-mcosys opened this issue Jan 24, 2020 · 7 comments
Closed

1.7.1 gfx1010 hang up #64

mfu-mcosys opened this issue Jan 24, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@mfu-mcosys
Copy link

mfu-mcosys commented Jan 24, 2020

It seems like something goes wrong with a amd rx5700.
After some time (few mins) without problems a random gfx1010 hangs up with eth

2020-01-24 21:14:16: [Statistics] Ethereum last 10 min - GPU 0: 50.429 MH/s, GPU 1: 50.430 MH/s, GPU 2: 48.223 MH/s, GPU 3: 48.233 MH/s, GPU 4: 48.219 MH/s, GPU 5: 48.203 MH/s, GPU 6: 50.430 MH/s, GPU 7: 48.241 MH/s, GPU 8: 48.221 MH/s, GPU 9: 35.868 MH/s, GPU 10: 32.575 MH/s, GPU 11: 50.430 MH/s. Total: 559.500 MH/s. 2020-01-24 21:14:34: <warning> GPU 3 hung up 2020-01-24 21:14:34: <info> Miner has not been restarted before 2020-01-24 21:14:34: <info> Self-restarting miner process.

DMESG:
[ 1507.095911] gmc_v10_0_process_interrupt: 10 callbacks suppressed [ 1507.095923] amdgpu 0000:0b:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32782, for process nanominer pid 2288 thread nanominer pid 2424) [ 1507.096104] amdgpu 0000:0b:00.0: in page starting at address 0x00000006c899b000 from client 27 [ 1507.096205] amdgpu 0000:0b:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101030 [ 1507.096287] amdgpu 0000:0b:00.0: MORE_FAULTS: 0x0 [ 1507.096346] amdgpu 0000:0b:00.0: WALKER_ERROR: 0x0 [ 1507.096406] amdgpu 0000:0b:00.0: PERMISSION_FAULTS: 0x3 [ 1507.096470] amdgpu 0000:0b:00.0: MAPPING_ERROR: 0x0 [ 1507.096531] amdgpu 0000:0b:00.0: RW: 0x0
I dont know if its a hardware problem...but ethminer works nice (1hour+).

memTweak=0 was added in the config.ini

@Grumpy-Dwarf Grumpy-Dwarf added the bug Something isn't working label Jan 24, 2020
@Grumpy-Dwarf
Copy link
Collaborator

Thanks for reporting, did not test RX 5700 on Linux much, last time I tried they did not work at all in Linux. Will try with updated drivers.

@mfu-mcosys
Copy link
Author

mfu-mcosys commented Jan 26, 2020

Something new: it works without mixing cuda+opencl, if i start nanominer just with all nvidia (cuda) cards it works, if i start nanominer jsut with all amd (opencl) cards, it works. (tested on win10).
The problem seems to be the cuda-nanominer-instance... because if i start 2 instance of nanominer... only the cuda-instance restarts after a mins of working.

phoenix works since 2 days without problems on linux, with a mix of:

GPU1: GeForce RTX 2080 Ti (pcie 1), CUDA cap. 7.5, 11 GB VRAM, 68 CUs
GPU2: GeForce RTX 2080 Ti (pcie 2), CUDA cap. 7.5, 11 GB VRAM, 68 CUs
GPU3: AMD Radeon RX 5700 (pcie 5), OpenCL 2.0, 8 GB VRAM, 36 CUs
GPU4: AMD Radeon RX 5700 (pcie 11), OpenCL 2.0, 8 GB VRAM, 36 CUs
GPU5: AMD Radeon RX 5700 (pcie 14), OpenCL 2.0, 8 GB VRAM, 36 CUs
GPU6: AMD Radeon RX 5700 (pcie 17), OpenCL 2.0, 8 GB VRAM, 36 CUs
GPU7: AMD Radeon VII (pcie 20), OpenCL 2.0, 16 GB VRAM, 60 CUs
GPU8: AMD Radeon RX 5700 (pcie 23), OpenCL 2.0, 8 GB VRAM, 36 CUs
GPU9: AMD Radeon RX 5700 (pcie 26), OpenCL 2.0, 8 GB VRAM, 36 CUs
GPU10: GeForce RTX 2060 SUPER (pcie 27), CUDA cap. 7.5, 7.8 GB VRAM, 34 CUs
GPU11: GeForce RTX 2080 Ti (pcie 29), CUDA cap. 7.5, 11 GB VRAM, 68 CUs
GPU12: GeForce RTX 2080 Ti (pcie 30), CUDA cap. 7.5, 11 GB VRAM, 68 CUs

So i guess its not a HW-problem. :)

@martinberlin
Copy link

Any news on this? Did someone got it working.
I get an error message that restarts and that I have to increase netTweak

@Grumpy-Dwarf
Copy link
Collaborator

This was reproduced on amdgpu-pro 20.40 Linux driver. Will recheck on latest 20.45 too now.

@Grumpy-Dwarf
Copy link
Collaborator

On 20.45 it does not hang but crashes inside driver with same messages in kernel log and terminates.

@Grumpy-Dwarf
Copy link
Collaborator

I'm negatively impressed with 20.45 quality. First time I've ever seen my 5700 xt at 98 degrees with default fan settings.

@Grumpy-Dwarf
Copy link
Collaborator

Fixed in nanominer 3.1.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants