Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging the ROCK (4.11 + AMD patches) kernel #4

Open
alexanderkjeldaas opened this issue Dec 5, 2017 · 32 comments
Open

Packaging the ROCK (4.11 + AMD patches) kernel #4

alexanderkjeldaas opened this issue Dec 5, 2017 · 32 comments

Comments

@alexanderkjeldaas
Copy link

Issue description

The ROCm project needs the ROCK kernel for a good while longer. The current ROCm release is 1.6, and 1.7 is being released now. There will be one or more releases at least before the required functionality is upstreamed.

It seems like packaging the ROCK kernel is the right thing to do in the meantime.

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver

The last changes were in August, and there are indications that it's being rebased on top of 4.13 based on ROCm/ROCm#256 comments.

@alexanderkjeldaas
Copy link
Author

The kernel should be called 4.11.0-kfd based on the naming convention the ubuntu packages use.

@corngood
Copy link
Owner

corngood commented Dec 6, 2017

I had a look at the default kernel config in the ROCK kernel, and the only obvious relevant difference from ubuntu was DRM_AMD_DC=y, so I enabled that and build the kernel from their tree. If you want to try the kernel, you can cherry-pick 76cb381 and set
boot.kernelPackages = pkgs.linuxPackagesFor pkgs.linux_rock.

It boots without any obvious errors, and DC seems to be working.

I'll get into the userspace stuff next.

@alexanderkjeldaas
Copy link
Author

Cool.. I also tested this yesterday. Didn't think you'd get around to it.. :-) NixOS#32376

@alexanderkjeldaas
Copy link
Author

Userspace seems to be a little bit more involved..

@corngood
Copy link
Owner

corngood commented Dec 6, 2017

@alexanderkjeldaas So you tested this kernel already on NixOS? Do you have any other WIP stuff?

@alexanderkjeldaas
Copy link
Author

I get some display issues. And wifi doesn't work. So it's not 100%.

@alexanderkjeldaas
Copy link
Author

@alexanderkjeldaas
Copy link
Author

roct compiles now

@corngood
Copy link
Owner

corngood commented Dec 6, 2017

Ok, we should definitely coordinate this, because you're already doing what I was going to do next.

Is there anything in particular you're stuck on, or some work that can be divided?

@alexanderkjeldaas
Copy link
Author

alexanderkjeldaas commented Dec 6, 2017

let's create a list of stuff:

I'm working on hcc, just finished hsa-runtime-amd.

Could you try ROCm-OpenCL-Driver?

@corngood
Copy link
Owner

corngood commented Dec 6, 2017

Sure, I'll continue with it. I will have limited time before the weekend though.

@alexanderkjeldaas
Copy link
Author

@alexanderkjeldaas
Copy link
Author

I'm also afraid that I don't have time to finish this.. :-)

@alexanderkjeldaas
Copy link
Author

ROCm-Device-Libs done.
I can't edit the ROCm board.

@alexanderkjeldaas
Copy link
Author

Continuing with hcc now. Current issue is finding the right HSA headers for hcc. I tried hsa-runtime-amd but looks like rocr-runtime is the one to use. Still not found during some compilation steps.

@alexanderkjeldaas
Copy link
Author

all in ak/rocm-changes in my tree.

@alexanderkjeldaas
Copy link
Author

I'm working on ROCm-OpenCL-Runtime

@alexanderkjeldaas
Copy link
Author

ROCm-OpenCL-Runtime is done.

@corngood
Copy link
Owner

I'll have some free time over the next few days, so I was going to have a look at your changes. I notice you have amdgpu-pro changes in there. Are you using any of the pro stack when testing ROCm?

Have you got to the point where the CL runtime is actually working?

@alexanderkjeldaas
Copy link
Author

alexanderkjeldaas commented Dec 12, 2017 via email

@alexanderkjeldaas
Copy link
Author

alexanderkjeldaas commented Dec 12, 2017 via email

@alexanderkjeldaas
Copy link
Author

alexanderkjeldaas commented Dec 12, 2017 via email

@corngood
Copy link
Owner

Like you get literally nothing? I'm getting:

➜  nixpkgs git:(ak/rocm-changes) sudo $(nix-build -A roc-smi)/bin/rocm-smi


====================    ROCm System Management Interface    ====================
================================================================================
 GPU  DID    Temp     AvgPwr   SCLK     MCLK     Fan      Perf    OverDrive  ECC
  0   67b1   45.0c    N/A      483Mhz   1300Mhz  24.71%   auto      0%       N/A
================================================================================
====================           End of ROCm SMI Log          ====================

With an R9 290 using the kernel from my branch.

@corngood
Copy link
Owner

What's your /sys/module/amdgpu/parameters/dc? Mine is -1. Also, I didn't update any firmwares from master.

@alexanderkjeldaas
Copy link
Author

I think my problem is that I'm booting with nomodeset, and then amdgpu refuses to load.

@alexanderkjeldaas
Copy link
Author

I'm unable to boot without nomodeset.

@alexanderkjeldaas
Copy link
Author

alexanderkjeldaas commented Dec 13, 2017

I'm going to try a few other motherboards and see what i get.

@corngood
Copy link
Owner

I've only been testing in xorg. Do you need to run without it?

@alexanderkjeldaas
Copy link
Author

rocm-smi now works with kernel 4.15.0-rc3

I've rebased my branch with nixpkgs head and added a few updates.

@corngood
Copy link
Owner

That's good. Is opencl working? Anything I can help with?

@alexanderkjeldaas
Copy link
Author

alexanderkjeldaas commented Dec 14, 2017 via email

corngood pushed a commit that referenced this issue Mar 2, 2024
Without the change `unnethack` startup crashes as:

    (gdb) bt
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
    #1  0x00007f734250c0e3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
    #2  0x00007f73424bce06 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
    #3  0x00007f73424a58f5 in __GI_abort () at abort.c:79
    #4  0x00007f73424a67a1 in __libc_message (fmt=fmt@entry=0x7f734261e2f8 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:150
    #5  0x00007f734259b1d9 in __GI___fortify_fail (msg=msg@entry=0x7f734261e2df "buffer overflow detected") at fortify_fail.c:24
    #6  0x00007f734259ab94 in __GI___chk_fail () at chk_fail.c:28
    NixOS#7  0x00000000005b2ac5 in strcpy (__src=0x7ffe68838b00 "Shall I pick a character's race, role, gender and alignment for you? [YNTQ] (y)",
        __dest=0x7ffe68838990 "\001") at /nix/store/B0S2LKF593R3585038WS4JD3LYLF2WDX-glibc-2.38-44-dev/include/bits/string_fortified.h:79
    NixOS#8  curses_break_str (str=str@entry=0x7ffe68838b00 "Shall I pick a character's race, role, gender and alignment for you? [YNTQ] (y)", width=width@entry=163,
        line_num=line_num@entry=1) at ../win/curses/cursmisc.c:275
    NixOS#9  0x00000000005b3f51 in curses_character_input_dialog (prompt=prompt@entry=0x7ffe68838cf0 "Shall I pick a character's race, role, gender and alignment for you?",
        choices=choices@entry=0x7ffe68838d70 "YNTQ", def=def@entry=121) at ../win/curses/cursdial.c:211
    NixOS#10 0x00000000005b9ca0 in curses_choose_character () at ../win/curses/cursinit.c:556
    NixOS#11 0x0000000000404eb1 in main (argc=<optimized out>, argv=<optimized out>) at ./../sys/unix/unixmain.c:309

which corresponds to `gcc` warning:

    ../win/curses/cursmisc.c: In function 'curses_break_str':
    ../win/curses/cursmisc.c:275:5: warning: '__builtin___strcpy_chk' writing one too many bytes into a region of a size that depends on 'strlen' [-Wstringop-overflow=]
      275 |     strcpy(substr, str);
          |     ^

I did not find a single small upstream change that fixes it. Let's
disable `fortify3` until next release.

Closes: NixOS#292113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants