Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create example compiling and running in raspberry pi with Mesa Vulkan drivers #131

Closed
axsaucedo opened this issue Feb 6, 2021 · 20 comments
Assignees
Labels
c++ documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed python

Comments

@axsaucedo
Copy link
Member

It seems there are some relatively recent advancements in the Vulkan Drivers support for RaspberryPis (https://www.raspberrypi.org/blog/vulkan-update-were-conformant/). This issue encompasses exploring putting together an end to end example similar to the android example that shows how to run Kompute on a Raspberry Pi using the Mesa driver which enables for Vulkan 1.0 compliant processing in the Raspberry Pi (https://gitlab.freedesktop.org/mesa/mesa)

@hpgmiskin
Copy link
Contributor

There is a blog post from June 2020 on how to get up and running with V3DV driver which I think would be a good starting point: v3dv: quick guide to build and run some demos – infapi00. Although it looks like some of the dependencies might be superfluous for running Vulkan Kompute.

Perhaps a more reusable example would come from being able to deploy Vulkan Kompute to a Raspberry Pi 4 using a service such as balena. Balena is used to deploy docker containers to IoT devices. The underlying operating system is balenaOS which is open source. A working Dockerfile which can be deployed at ease to an IoT device would make this library very attractive for those doing Edge processing.

On the topic of using Docker, I noticed the Vulkan Kompute Dockerfile makes use of the nvidia/vulkan base image. This image in turn is a combination of glvnd/devel/Dockerfile and vulkan/Dockerfile. Perhaps the latter Dockerfile implementations, coupled with one of the Balena base images, would be good a good starting point for getting this deployed as a container to a Raspberry Pi.

I realise this comment is a single perspective given the tools I am familiar with from work (the keen reader might have noticed Balena mentioned a couple of times). I do not work for Balena, but the company I work for makes extensive use of their services. Any feedback on the above thoughts would be most welcomed.

@axsaucedo
Copy link
Member Author

@hpgmiskin thank you for the comments! Sounds really interesting - for the example it may be best to have the simplest initial implementation, which could allow for one of the examples to run in a Raspberry Pi. Having a Docker image may be worth it as well, but ideally it could be developed such that the script to set up the dependencies can be separate (or on a Makefile). Ultimately you can see the dependencies required to run Kompute from scratch (whcih really are just libvulkan-dev and cmake) in here https://colab.research.google.com/drive/1l3hNSq2AcJ5j2E3YIw__jKy5n6M615GP?authuser=1#scrollTo=1BipBsO-fQRD

The base docker image that may be the best one to take as example would be the base runner - there are tons of dependencies that are installed and set up that you wouldn't need to (pretty much you only need the things outlined above, after setting up the drivers) https://github.com/EthicalML/vulkan-kompute/blob/master/docker-builders/KomputeBuilder.Dockerfile

I would suggest having an example like https://github.com/EthicalML/vulkan-kompute/tree/master/examples/logistic_regression where the location is relative to the repo, and everything can be built by just running cmake . && cmake --build . (once all the raspberry py v3dv driver dependencies are installed).

Happy to share further details on each of these points. Most of the points of the blog post are quite general, except the point where it uses the custom ICD (ie export VK_ICD_FILENAMES=/home/pi/local-install/share/vulkan/icd.d/broadcom_icd.armv7l.json) this is basically the ability to load the custom driver bindings (similar to what we have in the Docker image I showed above, but in that case we use Swiftshader to run on the CPU).

There is quite a lot of nuanced terminology on this, but perhaps the easiest is if you are up for it, you can create a fork, add a folder, and give it an initial shot - if you have any errors these probably would be either: 1) straightforward errors we've come across before, or 2) obscure errors that are related to v3dv for which we can reach out to the mesa team.

@axsaucedo axsaucedo added c++ documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed python labels Feb 21, 2021
@hpgmiskin
Copy link
Contributor

In some preliminary testing Vulkan Kompute is working on a Raspberry Pi 4.

Starting from 2021-01-11-raspios-buster-armhf and following v3dv: quick guide to build and run some demos – infapi00 the vulkan-kompute/test_array_multiplication.py test passes when run on a Pi 4.

I am going to pair back the dependencies to the bare minimum in order to build mesa and create an example in the examples folder with a README.md documenting the steps required.

@axsaucedo
Copy link
Member Author

@hpgmiskin awesome! That would be great, let me know if you have any questions or if you need a hand - I just did a pretty significant refactor so it may be best to actually build directly from master instead of the 0.6.0 branch. If you get stuck or see any issues, just post the debug logs and I'll be able to point you in the right direction.

@hpgmiskin
Copy link
Contributor

I got some time to experiment with this today. I will open a draft PR in order to share the (fairly rudimentary) README.md in the example folder. Currently only 8/12 tests are passing on master branch. I think some of the failing tests might be due to running on a 32 bit operating system so float64 is not available.

---------------------------- Captured stdout call -----------------------------
Dtype value float64
---------------------------- Captured stderr call -----------------------------
SPIR-V WARNING:
    In file ../src/compiler/spirv/spirv_to_nir.c:4285
    Unsupported SPIR-V capability: SpvCapabilityFloat64 (10)
    28 bytes into the SPIR-V binary
=========================== short test summary info ===========================
FAILED test_array_multiplication.py::test_array_multiplication - assert [0.0, 0.0, 0.0] == [2.0, 4.0, 6.0]
FAILED test_kompute.py::test_end_to_end - assert [2, 4, 12] == [4, 8, 12]
FAILED test_kompute.py::test_sequence - assert [0.0, 0.0, 0.0] == [2.0, 4.0, 6.0]
FAILED test_tensor_types.py::test_type_double - assert False
========================= 4 failed, 8 passed in 4.62s =========================
Kompute Sequence destroy called with null Device pointer
Kompute Tensor destructor reached with null Device pointer
Kompute Tensor destructor reached with null Device pointer
Kompute Sequence destroy called with null Device pointer
Kompute Tensor destructor reached with null Device pointer
Kompute Tensor destructor reached with null Device pointer
Kompute Tensor destructor reached with null Device pointer
Kompute Tensor destructor reached with null Device pointer

As for next steps:

  1. Try building mesa with v3d, kmsro and vc4 gallium drivers (currently I was only building with v3d).
  2. Attempt to update tests as part of the pull request in order to not depend on float64

Any feedback on PR or next steps much appreciated.

@axsaucedo
Copy link
Member Author

axsaucedo commented Mar 7, 2021

@hpgmiskin thank you for the heads-up! Interestingly enough I have added a feature this weekend which will allow us to explicitly circumvent this. The issue that you currently have is not specifically the architecture, it is just that many systems actually don't support float64 (which interestingly enough is Python's default - they use float64 or most commonly known double as their "float"). The feature added is particularly that you can specify the types of the tensors with the TensorT class, but I don't think it may even be necessary given that the default uses float32.

It seems a few tests actually pass, so it seems the issue may just be with some of the more particular features which often may not be supported in Mesa drivers. If you look at the tests that has that error is basicallyy the test_type_double which actually has a skip flag when running with the CI tests for that main reason.

Overall this seems quite positive tho! It would be good to understand what are the more specific features that are not working, could you re-run the tests but share all the debug logs? You can do this by running the tests with the following:

python -m pytest  -s --log-cli-level=DEBUG -v python/test/

Thank you for having a look at this Henry and for the PR, looks great - it looks great, I will actually reference it in the main documentation, similar to how we just add the README of the android example https://kompute.cc/overview/mobile-android.html

@hpgmiskin
Copy link
Contributor

Here is the full log output from running pytest.

pytest.log

Interesting the number of failed tests varies from 3 to 8 when I run the test suite multiple times. Could there be some sort of race condition which is making the test pass rate different?

@axsaucedo
Copy link
Member Author

Ok very interseting, thank you for sharing the logs - it indeed looks exactly like a race condition. It seems that when we run multiple record commands and then an eval, some of the basic steps are not waiting for each other - I have started exploring a fix to ensure the memory barriers address the issues, could you try re-running the tests in the PR #182?

@hpgmiskin
Copy link
Contributor

hpgmiskin commented Mar 9, 2021

@axsaucedo running the tests on #182 seems to mostly solve the issues. There was one occasion where test array multiplication failed. However, I was not able to reproduce this failure, even when using pytest-flakefinder. I will try a couple more times to see if something comes up.

As expected the double test was still failing. I will add a similar pytest skip, as is used with swiftshader, in order to have a fully passing test suite. When #182 is merged I can merge upstream master into this branch in order to update and test the latest Python test suite.

@axsaucedo
Copy link
Member Author

axsaucedo commented Mar 9, 2021

THank you @hpgmiskin, sounds quite positive! Ok great, I am adding an OpMemoryBarrier that will allow for barriers to be added on demand (not really needed for this but definitely still a feature I wanted to add), so I will merge the PR once that is added. It would be interesting to see why the Array mult test failed, mainly as this could mean that there may still be a barrier that is not properly configured

@axsaucedo
Copy link
Member Author

@hpgmiskin I have added another change to #182 and trying to investigate if there are any memory barriers that are set incorrectly or missing - I realised that the Staging memory was not using a HOST_COHERENT bit, which in the case the memory wasn't coherent we would have to flush the memory (which we don't currently do as we currently restrict to only coherent memory heap). I've gotten some further insights that the Pi heap memory is always coherent so this may not make difference but so far it does seem like an important piece. If you get a chance would be great if you can test and see if you can replicate the failure of the mult tests, otherwise I will merge that PR

@hpgmiskin
Copy link
Contributor

@axsaucedo I have installed and run the tests against the latest commit on #182, the only test to fail is the test type double. Even with running many replicate test there is no issue with the multiplication test.

@axsaucedo
Copy link
Member Author

Awesome - thank you for taking the time to test @hpgmiskin, this sounds quite positive so I'll merge now. I'll do some further investigation to ensure there are no discrepancies with the memory barriers but it seems this PR does address quite a lot of issues that would've appeared down the line. THanks!

@hpgmiskin
Copy link
Contributor

@axsaucedo I will merge in master later this weekend and update the tests to ignore double on broadcom GPU.

There is just one element it would be good to get your thoughts on. When testing the other day I realised an issue with ssh access to the device. Previously I had been using remote login while also logged in through a display, therefore, remote access to the renderer worked. When attempting to run the examples through remote login while the login page was displayed on the conventional screen resulted in an error.

Opening /dev/dri/renderD128 failed: Permission denied

This corresponds with experiences discussed on [Mesa-users] Vulkan VK_ICD_FILENAMES and /dev/dri/renderD128 permissions where the render access is not allowed until a non remote user has logged in.

You need to log into the system, I guess. The login process adds an acl to give you access to /dev/dri/renderD128 when you log in, and removes it when you log out (and probably change vt). If direct rendering works, then it works. Did you try to do your tests remotely with the system still on the login page? Been there, done that, got the T-shirt.

Is this something that you have had any experience of? I could attempt to brute force using chown but I assume this is a bad practice. Another consideration is to remove the instructions for running headless in the PR but this makes the barrier to entry higher and far reduces the potential use cases.

@axsaucedo
Copy link
Member Author

I don't think modifying permissions is always bad practice, it just depends whether it's being done in a way that is consistent with the expected access to the device and files - you should also be able to use chmod to ensure the relevant user/group have access. When you SSH you should be able to do so as specific users - whether root or another user. What is the user ID / name that you are currently logging in and seeing errors? Is that current renderD128 component currently accessible to the user? I don't have experience with this specifically but file permission errors is quite common especially in non-root Dockerfiles, ultimately there are various ways to address this, chmod should be a reasonable approach. Is this something that you can try?

@hpgmiskin
Copy link
Contributor

Thanks for the hand holding and sorry for all the questions.

Running ls -la /dev/dri results in

total 0
drwxr-xr-x  3 root root        120 Mar 13 10:17 .
drwxr-xr-x 17 root root       3680 Mar 13 10:17 ..
drwxr-xr-x  2 root root        100 Mar 13 10:17 by-path
crw-rw----  1 root video  226,   0 Mar 13 10:17 card0
crw-rw----  1 root video  226,   1 Mar 13 10:17 card1
crw-rw----  1 root render 226, 128 Mar 13 10:17 renderD128

Then running sudo chown root:video /dev/dri/renderD128 solved the issue 👍

@axsaucedo
Copy link
Member Author

No worries - great to hear @hpgmiskin !

@axsaucedo
Copy link
Member Author

Although looking at this I do think the best practice would be having the driver still owned by root, and using chmod ugo+r so it's visible to other users/groups.

@hpgmiskin
Copy link
Contributor

I have made some updates to the PR which should hopefully close out this issue 🤞

@axsaucedo
Copy link
Member Author

Awesome @hpgmiskin ! It does - awesome contribution! Just merged! Let me know your thoughts later on if you get any ideas for a potential follow-up, the edge detection use-case sounded really interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed python
Projects
None yet
Development

No branches or pull requests

2 participants