Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Adding Bazel Platforms support #3779

Merged
merged 19 commits into from
Mar 25, 2022
Merged

Adding Bazel Platforms support #3779

merged 19 commits into from
Mar 25, 2022

Conversation

nicknezis
Copy link
Contributor

@nicknezis nicknezis commented Feb 27, 2022

Some build script cleanup leveraging the Bazel platforms feature.

https://github.com/aosp-riscv/platform-build-bazel

Context below:

While working through a bug we ran into with MacOS failing to compile, I got a better understanding of some of the Bazel settings. In another thread with Josh, who is compiling on the Apple M1 for the first time, he ran into issues with Linux vs OSX.

It dawned on me that we were explicitly setting the config to modify a number of settings. Some of them include the following items:

  1. Platform libraries (Linux C libraries)
  2. Platform + CPU decision on which binary to use (Helm binary)
  3. OS decision (MacOS unique steps vs Linux as the default)
  4. Optimization setting (We just hardcoded to C03, but Bazel reported as "fastbuild")
  5. Stylecheck toggle (darwin vs darwin_nostyle)

In a recent PR, I collapsed all of the Linux flavors to a standard --config=linux. This was born out of my frustration with the build scripts not jiving with Bazel standards. But it was not actually solving the problem. I'm hoping that this PR will help resolve some of the issues. 

Changes: 

  1. No longer need to set --config. Bazel will detect the OS automatically.
  2. Default is fastbuild, but you can add -c opt to build optimized build. 
    a. You can even build both side by side with Bazel making a folder like darwin-fastbuild and darwin-opt3. If you want to run stylecheck, you can add --config=stylecheck
  3. The Linux libraries (-lm, -lpthread, -lrt) are included in tools/platform/BUILD
  4. As we add support for ARM, we can add more options in tools/platform/BUILD6. Scripts outside of Bazel were updated to remove --config.
  5. Scripts outside of Bazel will specify  -c opt or --config=stylecheck when needed.

@nicknezis
Copy link
Contributor Author

I decided to remove the Travis CI special settings because some of them were outdated settings which were needed to use a modern C++ compiler. We are now on a more modern version of Ubuntu, so it's not needed anymore. Also With the other .bazelrc cleanup, I thought it would be a good opportunity to clean it up.

One of the benefits I see is that the build completed in 1 hr 45 min. This is down from the nearly 3 hour long builds. I'm not sure what the difference was. One theory is maybe it was due to us re-running the style check on each stage of the process (build, non-flaky unit test, flaky unit test). Another theory is that by limiting the memory used and also the jobs to 25, maybe that slowed things down. But I'm taking this opportunity to test TravisCI and it seems like it's working.

@nicknezis nicknezis marked this pull request as ready for review February 27, 2022 19:48
@surahman
Copy link
Member

One of the benefits I see is that the build completed in 1 hr 45 min.

Yes, this is awesome! 🥳

2. Default is fastbuild, but you can add -c opt to build optimized build.

We should document this in the build script as well as the online documentation?

@joshfischer1108 has this alleviated your build issues on Apple Silicone?

@nicknezis
Copy link
Contributor Author

We should document this in the build script as well as the online documentation?

Updated the Compiling overview, compiling linux and compiling MacOS pages in the docs. Also found some other issues with old version of Bazel being referenced (0.26). I updated our docs to use Bazelisk, which automatically manages the verzion of Bazel. This will save us from having to update the docs each time we bump the version. This was especially annoying each time because of the old versioned docs in the repo that should not be updated.

@nicknezis
Copy link
Contributor Author

Some other bazel.rc related cleanup. I removed the tools/docker/bazel.rc file because it was no longer needed. It was limiting the cpu/memory of Bazel when running in Docker. This was primarily due to an issue with JDK locking up due to memory bloat. This is no longer an issue with JDK 11, so we don't need it anymore. Removed the file and removed references to it in the Docker scripts.

Here is an example linked off of the Bazel issue in which they document the fact that JDK 11 resolved many Docker related issues. bazelbuild/bazel#6592

@nicknezis
Copy link
Contributor Author

I also removed mention of Applatix CI scripts. Applatix was a Kubernetes related hosting platform which was acquired by Intuit in 2018. They seemed to be a copy of our other CI scripts, but were not used by Heron and were not maintained. So they are now purged.

@nicknezis nicknezis marked this pull request as draft February 27, 2022 23:09
@surahman
Copy link
Member

surahman commented Feb 28, 2022

I was just looking through the build logs and I am not sure if I am interpreting what I am seeing correctly but the only set of tests that seem to be running are the integration tests.

#10424 passed
===========================================================
heron build	0:59:59
heron test non-flaky	0:04:48
heron test flaky	0:01:18
heron build tarpkgs	0:01:10
heron build binpkgs	0:00:37
heron build docker images	0:01:12
===> Finished scripts/travis/build.sh at 2022-02-27 22:33:10 (1:09:06)
===> Starting scripts/travis/test.sh at 2022-02-27 22:33:14
===> Starting heron build integration_test at 2022-02-27 22:33:14
bazel build integration_test/src/... > heron_build_integration_test.txt
64 seconds 30 log lines
 `bazel build integration_test/src/...` finished with errcode: 0
===> Finished heron build integration_test at 2022-02-27 22:34:18 (0:01:04)
===> Starting heron install at 2022-02-27 22:34:18
bazel run -- scripts/packages:heron-install.sh --user > heron_install.txt
18 seconds 34 log lines
 `bazel run -- scripts/packages:heron-install.sh --user` finished with errcode: 0
===> Finished heron install at 2022-02-27 22:34:37 (0:00:19)
===> Starting heron tests install at 2022-02-27 22:34:37
bazel run -- scripts/packages:heron-tests-install.sh --user > heron_tests_install.txt
6 seconds 34 log lines
 `bazel run -- scripts/packages:heron-tests-install.sh --user` finished with errcode: 0
===> Finished heron tests install at 2022-02-27 22:34:43 (0:00:06)
===> Starting heron integration_test local at 2022-02-27 22:34:43
===> Finished heron integration_topology_test java at 2022-02-27 23:12:55 (0:04:36)
===> Task duration summary for scripts/travis/test.sh
===========================================================
heron build integration_test	0:01:04
heron install	0:00:19
heron tests install	0:00:06
heron integration_test local	0:05:55
heron integration_test http-server initialization	0:00:00
heron integration_test scala	0:02:45
heron integration_test java	0:18:30
heron integration_test python	0:06:25
heron integration_topology_test java	0:04:36
===> Finished scripts/travis/test.sh at 2022-02-27 23:12:55 (0:39:41)
===> Task duration summary for scripts/travis/ci.sh
===========================================================
scripts/travis/build.sh	1:09:06
scripts/travis/test.sh	0:39:41

Are the "regular"/non-integration battery of tests still run? I may have missed the output in the logs. I remember heron test taking at least 30mins before and logs indicate heron test flaky/non-flaky are taking a combined ~6mins.

@nicknezis
Copy link
Contributor Author

Are the "regular"/non-integration battery of tests still run? I may have missed the output in the logs. I remember heron test taking at least 30mins before and logs indicate heron test flaky/non-flaky are taking a combined ~6mins.

Interesting. I see them mentioned, but it's hard to tell with the way to log output is hidden away. I'll do some local testing.

@surahman
Copy link
Member

I think everything is okay with the tests. I did not see anything in the diff which would turn it off and this PR run log has the same sort of output without bazel test being specifically noted.

@nicknezis nicknezis marked this pull request as ready for review March 7, 2022 12:58
Copy link
Member

@surahman surahman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of really important changes in this PR.

scripts/travis/build.sh Show resolved Hide resolved
scripts/get_all_heron_paths.sh Show resolved Hide resolved
scripts/run_integration_test.sh Outdated Show resolved Hide resolved
scripts/run_integration_topology_test.sh Outdated Show resolved Hide resolved
@surahman
Copy link
Member

surahman commented Mar 22, 2022

Great work on this PR, there are some really important changes. I know some of these changes are cascading and build on/require each other but it would be better to introduce them as multiple small PRs. This way if there is an issue with some of the changes, we can revert the culprits without losing all changes.

I would get @nwangtw to once over and approve the changes before merging. It looks good to me. I have a few comments/questions - none of which would hold anything up on my end.

It would be nice if @joshfischer1108, or someone else with an Apple Silicone machine, could let us know if they are encountering any other build issues.

@thinker0
Copy link
Member

# BigSur-11.6.5 (20G527)
%  clang -v
Apple clang version 12.0.5 (clang-1205.0.22.11)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

% ./docker/scripts/test-unittest.sh darwin 0.20.5
================================================================================
INFO: Elapsed time: 954.681s, Critical Path: 305.35s
INFO: 3328 processes: 1144 internal, 2184 local.
INFO: Build completed successfully, 3328 total actions
Test cases: finished with 1189 passing and 0 failing out of 1189 test cases

Executed 247 out of 247 tests: 247 tests pass.
INFO: Build completed successfully, 3328 total actions
Cleaning up scratch dir

@windhamwong
Copy link
Contributor

Just tested on the ubuntu22.04, and found that,
If docker version is lower than (<20.10.10), the apt update will fail.
Should we mention in somewhere or in doc to remind people to upgrade?

@nicknezis
Copy link
Contributor Author

@windhamwong I added a comment above the apt update for both the compile Dockerfile.ubuntu22.04 and the dist Dockerfile.dist.ubuntu22.04.

@thinker0
Copy link
Member

% ./docker/scripts/test-unittest.sh centos7 0.20.5
================================================================================
INFO: Elapsed time: 1583.309s, Critical Path: 234.97s
INFO: 3608 processes: 1160 internal, 2448 local.
INFO: Build completed successfully, 3608 total actions
Test cases: finished with 1189 passing and 0 failing out of 1189 test cases

Executed 247 out of 247 tests: 247 tests pass.
INFO: Build completed successfully, 3608 total actions
Cleaning up scratch dir

@thinker0
Copy link
Member

-ADD bazelrc /root/.bazelrc
./docker/scripts/test-unittest.sh rocky8 0.20.5
================================================================================
INFO: Elapsed time: 1673.469s, Critical Path: 259.39s
INFO: 3608 processes: 1160 internal, 2448 local.
INFO: Build completed successfully, 3608 total actions
Test cases: finished with 1189 passing and 0 failing out of 1189 test cases

Executed 247 out of 247 tests: 247 tests pass.
INFO: Build completed successfully, 3608 total actions
Cleaning up scratch dir

@thinker0
Copy link
Member

+1

Hmm I don't have a Reviewer, so I don't have an Approved button.

@joshfischer1108
Copy link
Member

+1

Hmm I don't have a Reviewer, so I don't have an Approved button.

We need to get you added to the Apache Foundation Github Org as a contributor. Have you done the necessary steps to get this enabled after you were given committer access to the repo?

@joshfischer1108
Copy link
Member

Great work on this PR, there are some really important changes. I know some of these changes are cascading and build on/require each other but it would be better to introduce them as multiple small PRs. This way if there is an issue with some of the changes, we can revert the culprits without losing all changes.

I would get @nwangtw to once over and approve the changes before merging. It looks good to me. I have a few comments/questions - none of which would hold anything up on my end.

It would be nice if @joshfischer1108, or someone else with an Apple Silicone machine, could let us know if they are encountering any other build issues.

I will attempt to build this weekend.

Copy link
Contributor

@windhamwong windhamwong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issue with the PR
I can build on ubuntu20.04 andubuntu22.04

@thinker0
Copy link
Member

#3779 (comment)

Yes, I Want

@surahman
Copy link
Member

I will attempt to build this weekend.

Thank you, it would be great to know if all is well on Apple Silicone. Choi Se confirmed everything is building on Apple x86_64.

We need to get you added to the Apache Foundation Github Org as a contributor. Have you done the necessary steps to get this enabled after you were given committer access to the repo?

Let us get this set up so he can merge PRs.

Add back missing Dockerfile for Ubuntu22.04
@nicknezis
Copy link
Contributor Author

I think this is now ready to merge. For the Apple M1 support. I think this currently won't yet support it. But primarily because we need to add some missing ARM dependencies. For example the helm binary is currently x86. But we can add M1 support in a future PR. This PR definitely gets us much closer to being able to support it.

@surahman
Copy link
Member

I think this is now ready to merge.

Completely agree and Apple Silicon support should not block this. It has a lot of very broad and required changes.

@surahman
Copy link
Member

Resolved a silly merge conflict in the WORKSPACE related to extra whitespace.

@nicknezis nicknezis merged commit 8841d1c into master Mar 25, 2022
@nicknezis nicknezis deleted the nicknezis/bazel-platforms branch March 25, 2022 20:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants