Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tree-sitter-extrator project to ruby/extractor/codeql-extractor. #15595

Closed
wants to merge 1 commit into from

Conversation

criemen
Copy link
Collaborator

@criemen criemen commented Feb 13, 2024

I investigated porting our ruby extractor build to bazel, and unfortunately, bazels rules_rust only support local dependencies inside a workspace, c.f. bazelbuild/rules_rust#1525.

Therefore, to be able to build the ruby extractor with bazel, I had to move the tree-sitter-extractor project out of shared/, where it morally belongs, into the ruby extractor workspace. Alternatives considered:

  • Provide a symlink from inside the workspace into shared/. This is possible, but due to the poor symlink support on Windows, it'd make working with the entire codeql repo on Windows more difficult. Our internal developers all (have to) have developer mode enabled, so that we can use symlinks in Bazel, but all external contributers on Windows would presumably run into trouble while cloning the repo. Imo, that's not acceptable.
  • Fake an environment that looks like it's a workspace using advanced bazel magic. That's possible, but would make the build much harder to understand. In particular, bazel doesn't (easily) allow writing to the source tree, so this is not an easy/quick hack. It's important to me that the build system can be understood by non-bazel experts, so this isn't acceptable either.
  • Fix the upstream issue. I don't have a good enough understanding of the issue or the code to do that.
  • Drop the Cargo.toml for the ruby extractor, and model all dependencies only on the bazel side. This'd break IDE support, and the bazel<->cargo integration is actually quite good (besides its limitations), so I don't want to do that.

Out of all these alternatives, moving the shared code into the ruby workspace seemed the least bad option.
Note that this works, because we want to build the ruby language pack with bazel, but not the ql-for-ql one - for now, I don't see an advantage of porting that to bazel anyways, so it's okay that it's blocked from being ported to bazel right now.

I'm happy to discuss this change and alternatives further. If we agree that this is the way forward, then this needs to be merged together with a simple corresponding change in the internal repo.

…or`.

I investigated porting our ruby extractor build to bazel, and unfortunately,
bazels `rules_rust` only support local dependencies inside a workspace,
c.f. bazelbuild/rules_rust#1525.

Therefore, to be able to build the ruby extractor with bazel,
I had to move the `tree-sitter-extractor` project out of `shared/`, where
it morally belongs, into the ruby extractor workspace.
Alternatives considered:
- Provide a symlink from inside the workspace into `shared/`. This is possible,
  but due to the poor symlink support on Windows, it'd make working with
  the entire codeql repo on Windows more difficult.
  Our internal developers all (have to) have developer mode enabled, so that
  we can use symlinks in Bazel, but all external contributers on Windows would
  presumably run into trouble while cloning the repo. Imo, that's not acceptable.
- Fake an environment that looks like it's a workspace using advanced
  bazel magic. That's possible, but would make the build much harder to understand.
  In particular, bazel doesn't (easily) allow writing to the source tree,
  so this is not an easy/quick hack. It's important to me that the build system
  can be understood by non-bazel experts, so this isn't acceptable either.
- Fix the upstream issue. I don't have a good enough understanding of the issue
  or the code to do that.
- Drop the `Cargo.toml` for the ruby extractor, and model all dependencies only
  on the bazel side. This'd break IDE support, and the bazel<->cargo integration
  is actually quite good (besides its limitations), so I don't want to do that.

Out of all these alternatives, moving the shared code into the ruby workspace
seemed the least bad option.
Note that this works, because we want to build the ruby language pack with bazel,
but not the ql-for-ql one - for now, I don't see an advantage of porting that to
bazel anyways, so it's okay that it's blocked from being ported to bazel right now.

I'm happy to discuss this change and alternatives further.
If we agree that this is the way forward, then this needs to be merged together
with a simple corresponding change in the internal repo.
@criemen criemen marked this pull request as ready for review February 13, 2024 10:37
@criemen criemen requested review from a team as code owners February 13, 2024 10:37
@hmac
Copy link
Contributor

hmac commented Feb 13, 2024

Another alternative is presumably to publish tree-sitter-extractor on crates.io? Then it would be a normal dependency.

@aibaars
Copy link
Contributor

aibaars commented Feb 13, 2024

What about other users of the shared extractor?

Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea.

@criemen
Copy link
Collaborator Author

criemen commented Feb 13, 2024

@hmac Ah yes, publishing it, or moving it to another public repo are other options.
I don't know about publishing at all, and moving it to another repo would presumably increase friction in the development process, and we'd have to go through another open-source procss (although light-weight).

@aibaars There are no other uses of the shared code so far, python doesn't use it, and we have (as of now) no other tree-sitter extractors.

@criemen
Copy link
Collaborator Author

criemen commented Feb 13, 2024

After a long slack conversation with @aibaars , he came up with the idea of specifying the dependency on the shared extractor as git repo. Initial testing seems that this is working (it's just extremely slow, re-pinning the dependencies in bazel takes 5+min on my computer), but as this only has to happen on lockfile updates, I think that's acceptable.

I'll investigate this further, and if it doesn't work out, look at the crates.io and separate repo options instead of moving the extractor.
Thanks all for your input!

@criemen criemen closed this Feb 13, 2024
@tausbn
Copy link
Contributor

tausbn commented Feb 13, 2024

@aibaars There are no other uses of the shared code so far, python doesn't use it, and we have (as of now) no other tree-sitter extractors.

For future reference, QL-for-QL also uses it: https://github.com/github/codeql/blob/main/ql/extractor/Cargo.toml#L20

criemen added a commit that referenced this pull request Feb 19, 2024
This PR introduces a bazel and `rules_rust`-based build system
for the ruby extractor and language pack.
This replacese the existing, `cargo` and `cross`-based build system.

For local development, nothing changes, and the existing `cargo`-based
build still keeps working as-is.

We no longer need to use `cross` to compile our Linux binaries,
as we now can link against our hermetic C++ toolchain, which ships
with an old enough glibc, so that we don't run into symbol version issues
when deploying the binaries to older systems.
Besides the one change in dependency (explained in detail in `Cargo.toml`
and in #15595), nothing ought to
change in how we build the extractor.
criemen added a commit that referenced this pull request Feb 20, 2024
This PR introduces a bazel and `rules_rust`-based build system
for the ruby extractor and language pack.
This replacese the existing, `cargo` and `cross`-based build system.

For local development, nothing changes, and the existing `cargo`-based
build still keeps working as-is.

We no longer need to use `cross` to compile our Linux binaries,
as we now can link against our hermetic C++ toolchain, which ships
with an old enough glibc, so that we don't run into symbol version issues
when deploying the binaries to older systems.
Besides the one change in dependency (explained in detail in `Cargo.toml`
and in #15595), nothing ought to
change in how we build the extractor.
criemen added a commit that referenced this pull request Feb 20, 2024
This PR introduces a bazel and `rules_rust`-based build system
for the ruby extractor and language pack.
This replacese the existing, `cargo` and `cross`-based build system.

For local development, nothing changes, and the existing `cargo`-based
build still keeps working as-is.

We no longer need to use `cross` to compile our Linux binaries,
as we now can link against our hermetic C++ toolchain, which ships
with an old enough glibc, so that we don't run into symbol version issues
when deploying the binaries to older systems.
Besides the one change in dependency (explained in detail in `Cargo.toml`
and in #15595), nothing ought to
change in how we build the extractor.
criemen added a commit that referenced this pull request Feb 20, 2024
This PR introduces a bazel and `rules_rust`-based build system
for the ruby extractor and language pack.
This replacese the existing, `cargo` and `cross`-based build system.

For local development, nothing changes, and the existing `cargo`-based
build still keeps working as-is.

We no longer need to use `cross` to compile our Linux binaries,
as we now can link against our hermetic C++ toolchain, which ships
with an old enough glibc, so that we don't run into symbol version issues
when deploying the binaries to older systems.
Besides the one change in dependency (explained in detail in `Cargo.toml`
and in #15595), nothing ought to
change in how we build the extractor.
criemen added a commit that referenced this pull request Feb 23, 2024
This PR introduces a bazel and `rules_rust`-based build system
for the ruby extractor and language pack.
This replacese the existing, `cargo` and `cross`-based build system.

For local development, nothing changes, and the existing `cargo`-based
build still keeps working as-is.

We no longer need to use `cross` to compile our Linux binaries,
as we now can link against our hermetic C++ toolchain, which ships
with an old enough glibc, so that we don't run into symbol version issues
when deploying the binaries to older systems.
Besides the one change in dependency (explained in detail in `Cargo.toml`
and in #15595), nothing ought to
change in how we build the extractor.
criemen added a commit that referenced this pull request Feb 26, 2024
This PR introduces a bazel and `rules_rust`-based build system
for the ruby extractor and language pack.
This replacese the existing, `cargo` and `cross`-based build system.

For local development, nothing changes, and the existing `cargo`-based
build still keeps working as-is.

We no longer need to use `cross` to compile our Linux binaries,
as we now can link against our hermetic C++ toolchain, which ships
with an old enough glibc, so that we don't run into symbol version issues
when deploying the binaries to older systems.
Besides the one change in dependency (explained in detail in `Cargo.toml`
and in #15595), nothing ought to
change in how we build the extractor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants