Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Gitlab inclusion globbing #3500

Merged
merged 7 commits into from
Oct 30, 2024

Conversation

abmussani
Copy link
Contributor

Description:

As on the customer demands, similar to github source, This PR implements inclusion of globs for Gitlab sources. Two new flags are introduce similar to Github, --include-repos and --exclude-repos respectively. Since, glob patterns are validated and ignore if not compiled, therefore, test to check those errors is removed.

Gitlab Soruce test results:

Screenshot 2024-10-24 at 4 26 06 PM

--help results for Gitlab source:

Screenshot 2024-10-24 at 4 19 03 PM

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

…o to support globbing.

Apply globbing filter when repos is not provided.
remove test to check errors if globs are invalid.
@abmussani abmussani requested review from a team as code owners October 24, 2024 11:27
Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a pretty faithful reproduction of the behavior of globbing in the GitHub source, which is smart: it's great for consistency and it reduces the chance of weird errors creeping in. I have some suggestions to improve, but they're just suggestions, especially considering the GitHub source behavior here:

  • Defer to @rosecodym here, but what do you think about glob compilation errors becoming error conditions? My thinking here is scanning something you don't want to scan, or not scanning something you definitely wanted to scan is probably something you want to address upfront, so canceling the scan might be more desirable.
  • You could note that exclude takes precedence over include in the CLI flags usage info
  • Not 100% sure about this, but you could also make the condition where the configuration of both include and exclude is such that no repos are included an error condition. The thing this would avoid is having a user typo a glob, get a ✅ from trufflehog, and think everything is peachy when they inadvertently didn't scan anything.

If you do decide to do any of these things (and to be clear IMO it's totally up to you), there's the Validator interface where you could do checking of the glob itself and any effects of the globbing.

Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this! Copying the GitHub implementation is a great idea, but that implementation is more complicated than what GitLab needs - in particular, GitLab doesn't need repo caching, as far as I can tell. I've left some comments inline with more details, and feel free to reach out to me directly with questions.

// filteredRepoCache is a wrapper around cache.Cache that filters out repos
// based on include and exclude globs.
type filteredRepoCache struct {
cache.Cache[string]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the cache part of filteredRepoCache is ever used, so I don't think this is really a cache! That's a good thing - the repo caching in the GitHub source is pretty complicated, and it's good that we don't have to do it here too. You can remove the embedded cache.Cache and rename this structure - or even eliminate it entirely and pass the include/exclude globs directly to buildIgnorer (which would mean making buildIgnorer a little more complicated, but would also mean that you don't have to make anything else more complicated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosecodym the idea is to compile it very first time and just reuse it, similar to Github. Moving compilation in buildignorer will compile globs for all kind of Projects (user projects, group projects etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I do agree that cache.Cache is not needed in Gitlab source. I will take a look on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosecodym I have removed the concept of caching and renamed the entity 'globRepoFilter'. I did not eliminate the whole structure as I believe having that is just making the code more clear. Let me know what are your thoughts on this.

for _, ig := range include {
g, err := glob.Compile(ig)
if err != nil {
ctx.Logger().V(1).Info("invalid include glob", "include_value", ig, "err", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I definitely think that glob compilation errors should be a fatal error. At the very least we should log it with Error instead of Info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the code back to support compilation code. infact, now compilation will be done at validation, enumeration or chunk func like it was done before.

modified changes to support glob compilation errors.
@abmussani abmussani requested a review from rosecodym October 25, 2024 11:51
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks! i added a note about an unused parameter and a question about some comments, but nothing blocking. can you also confirm my reading that this pr exposes the repo exclusion globbing that was always present in the engine but was not previously exposed on the command line?

include, exclude []glob.Glob
}

func newGlobRepoFilter(ctx context.Context, include, exclude []string, onCompileErr func(err error, pattern string)) *globRepoFilter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like ctx is used here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for highlighting. Right, Its not being used. Let me clean it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -266,6 +266,10 @@ type GitlabConfig struct {
Filter *common.Filter
// SkipBinaries allows skipping binary files from the scan.
SkipBinaries bool
// IncludeRepos is a list of repositories to include in the scan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a list of repositories or repository glob patterns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of repository glob patterns.

@abmussani
Copy link
Contributor Author

this pr exposes the repo exclusion globbing that was always present in the engine but was not previously exposed on the command line?

Correct. I could not see a way to set IgnoreRepo in Gitlab source.

@abmussani abmussani merged commit f4670aa into trufflesecurity:main Oct 30, 2024
13 checks passed
@abmussani abmussani deleted the gitlab-globbing branch October 30, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants