Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add find command #99

Merged
merged 21 commits into from
Jun 28, 2024
Merged

Conversation

rebornplusplus
Copy link
Member

@rebornplusplus rebornplusplus commented Oct 12, 2023

This PR adds the find command which can be used to query the chisel releases for available slices.

Usage:
  chisel find [find-OPTIONS] [<query>...]

The find command queries the chisel releases for available slices.

With the --release flag, it queries for slices in a particular branch
of chisel-releases repository[1] or a particular directory. If left
unspecified, it queries with the release info found in /etc/lsb-release.

[1] https://github.com/canonical/chisel-releases

[find command options]
      --release=<branch|dir>      Chisel release branch or directory

Examples

$ chisel find --release ubuntu-20.04 python3 2>/dev/null
Slice                                    Package               Release
libpython3.8-minimal_config              libpython3.8-minimal  ubuntu-20.04
libpython3.8-minimal_libs                libpython3.8-minimal  ubuntu-20.04
libpython3.8-stdlib_all-os               libpython3.8-stdlib   ubuntu-20.04
...
... (trimmed) ...
...
python3.8_core                           python3.8             ubuntu-20.04
python3.8_standard                       python3.8             ubuntu-20.04
python3.8_utils                          python3.8             ubuntu-20.04
$ chisel find libc6_libs 2>/dev/null
Slice       Package  Release
libc6_libs  libc6    ubuntu-22.04
$ chisel find foo bar 2>/dev/null
No matching slices for "foo bar"

Please let me know what you think of the changes. I think I could use some suggestions with naming overall, and particularly the value of maxStrDist constant and logic in matchSlice in /cmd/chisel/cmd_find.go.


  • Have you signed the CLA?

This commit adds the ``find`` command which can be used to query the
chisel releases for available slices.

---

Usage:
  chisel find [find-OPTIONS] [<query>...]

The find command queries the chisel releases for available slices.

With the --release flag, it queries for slices in a particular branch
of chisel-releases repository[1] or a particular directory. If left
unspecified, it queries with the release info found in /etc/lsb-release.

[1] https://github.com/canonical/chisel-releases

[find command options]
      --release=<branch|dir>      Chisel release branch or directory
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
This commit changes the logic of matching a query against a slice.
Following the new behaviour, a slice name will be listed as a match if
the query is a sub-string of the slice name, in pkg_slice format.
Or, it will be listed if the query is atmost 1 (One) Levenshtein
distance [1] away from the slice name, in pkg_slice format.

[1] https://en.wikipedia.org/wiki/Levenshtein_distance
3ab898e introduced new changes in behaviour. Fix the tests accordingly.
Copy link
Contributor

@woky woky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find anything to nitpick on. LGTM!

cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find_test.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@cjdcordeiro cjdcordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks ;)

@rebornplusplus rebornplusplus mentioned this pull request Oct 19, 2023
1 task
Copy link

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just left a couple of nit comments/suggestions.

cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_help.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is looking very good, just left some minor nitpicks and items than can be addressed in other PRs. Some general comment:

I think we do a pretty good job in the help message by directing the user to the chisel-releases repository and telling them that releases match branches. However, I think that we do not do such a great job with error messages. For example:

$ ./chisel find libc6 --release ubuntu
error: invalid release reference: "ubuntu"

There is no mention of the expected format (e.g. name-number), and even with the correct format the error message can be improved, see:

$ ./chisel find libc6 --release ubuntu-11
error: no information for ubuntu-11 release

We could say above what the available releases are or point the user to the repository so he can see the branch names. Because the error messages were not introduced as part of this PR, I think it is best to create an issue and address it in the future, do you agree?

cmd/chisel/cmd_find.go Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
@rebornplusplus
Copy link
Member Author

We could say above what the available releases are or point the user to the repository so he can see the branch names. Because the error messages were not introduced as part of this PR, I think it is best to create an issue and address it in the future, do you agree?

I agree. I will create an issue for this after this PR is closed.

Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good when re-reviewing it. I left a couple of nitpicks. The only thing that I think could be improved in general lines is how we fuzzy match the string but I seem to recall from months ago that we would treat that as a separate PR.

return err
}
if slices == nil {
fmt.Fprintf(Stdout, "No matching slices for \"%s\"\n", query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use fmt.Printf for simplicity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit sceptical about doing that since the existing codes make use of the Stdout variable for testing purposes and there are no occurrences of directly using fmt.Print*:

$ grep -rni "fmt.Print" ./
./internal/jsonwall/jsonwall.go:28://	        fmt.Println(app.Name, "version:", app.Version)
./internal/jsonwall/jsonwall.go:37://	                        fmt.Println(app.Name, "version:", app.Version)
$ grep -rni "fmt.FPrint.*Stdout" ./
./cmd/chisel/cmd_version.go:31:	fmt.Fprintf(Stdout, "%s\n", cmd.Version)
./cmd/chisel/cmd_help.go:177:	fmt.Fprintln(Stdout, longChiselDescription)
./cmd/chisel/cmd_help.go:178:	fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:179:	fmt.Fprintln(Stdout, chiselUsage)
./cmd/chisel/cmd_help.go:180:	fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:181:	fmt.Fprintln(Stdout, chiselHelpCategoriesIntro)
./cmd/chisel/cmd_help.go:185:	fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:186:	fmt.Fprintln(Stdout, chiselHelpAllFooter)
./cmd/chisel/cmd_help.go:191:	fmt.Fprintln(Stdout, chiselHelpFooter)
./cmd/chisel/cmd_help.go:197:	fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:205:		fmt.Fprintf(Stdout, "%*s: %s\n", maxLen+2, categ.Label, strings.Join(categ.Commands, ", "))
./cmd/chisel/cmd_help.go:230:		fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:231:		fmt.Fprintf(Stdout, "  %s (%s):\n", categ.Label, categ.Description)
./cmd/chisel/cmd_help.go:237:				fmt.Fprintf(Stdout, "    %*s  %s\n", -maxLen, name, cmd.ShortDescription)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's keep it like this if it is the convention. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out stdout is used in the tests like we discussed weeks ago. This comment can be resolved.

cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
cmd/chisel/cmd_find.go Outdated Show resolved Hide resolved
@cjdcordeiro cjdcordeiro added the Priority Look at me first label Mar 14, 2024
@letFunny letFunny added the Blocked Waiting for something external label Jun 4, 2024
@letFunny
Copy link
Collaborator

letFunny commented Jun 4, 2024

Had a high level discussion with @niemeyer about the search functionality. We agreed that for the first iteration we should try to keep things simple but also make the UX feel natural to the user.

To achieve that we are going to use a string distance of 1 to see if two strings match. Additionally, we are going to take each term passed to the CLI (i.e. chisel find <term1> <term2> ...) and try to match those against package name or slice name but the important thing is that all of the terms have to match in order to add that slice to the results.

@letFunny letFunny removed the Priority Look at me first label Jun 4, 2024
@letFunny letFunny added Priority Look at me first Simple Nice for a quick look on a minute or two and removed Blocked Waiting for something external Priority Look at me first labels Jun 6, 2024
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking reasonable, thanks. Only a few nitpicks.

return err
}
if len(slices) == 0 {
fmt.Fprintf(Stdout, "No matching slices for \"%s\"\n", strings.Join(cmd.Positional.Query, " "))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Stdout/Stderr/ -- It's good practice not to send warnings, errors, and prose in general to Stdout when what is expected is structured output. Another option is to do the exact same output, but present an empty list. What do we do in such cases with snapd?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snap uses stderr, I will change it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w := tabWriter()
fmt.Fprintf(w, "Slice\tPackage\tRelease\n")
for _, s := range slices {
fmt.Fprintf(w, "%s\t%s\t%s\n", s, s.Package, releaseLabel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the typical package_slice syntax I think?

Also, why are we presenting Release in every line when this was an input parameter and will be the same on every line?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is conforming to the output format defined in the spec. Example taken from the doc:

Slice           Package     Release
apache2_bins    apache2     ubuntu-22.04
apache2_libs    apache2     ubuntu-22.04
....

Let's discuss it and potentially change it.

func match(slice *setup.Slice, query string) bool {
const maxStrDist = 1
fuzzyMatch := func(str, query string) bool {
return strdist.Distance(str, query, strdist.StandardCost, maxStrDist+1) <= maxStrDist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this should be a normal function instead of a closure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer applicable with the latest changes.

}

// findSlices goes through the release searching for any slices that match
// the query string. It returns a list of slices that match the query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// findSlices returns slices from the provided release that contain all of the query strings (AND).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1186a98.


// findSlices goes through the release searching for any slices that match
// the query string. It returns a list of slices that match the query.
func findSlices(release *setup.Release, queries []string) (slices []*setup.Slice, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/queries/query/ (it's one query with several required strings, not several independent queries, which would hint at an OR relationship)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1186a98.

fuzzyMatch := func(str, query string) bool {
return strdist.Distance(str, query, strdist.StandardCost, maxStrDist+1) <= maxStrDist
}
return strings.Contains(slice.String(), query) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't look right. For example, why are we happy to match "ab" against "abcdefghij", but then do a fuzzy match between them and request that the delta is only 1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, snap find looks for substrings as well, if you look for "python" you get:

Name                    Version                 Publisher          Notes    Summary
python3-alt             1.0.0                   muqtxdir-m         -        python3 snap packaging
python-ai-toolkit       0.1.5.4                 mauringo           -        Collection of common python packages for Data Analysis and AI
python0                 0.9.1                   zygoon             classic  Ancient version of Python for programming archeologists
python36-jamesh         3.6.4                   jamesh             -        Python language interpreter and standard library
python2-jamesh          2.7.14                  jamesh             -        Python language interpreter and standard library
python38                3.8.0                   om26er             -        Python 3.8
python36-arun           3.6.4                   arun.prasad        -        Python language interpreter and standard library

The difference is that they do not use fuzzy matching at all. Do we want to go that route?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we will take search for "" for each term in the query. That will allow us to get the substring behavior for free. We will also allow * and ? inside the query.

@niemeyer The only thing I am not sure about is whether we still want to allow a distance of 1 with other characters. In previous versions python3.1x would have matched python3.10 but now that we can write python3.1? I am not sure there is a need for that anymore (in the current PR the distance ONLY takes into account globs).

Label: label,
Version: version,
})
releaseLabel = label + "-" + version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return a releaseLabel that disagrees with release.Label? At the very least, we should not call this the release label, but maybe we should not return it at all? How's that information used, and what's the best approach considering that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I have removed it now that we do not use it for the output so there is no need to think of a clearer name.

}

// readOrFetchRelease takes a release branch name or a release directory path.
// It fetches or reads the chisel-release depending on the nature of input and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/readOrFetch/obtain/?

We always read, maybe local, maybe after fetching, but we don't have to try to convey all these details via the function name.

Then, the documentation needs to follow usual Go conventions, and should describe what it does more clearly. For example:

// obtainRelease returns the Chisel release information matching the provided string,
// fetching it if necessary. The provided string should either be a "<name>-<version>" pair,
// or the path to a directory containing a previously fetched release.

... or something along those lines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 99b9496.

@letFunny letFunny added the Blocked Waiting for something external label Jun 28, 2024
@letFunny letFunny removed the Blocked Waiting for something external label Jun 28, 2024
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Alberto. Probably final pass.

`

var cutDescs = map[string]string{
"release": "Chisel release directory",
"release": "Chisel release directory or Ubuntu version",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Chisel release name or directory (e.g. ubuntu-22.04)"

The directory case is quite special. We don't need to spend much time on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in d124e25.

@@ -18,10 +13,13 @@ var shortCutHelp = "Cut a tree with selected slices"
var longCutHelp = `
The cut command uses the provided selection of package slices
to create a new filesystem tree in the root location.

By default it fetches the slices for the latest Ubuntu
version, unless the --release flag is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true? Isn't to actually to fetch slices for the same Ubuntu release as the current host?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right, changed in d124e25.

Globs (* and ?) are allowed in the query.

By default it fetches the slices for the latest Ubuntu
version, unless the --release flag is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right, changed in d124e25.

`

var findDescs = map[string]string{
"release": "Chisel release directory or Ubuntu version",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in d124e25.

// ? - Any one character
// * - Any zero or more characters
// ⁑ - Any zero or more characters
func StandardGlobCost(ar, br rune) Cost {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yet, this standard is not what we use in GlobPath. Any reason why we're not just using GlobPath for our match? We won't have slashes in those strings, so it won't affect anything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not comfortable having some logic that was treating "/" differently. For example: python3.1x matches python3.10 where x is any character except when that character is a "/".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a slash supposed to be seen at all, in either of these strings? These are package and slice names.

In either case, the "Standard" in the StandardCost function comes from the fact that this is how one would get a classic Levenshtein distance from the books. There's nothing standard about this function. It's not even what we use ourselves in our own glob path matching.

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@niemeyer niemeyer merged commit 017b616 into canonical:main Jun 28, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Simple Nice for a quick look on a minute or two
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants