Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the user-provided page size when querying repositories #3502

Closed
6 tasks done
prsshini opened this issue Aug 15, 2024 · 15 comments · Fixed by #3507 or #3547
Closed
6 tasks done

Use the user-provided page size when querying repositories #3502

prsshini opened this issue Aug 15, 2024 · 15 comments · Fixed by #3507 or #3547
Assignees
Milestone

Comments

@prsshini
Copy link

prsshini commented Aug 15, 2024

Checklist

  • I confirm there are no unresolved issues reported on the Chocolatey Status page.
  • I have verified this is the correct repository for opening this issue.
  • I have verified no other issues exist related to my problem.
  • I have verified this is not an issue for a specific package.
  • I have verified this issue is not security related.
  • I confirm I am using official, and not unofficial, or modified, Chocolatey products.

What You Are Seeing?

Our S3 bucket source contains 1612 packages with 194 folders (with different versions of nupkgs in each folder).
Choco source is set as Sleet source, however, when running Chocolatey CLI, we get the following:

The threshold of 1,000 packages per source has been met. Please refine your search, or specify a page to find any more results.

PS C:\Users\vradhs2> choco source
Chocolatey v2.2.2
test - https://s3.amazonaws.com/bucket-name/index.json | Priority 0|Bypass Proxy - False|Self-Service - False|Admin Only - False.

Choco search command can only list 30 packages . Rest of the packages in the S3 bucket is not getting listed.

We tried

choco search --page=100

and

choco search --page-size=100

All of them have the same behavior to list only 30 packages.

Could you let me know if any choco settings that is preventing to list all the packages?

What is Expected?

To List all the 194 packages.

How Did You Get This To Happen?

choco source add -n test -s="test - https://s3.amazonaws.com/bucket-name/index.json"
choco search
package1
package2
...
package 30
30 packages found.
The threshold of 1,000 packages per source has been met. Please refine your search, or specify a page to find any more results.

The source contains 194 packages. And the source is an s3 bucket with packages pushed via Sleet.

Using chocolatey version 2.2.2

System Details

  • Operating System:
  • Windows PowerShell version:
  • Chocolatey CLI Version:
  • Chocolatey Licensed Extension version:
  • Chocolatey License type:
  • Terminal/Emulator:

Installed Packages

N/A

Output Log

https://gist.github.com/prsshini/af28447fbce6196f36214515d45b5540

Additional Context

I am able to search a package from Sleet S3 source using the command:

choco search package100 --exact
package100

This is lists as expected.

@gep13 added:

See the comment here which provides more information about the investigation into this issue, as well as a plan for steps moving forward.

@prsshini prsshini added the Bug label Aug 15, 2024
@pauby
Copy link
Member

pauby commented Aug 15, 2024

We only support the latest version for open-source. Please upgrade and update the issue description and logs.

Please also provide the installed packages list and the System Details and the configuration of Sleet.

Your logs don't contain index.json so the source you have provided in the description does not match what you say the source looks like (I understand it's an example but your logs already contain the actual source so there is no point anonymising it now).

@prsshini
Copy link
Author

prsshini commented Aug 15, 2024

Hello @pauby , Thanks for your reply. I upgraded the choco version to the latest one. 2.3.0. And I still see the same issues. Choco search can only list 30 packages eventhough the sleet source contains more than 30 packages.
I am runnning this in a windows server 2016. In the chocolatey logs, I do see the source is set with index.json. latest logs here 👍
https://gist.githubusercontent.com/prsshini/af28447fbce6196f36214515d45b5540/raw/58fd692accf76f7fbb955b5b91d1d895fdef5e81/latest%2520logs%2520with%25202.3.0

Sleet json config is used as

{
"sources": [
{
"name": "Sleetfeed",
"type": "s3",
"path": "https://s3.amazonaws.com/mubucket/",
"bucketName": "mybucket",
"region": "us-east-1",
"accessKeyId": "",
"secretAccessKey": "
********"
}
]
}

nupkgs are pushed to this bucket using the command Sleet Push d:/nupkgfile

@pauby
Copy link
Member

pauby commented Aug 16, 2024

Have you tried to query the source using nuget.exe? What as the result?

What happens if you use choco search --all-versions?

@prsshini
Copy link
Author

Hi @pauby I just tried using nuget.exe and it cqan list fewer than 30 packages..
I also tried choco search --all-versions and its only listing all the versions of the first 30 packages.

@pauby
Copy link
Member

pauby commented Aug 16, 2024

I just tried using nuget.exe and it cqan list fewer than 30 packages..

Did you mean more than 30 packages? How many did it list? What command did you use?

and its only listing all the versions of the first 30 packages.

The 1000 limit is for package versions, nor packages. Worth --all-versions how many were shown in total?

@prsshini
Copy link
Author

our packages have multiple versions. Fot eg., Package1 will have 10 versions and Package2 will have 20 version.
So the total number of versions in all the 30 packages is what listed when I used choco search --all-versions. There are 452 packages in the first 30 packages. So, when I used choco search --all-versions, it listed 452 packages.

However the total packages in the Bucket is 192 and the total version is 1612.

But Choc search list only first 30 packages and choco search --all-versions lists only 452 packages which are the total of first 30 packages,

@prsshini
Copy link
Author

I also tested with a new bucket which has only 32 packages with a total of 400 packages and I still go 1000 limit error. So choco somehow limits to list only the first 30 packages. Is there a configuration that I am missing to make it list everything? Choco config do not have any setting for page size of package list limit.

@pauby
Copy link
Member

pauby commented Aug 16, 2024

There is no configuration option you're missing.

Thanks for testing all of that. Leave it with me as we'll need to reproduce your setup and the issue to see what is going on.

@prsshini
Copy link
Author

@pauby Thank you for your support.
Just to reiterate the steps to reprouce.
Generate a nupkg.
Install sleet in your dev and save your sleet.json as below.
{
"sources": [
{
"name": "feed",
"type": "s3",
"path": "https://s3.amazonaws.com/mybucket/",
"bucketName": "mybucket",
"region": "us-east-1",
"accessKeyId": "yourkey",
"secretAccessKey": "yourkey"
}
]
}

Sleet Init command to initialse your bucket.
use the command "Sleet Push D:/nupkg-package"
Likewise push more than 30 packages to your s3 bucket.

Set up your choco source as
choco source add -n test -s="https://s3.amazonaws.com/mybucket/index.json"

Use command "Choco Search" and it will list only 30 packages with an message
30 packages found.
The threshold of 1,000 packages per source has been met. Please refine your search, or specify a page to find any more results.

@prsshini
Copy link
Author

Hi @pauby Any luck with this? were you able to reproduce this issue? Thanks!

@gep13
Copy link
Member

gep13 commented Aug 21, 2024

@prsshini thank you for bringing up this issue. I have done some investigation work, and I can report that I am able to reproduce this issue. Based on some discussions internally, we have decided on a path forward, which I wanted to lay out here.

To confirm what was done for our internal testing...

  1. Installed sleet on target machine
  2. Brought together a collection of packages (a total of 2545 unique package versions including pre-release packages, with 39 distinct packages)
  3. Ran the following script to bring these packages into the sleet instance (this loop was necessary, since pointing sleet at the full folder of nupkgs then complained about duplicate packages)
Get-ChildItem "C:\temp\packages" -Filter *.nupkg | 
Foreach-Object {
    Write-Host $_.FullName
    sleet push -s myLocalFeed $_.FullName
}
  1. Then the sleet folder was then hosted on a web server (for the purposes of this test, the Express Visual Studio Code Extension was used)

Now that this was up and running, the following results were observed...

  1. Ran choco search chocolatey --source http://localhost/index.json --ignore-http-cache and only 30 packages were returned, when the expected number was 39. The default page size for choco.exe is 30, so it is expected that only 30 packages would have been returned.
  2. Ran choco search chocolatey --source http://localhost/index.json --ignore-http-cache --page-size=20 and 20 packages were returned, which worked as expected
  3. Ran choco search chocolatey --source http://localhost/index.json --ignore-http-cache --page-size=40 and only 30 packages were returned, when the expected number was 39.
  4. Ran nuget search chocolatey -Source http://localhost/index.json and only 20 packages were returned, when the expected number was 39. The default page size for nuget.exe is 20, so it is expected that only 20 packages would have been returned.
  5. Ran nuget search chocolatey -Source http://localhost/index.json -take 10 and 10 packages were returned, which worked as expected
  6. Ran nuget search chocolatey -Source http://localhost/index.json -take 40 and all 39 packages were returned

In summary, based on these tests, it would appear that nuget.exe is working correctly, and choco.exe is not working correctly. However, there are some technical details here that I think need to be explained, as it is a combination of the way that choco.exe is working, and how sleet is working, that cause the problem.

The first thing to point out is that the search in sleet, by default, doesn't actually do anything except return all the packages that exist on the static feed. It doesn't look at the incoming query parameters and filter the results to match what is requested. It simply returns all the packages. This is documented on the sleet GitHub repository, and a link is provided to this blog post to allow true search results to be returned. Am I right in saying that you aren't using the Sleet.Search package, and instead using the default sleet search? I am assuming that this is the case, as that is what matches the replication steps that I have done.

In principal, returning an unfiltered list of packages in each search query that is done would appear to give exactly what you are looking for, however, this causes problem in how Chocolatey CLI operates. Let me try to explain...

Due to this section of code, the maximum page size for a request from Chocolatey CLI is 30 packages. This is due to some historical problems with larger page sizes against Chocolatey Community Repository as well as some other Repository Managers, like Nexus. As a result, when you attempt to run the following command:

choco search chocolatey --source http://localhost/index.json --ignore-http-cache --page-size=40

There are actually two outgoing queries, which can be seen here:

image

The result of these two queries is that a total of 40 packages should be returned. However, since sleet is returning the exact same information from both queries, Chocolatey CLI actually only sees the first 30 packages, since the packages that are returned are essentially duplicates, and they are ignored.

When nuget.exe does the following command:

nuget search chocolatey -Source http://localhost/index.json -take 40

There is only 1 outgoing query:

image

Which means that nuget.exe can see all the packages, as there are no duplicates.

I hope this serves to illustrate what is going on here, if not, please let me know, and I can try to explain further.

In terms of what we plan to do to improve this experience...

  1. Allow the user to control the page-size directly, without automatically setting it to 30
  2. Output a warning when the user uses something other than the recommended 30
  3. Continue to output an error when user attempts to use a page-size greater than 100

Keep in mind, due to the upper limit of allowed packages in the response to a search query, as well as the upper limit on allowed page sizes, you may still not be able to return all packages from sleet. However, this is purely down to the responses that sleet is providing by default. The change here is to try to make things better, but it is not a guarantee that things will work as you want.

The best recommendation would be to introduce the Sleet.Search package, as mentioned earlier.

@gep13 gep13 added this to the 2.4.0 milestone Aug 21, 2024
@prsshini
Copy link
Author

@gep13 Thank you for your detailed response. Just to confirm your question, we are not using Sleet.Search and just using the default sleet search. So your assumption is right.

In terms to the fix that you are proposing, When you say page-size, it doesnt literally mean "page", correct? it means number of packages it can display in one query.
If that assumption is right, why do you recommend having an error when page size is greater than 100?

When we use 2 sources, (primary and secondary) when the number of distinct packages from both the sources can exceed 100, isnt that a common business use case?

When do you think the proposed fixes be pushed?

In the meantime, we will try to explore Sleet.Search to see if it fits our need.

Thanks for your time and support.

@gep13
Copy link
Member

gep13 commented Aug 22, 2024

@prsshini said...
In terms to the fix that you are proposing, When you say page-size, it doesnt literally mean "page", correct? it means number of packages it can display in one query.

Here, I am referring to the number of results that should be returned from the search query. For example, if I did:

choco search chocolatey --page-size=5

I would expect 5 results to be returned, even if there were more than 5 results available.

While this paging is typically done on the Repository Manager (but as described, Sleet isn't doing this), the truncation to that page size is also done on the client side, to make sure that the correct number of requested results are returned.

@prsshini said...
If that assumption is right, why do you recommend having an error when page size is greater than 100?

The choco search command is a command that does a search. It is attempting to filter down the results of a query to a manageable amount of information. The search command is not intended to be a command that is used to enumerate through all the packages that are available on a given source. As such, in the 2.x release of Chocolatey CLI, we introduced a number of safe-guards to prevent it being used like this. That is why you will see a maximum of 1000 package returned from each source that is queried, and a maximum of 100 for the page size.

@prsshini said...
When do you think the proposed fixes be pushed?

I have added this issue to the 2.4.0 milestone, which will be the next release of Chocolatey CLI, but I can't offer any indication on when this will be released.

@prsshini said...
In the meantime, we will try to explore Sleet.Search to see if it fits our need.

I believe that will be the best course of action, given your use case here.

@corbob
Copy link
Member

corbob commented Oct 31, 2024

While I have a script running to populate my test repository, I want to expand on some of the steps @gep13 mentioned here: #3502 (comment)

Between steps 1 and 3 I found I needed to initialize the sleet configuration. I followed the steps here: https://github.com/emgarten/Sleet/blob/main/doc/feed-type-local.md

In particular I needed:

sleet createconfig
notepad sleet.json # edit the config to match their example
sleet init --config .\sleet.json --source myLocalFeed

The contents of my sleet.json:

{
  "username": "",
  "useremail": "",
  "sources": [
    {
      "name": "myLocalFeed",
      "type": "local",
      "path": "C:\\myFeed",
      "baseURI": "http://localhost/"
    }
  ]
}

Then for step 4, I opened C:\myFeed in VSCode and used Express as Gary mentioned.

corbob added a commit to corbob/choco that referenced this issue Oct 31, 2024
If a page size has been specified, use it. Also emit a warning if it's
something other than 30 as there are known issues with some feeds.
@corbob corbob reopened this Oct 31, 2024
corbob added a commit to corbob/choco that referenced this issue Nov 1, 2024
The integration test for PageSize broke because the message was changed
in chocolatey#3507. This updated the test to use the new wording.
corbob added a commit to corbob/choco that referenced this issue Nov 4, 2024
If a page size has been specified, use it. Also emit a warning if it's
something other than 30 as there are known issues with some feeds.
corbob added a commit to corbob/choco that referenced this issue Nov 4, 2024
The integration test for PageSize broke because the message was changed
in chocolatey#3507. This updated the test to use the new wording.
corbob added a commit to corbob/choco that referenced this issue Nov 4, 2024
If a page size has been specified, use it. Also emit a warning if it's
something other than 30 as there are known issues with some feeds.
corbob added a commit to corbob/choco that referenced this issue Nov 4, 2024
The integration test for PageSize broke because the message was changed
in chocolatey#3507. This updated the test to use the new wording.
corbob added a commit to corbob/choco that referenced this issue Nov 4, 2024
Add tests to exercise the messages added for page sizes and ensure they
are behaving as expected.
corbob added a commit to corbob/choco that referenced this issue Nov 5, 2024
If a page size has been specified, use it. Also emit a warning if it's
something other than 30 as there are known issues with some feeds.
corbob added a commit to corbob/choco that referenced this issue Nov 5, 2024
The integration test for PageSize broke because the message was changed
in chocolatey#3507. This updated the test to use the new wording.
corbob added a commit to corbob/choco that referenced this issue Nov 5, 2024
Add tests to exercise the messages added for page sizes and ensure they
are behaving as expected.
vexx32 added a commit that referenced this issue Nov 5, 2024
(#3502) Return the PageSize if specified
@vexx32 vexx32 added 4 - Done and removed 3 - Review labels Nov 5, 2024
corbob added a commit to corbob/choco that referenced this issue Nov 6, 2024
The CommandName is null when the list method is called during alternate
source commands.
corbob added a commit to vexx32/choco that referenced this issue Nov 7, 2024
The Test Kitchen environment has both `hermes` and `hermes-all` enabled.
This results in a duplication of the warning message. This disables all
the sources and then enables just the `hermes` source. # Please enter
the commit message for your changes. Lines starting
vexx32 pushed a commit to vexx32/choco that referenced this issue Nov 7, 2024
The Test Kitchen environment has both `hermes` and `hermes-all` enabled.
This results in a duplication of the warning message. This disables all
the sources and then enables just the `hermes` source. # Please enter
the commit message for your changes. Lines starting
vexx32 pushed a commit to vexx32/choco that referenced this issue Nov 7, 2024
The Test Kitchen environment has both `hermes` and `hermes-all` enabled.
This results in a duplication of the warning message. This disables all
the sources and then enables just the `hermes` source. # Please enter
the commit message for your changes. Lines starting
vexx32 pushed a commit to vexx32/choco that referenced this issue Nov 7, 2024
The Test Kitchen environment has both `hermes` and `hermes-all` enabled.
This results in a duplication of the warning message. This disables all
the sources and then enables just the `hermes` source. # Please enter
the commit message for your changes. Lines starting
@vexx32 vexx32 changed the title With S3bucket as Sleet source choco cannot list more than 30 packages Use the user-provided page size when querying repositories Nov 12, 2024
@vexx32
Copy link
Member

vexx32 commented Nov 12, 2024

🎉 This issue has been resolved in version 2.4.0 🎉

The release is available on:

@vexx32 vexx32 removed the 4 - Done label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants