Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Caching - metadata endpoint #207

Closed
giggsey opened this issue Jun 29, 2020 · 9 comments · Fixed by #229 or #247
Closed

HTTP Caching - metadata endpoint #207

giggsey opened this issue Jun 29, 2020 · 9 comments · Fixed by #229 or #247
Labels
enhancement New feature or request
Milestone

Comments

@giggsey
Copy link
Contributor

giggsey commented Jun 29, 2020

I've tried a Laravel project (with a few internal packages) going through a self-hosted install, then the hosted version.

Both repman installs are taking around 30 minutes to run with composer v1. This compares to 25 seconds when using packagist directly.

I believe the reason for this is that there are no cache headers on any of the repman responses.

> curl -I https://packagist.org/p/illuminate/console.json   
HTTP/2 200 
server: nginx
date: Mon, 29 Jun 2020 12:02:11 GMT
content-type: application/json
content-length: 23256452
last-modified: Mon, 29 Jun 2020 05:11:16 GMT
vary: Accept-Encoding
etag: "5ef977f4-162dd84"
access-control-allow-origin: *
access-control-allow-methods: GET
access-control-allow-headers: X-Requested-With,If-Modified-Since
accept-ranges: bytes
> curl -I https://repo.repman.io/p/illuminate/console
HTTP/2 200 
content-type: application/json
content-length: 0
server: nginx
cache-control: no-cache, private
date: Mon, 29 Jun 2020 12:03:07 GMT
x-cache: Hit from cloudfront
via: 1.1 b4d3f424b1e6960b9f71e8cf3b9e1a57.cloudfront.net (CloudFront)
x-amz-cf-pop: ICN51-C1
x-amz-cf-id: XfTtcCR__cGq15VB5rnQGIH7SeG1v7sx_YKjrr28pNzCqcG6rAAOJA==
age: 6
> curl -I https://repo.repman.myinternaldomain.local/p/illuminate/routing
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 29 Jun 2020 12:04:45 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Powered-By: PHP/7.4.5
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block

This one file itself is 25MB, and my Laravel install ends up downloading 500MB every time I try a composer update.

@akondas
Copy link
Member

akondas commented Jun 30, 2020

The problem with the size of these files is quite big. We've noticed it before, but at the moment I don't know what a good solution is (this applies to more packages from illuminate namespace).

Yes, there is no http cache on the metadata endpoint at the moment (for private packages it will not be present). Nevertheless, it's hard for me to believe in this time comparison, because the composer caches this metadata, so in the case of normally composer install, the times should be similar (or with an advantage for repman, because the public instance using Cloudfront cache - which is faster). How were these times counted?

As for the metadata cache (https://repo.repman.io/p/vendor/package) I think we can modify it to make it exist. The downside will be that there may be a slight delay with a new package.

@akondas akondas changed the title HTTP Caching HTTP Caching - metadata endpoint Jun 30, 2020
@akondas akondas added the enhancement New feature or request label Jun 30, 2020
@giggsey
Copy link
Contributor Author

giggsey commented Jun 30, 2020

Doing a basic Laravel install using both packagist and repo.repman.io:

Packagist

{
    "require": {
        "php": "^7.2.5",
        "fideloper/proxy": "^4.2",
        "fruitcake/laravel-cors": "^2.0",
        "guzzlehttp/guzzle": "^6.3",
        "laravel/framework": "^7.0",
        "laravel/tinker": "^2.0"
    }
}

 /tmp/composer  composer update --profile --no-plugins
[7.0MiB/0.04s] Loading composer repositories with package information
[7.4MiB/0.18s] Updating dependencies (including require-dev)
[421.6MiB/11.42s] Package operations: 60 installs, 0 updates, 0 removals
[420.4MiB/63.59s] Memory usage: 420.42MiB (peak: 720.59MiB), time: 63.59s

repo.repman.io

{
    "repositories": [
        {"type": "composer", "url": "https://repo.repman.io"},
        {"packagist": false}
    ],
    "require": {
        "php": "^7.2.5",
        "fideloper/proxy": "^4.2",
        "fruitcake/laravel-cors": "^2.0",
        "guzzlehttp/guzzle": "^6.3",
        "laravel/framework": "^7.0",
        "laravel/tinker": "^2.0"
    }
}

/tmp/composer  composer update --profile --no-plugins
[7.1MiB/0.04s] Loading composer repositories with package information
[7.4MiB/1.73s] Updating dependencies (including require-dev)
[246.0MiB/240.48s] Package operations: 60 installs, 0 updates, 0 removals
[244.9MiB/315.79s] Memory usage: 244.91MiB (peak: 545.03MiB), time: 315.79s

The example from my initial comment had more packages in it, as well as some internal ones.

@giggsey
Copy link
Contributor Author

giggsey commented Jun 30, 2020

The problem with the size of these files is quite big. We've noticed it before, but at the moment I don't know what a good solution is (this applies to more packages from illuminate namespace).

I noticed using your docker install that this endpoint often ran out of memory writing the JSON file locally. Can you increase the memory limit for PHP-FPM? (--I couldn't find a Dockerfile for the image-- (I'm an idiot, and have now found it))

Yes, there is no http cache on the metadata endpoint at the moment (for private packages it will not be present). Nevertheless, it's hard for me to believe in this time comparison, because the composer caches this metadata, so in the case of normally composer install, the times should be similar (or with an advantage for repman, because the public instance using Cloudfront cache - which is faster). How were these times counted?

composer install is fine, but since we use a local satis setup already, this isn't any different for us.

As for the metadata cache (https://repo.repman.io/p/vendor/package) I think we can modify it to make it exist. The downside will be that there may be a slight delay with a new package.

The files are already being written to the disk, so there just needs to be an HTTP cache check when serving them. A Cron that checks the ETag with packagist.org regularly should keep that cron time down.

@akondas
Copy link
Member

akondas commented Jul 15, 2020

Hey @giggsey, so:

  • @karniv00l is currently working on task with memory limit for docker images
  • me personally working on metadata endpoints: (with inspiration from @jonnyynnoj) it seems to me that it will be able to return these headers from the cache, at the same time do not lose the ability to update these files on the fly if needed (By the way, it will be easier to switch to the flysystem which has a dozen or so different adapters built-in, like s3)

I intentionally do not want to do a mechanism that will update these files from time to time. Once that it's unnecessary. Two that I saw the composer developers' request not to do such mirrors in such a way that they ask every second (or other period) for all packages.

@giggsey
Copy link
Contributor Author

giggsey commented Jul 15, 2020

Sounds good.

I know repman mirrors the packages, but will it use the local copy of the provider JSON if it can't reach packagist to check if the page is stale?

@akondas
Copy link
Member

akondas commented Jul 16, 2020

Good question, not at the moment. Only downloaded distribution files work like this. But since you mentioned it, I'll take this feature straight away. If it is not possible to download the fresh metadata file, the old one will be returned.

@akondas akondas added this to the 0.5.0 milestone Jul 16, 2020
@akondas
Copy link
Member

akondas commented Jul 16, 2020

Hey @giggsey you can test our latest tag with increased memory limit (#219 merged).

@akondas
Copy link
Member

akondas commented Jul 21, 2020

Hey @giggsey we just deployed changes with http cache. You can check if performance is better now 😉

@giggsey
Copy link
Contributor Author

giggsey commented Jul 27, 2020

I'm getting a 304 when hitting https://repo.repman.io/p/symfony/http-foundation, but the body is still being transmitted. This means the response time for me is ~5s still.

TFFB is around 260ms, which is fine.

Testing it using my local (via a VPN) repman instance is taking around 1000 seconds for a Laravel project.

akondas pushed a commit that referenced this issue Aug 19, 2020
* Only add the Response stream if the Request is a cache miss

Fixes #207

* Cache downloads / package list

Response::isNotModified() doesn't need to be wrapped in the if statement, as it'll remove the content (even for a StreamedResponse)
akondas pushed a commit that referenced this issue Sep 8, 2020
* Only add the Response stream if the Request is a cache miss

Fixes #207

* Cache downloads / package list

Response::isNotModified() doesn't need to be wrapped in the if statement, as it'll remove the content (even for a StreamedResponse)

* Package List Search

Signed-off-by: Joshua Gigg <[email protected]>

* Package List Search

* CS

* Use ILIKE instead

* Separate filter SQL so it's not always required

* Add tests & fix searching

* Fix PHPStan

* CS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants