Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack setup download incredibly slow #2240

Closed
wereHamster opened this issue Jun 4, 2016 · 62 comments
Closed

stack setup download incredibly slow #2240

wereHamster opened this issue Jun 4, 2016 · 62 comments

Comments

@wereHamster
Copy link

I'm trying to setup stack but trying to download GHC is very slow. I'm getting less than 10Kb/s, but my internet connection is capable of much faster speeds. fast.com reports 21Mb/s. Using Chrome or curl to download the GHC release directly from GitHub (/commercialhaskell/ghc/releases/) is equally slow.

stack version: 1.0.4.3 x86_64

@wereHamster
Copy link
Author

If I proxy all HTTP traffic through my google cloud instance (running somewhere in europe) then the download speed increases to more than 2Mb/s which makes it possible to setup stack in reasonable time.

And I noticed that stack doesn't respect the OS X proxy settings, had to set those explicitly on the command line: $ http_proxy=host:port https_proxy=host:port stack setup :(

@zach007
Copy link

zach007 commented Oct 17, 2016

stack Version 1.2.0, it is still very slow to download the ghc using setup .

@tolysz
Copy link
Collaborator

tolysz commented Oct 18, 2016

How are any other services running on s3 or amazon working for you? Network links have limited capacity thus if there is someone on your network saturating the same link it could make you slow.

@srid
Copy link

srid commented Oct 20, 2016

FWIW I see this issue as well. I'm running stack setup from a Linode VM in a Singapore data centre.

@wereHamster
Copy link
Author

The problem with GitHub - or more specifically S3 - is that it's not meant to be a CDN. If the Commercial Haskell SIG has an amazon account, they could set up their own S3 + cloudfront distribution to speed this up.

@vlad-shatskyi
Copy link

One workaround would be to install GHC using your system's package manager, and then Stack will pick it up.

@zongwu233
Copy link

@wereHamster is right, set https_proxy can speed up the download

@seb314
Copy link

seb314 commented Feb 16, 2017

Reconnecting your home Internet connection might help, if this gives you a new IP.
At least this got me to 33MB out of 109MB...

@stefjoosten
Copy link

I got the same on Windows. I ran it with switch --verbose, to see progress. After 90 minutes of seeming inactivity I killed the session...
image

@reverofevil
Copy link

Still a problem. Consider using bittorrent.

@shreyasbharath
Copy link

I am having the same issue where downloading GHC is really slow (speeds of < 10 KB/sec).

Also, it takes forever to update the package index, it's stuck on the below step for hours.

Updating package index Hackage (mirrored at https://s3.amazonaws.com/hackage.fpcomplete.com/)

Are there any workarounds?

@teh-monad
Copy link

Downloading stuck at 50 KiB speed.
Tested on another stack versions(even on win), still problem.

Receiving objects: 4% (11988/264752), 1.44 MiB | 51.00 KiB/s

@reverofevil
Copy link

@shreyasbharath @teh-monad I've posted a link up there with a workaround.

@mgsloan
Copy link
Contributor

mgsloan commented Oct 2, 2017

It is possible to manually download and install ghc. Things are a little trickier on windows, as the setup process is more involved, also installing msys and other tools.

I, and I think most others that work on stack see consistently good download speeds. So, it is very difficult for us to work on this problem. If you can please try to diagnose what the problem is, that would be helpful. Perhaps it is just due to geographical location? Or is it possible for the code to do something better that will make the download faster?

Still a problem. Consider using bittorrent.

The problem with this is who will seed? Do we force a particular client? Using bittorrent for things like this is a big can of worms.

@reverofevil
Copy link

@mgsloan

The problem with this is who will seed?

  1. Your server will seed. When release of something just happened, you might enable superseed mode to put the load off the servers fast.
  2. Your clients will seed. It's no big deal to leave torrent client seeding. This is a common practice.
  3. Probably major suppliers like GHC will finally make their software available as torrents. Haskell Platform has the same problem with 100KiB/s download rate, and currently they don't care too.

Do we force a particular client?

Just add an option for a shell/batch command to run the client.

Using bittorrent for things like this is a big can of worms.

Like what? Why, for example, Ubuntu doesn't mind sharing their ISOs as torrents?

@mgsloan
Copy link
Contributor

mgsloan commented Oct 3, 2017

Your clients will seed. It's no big deal to leave torrent client seeding. This is a common practice.

Uhh, so all the stack users will seed the torrent? Seems iffy. If you make it opt-in probably few will enable it. If you make it opt-out, we'll frustrate people that don't expect us to start using their upload bandwidth all the time.

Just add an option for a shell/batch command to run the client.

I guess so, we'd have to make some assumptions about the arguments these clients expect, where they put the files etc. etc. If you think it's easy, please open a PR adding it as an optional feature.

Like what? Why, for example, Ubuntu doesn't mind sharing their ISOs as torrents?

That's a very different case, I don't think ubuntu has their torrent stuff integrated into an install utility. For example, does apt-get use torrents? I don't think so..

@ravshanof
Copy link

probably we should try it with VPN

@junjihashimoto
Copy link
Contributor

Could you use CDN like cloudfront instead of s3(https://s3.amazonaws.com/hackage.fpcomplete.com/)?
In addition to the speed of CDN, traffic price is lower than one of s3.
https://aws.amazon.com/s3/pricing/
Up to 10 TB / month | $0.090 per GB
https://aws.amazon.com/cloudfront/pricing/
First 10 TB / month | $0.085 per GB

By the way, GHC(downloads.haskell.org) uses CDN(fasty).

@borsboom
Copy link
Contributor

borsboom commented Dec 4, 2017

Agree, using CloudFront would be a good thing to look into.

@mtomko
Copy link

mtomko commented Dec 12, 2017

I've had stack updating the package index for 8 hours on a fast internet connection. I can't quite tell what's going on, but it seems like my 100Gbs home internet should be plenty fast.

@cdepillabout
Copy link
Member

This can be a big problem in Japan. It really depends on the time of day, but I've had stack setup take over an hour at times.

It can be frustrating at work to wait that long to start building the project I'm trying to work on.

I think using a CDN would really help.

@ProofOfKeags
Copy link

Just ran into this as well. Do we know what is causing it?

@khoparzi
Copy link

For the moment I'm dealing with this on OSX with
brew install ghc

A better solution would be welcome.

@Programmerino
Copy link

It looks like there are mirrors for each release at: https://www.haskell.org/ghc/download_ghc_8_2_2.html#linux_x86_64 (8.2.2 for example). I cannot tell if the speeds are much different from GitHub from my current connection...

Also, I think that some kind of torrent solution would be great, however, I'm not sure of exact implementation. One way is just to use the preexisting magnet link protocol (magnet://) to open the default torrent client, but I think this should be only activated by a command-line parameter like --torrent or something like that. Of course, it would probably be much easier and possibly faster just to move downloads off of GitHub to, or at least provide, a faster download source.

@rgrinberg
Copy link

Any work around for this? The download is unbearably slow, I'm willing to do download things manually, but I'm not sure how to make stack aware of local tarball.

@sullyj3
Copy link

sullyj3 commented Jun 6, 2018

+1 from me, almost every time I've tried to get started learning stack, setup has taken several hours and seemed to get stuck, and I've had to kill it. A workaround would be awesome.

@klausmyrseth
Copy link

Been testing out stack but its not usable in a commercial environment at the moment because of this, all downloads and installs are slower then sirup, feel like I'm on a modem and I got 1gbit line with amazing connection to both google, amazon and others, I also have a fixed IP.

This is an issue you really should focus on. Just installing ghc took 72 minutes for the project and hlint as much as 13 minutes :O

@ProofOfKeags
Copy link

Is this an issue with the bandwidth of hackage mirrors? If so, would extra mirrors help? I'd be amenable to setting one up assuming the cost can stay bounded for us and that load could be appropriately balanced. The bittorrent idea mentioned up thread might also be a fantastic way to deal with this as it would allow non-organizational support of hackage redundancy. I understand this would probably be a considerable refactor as well, and it may not even belong in stack proper, but rather cabal or something.

Thoughts?

@eyeinsky
Copy link

eyeinsky commented Dec 8, 2020

Is the https://s3.amazonaws.com/hackage.fpcomplete.com/ currently inaccessible? From several machines I get this response to curl https://s3.amazonaws.com/hackage.fpcomplete.com/ :

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>115F1B801E2B1DE4</RequestId><HostId>SU7KgJQXclkgBpoBBrBQIgzqfWML6jo70edlPmhsB9mKLbB2zO0CiRTl6hQQ5g1IVnA80d8CmTU=</HostId></Error>

@debug-ito
Copy link

I'm having the same problem. stack setup (downloading GHC package of about 140MB) took 2--3 hours. Although my internet connection is not so fast, it's a lot slower than other traffics. Also, it's so slow to download and update package index (i.e. stack build is stuck with Downloading index message). Only 10--15 kbps was observed.

@eyeinsky I got the same response. Is it an expected behavior?

@eyeinsky
Copy link

@debug-ito I have no idea, sorry.

Even though this issue happens so rarely (for me, maybe once a year), it would be great to get to the bottom of this, and have some easy alternative to use whenever e.g AWS S3 is not accessible (if this is the reason at all).

@debug-ito
Copy link

@eyeinsky I used the following mirrors by Tsinghua University, and it worked well.

@martijnbastiaan
Copy link

martijnbastiaan commented Jan 15, 2021

I've been experiencing similar problems, although not quite as bad as @kim's (over at #5471). In hopes of seeing a pattern, I've setup a script that periodically downloads ghc from GitHub. To establish a baseline, it first downloads a Xubuntu ISO from a server outside the country. It subsequently downloads GHC. Here's the raw data collected so far. Plotted:

Figure_1

Figure_2

(This would have been nicer as a scatter plot, but I couldn't quickly figure out how to do that properly.)

A few things to note:

  • Between 2021-01-14 19:00+01:00 and 2021-01-14 23:15+01:00 stack setup would have been completely unusable for me.
  • If it's not completely unusable, download speeds seem to ping-pong between either a slow speed at 1.5-2.5 MiB/s or "high" speeds at 10-15 MiB/s with few datapoints outside of those buckets.
  • When downloading GHC it ultimately connects to github-production-release-asset-2e65be.s3.amazonaws.com, which ping times seem to suggest it's located in US-East, while I'm in Europe-West.

This is on my home connection, but I've seen similar speeds on office connections and VPSs in proper datacentres. This behavior is worrisome, as Stack sadly remains the only beginner-friendly way to use Haskell on all platforms IMHO.

@mgsloan Would it be possible for you or other Stack developers to run these scripts too?

  • download.sh: downloads Xubuntu/GHC and logs to a log file
  • systemd-run --on-calendar='*:0/15' /home/user/speedtest/download.sh --uid=user: runs ^ every 15 minutes
  • plot.py: parses log files / plots graphs

@qrilka
Copy link
Contributor

qrilka commented Jan 16, 2021

@martijnbastiaan maybe I'm wrong but it looks like it's something about Github own configuration how it stores release assets.
Also https://github.com/commercialhaskell/ghc is a fork of GHC so Stack developers have no direct control over it.

@martijnbastiaan
Copy link

martijnbastiaan commented Jan 16, 2021

You're right, this does seem to be an issue with GitHub assets @qrilka. I've made a few VPSs around the world to see if geographical location matters at all. The results:

  • New York: 100 MB/s
  • Amsterdam: 30 MB/s
  • Bangalore: 10 MB/s

So, it always seems to be fetching from US servers, which is bad. My guess is that smaller ISPs have worse peering agreements, leading to the behavior I and other people in this thread have been seeing.

Also https://github.com/commercialhaskell/ghc is a fork of GHC so Stack developers have no direct control over it.

Yeah, I think Stack should just use https://downloads.haskell.org/~ghc/, which is backed by a proper CDN (Fastly).

edit: This commit changed asset fetching from haskell.org to github.com, but no comments.

@qrilka
Copy link
Contributor

qrilka commented Jan 16, 2021

@borsboom probably you know the reason GHC is fetched from Github and not Haskell.org?

martijnbastiaan added a commit to martijnbastiaan/stackage-content that referenced this issue Jan 16, 2021
GitHub serve files from a single geographical location, severly reducing
download speeds for non-US users. Even worse, some (European) ISPs seem
to have peering issues limiting download speeds to sub-1 Mbps at times.

This commit therefore changes 'stack-setup-2.yaml' to use
'downloads.haskell.org' instead of GitHub Assets. At the time of
writing, the former uses a proper CDN (Fastly) to deliver content which
should result in consistent speeds around the globe.

Discussion on:

  commercialhaskell/stack#2240
martijnbastiaan added a commit to martijnbastiaan/stackage-content that referenced this issue Jan 16, 2021
GitHub serves files from a single geographical location, severely reducing
download speeds for non-US users. Even worse, some (European) ISPs seem
to have peering issues limiting download speeds to sub-1 Mbps at times.

This commit changes 'stack-setup-2.yaml' to use
'downloads.haskell.org' instead of GitHub Assets. At the time of
writing, the former uses a proper CDN (Fastly) to deliver content which
should result in consistent speeds around the globe.

Discussion on:

  commercialhaskell/stack#2240
@martijnbastiaan
Copy link

martijnbastiaan commented Jan 16, 2021

@qrilka @borsboom I've gone ahead and submitted a PR that changes the URLs to use downloads.haskell.org. I believe that should fix the issues experienced in this thread for the majority of users (Windows / "standard" Linux / MacOS). Feel free to close it if using GitHub Assets is more appropriate.

@borsboom
Copy link
Contributor

probably you know the reason GHC is fetched from Github and not Haskell.org?

Originally this was because downloads.haskell.org was slow and unreliable. Since then they've done a lot of work on it and put it behind a CDN, so I think that's resolved. Really just historical momentum was keeping us using the github.com mirror. I think switching back to downloads.haskell.org makes sense at this point (if there does turn out to be a problem, we can always switch back).

@kim
Copy link

kim commented Jan 16, 2021

Do you think there is a chance that haskell.org would also host binaries of stack itself (re #5471)?

martijnbastiaan added a commit to martijnbastiaan/stackage-content that referenced this issue Jan 16, 2021
GitHub serves files from a single geographical location, severely reducing
download speeds for non-US users. Even worse, some (European) ISPs seem
to have peering issues limiting download speeds to sub-1 Mbps at times.

This commit changes 'stack-setup-2.yaml' to use
'downloads.haskell.org' instead of GitHub Assets. At the time of
writing, the former uses a proper CDN (Fastly) to deliver content which
should result in consistent speeds around the globe.

Discussion on:

  commercialhaskell/stack#2240
@martijnbastiaan
Copy link

martijnbastiaan commented Jan 17, 2021

With commercialhaskell/stackage-content#82 merged, Stack now pulls GHC from downloads.haskell.org which should work properly around the globe :). I tried to test this, but sadly Stack got stuck once more, this time on:

2021-01-17 12:14:16.497805: [info] Selected mirror https://s3.amazonaws.com/hackage.fpcomplete.com/
2021-01-17 12:14:16.499561: [info] Downloading timestamp
2021-01-17 12:14:17.065309: [info] Downloading snapshot
2021-01-17 12:14:17.167117: [info] Downloading mirrors
2021-01-17 12:14:17.358902: [info] Cannot update index (no local copy)
2021-01-17 12:14:17.359185: [info] Downloading index

s3.amazonaws.com/hackage.fpcomplete.com is only served from one geographical location and behaves the same as seen here. I haven't been able to find where Stack gets https://s3.amazonaws.com/hackage.fpcomplete.com/ from. It seems to be present in neither commercialhaskell/stack nor commercialhaskell/stackage-content nor in mirrors.json on Hackage. If one of the Stack developers could point me in the right direction, that would be wonderful.

I very much care about this issue, as I think it's crucial for Haskell to have a simple and reliable way of building projects on all major platforms. Stack is currently the only way to achieve that, but with this issue present, it's only reliable in the US.

@borsboom @qrilka Is there anything I can do to help move this issue along? As I said, I care about this issue and I'm willing to put in work to close it.


To properly close the issue we should remove reliance on all non-CDN sources. This is everything I could find:

@qrilka
Copy link
Contributor

qrilka commented Jan 18, 2021

@martijnbastiaan s3 mirror you get by default comes from https://www.stackage.org/haddock/lts-16.31/pantry-0.4.0.2/src/Pantry.html#defaultHackageSecurityConfig
I think there were some arguments for using this s3 mirror (hackagee wasn't reliable enough) but I'm not quite familiar with such details.

@m1nhtu99-hoan9
Copy link

m1nhtu99-hoan9 commented Jan 19, 2021

I found this Stackage GHC mirror.

To fix the problem of Stack taking forever to download GHC when stack install:

  1. Run stack install --verbose instead of stack install. What you need is this debug message (which is displayed as below in my local machine):
[debug] Downloading from https://github.com/commercialhaskell/ghc/releases/download/ghc-7.10.3-release/ghc-7.10.3-x86_64-fedora24-linux-patch1.tar.xz to /home/mnhthng/.stack/programs/x86\_64-linux/ghc-tinfo6-7.10.3.tar.xz ...

Terminate Stack for now with Ctrl + C.

  1. Go to the mirror I mentioned above, then look for your desired tar.xz file (in my case, ghc-7.10.3-x86_64-fedora24-linux-patch1.tar.xz). Download it (which is much faster than the default one from Stack CLI)

  2. Move the downloaded package to your local stack folder and rename it as guided by the debug message mentioned in step 1.

  3. Now when you run stack install, Stack will be smart enough to realise the package been put there and it will do unpacking and the rest for you.

Peace!

@qrilka
Copy link
Contributor

qrilka commented Jan 19, 2021

You could achieve the same just by setting a proper value for setup-info-locations @mnhthng-thms without extra tricks with Stack termination (I guess you meant Stack and not Stackage in "Terminate Stackage for now")

@m1nhtu99-hoan9
Copy link

m1nhtu99-hoan9 commented Jan 19, 2021

You could achieve the same just by setting a proper value for setup-info-locations @mnhthng-thms without extra tricks with Stack termination (I guess you meant Stack and not Stackage in "Terminate Stackage for now")

@qrilka I did specify package-indices to other mirror rather than the default one. I've not known that I can config setup-info-locations as you recommended.

"The proper value" for stack-info-locations in my case would be:

setup-info-locations:
  - https://mirrors.cloud.tencent.com/stackage/stack-setup.yaml

Thanks for suggesting!

martijnbastiaan added a commit to martijnbastiaan/pantry that referenced this issue Jan 20, 2021
's3.amazonaws.com' is not meant to serve clients around the world, as it
is hosted in the US only. This causes bandwidth to be severely throttled
for, amongst others, European users. This has been observed by users in
the following issue:

  commercialhaskell/stack#2240

At some periods, bandwidth drops to effectively zero, rendering Stack
unusable. Measurements can be found here:

  commercialhaskell/stack#2240 (comment)

'hackage.haskell.org' is served from Fastly's CDN, therefore mitigating
these issues.
martijnbastiaan added a commit to martijnbastiaan/pantry that referenced this issue Jan 21, 2021
's3.amazonaws.com' is not meant to serve clients around the world, as it
is hosted in the US only. This causes bandwidth to be severely throttled
for, amongst others, European users. This has been observed by users in
the following issue:

  commercialhaskell/stack#2240

At some periods, bandwidth drops to effectively zero, rendering Stack
unusable. Measurements can be found here:

  commercialhaskell/stack#2240 (comment)

'hackage.haskell.org' is served from Fastly's CDN, therefore mitigating
these issues.
@hseg
Copy link

hseg commented Sep 22, 2021

An additional advantage to @m1nhtu99-hoan9's suggestion is that it permits using
cleverer clients than http-client which are optimized for segmented downloads,
eg aria2c.
OTOH, in trying to gather data for an anecdotal example was hit by
this inconsistency in download times -- between one invocation and another,
download times varied by tens of minutes.
So probably this is something better solved server-side.

@mpilgrem
Copy link
Member

I am closing this issue as Stack gets modern versions of GHC from downloads.haskell.org.

@andreasabel
Copy link

@mpilgrem : I notice the issue is still open. Maybe you forget to hit the "Close" button?

@mpilgrem mpilgrem closed this as completed May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests