Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Consider migration from GitHub #9

Closed
3 tasks done
FireMasterK opened this issue Oct 24, 2020 · 83 comments
Closed
3 tasks done

[Question] Consider migration from GitHub #9

FireMasterK opened this issue Oct 24, 2020 · 83 comments
Labels
question Further information is requested

Comments

@FireMasterK
Copy link

FireMasterK commented Oct 24, 2020

Checklist

  • I'm asking a question
  • I've looked through the README and FAQ for similar questions
  • I've searched the bugtracker for similar questions including closed ones

Question

Have you considered migrating from GitHub to prevent this entire situation from happening again?

A mirror is now available at https://git.kavin.rocks/kavin/yt-dlc
Tweet regarding the CEO trying to bring back youtube-dl: https://nitter.kavin.rocks/t3rr4dice/status/1320660235363749888

@FireMasterK FireMasterK added the question Further information is requested label Oct 24, 2020
@ssokolow
Copy link

If so, I'd suggest a European-hosted alternative, like the Gitea instance at Codeberg.org.

@xloem
Copy link

xloem commented Oct 25, 2020

https://git-annex.branchable.com/ uses a solution where issues etc are stored in the same git repository as the source code.

@FormerlyChucks
Copy link

just here to say fuck youtube

@Redo11
Copy link

Redo11 commented Oct 25, 2020

The fact that this is still here is amazing. I really want this source code to be backed somewhere safe, but be aware that self-hosted options place you at the front of dealing with DMCA. The safest way would be to use something so obscure, that they ain't even bother with claiming it.

@ssokolow
Copy link

@Redo11 Aside from Codeberg, here are few more options I've since learned of:

  • Gitee is based in China.
  • Git-SSB is a decentralized solution which self-hosts its own development on the Secure Scuttlebutt overlay network.
  • CodeFuse is apparently a GitHub clone sitting on top of WebIndexP2P, but I haven't yet figured out if it's open-source.

Also, SourceForge is so desperate to stay relevant that they welcome people using SourceForge to set up supplementary services for projects hosted elsewhere, which might help with DMCAs since the code and the non-code hosting would be independent. (For the common case of projects on GitHub, they even have a wizard for creating a SourceForge project to supplement a GitHub repo.)

@xloem
Copy link

xloem commented Oct 25, 2020

@ssokolow @Redo11

more things, not in a particular order

@Seirdy
Copy link

Seirdy commented Oct 25, 2020

a simpler alternative would be de-coupling the issue-tracking from the repo. several mirrors could be put in place (github, gitlab, sourcehut, bitbucket, codeberg, notabug.io, repo.or.cz, a bunch of self-hosted instances) so you get a "hydra" effect. Git lets you work with multiple remotes out of the box.

Then the issue becomes figuring out how to keep issue-tracking resilient. The best solution would be mailing lists; since everybody has a copy of everything in their maildir, the mailing list is fully distributed/decentralized just like git. The RIAA is less likely to come for a mail server, and if they do, migration is trivial if you have a maildir or MBOX.

@ssokolow
Copy link

ssokolow commented Oct 25, 2020

@Seirdy Dear God please no. I despise when circumstances force me to join a mailing list... unless you're willing to guarantee the availability of an NNTP bridge.

Without an NNTP interface, I usually consider reporting bugs to mailing list-based developers too onerous and work around them locally while waiting for someone else to solve things.

It's bad enough when a blog/comment/mail-form spammer lands on a message template well-enough formed to get past my "I wouldn't want this from a human either" filters and I don't have time to update them right away.

@Seirdy
Copy link

Seirdy commented Oct 26, 2020

@ssokolow the thing is, most mainstream issue trackers require you to create an account. That in itself is "too onerous" for people who already have an email associated with their git identity. Asking everyone to create a codeberg/gitlab/notabug/sf account probably won't fly.

If you consider emailing a patch too onerous, you could just post a pastebin link to IRC instead. There's no shortage of ways to send a file that don't revolve around putting all your eggs in the same basket that's hosting your repo.

My mailing-list suggestion was predicated on the idea that switching a git remote is extremely trivial, while migrating issues/patches is more tricky. Email is already a proven solution for some of the largest git projects in existence.

If you have another well-tested solution for de-coupling issue tracking from the code forge and distributing it among participants (so everyone gets an offline copy) without requiring contributers to create yet another account, I'm all ears.

@ssokolow
Copy link

ssokolow commented Oct 26, 2020

@Seirdy As long as there are separate mechanisms for contributors and mere bug reporters, and bug reporters don't have to do something like registering a throaway account to keep the mailing list from cluttering up their inbox, I'll be happy with it.

That said, I do remember running across a couple of experiments in retrofitting Git with Fossil-style "bug tracker data stored in the repo" issue trackers. I wonder if any of those are still active.

UPDATE: Maybe git-bug. It's got a WIP web interface that could be extended to meet needs and support for importing and exporting GitHub, Gitlab, and JIRA issues in more than just a "one shot in the beginning" sort of way.

(Heck, extend that to support import/export with Gitea and run a Gitea instance for the public to report bugs at and it'd probably be perfect. Bugs come in on the Gitea and get imported into the repo and pushed out to the clones. If you're worried about the account, mod Gitea to accept something like GitHub OAuth sign-in as an option.)

Failing that, maybe whip up a basic web frontend for git-issue or bug.

UPDATE: I think Bugs Everywhere was one of the ones I originally remembered.

UPDATE: Turns out I blogged about it back in 2011.

@Seirdy
Copy link

Seirdy commented Oct 26, 2020 via email

@ssokolow
Copy link

ssokolow commented Oct 26, 2020

As long as I'm not being forced to reinvent "just one thread and mentions elsewhere, please" filtering and have a simple unsubscribe link/button that doesn't require me to send an UNSUBSCRIBE e-mail that's formatted just right and/or appears to comes from just the right inbound-only alias in my "spam defense via e-mail addresses as revokable API keys" mail system, I'm fine.

@xloem
Copy link

xloem commented Oct 26, 2020

UPDATE: I think Bugs Everywhere was one of the ones I originally remembered.

A first glance at this solution appears to make sense here. It's written in python, just like this project, so contributors would be at home. It already has a web interface and an email interface.

I don't know why you'd want to decouple the bugs from the code when the project may get disrupted. Combining them (i.e. storing the issues in the code, not on some service) helps preservation and access.

One concern was learning to use these things. Somebody has to put in the effort of setting it up.

Some other suggestions were for tools that bridge between github, gitlab, jira. These have the advantage of not requiring anyone to run a server to provide easy user access.

@BiosPlus
Copy link

I'd recommend notabug.org

Would also recommend looking into setting up a mirror of the primary repo on a self hosted gitea or gitlab instance hidden behind a domain from njal.la

@rain-1
Copy link

rain-1 commented Oct 26, 2020

a simpler alternative would be de-coupling the issue-tracking from the repo. several mirrors could be put in place (github, gitlab, sourcehut, bitbucket, codeberg, notabug.io, repo.or.cz, a bunch of self-hosted instances) so you get a "hydra" effect. Git lets you work with multiple remotes out of the box.

Then the issue becomes figuring out how to keep issue-tracking resilient. The best solution would be mailing lists; since everybody has a copy of everything in their maildir, the mailing list is fully distributed/decentralized just like git. The RIAA is less likely to come for a mail server, and if they do, migration is trivial if you have a maildir or MBOX.

Seconded, How about https://github.com/MichaelMure/git-bug

@vxbinaca
Copy link
Contributor

Just do Gitee and call it a day, this magic shit about IPFS and decentralized stuff is overkill. The Chinese don't care.

@xloem
Copy link

xloem commented Oct 26, 2020

If anyone can actually set any of these up so others can use them, that's what the community really needs here.

@jbruchon
Copy link
Contributor

Removal of the references to downloading copyrighted material is sufficient. The DMCA takedown's only realistic leg to stand on was the test cases showing that the primary intent of this program was to be used to obtain copyrighted material. Clearly, this is a tool that can be used for good or evil, and archivists like myself use it to back up YouTube channels that could be at risk of being lost in the future as YouTube continues its slow shift towards becoming "cable TV, but online." There is no longer any solid footing that a new DMCA takedown notice can stand on. The program doesn't circumvent copyright protection mechanisms (the JavaScript to assemble media stream URLs and the media streams themselves are all sent by YouTube unencrypted without a browse-wrap license agreement and/or mandatory user registration, and no, SSL is transport layer encryption, not DRM) so it now falls into the same category as a tool like HandBrake: it CAN be used to do things that are possibly a violation of copyright, but that is neither its only purpose nor its primary purpose.

If you move from GitHub, you're moving off of the largest open source software hosting platform on the planet. You will lose searchability and a lot of people simply won't register at the other lesser-known sites to contribute to the code base.

I don't think moving is a good idea.

@Seirdy
Copy link

Seirdy commented Oct 26, 2020 via email

@322997am
Copy link

I believe the best way is to self-host on a VPS service in a country that ignores the DMCA(Russia for example). The DMCA is a bad law that is probably unconstitutional, but stuff like these takedowns will continue happening as long as Disney and the like exist. I believe that Gitea is feature rich and can be self-hosted. I am willing to help with any translation needed if this is ever decided on, as I am fluent in both Russian and English.

@ssokolow
Copy link

@322997am ...but do mirror everything somewhere else or your bus factor is one.

@xloem
Copy link

xloem commented Oct 29, 2020

If somebody sets up a mirror, can anybody volunteer to co-maintain or co-administer it?

@FireMasterK
Copy link
Author

https://git.kavin.rocks/kavin/yt-dlc

Here's a mirror on my personal git server.

@FireMasterK
Copy link
Author

A relevant tweet to this topic: https://twitter.com/t3rr4dice/status/1320660235363749888

@Seirdy
Copy link

Seirdy commented Oct 29, 2020

Just thought I’d share my approach to “hydra hosting” since people here seem interested:

I mirror my repos across Sourcehut, Gitlab, and GitHub. Here’s the relevant snippet of my .git/config of my dotfiles repo:

[remote "origin"]
	url = [email protected]:~seirdy/dotfiles
	fetch = +refs/heads/*:refs/remotes/origin/*
[remote "gl_mirror"]
	url = [email protected]:Seirdy/dotfiles.git
	fetch = +refs/heads/*:refs/remotes/gl_mirror/*
[remote "gh_mirror"]
	url = [email protected]:Seirdy/dotfiles.git
	fetch = +refs/heads/*:refs/remotes/gh_mirror/*

Pushing to three remotes, one after the other, can be slow. To speed things up, I created an alias to push to all remotes in parallel in my gitconfig:

[alias]
	pushall = !git remote | grep -E 'origin|mirror' | xargs -L1 -P 0 git push --all --follow-tags

This pushes to all remotes that have "origin" or "mirror" in their names, and skips the rest. Now, I can pull from all repos I'm following and push to the ones I have access to.

Core developers can post a list of a few upstream remotes that only they can push to. Community members can set up extra remotes for resiliency and for forking/personal development.

This raises an issue: when there are multiple git remotes, where do people file tickets or submit patches?

The best solution is to have one canonical place to do issue tracking, separate from git remotes. I've previously explained why I think a mailing list (with a Sourcehut-style frontend for those not used to mailing lists) would be the best option for this, but a plethora of other solutions exist as well, from GitHub issue trackers to Bugzilla.

Edit: also, it's a good idea to advertise the remotes in the project README. Example.

@FireMasterK
Copy link
Author

FireMasterK commented Oct 29, 2020

Why not have a cronjob to do the same? This way even actions (such as merging a pr) on github could be added and the developers don't need to change their work flow / modify their git configs when pushing commits

@ssokolow
Copy link

ssokolow commented Oct 29, 2020

@FireMasterK What I've been meaning to set up for my own projects is:

  • Use something like a cronjob or a hook to automatically merge updates on my GitHub issue trackers into the repo using something like git-bug so my users get the interface they're used to, but the issues get merged into whatever backup/mirroring regime I choose.
  • Set up automatic mirroring between GitHub, GitLab, and BitBucket (either using ready-made support like Gitlab offers or by adapting the code I use for rebuilding gh-pages on git push to drive the syncing from CI).

@Seirdy
Copy link

Seirdy commented Oct 29, 2020

@FireMasterK The whole point of this approach is not to depend on one service, be it a git remote or a CI/CD provider, especially a proprietary one like GitHub Actions. These features have been baked into git and successfully used for a long time.

edit: neutralized a rogue comma

@ddevault
Copy link

SourceHut admin here. I wrote this up today, is relevant:

https://sourcehut.org/blog/2020-10-29-how-mailing-lists-prevent-censorship/

@ohnonot
Copy link

ohnonot commented Oct 31, 2020

@ohnonot there are a lot of proposed decentralized solutions but not much evidence of anybody putting work into setting them up to use.

It's not hard to set a repository up to be in sync with a github repo; I propose framagit or any other gitlab-based git site: https://docs.gitlab.com/ee/user/project/repository/repository_mirroring.html (not sure how to make that go both ways though)
The most important thing is to leave github behind IMO.
AFAIU this repo owner is working on that.

The main youtube-dl repo owner is currently tied up in proceedings with EFF and github's ceo, and has said they can't communicate about the details. i'm not in touch myself with blackjack4494, the owner of this community fork.

Thanks for that additional info. Appreciated.

I'm not sure it's necessary to go TOR on this; IMO just moving out of the RIAA's reach is enough.

@Nekun
Copy link

Nekun commented Oct 31, 2020

@vxbinaca

Anything else is wasted energy rn.

It's naive to think that copyright holders will immediately gives up after simple counter-measures and will not make troubles for project in near future. Just look on Popcorn Time app, how many domains they already changed? Why don't be on more than one step forward in this whack-and-mole game and improve resistance standard for that kind of projects?

@SoniEx2
Copy link

SoniEx2 commented Oct 31, 2020

You could use git-bug to setup a decentralized/federated GAnarchy-based issue tracker, fwiw. While also doing decentralized development on GAnarchy. It's kinda your best bet to continue development while not excluding any potential contributors - setting up a centralized git instance (such as gitlab or gitea) on tor or i2p is a good way to exclude a lot of contributors, when there's no reason said contributors can't temporarily push their fork on github/gitlab.com/bitbucket/etc despite the risk of their repo getting taken down, so that other developers can quickly grab a copy and merge it.

@Seirdy
Copy link

Seirdy commented Oct 31, 2020 via email

@ssokolow
Copy link

@Seirdy All the methods I've seen for using git remotes involve either client-side intervention or something like a server-side cronjob.

Gitlab's mirroring support is a much more setup-and-forget solution and has support for event-driven configurations on the server side.

@xloem
Copy link

xloem commented Oct 31, 2020

to clarify, it sounds like users are interested in helping but aren't sure how to run a mirror. i heard gitolite can help with this, i haven't looked it up. there are a lot of options, easiest of which are places like codeberg and sourcehut. what's important is to start a norm of listing the mirrors in the repo; we can write scripts to sync them after that

@SoniEx2
Copy link

SoniEx2 commented Oct 31, 2020

you can also use plain HTTPS to host git repos.

@virtadpt
Copy link

Some of us use Fossil to avoid these problems. But use whatever people are willing to use.

@blackjack4494
Copy link
Owner

I haven't forgotten this topic if some may wonder.

@xloem
Copy link

xloem commented Nov 1, 2020

@blackjack4494 i've been putting some work into learning some of the internals of git-bug while playing around with making a python hack to import issues from sausage's gharchive dumps ( https://github.com/xloem/youtube-dl-1/blob/gitbugideas/devscripts/sausage2gitbug.py i'm working on that roughly constantly right now, but i don't think it's the most efficient approach ). curious if any other projects or plans are in the works.

@Redo11
Copy link

Redo11 commented Nov 1, 2020

Coming back, @xloem and @ssokolow, I know one more good service to host git repositories, encrypted etc, called Keybase. It's kinda like slack+LinkedIn+git, but is encrypted and everything stored there is safe. That would mean, we can communicate and work on code etc, but one problem is that it is also private. Next solution would be IPFS, which is P2P file hosting(Idk, how to better describe it, check it out), it is great, but also pros and cons should also be considered. Next option is going onto dark net like ZeroNet or I2P (tor and FreeNet are too slow).

@FrickTheRIAA
Copy link

FrickTheRIAA commented Nov 1, 2020

Keybase is great. I'm just unsure how long it will last since Zoom bought it to get end-to-end encryption specialists to fix their security mess and save their reputation. I wouldn't be surprised if Keybase gets shut down eventually with all of the employees working on Zoom full time. Then there's the anti-end-to-end encryption legislation that the US government is pushing that might impact Keybase, though the EU is apparently working on anti-end-to-end encryption legislation as well, so the free world is shrinking. Keybase is great, just don't get shocked if you have to migrate from there within a few years.

@FrickTheRIAA
Copy link

FrickTheRIAA commented Nov 1, 2020

I just noticed that Disroot has a Gitea instance. While I still firmly believe that self-hosting with a DMCA ignored hosting provider is better, Disroot's Gitea instance would be one of the best already existing platforms for a project like this. It's probably not immune to the RIAA long-term, but based on the links below I have a feeling that both Disroot and Greenhost (Disroot's hosting provider) would be as upset as we are if they got a DMCA takedown order for an open source project and that they would probably fight it. Extra plus to Disroot for being Copyleft and having a good privacy policy. Both Disroot and Greenhost are based in the Netherlands, which is a pretty good jurisdiction where courts have stood up to the US on occasion.
Disroot's About page
Disroot's Mission Statement
Greenhost - Internet Freedom
Privacy at Greenhost

@FrickTheRIAA
Copy link

FrickTheRIAA commented Nov 2, 2020

@Nekun I made a big mistake in my guide regarding Tor over VPN, it's not a good idea at all. Can you please update your mirror one last time? I've added links back to the original Reddit and Raddle threads into the guide now, so you won't have to update your mirror after this thanks to that.
(Sorry for this off-topic post, but I don't know how else to contact Nekun. This will be the last post like this.)

By the way, a mod managed to restore the original Reddit thread and confirmed that the thread was not removed by the /r/youtubedl mods, which means that it was indeed removed by Reddit.

@xloem
Copy link

xloem commented Nov 2, 2020

@blackjack4494 , I'm not sure how to get in contact with you. The following is from the guy who runs https://youtube-dl-sources.org/

Does youtube-dlc have any sort of DNS presence? youtube-dlc.org seems to
be unregistered. e.g. https://bugs.youtube-dlc.org could be pointed to a
hosted Gitea instance, as you mention, with some links to mirrors,
places to report bugs, etc. I'd suggest that Tom/blackjack4494 register
it, could be useful. I'm also happy to register it on his behalf...

@FrickTheRIAA
Copy link

FrickTheRIAA commented Nov 2, 2020

I'd highly suggest not including youtube in the domain name. We don't want it to get suspended because of trademark infringement.
Particularly .is domains are extremely hard to take down if registered via ISNIC directly (a lot cheaper, but doesn't accept cryptocurrency) or using a resilient registrar. It would be pretty much impossible for a properly registered .is domain to get taken down in our case since this project is perfectly legal and the only way to get it taken down is via a court order issued by an Icelandic court.
.ch and .ws are said to be pretty resilient as well in case you want a cheaper option when paying with cryptocurrency (pro-tip: Monero).
No matter what TLD you go with, use a resilient domain registrar/reseller that won't take it down the minute they get a complaint. See the "Resilient domain registrars/resellers" section in my guide for more info about that.

@Nekun
Copy link

Nekun commented Nov 2, 2020

@FrickTheRIAA added links to original posts. For any contacts watch uids of my PGP key 55FF381A18B168D3B8C798BDBCF985EA5454B928

@real-andrewr
Copy link

Hi, I tested out git-bug just out of interest. It's fairly straight forward, took maybe 30 mins to get the hang of it (in terms of setting up a bridge, pushing, pulling).

The way git-bug works is that it stores metadata in the .git directory. I was expecting something else, but this is interesting because you can just store the bugs in the code repository.

There are basically 3 models you could use, if you actually want to use git-bug to offer central bug/issue tracking.

  1. Keep issues on GitHub which is effectively authoritative. Users can create issues using git-bug if they want to (you can already do this now). Perhaps someone could run a git-bug cron job and host a backup repository separately, in case GitHub goes down; if this is the case you effectively have to go to (3).

  2. Predominantly use git-bug for bug tracking. Users can still raise bugs in GitHub. A cron job would be set up which pulls and pushes bugs between git-bug and GitHub. The advantage is obviously that users can still post bugs and reply in GitHub. The significant downside is that whichever GitHub user account is configured for the git-bug bridge will be the author of any issues and comments. For example, if xloem created an issue in git-bug and the bridge was running from my account, a corresponding issue would be created in GitHub by my account (real-andrewr). I guess you could run the bridge using an intentionally named GitHub account to avoid some confusion, ytdlcbugbot, and basically just offer a degraded experience for GitHub users.

  3. Use git-bug to import all issues from GitHub, and then just use git-bug going forward. Downside: users have to figure out how to use git-bug.

P.S. I will put up a git-bug mirror of the issues on yt-dlc anyway, so at least there is a backup of the issues against this new account. Probably will do this in the next day or so.

P.P.S. I'm hosting the mirror https://youtube-dl-sources.org. I'll move it to Gitea in the next day or two (depending on some real life things, like getting my car serviced and work) and put up a little guide on how I set it up, with scripts, cron jobs etc. in case it's useful to anyone. It's nothing too complicated, just takes time to go through all of the doco and actually do it.

@FrickTheRIAA
Copy link

@Nekun Thanks! However, my old, bad suggestion of running Tor over VPN is still there in your mirror. Could you update the whole guide so that it's a copy of the current version on Reddit? That's what I was trying to say in my previous post, but it might have sounded unclear. Thanks and sorry for the inconvenience!

@quyleanh
Copy link

quyleanh commented Nov 4, 2020

Hear something bad on xda. Please consider asap @blackjack4494

@ssokolow
Copy link

ssokolow commented Nov 4, 2020

@quyleanh That was covered on TorrentFreak two days ago.

“Please note that re-posting the exact same content that was the subject of a takedown notice without following the proper process is a violation of GitHub’s DMCA Policy and Terms of Service.

(Emphasis mine)

It's about the flurry of people reuploading the same repo that got taken down without adding any new commits to make an effort to remedy the reason for the takedown.

@w3bb
Copy link

w3bb commented Nov 4, 2020

I'm pro-email. It's decentralised and git is designed for it. If we choose any centralised solution we're right back where we started.

@SoniEx2
Copy link

SoniEx2 commented Nov 4, 2020

mailing lists are, in fact, NOT decentralized. and if you ditch mailing lists you become completely undiscoverable.

@ssokolow
Copy link

ssokolow commented Nov 5, 2020

Yeah. What makes mailing lists valuable is that they can be self-hosted (like Gitea), that they replicate all pieces of data to all clients (like git with something like git-bug to merge in tracked bugs), and that they have open-source clients available that don't have any sort of remote kill switch or revocation override.

@ohnonot
Copy link

ohnonot commented Nov 5, 2020

It seems that the "real" youtube-dl is back in business, and off github (on that other git-something).
Unfortunately, bandcamp still doesn't work on 2020.11.01.1 :(
That makes me sad, because I'd much prefer to keep (main) development centralised.
Where is the code base on this one? Will it be able to re-base itself on the "original", again and again, to avoid forking hell?

@Seirdy
Copy link

Seirdy commented Nov 5, 2020 via email

@blackjack4494
Copy link
Owner

It seems that the "real" youtube-dl is back in business, and off github (on that other git-something).
Unfortunately, bandcamp still doesn't work on 2020.11.01.1 :(
That makes me sad, because I'd much prefer to keep (main) development centralised.
Where is the code base on this one? Will it be able to re-base itself on the "original", again and again, to avoid forking hell?

initial idea was actually to keep similar/same structure as youtube-dl so you could easily merge changes back to main project. However addressing issues in youtube-dl on github got me banned. I offered them help and so did a few others but they refused as they said there are enough people having write rights.
What forking hell? A fork is for people developing their own solutions and creating PRs to merge back changes. Another case is that you start your own project on that fork and later set it up as a dedicated repository.
As far as I know there is no real active development in any other forks.
Bandcamp should work with youtube-dlc tho.

siikamiika pushed a commit to siikamiika/yt-dlc that referenced this issue Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests