Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate peer-to-peer AppImage distribution #175

Open
probonopd opened this issue Jun 27, 2016 · 82 comments
Open

Investigate peer-to-peer AppImage distribution #175

probonopd opened this issue Jun 27, 2016 · 82 comments

Comments

@probonopd
Copy link
Member

probonopd commented Jun 27, 2016

AppImage is all about easy and secure software distribution for Linux right from the original upstream application author directly to the end user, without any intermediaries such as Linux distributions. It also supports block-based delta binary updartes using AppImageUpdate, allowing for AppImages that can "update themselves" by using information embedded into them (like Sparkle Framework for macOS). Consistent with this vision, we would like to enable peer-to-peer based software distribution, so that we would not need central hosting (such as GitHub Releases, etc.) while ideally maintaining some notion of a "web of trust" in which it is clear who is the original author of the software, and that the AppImage is distributed in the way the original author wants it to be distributed.

In this ticket, let's collect and discuss various peer-to-peer approaches, that could ideally be woven into the AppImageUpdate system as well.

"IPFS is the Distributed Web. A peer-to-peer hypermedia protocol to make the web faster, safer, and more open." https://ipfs.io

Should we use it to distribute AppImages?

@davidak
Copy link

davidak commented Jun 28, 2016

That would be a really cool feature. When someone on my local network has downloaded the app already, i can download it from him. But it needs to be verified. Is there something like a has for every AppImage from upstream? Otherwise cheap IoT devices from china could send you infected AppImages.

@probonopd
Copy link
Member Author

Like, a GPG signature? Currently these are separate files (outside of the AppImage), but we could also append them to the AppImage (=make them part of the AppImage).

@davidak
Copy link

davidak commented Jun 30, 2016

I tried IPFS out the last two days and read a lot about. It has hashes integrated to find the right content. So you get the content you request.

Downloading files is easy. Here you get the lastest official Subsurface AppImage:
ipfs get QmUH4SZVdBPekZXkE77ntLknAtAjuiKsHgEW6eJzioyQyD
(you need to have the IPFS daemon running)

There is also ipget which included a IPFS node.
ipget QmUH4SZVdBPekZXkE77ntLknAtAjuiKsHgEW6eJzioyQyD -o Subsurface-4.5.6-x86_64.AppImage
https://github.com/ipfs/ipget

@probonopd probonopd added the idea label Aug 19, 2016
@probonopd
Copy link
Member Author

probonopd commented Oct 23, 2016

Also check Ethereum https://www.ethereum.org/

@probonopd probonopd changed the title Investigate IPFS for peer-to-peer AppImage distribution Investigate peer-to-peer AppImage distribution Oct 23, 2016
@davidak
Copy link

davidak commented Oct 24, 2016

@probonopd How would that help to distribute AppImages?

Another technology like IPFS is WebTorrent. You seed while you are on the website.

@probonopd
Copy link
Member Author

@davidak not sure yet; didn't check it in detail yet.

Regarding WebTorrent, who stays on a single webpage for so long? Probably more suited to video distribution than apps.

@probonopd
Copy link
Member Author

probonopd commented Nov 13, 2016

Check the Keybase filesystem: Public, signed directories for everyone in the world. https://keybase.io/docs/kbfs, very promising.

Every file you write in there is signed. There's no manual signing process, no taring or gzipping, no detached sigs. Instead, everything in this folder appears as plaintext files on everyone's computers. You can even open /keybase/public/yourname in your Finder or Explorer and drag things in.

And

Keybase can't be coerced to lie about your public keys, because each one needs to be announced, using a previous device or paper key. Together, these announcements form a chain that is announced in the bitcoin block chain.

But:

We're giving everyone 10 gigabytes. (...) There is no paid upgrade currently. The 10GB free accounts will stay free, but we'll likely offer paid storage for people who want to store more data.

@probonopd
Copy link
Member Author

@probonopd
Copy link
Member Author

probonopd commented Nov 2, 2017

Also see the Dat project https://datproject.org/ and the Beaker Browser https://beakerbrowser.com/ built on top of it. Also see https://twitter.com/probonopd/status/925106318796578818

@pfrazee
Copy link

pfrazee commented Nov 2, 2017

A few thoughts from the Beaker Browser team

  • Dat uses pubkey-addressed archives, so anything distributed with it is signed. (There's also potential support for static hash-addressed archives.)
  • Dat also maintains a changelog in its archive metadata, so it's good for tracking history & versions.
  • Dat does not currently use differential diffs in its own updates, but there are plans to investigate that in the future.
  • We use Dat in Beaker to act as a website, but it can be any form of data storage. In the next version (0.8) we will have a built in user identity concept which will use Dat archives to represent users. That will eventually be a foundation for webs of trust in the application layer -- but it will take some time for the WoT networks to mature.
  • We're switching over to electron-updater in Beaker right now, which is great because it has pluggable transports. For us, the main reason we haven't distributed Beaker over dat is the need for auto-updates. I think now that it'd be fairly trivial to write a Dat transport for electron-updater and get all the behaviors we need.

@TheAssassin
Copy link
Member

@pfrazee sounds promising. If you want to investigate binary delta updating, you can check out zsync(2), which is based on the same algorithms that rsync uses. It calculates a meta file for an existing file, containing a set of hashes (calculated by chunking a file into blocks with a specified blocksize and hashing the blocks using a specified hashing algorithm).

I'm sure it's possible for you to make use of the functionality in this library. Heck, I could even imagine zsync2 supporting Dat as a URL scheme.

@TheAssassin
Copy link
Member

We use Dat in Beaker to act as a website, but it can be any form of data storage. In the next version (0.8) we will have a built in user identity concept which will use Dat archives to represent users. That will eventually be a foundation for webs of trust in the application layer -- but it will take some time for the WoT networks to mature.

I've thought a lot about Web of Trusts recently for application deployment (AppImage related), and they'll apply to generic content as well.

Often, PGP's WoT is used as a reference for working Web of Trusts.
Their trust model works like, "I trust user A, and user A, B and C trust user Z, so I can trust Z, too, I guess."
However, this trust model is only used to verify the authenticity of a key a mail you receive is signed with, the crypto itself does not depend on it, and even if a key has no third party signatures, it doesn't mean much to the security of the communication itself. In most cases, the users know each other anyway, and trust the keys in their mail clients by validating the keys' fingerprints manually.
It's a nice idea, but isn't used by many people. Nowadays, you'd rather put your key ID into all mails you send, send them over a second channel (like a chat service or phone), or put it on your website, where people can get it and download and trust the key before writing and after receiving mails.

When building a WoT from scratch, one can use pretty much the same methods and structures PGP established. Sure, it'll take a while to get people to use it, and build a large base of trusted users so that a certain level of security is reached. The algorithms and structures are proven in the real world, and despite they haven't ever reached the majority of email users, they are secure and work fine.

However, no WoT is really immune against malicious attacks. It's fairly easy to manipulate a WoT. Let me give you an example:
By creating a few thousands of keys who then sign each others' keys (not everyone's, that'd be too obvious) and keys of all the other users (that'll make them look even more valid), you can create accounts appearing trustworthy, but have been created by some software. Time's not a factor here, the software could've been running for weeks or months.
The problem is that it is really hard to detect those as being malicious (attackers are pretty good at finding flaws in your code, especially when it's open source), and once they're in the network, there is no chance to get rid of them unless you have some central "blacklist" (which undermines the decentralization aspects of a WoT). Even if you'd support some decentralized "anti trust" feature (like some second kind of signatures which discredit a key rather than making it look trustworthy), 10 minutes of an attack could be enough to do a lot of harm in dependent systems.

Transferring those thoughts to application distribution, as said, 10 minutes can be enough for an attack to do a lot of harm for your users. As research in the field of anti virus shows, 10 minutes can be enough for something like ransomware or computer worms to spread across a lot of computers. This is similar to zero days, they can be fixed within the same 10 minutes, and even if the fix would be deployed immediately, the ransomware can have infected 100s of 1000s of computers and thus have dealt a lot of damage. I could provide a list of references, but as we've all heard of it before, I don't think it's necessary.

Therefore, I am trying to construct some more secure trust models for the AppImage ecosystem.
For AppImage's updating mechanism specifically, we could inspect the key the old AppImage is signed with, and then check whether the new AppImage's key matches the old one. In that case, we can trust it this time, and perform the update. Otherwise, we can either reject the update, or show a big yellow warning and have the user decide on it. As long as the key won't change, everything will work smoothly, but if there should be an issue, we can protect the user from any kind of attacks.

For the desktop integration via appimaged or (even better) the desktop environment itself, I'd imagine a trust model similar the one PPAs on Ubuntu established. We'd allow users to trust keys AppImages are signed with by adding them to a separate user specific key ring. (Distributions could even ship with a global keyring, such as openSUSE with it's openSUSE build service, which builds AppImages and signs them with the OBS key).
Whenever it finds a new AppImage with an unknown key, it could ask the user about whether they want to trust the key or not. AppImages provided by the same developer would then be trusted automatically, however, new AppImages (i.e., the ones not marked as executable already) could show a "first use" warning, asking the user whether they want to run the AppImage when they double click it.
When implementing the trust model I suggested, an additional security layer is put on top of this very basic security mechanism. Whenever an unknown key is encountered, the pop-up could also ask whether you'd want to trust the key. If you e.g., check a checkbox, it'd suppress further warnings for this specific key, otherwise the AppImage would still be executable, but the DE could still spawn a warning that the key cannot be trusted.

AppImageUpdate could eventually implement the same idea, by issuing a warning for unknown keys, and once they are trusted and the new file's matches the old key, the upgrade will just be performed. On a key change, it should clearly state that the new key differs from the old one, and ask the user whether to trust the new one, and whether the old one should be removed.

I think that's a fairly secure trust model for AppImage, using some established structures, being not too complicated and easy to implement by users with our existing zsync based infrastructure.

TL;DR: Coming back to Dat, I don't think a web of trust will provide any real security to your users, for the reasons stated above. People should not ultimately rely on it, and for application deployment, where foreign code is supposed to be executed on others' machines, I would never ever rely solely on a Web of Trust. For static websites and other harmless contents, it might work to some extent, but thinking of a browser, when it comes to JavaScript, things get problematic again.

So, if you design a Web of Trust which is not subject to any of those issues, please make sure to notify us, because I'm really interested in the topic. If it'd fit our needs, I'll consider using it for AppImageUpdate, too!

@pfrazee
Copy link

pfrazee commented Nov 3, 2017

We need to redefine the WoT away from how PGP defined it. The pure "human friends only" model is way too slow-moving, and the measure of transitive trust was a fairly limited form of graph analysis.

The new definition should be based on a set of features:

  • Pubkeys are used to identify all agents
  • App activity is published in signed cryptographic networks (BitTorrent, Dat, SSB, IPFS)
  • Rather than relying on in-person signatures, we bootstrap from existing CA-secured channels and rely on multiple overlapping signals to provide reasonable confidence

Cryptographic networks like Dat give a richer dataset to analyze. All interactions are logged in the network, and become signals for graph analysis. So, inconsistencies should be more detectable.

For instance, if multiple "Paul Frazees" start getting followed, a crawler should be able to notify me and I can react by flagging them. Then, as with any graph analysis, the computed trust is a matter of choosing good starting nodes (and doing good analysis).

For bootstrapping trust, we use auditable key distribution nodes, which ought to be the job of orgs and services. We can use auditable contract systems like nodevms to back these kinds of servers. They will then use CAs to identify themselves. So, again: a combination of CA-secured channels and app-generated trust signals.

Direct in-person signatures could still be used, perhaps initially only for high-risk tasks like software distribution. That would be the sort of thing where the user accounts of the org and devs have published special "trust" objects on Dat, which are in turn used by software-installers.

But-- that question is basically pushed into application space, since any app can decide how to do its trust analysis on top of the crypto networks. So, perhaps instead of calling it a Web of Trust, we need to think of it as a "Trust Platform," because we're putting trust signals into the application space as a primitive to work with.

Regarding the risk of the attack window, with any automated decision based on trust, such as installing software, there's always the option of putting in a time delay. "This software must be published for 24 hours with no 'withdrawal' signals from X set of users before being installable."

@probonopd
Copy link
Member Author

probonopd commented Nov 3, 2017

What I mean with "web of trust" is really not specific to applications but I guess has been/needs to be solved for a peer-to-peer Web browser as well. After all, an AppImage is just a file, like a HTML file. In both cases I want to have certainty that what claims to be coming from, e.g., @pfrazee (just standing in as an example here), is actually coming from @pfrazee and has not been altered in between - be it a HTML page or an AppImage. The more difficult question is whether @pfrazee can be trusted - be it with information or software originating from him. An indication may be who else is following him.

So in summary, I believe a peer-to-peer Web browser needs to address the very same questions somehow, and if they are adequately solved for Web browsing, then we can also use the very same concepts for software distribution.

Agree?

@pfrazee
Copy link

pfrazee commented Nov 3, 2017

I believe a peer-to-peer Web browser needs to address the very same questions somehow, and if they are adequately solved for Web browsing, then we can also use the very same concepts for software distribution.

@probonopd I think that's exactly right.

@TheAssassin
Copy link
Member

You're right, it's probably better to avoid calling this "Web of Trust", as I guess many people associate PGP's model with that term. I have to admit I'm not too much into blockchain technology or stuff like smart contracts which are built on top of it.

All this sounds quite interesting, but also far from being mature right now, unfortunately. Is there a roadmap, set of definitions or specifications or any other data where interested people could get informed about your plans?

I'll be thinking about what you said about the trust model that an application scenario like "app update distribution" could put on top of it. I see what you mean with the withdrawal signals, but I couldn't imagine how to realize that, since there's a paradox: You don't want to publish updates until a "crowd-sourced" trust has been reached, but how would that be possible without pushing updates to at least some users? A/B like testing might work, but the e.g., 10% of users, who would receive the update right away are put at an unacceptable risk of getting malware on their systems (they might not even be able to push a withdrawal request into the network, depending on the effects of the malware).

Right now I'm not 100% sold of the concept, but I'm confident that a constructive discussion might lead to a working model. If you could point me to a place where you discuss those things, I'll have a look as soon as possible.

By the way, I think it might be worth to talk to some bigger projects like openSUSE, too, who provide trustworthy AppImages (they sign the AppImages they publish with their pubkeys, so they might be a reasonable institution to "seed trust" in the network.

All in all, Dat and Beaker sound interesting for distribution right now, but I'd leave aside its web of trust when implementing it in AppImageUpdate, I'd rather continue to use a more conservative trust model like the one I suggested.

@TheAssassin
Copy link
Member

By the way, I'd like to invite you into our IRC channel, #AppImage on Freenode.

@probonopd
Copy link
Member Author

probonopd commented Nov 3, 2017

What establishes trust today?

  • An "official" domain. Downside: anyone can go register getbeakerbrowsernow.org - which user checks the whois records?
  • https certificates. Downside: in reality, anyone who was able to register getbeakerbrowsernow.org will get https certificates for it
  • Google rank. Downside: SEO experts can game the system and get getbeakerbrowsernow.org in the top spot
  • GitHub stars. Downside: Given enough evil intentions, these can likely be faked too, but it takes much more effort because we can check out who starred a project. This has "web of trust" aspects
  • GPG or other types of signatures. Downside: Do I really know whether the signature belongs to who claims to be the person? Should I trust a GPG signature belonging to getbeakerbrowsernow.org?
  • Software in a distro repository. Downside: Does not scale. By far not every software on the planet will be in a distro repository in all versions, including continuous builds. For enterprise stable distributions, there is only outdated applications in the repositories. Hence, most software ends up in additional third-party repositories like PPAs or personal repositories on OBS. Who really checks their integrity?

What might establish trust in the future?

@pfrazee
Copy link

pfrazee commented Nov 3, 2017

All this sounds quite interesting, but also far from being mature right now, unfortunately. Is there a roadmap, set of definitions or specifications or any other data where interested people could get informed about your plans?

No, this is just a set of ideas we're forming as we build with dat & beaker. I agree that it's too early to go into production-mode with using a new trust model on top of Dat. I think Dat's a great protocol to distribute images, but I'd still use existing code signature techniques on top of using Dat.

You don't want to publish updates until a "crowd-sourced" trust has been reached, but how would that be possible without pushing updates to at least some users?

That's not I'm suggesting there. You'd already have a trust network established for the release: that is, the pubkeys you trust to publish or withdraw a release. The purpose of the delay would be to give the owning orgs a chance to notice and react to a compromise in those trusted actors.

So, a simple example scenario that could work right now: you have an app you build, and the .appimage is signed by your dev laptop (1 sig). Somebody steals your laptop and publishes a compromised version. If there was a 24 hour delay before clients auto-download the update, that'd give you time to access the .appimage host and take down the bad version.

Same idea here.

By the way, I'd like to invite you into our IRC channel, #AppImage on Freenode.

Joined!

@pfrazee
Copy link

pfrazee commented Nov 3, 2017

I wrote an article a while back, when I was working on SSB, that tried to summarize a lot of reading I did on trust and reputation analysis. It's overly dense, but the research I linked to was good http://ssbc.github.io/docs/articles/using-trust-in-open-networks.html

Reacting to some of your points @probonopd

HTTPS & DNS do have the problem you mention -- you can phish using "close enough" domain names. It happens pretty frequently.

Graph & reputation analysis - The issue of "SEO gaming" is true. The Advogadro project (see my article) had decent success. It depends on the usecase; if false positives/negatives are dangerous, then you can use graph analysis more as a suggestion.

Stars & user signals - If you filter the stars/signals by "people you follow" or "people in your network" or some similar tool, you improve the value of that signal, but lose potentially good source that you're just not connected. This is why you might want a single node to try to globally crawl and rank everybody -- they can potentially tell you which stars to trust and which ones not. How?
Basically, what you're doing is having the crawler try to define the "best people in my network," and then use that set to filter signals such as stars (and therefore cut out the spam). Again, check out advogadro or pagerank (in my article). Graph analysis is a way to expand your network of trust without having to manually evaluate each new connection.

@probonopd
Copy link
Member Author

Someone already had mentioned this idea 12 years ago in an article about klik (AppImage predecessor):

it's a good idea to integrate a p2p network on it, such as bittorrent, so that once it's popular, the servers aren't down because of too much people downloading, or you start getting problems of connection. It would be nive to kind of force people using p2p in this case.

https://dot.kde.org/comment/44508#comment-44508

@probonopd
Copy link
Member Author

probonopd commented Dec 4, 2017

User stories

  • As a user in a local network, I want downloads (and ideally delta updates!) to be fast and efficient because not everything gets downloaded from the Internet, but instead as much as possible is downloaded from the local network. Ideally, I don't notice anything special when downloading and/or updating, except that things are fast even though my Internet line is not. This may be extremely important for users with local networks but slow network connections
  • As a user in a country from where access to GitHub Releases and similar locations is slow, I want to still have fast downloads and updates
  • As a creator of an AppImage, I want to have the option of just sharing it with friends without needing to upload it to some server. I want to be able to share AppImages without having to sign up somewhere, and without having to make the AppImage public to the world (i.e., only who knows the download link/hash/... should be able to download)
  • As an author of a popular software, I want to make it available to millions of users without having to worry about server cost or being billed or blocked from GitHub Releases and the like for over-usage
  • As the user, all of this needs to be super simple for me. I should not have to do anything else but to switch on the "Use p2p" switch
  • As another user, I don't want to use p2p for whatever reason; for me, the whole system should also function without it. I am fine with downloading from p2p-to-http gateways like https://ipfs.io/ipfs/ though because that is normal traffic for me
  • As a LibreOffice tester, I want to go from LibreOffice nightly to the next nightly without having to re-download the whole thing every day - just the few parts that have changed... (note: for this we have a working solution, currently using zsync2 and HTTP Range Requests)

This implies:

  • The same AppImage should always result in the same hash, so that if two users share the same AppImage (without knowing from each other) the network should be intelligent enough to treat the shared file as one and the same (ipfs does that)
  • The p2p mechanism should work with AppImageUpdate (the local and public ipfs daemons appear to support HTTP Range Requests so much of the existing logic could stay in place)
  • The zsync2 file needs to be stored at a mutable location, i.e., the author must be able to update the content of the same URI. echo "blah" > /ipns/local/blah.txt ipfs-inactive/faq#232 (comment)

Option 1: ipfs

Written in Golang, which means one single binary runs without much hassle pretty anywhere.

There is even a C implementation: https://github.com/Agorise/c-ipfs

To be investigated: Just running the ipfs daemon without using it seems to significantly slow down other download traffic on the machine/in the network.

Setting up ipfs

'/home/me/Downloads/go-ipfs/ipfs' init
'/home/me/Downloads/go-ipfs/ipfs' daemon

Adding an AppImage to ipfs

Of course, appimaged would do this automatically if it detects ipfs is on the $PATH and/or is a running process.

/home/me/Downloads/go-ipfs/ipfs add -q '/isodevice/Applications/AppImageUpdate-8199a82-x86_64.AppImage' | tail -n 1
QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB

# Everyone who would add the exact same version of `AppImageUpdate-8199a82-x86_64.AppImage` would get the exact same hash
# TODO: Find out how the hash is calculated

Download this through the browser

http://localhost:8080/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB

https://ipfs.io/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB <-- global link

Works! But only as long as the machine is online. To change that:

http://ipfsstore.it/submit.php?hash=QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB

This will store it for 30 days and longer if someone sends BTC to the address displayed.

Now, to make this into a redundant cluster, we could set up
https://github.com/ipfs/ipfs-cluster/ - since one can set up redundancy and automatic replication, we could probably use the cheapest hosting we can find...

Range requests are apparently supported:
https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Byte_serving.html

See web interface

http://localhost:5001/webui

ZeroConf

_ipfs-discovery._udp is already implemented for looking up other ipfs daemons on the local network, ipfs/kubo#520. Code: https://github.com/libp2p/go-libp2p/blob/e4966ffb3e7a342aaf5574d9a5c0805454c07baa/p2p/discovery/mdns.go#L24

It is not used to announce files on the LAN, however (we would need to do this ourselves).

Delta updates

https://ipfs.io/blog/17-distributions/ says:

It may also make downloading new versions much faster, because different versions of large binary files often have lots of duplicated data. IPFS represents files as a Merkle DAG (a datastructure similar to a merkle tree), much like Git or BitTorrent. Unlike them, when IPFS imports files, it chunks them to deduplicate similar data within single files. So when you need to download a new version, you only download the parts that are new or different - this can make your future downloads faster!

So, it looks like that while we can continue to use zsync it may not even be needed?

Deduplication between different AppImage files

Asked for opinions re. intelligent chunking for better deduplication on the IPFS forum, https://discuss.ipfs.io/t/ipfs-for-appimage-distribution-of-linux-applications/1553

On IRC #ipfs, someone pointed out:

probono > Could we have IPFS do the chunking of the Live ISO's squashfs based on the individual files that make up a Linux Live ISO? (Or AppImage)
whyrusleeping > kinda like the tar importer
probono > whyrusleeping: with the tar importer, can i get the "original tar" back out of the system?
probono > which a matching checksum?
whyrusleeping > probono: yeah, with the tar export command

Similar: ipfs/kubo#3604

Potential AppImage workflow

  1. User installs ipfs using whatever method he wants (e.g., we could also bundle it in the appimaged AppImage)
  2. User opts into p2p sharing
  3. We could optionally check the AppImage for metadata (e.g., license information) that allows p2p sharing
  4. appimaged execs ipfs add -q 'Some.AppImage' if it is on the $PATH
  5. appimaged gets back QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB
  6. For LAN: appimaged announces QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB on the local network with Zeroconf (probably in a JSON feed together with some metadata such as the filenames etc.)
  7. For WAN: zsyncmake2 calculates QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB as well and puts it into a custom header like X-ipfs-hash
  8. For WAN: zsnyc2, when seeing X-ipfs-hash and when having ipfs on the $PATH, tries downloading from http://localhost:8080/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB; else downloads as usual; if that fails, downloads from https://ipfs.io/ipfs/QmZKVvm9jdF7TTfg8LEWMMsoinDxJEFMVybfzGUfs3dkKB

ipfs-inactive/faq#59

Option 2: Hook libp2p into zsync2

Viable? Pros? Cons?
AppImageCommunity/zsync2#15

Option 3: dat

To be written

Written in nodejs, which means npm and friends are needed to set it up.
A C library is still in a very early stage: https://github.com/mafintosh/libdat

Pros and cons

From https://docs.datproject.org/faq:

How is Dat different than IPFS?

IPFS and Dat share a number of underlying similarities but address different problems. Both deduplicate content-addressed pieces of data and have a mechanism for searching for peers who have a specific piece of data. Both have implementations which work in modern Web browsers, as well as command line tools.

The two systems also have a number of differences. Dat keeps a secure version log of changes to a dataset over time which allows Dat to act as a version control tool. The type of Merkle tree used by Dat lets peers compare which pieces of a specific version of a dataset they each have and efficiently exchange the deltas to complete a full sync. It is not possible to synchronize or version a dataset in this way in IPFS without implementing such functionality yourself, as IPFS provides a CDN and/or filesystem interface but not a synchronization mechanism.

Dat has also prioritized efficiency and speed for the most basic use cases, especially when sharing large datasets. Dat does not make a duplicate of the data on the filesystem, unlike IPFS in which storage is duplicated upon import (Update: This can be changed for IPFS too, ipfs/kubo#3397 (comment)). Dat's pieces can also be easily decoupled for implementing lower-level object stores. See hypercore and hyperdb for more information.

In order for IPFS to provide guarantees about interoperability, IPFS applications must use only the IPFS network stack. In contrast, Dat is only an application protocol and is agnostic to which network protocols (transports and naming systems) are used.

For investigation

Deduplication between different packages

Wouldn't it be cool if e.g., all AppImages containing Qt could deduplicate data? Check ipfs/notes#84 where it talks about deduplication.

AppImageHub data

Coud probably be decentralilzed in a database as well, e.g., using https://github.com/orbitdb/orbit-db/blob/master/API.md

@whyrusleeping
Copy link

@probonopd the chunking we talked about in irc could be pretty useful here. As a quick hack I would be interested to see what sort of deduplication you get across different images using rabin fingerprinting: ipfs add -s=rabin. This uses content defined chunking and should ideally produce a better layout that the default fixed width chunking (at least for this usecase).

If you add two different files with the rabin fingerprinting, you could do ipfs refs -r <hash> on each (which lists each block) and see how many hashes are the same between each file.

@KurtPfeifle
Copy link
Contributor

@whyrusleeping:
If that indeed would work, it would be a pretty cool feat!

@whyrusleeping
Copy link

@KurtPfeifle do you have a list of images somewhere I could download and try this out on?

@KurtPfeifle
Copy link
Contributor

@whyrusleeping:

A list of crowd-sourced AppImages and their respective download locations is here:

@pinnaculum
Copy link

@pinnaculum Well it's not WebTorrent(Where the client is entirely in the web), It's actually Bittorrent with Web Seeding but I guess WebTorrent is no different. You see now we have a very decentralized way to distribute and release updates for AppImages. All you need is a HTTP Web Server for the delta update(using Zsync). Zsync and BT are very similar so I chose BitTorrent. Since AppImages are static anyways. The update process is simple, The updater downloads the zsync meta file from HTTP server(the same as downloading a .torrent file from somewhere) and construct the new file with the old file as seed files using zsync(torrent clients resumes downloads but it does not construct it from unrelated files), Download the remaining blocks from somewhere, like an HTTP server. So All I did is replace the HTTP Server part with Torrent Swarm.

Thanks for explaining it, great job. What's concerning is this part:

"The updater downloads the zsync meta file from HTTP server"

So there's an HTTP server involved, that you depend upon for this process. Which also means that you'll depend on DNS, and that you'll have an address/url hard-coded somewhere ? or am i wrong

the main one being that you can easily add a mechanism inside the AppRun to automatically import/verify the AppImage that is being run (using ipfs -n on $APPIMAGE, no need for a daemon).

Does this require the user to connect to the internet because that would be a deal breaker for me.

The user doesn't need to be connected. Agreed this would have been a deal breaker !

The other advantage is that you could use pubsub (IPFS publish/subscribe) to publish messages (probably JSON) on AppImage metadata (on an official topic), for example you'd have a message to distribute a new AppImage, that would provide the image CID (the hash), software name, author etc.. whatever. That way it's then trivial to build a fully distributed index web service of the all AppImages, using a pubsub service that will analyze these messages and create an index, ideally using raw DAGs, which can be associated to an IPNS key to have a single "index" address.This service can analyze the AppImage and verify its authenticity, then pin it. There would need to be a policy to express for how long, and how many AppImages are kept pinned etc ..

This way you reduce the workload of having to maintain a centralized index and you have a fully distributed index service with very little maintenance involved.

Are you talking about AppImageHub? But AppImages was intended to be distributed by Authors from their own website. Like how you distribute .exe, .dmg

AppImageHub was built to just showcase all AppImages, IIRC. Not a repo of any kind which AppImages stand against.

Yes i was referring to AppImageHub, and how you could build a decentralized hub, but this was just to explain the kind of things you could easily build using pubsub.

The distribution of images is the best part: AppImage creators then don't need to depend upon centralized services (cloud providers for example) to store the images, they're content-addressed, easily replicated, and can be retrieved either directly from IPFS or through any public IPFS HTTP gateway. Creators will just reference the CIDs. With torrent you also have easy replication, but i don't see how you ensure replication of the zsync meta files ? Would users require a bittorrent client to download the AppImage ?

I'm not advocating any solution over another, just sharing ideas, it's an interesting topic.

@antony-jr
Copy link

So there's an HTTP server involved, that you depend upon for this process. Which also means that you'll depend on DNS, and that you'll have an address/url hard-coded somewhere ? or am i wrong

Yep. The update information is embedded into the AppImage(hard-coded), See https://github.com/AppImage/AppImageSpec/blob/master/draft.md#update-information

Would users require a bittorrent client to download the AppImage ?

No they will not require a Bittorrent client, They just need to use AppImage Updater which uses libtorrent-rasterbar under the hood. But it is worthy to note that the torrent client is only used to download the remaining blocks. The zsync algo is used to construct a new file from the old version. So AppImage Update = Zsync + BitTorrent

I'm not advocating any solution over another, just sharing ideas, it's an interesting topic.

Well I would like to get into IPFS too. Maybe I will try researching on it more to make a Zsync + IPFS combination. So users get to decide which one to use.

@pinnaculum
Copy link

Ok just saw the update information section in the specs. So in the update information field you put an http link to a .zsync metadata file ? This is a good system because HTTP uses location-addressing and the URL will not change. Only downside is that these links are not allowed to break (or you can't ujpdate/sync). You could have .zsync files stored in IPFS but it wouldn't work to store the CIDs in the update information, since once the zsync metadata changes, the associated CID will change (and using IPNS would be way too fragile).

zsync looks awesome and i'm sure it's lightweight.

@antony-jr
Copy link

zsync looks awesome and i'm sure it's lightweight.

Yes. It's very lightweight.

@pinnaculum
Copy link

Now regarding security, let's take one of the worst-case scenarios. Someone publishes an AppImage, the update information in the image contains an http/https address to a .zsync file, let's say zsync|https://banana.org/software/Banana-latest-x86_64.AppImage.zsync. Somehow the banana.org server gets hacked (or someone gets control of the domain and changes the IPs in the DNS) and the hacker replaces the .zsync metadata file with a carefully crafted one which will make the user who updates the software, download/sync an "infected" image via zsync.

Is there anything in the updating process, or in the way zsync works, that can prevent such an attack ? If not then that's something to think about.

@probonopd
Copy link
Member Author

probonopd commented Oct 24, 2020

Looks like some of the larger applications that are already using AppImage are also already using FossTorrents:
https://fosstorrents.com/softwares/

https://twitter.com/FossTorrents/status/1320121510389075968

Maybe this could be integrated.

@TheAssassin
Copy link
Member

@pinnaculum why is that "something to think about"? This "remote server gets hacked" scenario is omnipresent. What if the initial AppImage you download was already altered by an attacker? I don't see what's "to be thought about" there. The vast majority of all AppImages are located next to the .zsync file anyway. As a downloader, you have no chance to see if the file was altered. But you also have no reason to believe so.

Not to mention, if you sign the AppImage, AppImageUpdate will require any update to be signed properly with the same key as the existing one.

I don't see the big threat you seem to see there... this reminds me a lot about that weird "let's create digests for our downloads, which we include in the index.html we place next to the files". No security gained, only users wasting time and energy checking the digests against the files they downloaded.
They only ever made sense on systems like Mirrorbrain, where the main instance redirected you to other servers.
And actually, AppImageUpdate also supports RFC 3230 and RFC 5843, which secure such scenarios as well (if the redirecting server supports them, of course).

Any form of "distributed updates" has rather serious privacy issues. If all your AppImages are shared through BitTorrent, IPFS or whatever, an attacker can easily guess what applications you have and tailor attacks on you. I don't see the issue with having centralized update servers. Very often, it's cheap people who refuse to set up hosting themselves (because it costs money or whatever) who vouch for "distributed updates" because "they can't be taken down" or whatever. Well, self-hosted files are also not "taken down" by some large company.
What bugs me even more is, with IPFS/BitTorrent/..., the storage and bandwidth required doesn't just magically pop up. With a central server, it's easy to handle those costs. But with "distributed" systems, people with e.g., metered bandwidth would be excluded easily. I've seen discussions about this in the context of PeerTube on mobile devices, where the upstream is not only metered but also limited usually. The pragmatic response from the authors of PeerTube AFAICS is to disable the torrent part on mobile devices.

I would never enable any such mechanism, and I don't see how they would help the world. They add complexity, they take users' privacy at risk, they open new attack vectors, they don't work with metered bandwidth and they also require infrastructure and maintenance.

@antony-jr
Copy link

antony-jr commented Oct 25, 2020

Somehow the banana.org server gets hacked (or someone gets control of the domain and changes the IPs in the DNS) and the hacker replaces the .zsync metadata file with a carefully crafted one which will make the user who updates the software, download/sync an "infected" image via zsync.

@pinnaculum If a server ever gets hacked then that's the end of that organization. What happens if Github gets hacked, everyone is screwed. See Firefox Monitor. Some of the top sites are already been hacked and some passwords and personal info has been exposed to the hackers.

Is there anything in the updating process, or in the way zsync works, that can prevent such an attack ? If not then that's something to think about.

There is GPG Checks. Which compares the signature in the old AppImage and the new one. But if the Server itself is hacked then the newly released AppImages can be crafted to do anything. But you can run AppImages in Firejail.

Any form of "distributed updates" has rather serious privacy issues. If all your AppImages are shared through BitTorrent, IPFS or whatever, an attacker can easily guess what applications you have and tailor attacks on you.

@TheAssassin Yes it does have but it is virtually impossible to share data between peers without knowing IP address of the peers. You can also use VPN or TOR with BT.

EDIT : So you can guess what AppImage I'm downloading now through BitTorrent? How did you associate a specific IP to my identity?? If I used a VPN or TOR what good does IP reveal of my identity?

. I don't see the issue with having centralized update servers. Very often, it's cheap people who refuse to set up hosting themselves (because it costs money or whatever) who vouch for "distributed updates" because "they can't be taken down" or whatever. Well, self-hosted files are also not "taken down" by some large company.

Even with Hetzner you get 20 TB bandwidth. And there is time required to maintain the server. Hetzner does take down your server without notice if they think something is bad. See https://www.reddit.com/r/hetzner/comments/j8jid3/hetzner_poor_service/

With AWS you can get a bill of $2000 in just one month if you screw something up. Or if your API key is somehow exposed.
Also not everyone has that kind of money to host their own server for open source software.

What bugs me even more is, with IPFS/BitTorrent/..., the storage and bandwidth required doesn't just magically pop up.

You can limit bandwidth usage in BitTorrent. I don't know about IPFS. I don't see any storage usage with BitTorrent other than the space required to store the file you are downloading.

The pragmatic response from the authors of PeerTube AFAICS is to disable the torrent part on mobile devices.

Any P2P solution used will be asked for permission by the user. And Will be disabled by default.

I would never enable any such mechanism, and I don't see how they would help the world.

@TheAssassin It's your opinion and it's a free world. You can choose whatever you want and that's the good thing in Free Software. Everyone is different. I not saying BT/IPFS is perfect but nothing is perfect, even HTTP.

They add complexity, they take users' privacy at risk, they open new attack vectors, they don't work with metered bandwidth and they also require infrastructure and maintenance.

There is no complexity in the part of users. If you fetch from a self hosted server or Github, does it not take user's privacy at risk? I mean if the organization or the user hosting the server can sell the data. You don't want to use P2P in metered bandwidth and it is recommended IMO. What infrastructure and maintenance are you talking about? With BT it's just one file to share. Or a magnet link.

@antony-jr
Copy link

@pinnaculum I tried your IPFS browser -> https://github.com/pinnaculum/galacteek and it seems that IPFS needs a lot of time to start. IPFS is a FS which is a overkill for one file but might be a good choice for pmOS(postmarketOS). But I like that you have added a seed option for the AppImage itself. IPFS does seem to have more features than BT.

@antony-jr
Copy link

Looks like some of the larger applications that are already using AppImage are also already using FossTorrents:
https://fosstorrents.com/softwares/

https://twitter.com/FossTorrents/status/1320121510389075968

Maybe this could be integrated.

This is interesting, Because the zsync meta file and everything is just included in the torrent file. So a new update mechanism with just a magnet link might be possible. Imagine embedding just the torrent magnet link. Everything is decentralized then. No need to ping Github which logs everything. But the only downside is that, The torrent should be alive.(i.e Should have decent seeders for the magnet link to work)

@TheAssassin
Copy link
Member

There is GPG Checks. Which compares the signature in the old AppImage and the new one.

As I pointed out before, AppImageUpdate validates the signature, then compares the keys. And it's entirely optional, it's not the the norm.

But you can run AppImages in Firejail.

Firejail doesn't really add a lot of value. The default profile is quite permissive. What would really be required is app-specific profiles.

I proposed a concept how this could be implemented without having to trust the AppImage you've downloaded, similar to what Android provides. This, however, requires some integration in the system, which currently only AppImageLauncher can provide. TheAssassin/AppImageLauncher#99 (comment)

No need to ping Github which logs everything

So? Then you allow anybody on the planet to track everyone accessing that torrent, both uploaders and downloaders. And you ignore the fact that your model still relies on web seeds, i.e., HTTP servers.

You can also just self-host your AppImages on your servers, and disable logging or be GDPR compliant. On my servers, all logs are pseudonymized (I zero out some bytes of every IP address; I can't disable logs entirely, otherwise I couldn't handle DoS attacks any more other than banning all traffic).

If you fetch from a self hosted server or Github, does it not take user's privacy at risk?

Don't constantly limit the discussion to GitHub. I've talked about self-hosting as well. That argument is pretty hypocritical, as we have the discussion on GitHub and we all apparently trust them to some extent.

If it were my choice, we'd host all AppImage software (the build artifacts) on my server for users to download. But @probonopd doesn't want that, claiming GitHub was so much better...

I mean if the organization or the user hosting the server can sell the data.

First, that would be illegal without consent, and at least they have a privacy policy I can inspect. Torrents don't have such facilities. At least you know who you can sue if you notice violations.

Second, only they create logs. In a distributed world, everyone can just track all accesses to a resource. The danger is not the logging, but the attack vectors which are opened subsequently. Show me the apps you use, and I'll be able to check for exploits and infect your computer with malware. Especially with older AppImages (whose libraries are never updated, of course), there's surely plenty of bugs that could be exploited.

My argument is that this information should not be leaked. The average user does not know about these informations. Though, you enforce them sharing those information with the world.

And if you don't make them upload (e.g., while the computer is idle), then you can just continue to use some HTTP server, BitTorrent just adds unnecessary complexity on top.

You also didn't explain how you'd handle metered connections. You can't easily recognize them and stop uploading content in those cases? Metered connections are omnipresent. There's 4G routers in households in rural areas for instance. Some people might just connect via a smartphone using tethering. People will likely not notice that their bandwidth is consumed by some distributed stuff, which you don't even seem to be willing to make transparent to the user (like that'll be "too complicated" and will overwhelm the users or whatever).

Any P2P solution used will be asked for permission by the user. And Will be disabled by default.

Really? The entire thread reads like that was the holy grail solution, and it could/should be set up without any alternatives as fallback. How would you ensure the fallback anyway? In the end, it's the application authors who would have to maintain two systems then.

Are you really going to provide all pros and especially the cons in some dialog asking the user to enable such an option? I highly doubt it, looking at the systems built previously in the AppImage context. It'll probably be just "do you want to enable BitTorrent", without any additional information.

You're talking about non-tech-savvy users here, who tend to just click "yes" on every dialog without understanding the consequences. I'm sure you know more than one relative who doesn't read messages or really understand what they're doing, but just replays behavioral patterns without thinking about what they do...

How did you associate a specific IP to my identity?? If I used a VPN or TOR what good does IP reveal of my identity?

Do I need to know your identity? No. I just need to understand how I can intrude your computer. Once in, I can read all your personal data easily.

And using a VPN or Tor only protects you to small attackers. Most VPN providers have extensive logs (there's been many examples, and even if they claim they don't, you can't prove it, you can only hope so). Only Tor can protect you really well, but Tor doesn't want P2P traffic in their network. It just slows down people who rely on it to get in touch with the rest of the world. Using Tor to mitigate the design flaws of your poor, hardly thought out P2P scheme is just abuse.

Even with Hetzner you get 20 TB bandwidth. And there is time required to maintain the server. Hetzner does take down your server without notice if they think something is bad. See https://www.reddit.com/r/hetzner/comments/j8jid3/hetzner_poor_service/

With AWS you can get a bill of $2000 in just one month if you screw something up. Or if your API key is somehow exposed.
Also not everyone has that kind of money to host their own server for open source software.

First of all, I have no traffic limit at all on my server, but it costs a lot more than yours. Even 20 TB are pretty much for the few Euros you pay. (I mean, you can't expect unlimited traffic for those prices, it all costs money; GitHub etc. surely have costs as well).

That Reddit link actually shows the opposite of what you wanted to prove. It shows it was the user's own fault, and they were able to recover it. And there's even statements showing happy customers...
Don't try to prove your point with unconfirmed Reddit links. It just damages your own credibility.

And: everyone hosting with AWS must've never made any comparison to other hosters. They're just very much overpriced. That's not an argument, at all.

What infrastructure and maintenance are you talking about? With BT it's just one file to share. Or a magnet link.

There's the web seed that needs to be maintained as well, potentially also trackers (as they really speed things up, DHT alone is really slow). With IPFS, there's servers to be set up as well unless you want to just rely on your users to provide all the upload bandwidth (which you probably shouldn't). Then there's the full download you need to offer as well, for people who want to initially download the files. It's not like with BT/IPFS, you'd have absolutely no infrastructure. In fact, you likely just rely on other people's systems who keep everything running.

@pinnaculum
Copy link

@pinnaculum why is that "something to think about"? This "remote server gets hacked" scenario is omnipresent. What if the initial AppImage you download was already altered by an attacker? I don't see what's "to be thought about" there. The vast majority of all AppImages are located next to the .zsync file anyway. As a downloader, you have no chance to see if the file was altered. But you also have no reason to believe so.

Agreed, this scenario is not specific to the usecase of zsync. I suppose you just have to trust the update you're getting.

@antony-jr

This comment has been minimized.

@TheAssassin

This comment has been minimized.

@antony-jr

This comment has been minimized.

@antony-jr

This comment has been minimized.

@TheAssassin

This comment has been minimized.

@TheAssassin

This comment has been minimized.

@antony-jr

This comment has been minimized.

@antony-jr
Copy link

antony-jr commented Oct 25, 2020

There's the web seed that needs to be maintained as well, potentially also trackers (as they really speed things up, DHT alone is really slow).

We can use trackers from archive.org and the web seeds can be hosted in Github or really cheap hosting with low bandwidth. There is also cheap seed boxes which can be easy to maintain and cheaper than a server. Then you don't really need Github at all. It just a magnet link away.

Do I need to know your identity? No. I just need to understand how I can intrude your computer. Once in, I can read all your personal data easily.

So you can intrude my computer with just knowing what software I use?? There is really very very small chance if you could do this. Well it is only possible with users who configured their router or computer improperly. IPs are dynamic anyways, So you have to find a vulnerability in a very limited time.

EDIT: There are static IPs too. I recommend not using BT with static IP.

@TheAssassin, you suggest that we never distribute with P2P? I mean it is virtually impossible to share data without knowing IPs of peers and the IPs has to be made public.

@TheAssassin
Copy link
Member

TheAssassin commented Oct 25, 2020

We can use trackers from archive.org and the web seeds can be hosted in Github or really cheap hosting with low bandwidth.

See, now that's yet again infrastructure. It's not infrastructure-less. It's "not just a magnet link away", you need to upload several files to the right locations with static URLs and only then can craft your magnet URL.

And using other people's infrastructure to power your updates is also not very nice. Self-hosting should be preferred.

So you can intrude my computer with just knowing what software I use?? There is really very very small chance if you could do this.

We've seen such an attack ourselves recently in appimaged and libappimage. Detailed analysis will follow soon-ish. The attacker just had to make you download something that looked like an MP3 file, but in fact was an AppImage which triggered appimaged. Upon integration, due to a bug, it could overwrite files in the system. The author crafted their attack by overwriting e.g., a system's file manager's desktop file and icon, waiting for a user to click it, et voilà, you've got malware running. But there's also other ways to get this malicious software to run on the system.

Such bugs are not new at all. For example, Wireshark doesn't recommend you to run it on a production system, as they've got a history of bugs which allowed attackers to take over the system. The code base is large and flaws are to be expected. They recommend themselves to use e.g., tcpdump to capture the data (a tool with a much smaller amount of code), then put it into e.g., a VM, and run Wireshark on it while not connected to the Internet.

So yeah, knowing which versions of which AppImages you have gives me a lot of context information for free which I usually would not have.

Another example where such information is useful is web application security. It's much easier to search for known exploits if I know the version of a web application which is running somewhere. Many applications therefore don't publish the current version a server is running on the index page any more.

Minimizing the amount of meta information is key in making an attacker's life harder. After all, even encryption is just about making it hard for an attacker to get the plaintext. It's not impossible, just very, very difficult.

Well it is only possible with users who configured their router or computer improperly. IPs are dynamic anyways, So you have to find a vulnerability in a very limited time.

As just pointed out, the idea is not to intrude your network by your IP. And I'm also not describing a mass attack. I'm describing an attack that would be tailored to a small amount of users.

The point is, why even publish something as sensitive as the list of applications?

By the way, getting your IP isn't that hard. If I want to attack you, I'd do what phishing has always done: try to get you visit some website.

I have by the way not declined there might be advantages, too. I just see security and privacy issues which I consider unacceptable.

@pinnaculum
Copy link

We can use trackers from archive.org and the web seeds can be hosted in Github or really cheap hosting with low bandwidth.

See, now that's yet again infrastructure. It's not infrastructure-less. It's "not just a magnet link away", you need to upload several files to the right locations with static URLs and only then can craft your magnet URL.

The AppImage creator chooses the distribution mechanism and what it implies in terms of "infrastructure" (which in the case of zsync is almost nothing let's be honest). Since the "distributor" has that choice there just needs to be good documentation on what is needed to maintain reliable distribution of the AppImage.

I don't see any problems here .. HTTP works for you ? go ahead. Wanna try BT ? you have that choice

And using other people's infrastructure to power your updates is also not very nice. Self-hosting should be preferred.

That's a never-ending debate and most people don't have the means to do pure self-hosting they'll just use github or whatever comes next.

@antony-jr
Copy link

I don't see any problems here .. HTTP works for you ? go ahead. Wanna try BT ? you have that choice

Yeah. I think it's the choice of the user and the distributor. In a local network Decentralized is more helpful and very secure.

@antony-jr
Copy link

@TheAssassin I need to make a statement. First of all you need to associate a IP to an identity in very little time to know if a person uses a specific software, So you saying, When using BT you publish the list software you use to the world is just nonsense. You say you don't need the identity because your attacks will be IP based, (i.e) You only need to know if a IP uses a software which has vulnerability such that it could be hacked externally with just the IP address. Can you please give a example that such software exists in the first place?? I mean a software which has a vulnerability such that a person can be hacked externally with just IP must not be used at all with a normal computer(maybe a virtual one). In your example you demonstrate the vulnerability in appimaged. Let's say you know IP 10.0.0.2 is downloading a vulnerable version of appimaged, Now how can you convince someone at 10.0.0.2 that they need to download the malicious MP3 file with social engineering or some other means, How would you send this software to someone at IP 10.0.0.2 in short time before the IP changes. Now after a minute or two the one with IP 10.0.0.2 is going to change.

Also I proposed the use of VPN because it's cheaper and better than Github tracking you. You said that VPN makes logs and tracks but you contradict yourself.

First, that would be illegal without consent, and at least they have a privacy policy I can inspect. Torrents don't have such facilities. At least you know who you can sue if you notice violations.

Most VPN providers have extensive logs (there's been many examples, and even if they claim they don't, you can't prove it, you can only hope so)

Why can't you sue VPN providers because VPN providers also have privacy policy that you can inspect.

Even when using Github, I advice everyone to use VPN or TOR if they care about privacy.

And if you say there is some old version of the library exists or software exists in AppImage that can enable a hacker to intrude with just knowing your external IP then I think all the big companies already have a backdoor(like with Windows) to all users who uses Linux not just AppImages. Because LTS version of Ubuntu has old libraries and software, Some don't update often

@antony-jr
Copy link

Even if you know the IP and timestamp, to know the actual person you need the help of the ISP, AFAIK that kind of power is only with the government. In some countries they explicitly block BT bandwidth like in the US(Only some ISP do this). But BT has a lot of ways to hide the signature. VPN can solve all this problem but a government can quite literally do anything to you.

@TheAssassin
Copy link
Member

TheAssassin commented Oct 26, 2020

You only need to know if a IP uses a software which has vulnerability such that it could be hacked externally with just the IP address.

I never said that, and I don't want to repeat myself. It's described in the previous comments. Your "example" is completely unrelated and describes some other attack that I did not refer to. I never said you need the exact identity, you just need to be able to match IP addresses. Anyway, this is off topic, and I don't think you understand what I mean...

Your description is flawed anyway. The IP address of a person regularly doesn't change that often. Not even with Tor (the exit node doesn't change to protect your anonymity, actually). External IPv6 addresses are often static nowadays for single computers even, as routers don't do NAT any more. And due to the lack of new IPv4 addresses, many providers now use "DS lite" (dual stack lite), where the NAT gateway is shared by many users and remains the same for at least one day (but, with many providers around here, it could be weeks).

Also, the more special the resource you're trying to access (i.e., the less people access it), the more easy it is to recognize and match related requests. There's a reason there's Tor browser, which uses a predefined window size in order to make sure all users have one out of 3-4 fingerprints. Deanonymizing Tor users isn't that hard, unless they really just use larger websites. The larger the sea is you swim in, the harder it is to find you.
Edit: you might want to watch https://media.ccc.de/v/SHA2017-102-tor_de-anonymization_techniques.
Edit 2: here's the article I had in mind regarding fingerprinting in Tor browser and why it's inadvisable to tamper with the Tor browser window: https://blog.torproject.org/browser-fingerprinting-introduction-and-challenges-ahead

Your "it's safe in LAN" argument is invalid anyway. Is a public WiFi a LAN? Unless you put in the same effort like e.g., Android (which can, to some extent, differentiate between private and public networks), you can't easily differentiate between such networks.

The point is, all your models rely on users sharing the AppImages they have on their computer, all the time. Otherwise, you don't have enough peers to download from. You cannot do this selectively really well, or you end up with a highly complex HTTP(S) model, as you fall back to the web seed anyway. The chance for this updating scheme being really all that useful is quite low.

Oh, and please stop narrowing the discussion to updates only. This issue is titled "Investigate peer-to-peer AppImage distribution". Integration in AppImageUpdate is just a secondary goal. All models proposed originally wanted to share all AppImages through e.g., appimaged, all the time.

Peer-to-peer over VPN adds a dependency on one central service (VPN provider), who might even disallow it. And using Tor is abusive.

And having to use a third-party service to somewhat fix/secure your model shows to me, at least, that it's not mature enough.

@antony-jr
Copy link

Okay let's agree that P2P is still not mature and it compromises privacy and security to some extent. I've come to the conclusion that for now, We need to investigate this more.

@TheAssassin But one thing that we gain from using P2P is reducing the load on the central server(balancing it when traffic is high). Think about Github Rate Limits on range requests that you solved with reducing the no. of requests.

@TheAssassin
Copy link
Member

TheAssassin commented Oct 26, 2020

Indeed, I never declined the advantages. And I'm not saying P2P is insecure by definition. But one has to raise awareness that it indeed has serious privacy implications, which average users won't necessarily see. It's always better to reduce the amount of information you share to a minimum. Just look at the manic ads industry with their fingerprinting and the crazy amount of information they use to fingerprint browsers. It's insane.

Perhaps we can implement such a feature, but just dumping everything into a P2P service isn't going to lead to success. For selected AppImages it might make sense (although I think that just self-hosting is still easier and better).

Nevertheless, I can really recommend the SHA2017 talk I linked above. It's really insightful.

Edit: Oh, I totally forgot this link: https://blog.torproject.org/bittorrent-over-tor-isnt-good-idea

@antony-jr
Copy link

Nevertheless, I can really recommend the SHA2017 talk I linked above. It's really insightful.

Edit: Oh, I totally forgot this link: https://blog.torproject.org/bittorrent-over-tor-isnt-good-idea

I will definitely take look at those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

11 participants