-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opting Out #47
Comments
The archive will only access rooms where the history is But it is possible that we add an additional signal/control to determine whether search engines should index it. I imagine that you would still be able to access the room via Matrix public archive but we would tell Google not to index the room if that signal is present. |
Will removing world readability effect the archive or will the past history still be visible using it? Is there any mechanism in Matrix for redacting the past public history that any room administrator can use without resorting to running bots/servers or unspecced hacks ( |
Another concern is do tombstoned rooms automatically get excluded by both this and matrix-static? They may be inaccessible if they have weird join rules and no one to invite. |
@Mikaela I'm not sure exactly how the
Best to create an issue and ask elsewhere about this.
Any world readable rooms, including tombstoned will be accessible. I don't have any details about the other project, matrix-static, for how it works around this. |
Related to matrix-org/synapse#14127 |
I have a few rooms that are set to What if the archive backend was to query all the rooms to check if its bot has access? If it is banned from a room then it would remove that room from the list on the website. I imagine this check would be in addition to whatever setting is used to opt-out. |
@Cyberes The app is stateless so we can't just hold onto that kind of access/ban information across requests. It needs to be something we can query from the API and for the room directory, we wouldn't want to make a separate look-up for every single room we're trying to display in the grid so it needs to come from the room directory, A new state event like If you can share the details, what's the distinction that your rooms should be |
The rooms that fit the |
Really not sure about how fit for purpose using Changing the setting to At least for now, banning the bot seems like the only solution, but the resulting 403 is really doesn't look good for the archive. |
This comment was marked as outdated.
This comment was marked as outdated.
After the bot joined a room with history visibility set to
|
"archived" is a bit of a overloaded term here but given that this project is called "Matrix Public Archive" I can see where the confusion may be be coming from. Any public room with The Matrix Public Archive doesn't hold onto any data (it's stateless) and requests the messages from the homeserver every time (it archives nothing). The archive.matrix.org instance has some caching in place, 5 I've tried to clarify more of this in the FAQ document and added more details on why not guest access/peeking. Banning |
…FAQ (#241) Add context and demystify public/world_readable/guest/peeking in the FAQ Spawning from: - #47 (comment) - https://matrix.to/#/!SzoPnANsRYxITaDPaJ:matrix.org/$Zwr_GzklOjhRoAY-D9Ekh0qYhZYUWI_d5HUcJ1180zM?via=matrix.org&via=evulid.cc&via=t2l.io - https://matrix.to/#/!QQpfJfZvqxbCfeDgCj:matrix.org/$ZKgZ6oPhW39gByfORAxp-zfL_g6lISL73Ms_6D16SPQ?via=matrix.org&via=element.io&via=envs.net - https://matrix.to/#/!SzoPnANsRYxITaDPaJ:matrix.org/$eC8O8zFwsvEkoy2kdwrNowKhvCg_kCU7zZhjSlSEGto?via=matrix.org&via=evulid.cc&via=t2l.io
This comment was marked as off-topic.
This comment was marked as off-topic.
Link FAQ about indexing in the right-panel footer so people can more easily understand what goes into the result and find issues to track about opting out. - #47 - https://github.com/matrix-org/matrix-public-archive/blob/5caf9dc1b8d5e98fdc99ab06ab9b9d09d2f546ed/docs/faq.md#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines
Why was my comment marked off topic? But also, it was not obvious to many of us that this project aims for 5 minutes/2 days ephemeral caching. According to the documentation and the name, most users seem to associate it with some kind of crawling mass-joining bot with unlimited storage and resources that slows the federation to a crawl and eternally captures all of their content to use against them or train a mastermind that will take over the world. Many moderators are also suffering from PTSD due to encountering unidentified bots almost every week mass-joining basically every room they can and about half of them start to flood or spam thousands of rooms at a time after a few days of delay. This bot and the system itself falls under a different category as long as it is true that it is an interactive agent acting on behalf of a given human visitor who is browsing the calendar or clicking on a permalink they got from another platform. This would be seen highly beneficial and a quality of life improvement by most. That would be much more palatable to all stakeholders. If this was true, the spirit of the robots.txt exclusion standard similarly wouldn't apply. I recommend renaming the project to something that makes this more obvious, such as I seems logical and uncontested that we should allow crawling & indexing of a message by search engines as long as it is linked this way from a personal web page. However, enabling spidering (following intra-site links with indefinite scrolling) again falls under a different philosophical debate as can be seen from the relevant MSCs. |
Just an additional data point but is it considered that it may be a bad idea to allow anyone to activate this on any room? Shouldn't it be a thing only room admins can do? Basically because a) a room admin may not be aware of this b) a room admin may not want this but also doesn't want to clutter state with yet another ban. Another thing is the right to be forgotten which exists in the eu. A user has the right to be forgotten as per gdpr. It's not clear how an individual can opt out even if the room itself is deciding to allow the bot. Mass redactions in the past have been the equivalent of a denial of service attack. So I don't think they are a sensible way to do this. |
@MTRNord Could you please read the past messages? Or at least mine #47 (comment) |
I want to never, ever, ever feel the need to write an entire essay to this organization for something that should be obvious to an organization developing infrastructures for communities and open-source projects. No matter if you think that you can implement it in a way that is better / more ethical, the existence of a service joining channels out of nowhere (while letting everyone else try to put the pieces together as to why it exists, how it joined, did somebody invite it?, etc.) wastes time of volunteers using Matrix to figure out what changed in their channel and what implications it may have. There's a clear power imbalance, you have the matrix.org infrastructure that cannot have any accountability whatsoever and are also being paid to clean the entire thing with the "move fast, break things" approach. On the other hand, you have a lot of volunteer-run communities that depend on you and have to figure out what's going on with their channel with zero communication whatsoever. I spent at least 2 hours of limited, volunteering time examining what was happening, and there were more people involved (EDIT: Just to be clear, this is a strictly personal opinion and I do not speak for them.). Discord uses announcements to explain changes in the way people run their communities. You can announce stuff server-wide in IRC. The UX problems that exist in Matrix are not the fault of the communities that use your infrastructure and the "move fast, break things" approach to this is in every way super annoying, especially with the Trust & Safety implications that were dismissed with a series of whataboutisms. |
Do I understand correctly that the whole XMPP protocol is also opted-in to the bot with no method of opting out? |
I had just another thought about this: Would it help any admins or in general if the bot instead of silently joining would announce what is happening and link to its privacy policy as well as the FAQ? Currently, it joins silently, which feels like intentionally wanting to stay under the radar, even if that's not intentional. I think at least in some cases having the bot explain itself might help with acceptance and also allows room admins to more easily and quickly opt out if they wish to do so. |
Hi I'm a Libera channel op and my channels' policy is to kickban anyone with
I'd also point to Libera's own public logging policy: https://libera.chat/policies/#public-logging
|
And what do you want in a matrix related Issue? |
Whether they hate it or not is a thing, and I won't comment on this. But it's completely legitimate even for non-users to comment on such matters considering due to the various bridging here and there even non-users are affected by things such as public logs available to everyone, or public indexing/crawl/scrapping resulting in logs that can be searched for in search engines. Ie. a pure IRC user with many matrix user that joined their channel (and you can translate that to XMPP or pretty much any other protocol Matrix is bridging to) is completely affected by these choices, pretty much against their will by design (not even talking about all the users who're not even aware about this). |
This effects me because I participate in IRC networks that matrix bridges to. I never opted in to being archived publicly by Matrix, nor do I want to be. |
@akierig It looks like the The archive bot will still join the room because it doesn't know the history visibility before it joins but it won't show any content from the room in that case (only It seems useful if we had an endpoint that would return the history visibility information without joining ( |
I would like to request increasing the priority for resolving this issue as there are currently at least two instances of bot requiring manual banning:
|
@Mikaela oops, I didn't see a question in CME. I do have this set up, but my instance can only join two specific rooms (which I admin), so it is not contributing to this problem. |
For what it's worth, we've had to ban the archive bot on the bridge because it joined too many rooms. The IRC bridge (at least, the libera.chat bridge) requires that all Matrix-side users are joined to the IRC channels they are bridged to. However, most IRC networks will limit how many channels one user may join at any time. The bot exceeded this value and the result was the bridge became very unstable. We're hoping there might be a solution to this problem at some point because I believe the archive provides value to some users of the bridge, but ultimately in it's present incarnation it's not suitable. |
As there is even more development with opting more rooms into archiving, I would like to ask whether there is development also with opting out? I also wonder whether this issue could be pinned for its significance? This tracker doesn't currently utilise that feature and three can be pinned at once on GitHub as far as I am aware of. Additionally I have been wondering whether declaring Free Tibet and Slava Ukraini in public rooms gets them opted out from Baidu and Yandex? |
I just learned that https://staging.archive.matrix.org/ is a thing and using a separate account You could be even better and be opt-in for room administrators which would be in spirit of privacy friendliness without even mentioning modern privacy legislations or directives. |
I apologise for multiple comments in a row and heated emotions, but if staging.archive.matrix.org is meant for internal testing/staging, why is it publicly accessible? Surely Matrix Foundation would have the resources to at least put HTTP Basic Auth in front of it? I also question it having a different account. If the bot is truly stateless, why cannot it share the account so one ban would affect both instances? |
Doing a ban evasion with a second “testing” bot that is publishing data publicly isn’t the smartest thing to do. |
I've also updated it to use the The end-goal is to have |
Currently Element gives room admins the option to make a room visible by "anyone" or "members only." I don't see anything wrong with making a public archive of rooms that are readable by anyone already. But the archive bot is joining rooms where the admins have already set the room to not be publicly visible, and it's archiving their history. My room is set to be visible only to members, but the archive bot joined today and made the history visible on the public archive. It's gone now, since I banned the archive bot, but I shouldn't have had to do that when I already "opted out" by setting my room to be not publicly visible. View.matrix.org does this right. |
Even if a room is publicly readable, that doesn't mean that its admins consent to conversations in that room being systematically scraped. Yes, anyone could come by and archive the room against the will of its participants. But this requires a targeted or coordinated effort. I am very pro the idea of archiving Matrix rooms by the way. So much information gets lost on the internet every day because it got shared once in a Discord or Matrix or whatever room and will disappear with time. But this should be an opt-in process. Make a UI element for it. Show people why they should have their rooms archived, make people care about archiving. Don't force this on people. Making privacy decisions for the user, not trusting them to understand what is best for them, is one of the ways that the tech industry has been eroding the trust of users over the last 20 years. Please don't follow Silicon Valley's footsteps. It won't lead you anywhere nice. |
Funnily enough it doesn't actually archive, it just makes search engines able to scrape the content, once the bot is banned, any content stored there is gone. |
I never used any Matrix products but I might be affected since I use XMPP and IRC. Can I request a listing of my personal data which is stored by Matrix under the GDPR? |
This comment was marked as off-topic.
This comment was marked as off-topic.
The GDPR thing is really important. Especially inside the EU people can really harm admins if a room gets archived against their will and they take legal action on this. A court would probably not have the knowledge to understand what this is and would blame the room admin or the server admin the room was created on for the fact that everyone's message gets archived (yes it's not really archived as in stored but archived as in search engines can scrape it). To be compatible with the GDPR it probably would have to be opt-in and opt-in on a user per user basis instead of a room per room basis. I as a room admin can never decide if all my users would be okay to have their (public) messages archived and scanned by search engines so I can't willingly take that decision. It would probably have to be so that I as a Matrix user can say "yes, archive all my messages I write in all public rooms per default, but don't archive my messages in room X Y and Z". This would only cause some part or contents of a room's conversation to be archived but it would be GDPR compatible because it'd be opt-in per user-basis. Don't get me wrong I love the idea of public rooms being indexed by search engines so that knowledge can be shared in a better way and also archived for the future, BUT it's not good if some random user takes legal action against an administrator because of that at some point and this administrator then lands in a court without knowledge or technical understanding about this topic so the administrator get's eventually punished without any real reason. -> Very hypothetical example but it's still possible IMO. |
Additional complication with GDPR is that nothing prevents archiving the archive. From Forĝejo discussion on the bots: There are also other archives like archive.ph, view there. Edit; I also forgot that IPFS Companion provides decentralised archive/snapshots too, so whack-a-mole with arvhives may not work any better on archive.matrix.org side than users trying to opt-out of these archive bots. @wojtekLs,
I try to keep the issue cleanish by selfmarking myself as offtopic when my comment is not strictly relevant to the issue at hand. |
Has there been any thought given to how a room admins or homeserver admins could opt out their room or server?
My thought would be some sort of state event in a room and the already standard
X-Robots-Tag
for servers.I know this project is in very early stages but given the distributed nature of the matrix network I believe this to be a very important thing, especially coming from the matrix core team (Or at least that's who it looks like it's being driven by).
Related MSC's:
The text was updated successfully, but these errors were encountered: