Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure deleted files aren't added to messages #20

Merged
merged 2 commits into from
Feb 6, 2023

Conversation

waocats
Copy link
Collaborator

@waocats waocats commented Oct 23, 2022

Although this seems(?) to be undocumented behavior on Slack's part, I noticed in our own backups that files may not necessarily have a url_private attribute. These are denoted with "mode": "tombstone", which (I assume) means that they were deleted. I found an example here of someone else encountering the same thing. Mine looked something like this:

"type": "message",
"text": "...",
"files": [
    {
        "id": "F023WGH1WF7",
        "mode": "tombstone"
    }
],
"upload": false,
"user": "...",
"display_as_bot": false,
"ts": "1622764865.023000"

Without this change, SlackDownloader.download would fail on those files, since they doesn't have a URL associated with them. However, I added a line in SlackParser.parse_message that checks for this, and doesn't call ParsedMessage.add_file if this is the case.

Hope this helps, and thanks for all of your great work!

@richfromm
Copy link
Owner

Sorry I didn't see this earlier. I didn't have notifications set up properly for this repo, hopefully that's fixed now.

Yeah, Slack's documentation isn't really complete, most of this I just figured out via example. And my testcases are limited, my motivation for doing this all in the first place is a small personal slack.

@richfromm
Copy link
Owner

I assume that when url_private is missing, that url_private_download is also missing?

@@ -477,7 +477,8 @@ def parse_message(self, message, filename, channel_msgs_dict):

if 'files' in message:
for file in message['files']:
parsed_message.add_file(file)
if file['mode'] != 'tombstone':
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should also explicitly check that url_private is present.

And if so, whether that's best done here, or in ParsedMessage.add_file()

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the question of what's the "right" thing to do if the file is not present.

We could (like you have here) just skip past it. But might it be better to instead attach some file that says that the file is not actually present? I'm not sure. That could be a little awkward.

Or mark that situation in the destination discord in some other way? We could for example add some text to the message (perhaps in italics), notifying about the missing attached file. There's a somewhat related precedent for putting out of band meta-messages in a discord message, like where I create a fake root of a thread:

fake_message_text = SlackParser.format_message(

TBH, I'm undecided. I'm also trying to decide on the "right" way to respond to the URL being present in the export, but then the file not being found. See #21

Either way, this situation is probably something worthy of noting for the user with logger.warning(), rather than skipping over it completely silently.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: adding something to notify in Discord that there's a missing file

See my suggested at #21 (comment) to handle the case where a file is not found when we do have a filename and URL in the export, but get a 404 trying to fetch it.

We could do something similar in this case, but it has somewhat less value, since we don't even know what the filename is. The only thing we have besides mode of tombstone is the id. I suppose we could attach a text file that said something like WARNING: File with id F023WGH1WF7 deleted from Slack, but does that have sufficient value to the end user in Discord vs. doing nothing? Maybe a little bit, b/c it does indicate that something is missing, and that the state in Discord is somewhat incomplete. But if so, is it any more useful than something generic like WARNING: File attachment deleted from Slack?

Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure whether the lack of a url_private attribute can occur without tombstone as the file mode (though I will assume it can for the rest of this post). However, tombstone is a guarantee that the file has been deleted from Slack, while I would assume some error on Slack's end for the former. I know Discord doesn't have deletion notifications, and I don't think Slack does either--so I believe an informational log message would suffice.

As far as whether this belongs in ParsedMessage.add_file() or not, I also feel a bit conflicted. I suppose it comes down to what a "file" represents--and here it seems there isn't really much choice other than using Slack's nomenclature (where deleted files are still files), as the for loop directly above implies. However, I think that the name add_file suggests that it will end up adding something to the ParsedMessage object, and by ignoring files, the behavior of the method wouldn't really be well-defined. I think the best way to get around this is to raise an exception when files can't be added, and handle these in a try block in place of where I made my initial modification.

Ultimately, I suppose it wouldn't hurt to define a couple of conditions here--if a file has a mode of tombstone (defined behavior on Slack's part), and then if there isn't a url_private attribute and the file's mode is not tombstone, or you get a 404 (undefined behavior on Slack's part). For undefined behavior(s), you have a warning message in the log (and Discord) that a file wasn't found, and you could continue based on the command-line options you discussed in #21. Otherwise, you could just continue as normal with a call to logger.info(). What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in responding, I'm currently job hunting (feel free to pass me any leads :), and that kind of needs to take precedence.

I did a few experiments about deleted files, removing them from both Slack and Discord to see what it looked like in both the UI and the Slack export.

Slack's behavior is somewhat weirdly inconsistent:

Screen Shot 2022-11-10 at 10 38 50 AM

If I attach multiple files to a single message, when I delete one, it's just gone from the UI.

But if I attach just one file at a time, a deleted file shows up with a "This file was deleted." placeholder.

Not directly related, but I also note that the former case (multiple files in a message) does not give inline previews for text files, but the latter case (one file per message) does inline preview text files. Which makes me somewhat wonder if the complete disappearance of the deleted file in the former case, with no mention in the UI, might be a bug.

Regardless, in either scenario in Discord, there is nothing in the UI that explicitly shows a deleted file, other than noting that the post has been "(edited)":

Screen Shot 2022-11-10 at 10 39 18 AM

One other item worth noting, in both deletion cases in Slack, the export not only marked that the file was deleted, but included a timestamp of when (but not the filename):

        "files": [
            {
                "id": "F04AMA34TPT",
                "mode": "tombstone",
                "date_deleted": 1668024977
            }
        ],

Regarding where this belongs in the code, I think it comes down to what we want the end user visible state to be. If we wanted to attach some placeholder file indicating that there was a file but it was deleted, then I think we consider this a special case of a file, and it's treated as a file. But if we wanted to treat this as just something that's not really in Slack and therefore does not need any kind of reference in Discord, then I think it's fine to pass over it when parsing, (that is, in SlackParser.parse_message(), and never call ParsedMessage.add_file().

I somewhat keep changing my mind, but right now I'm leaning towards treating the tombstone case (this PR) and the 404 case (the other PR) differently.

In this case, the export is very clear. There was a file, but it's now been deleted. This is not an error condition. It's perhaps worthy of informing the user (maybe not even rising to the level of logger.warning(), and logger.info() is sufficient; not sure), but doesn't need to be placed into Discord in any way, and there's no reason to prompt the user or fail or anything like that. Fwiw, one of the things that prodded me in this direction was that the export doesn't even include the filename. Just saying that some unknown file was deleted from Slack adds so little info to Discord, it feels like clutter. Maybe I'd feel differently if we could say file "foo.txt" was deleted.

But the 404 case is different. Here, there is some kind of error, and I think giving the user the option to do something about it or at least look into it and/or see if it's consistent feels reasonable. Maybe 3 separate choices (fail/ask/ok) is overkill, but it's not that hard to implement. Which then still leaves open the question of do we attach a placeholder text file in Discord noting that the file wasn't found. Note that in this case we can include the name of the file. So maybe. I'm still undecided.

But if the tombstone and 404 cases are different, then I'd probably keep the PR's separate and deal with the tombstone first, and can defer a final decision on the 404 case until after this merges. (If they were going to be treated the same, I was considering opening a new PR to deal with both of them together, which would ultimately replace both of these PR's.)

Feedback welcome.

On a logistical note, do you know if I have the priv's to push to your fork? Or would you need to explicitly list me as a Contributor for that to work?

Copy link
Collaborator Author

@waocats waocats Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I wrote a reply a while back, but I must have forgotten to send it. At the time, I should have given you Contributor rights to my fork, so you're welcome to work on it there if you end up preserving this PR. Please let me know if it doesn't work :)

Since Discord doesn't show when anything is deleted (files included) anyway, I think it would make sense to pass over it, and give no mention--in Discord at least--that there's a deleted file. Having no information on what the file is definitely strengthens that case, and I think a call to logger.info() would be just fine. I suppose the user could specify if they want this or not, in order to give them peace of mind that the export went through (without having to match up instances of discontinuity with the logs)? Personally, that's about the only case I can think of, and I doubt it's worth implementing just for that--though I'm not sure if there are any other reasons you had in mind.

BTW, good luck on the job search!

@richfromm
Copy link
Owner

I found a reference to tombstone messages in Slack's documentation wrt deleting remote files:
https://api.slack.com/messaging/files/remote#removing

Which I suspect is not directly what's going on here (I think this is the case of a file directly uploaded to slack, but those also have a "Delete file" option), but it's probably related enough that this seems likely how slack handles an uploaded file that has been deleted.

@waocats
Copy link
Collaborator Author

waocats commented Nov 7, 2022

I found a reference to tombstone messages in Slack's documentation wrt deleting remote files: https://api.slack.com/messaging/files/remote#removing

Which I suspect is not directly what's going on here (I think this is the case of a file directly uploaded to slack, but those also have a "Delete file" option), but it's probably related enough that this seems likely how slack handles an uploaded file that has been deleted.

I saw something else on this, actually:
https://api.slack.com/changelog/2019-03-wild-west-for-files-no-more

It seems "tombstoned" messages may not have a mode of tombstone, after all. While that makes things more confusing, I still think the mode is reserved just for messages that were deleted by a user, but their use of the term in the docs is inconsistent and should probably be taken with a grain of salt. I'm wondering if what they described in the link you sent results in messages with a mode of tombstone when its archived (though once this PR is merged we should have our bases covered, here).

* Log a warning if deleted file is encountered. But still just pass
  over it, since we don't even know the name, it's not worth noting
  this state in Discord.
* If the date deleted is included in the Slack export (empirically
  this seems to be the case, but none of this is documented), include
  that in the warning log.
* Change the polarity of the check. Since the various file modes
  aren't documented, just skip over this on "tombstone", assume
  anything else is a normal attachment. Empirically, I've seen both
  "hosted" and "snippet".
@richfromm
Copy link
Owner

OMG, So sorry for the extreme delay in returning to this. I made some minor changes, have a look if you're still at all paying any attention to this repo.

@waocats
Copy link
Collaborator Author

waocats commented Feb 6, 2023

LGTM!

@richfromm
Copy link
Owner

See #27 , which deals with related issues, and is now a replacement for #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants