Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Suggest tags when editing a document #264

Closed
jovandeginste opened this issue Jan 3, 2021 · 14 comments
Closed

Suggest tags when editing a document #264

jovandeginste opened this issue Jan 3, 2021 · 14 comments
Labels
feature request New feature or request fixed in next release This is stuff that I've already addressed on the development branch.
Milestone

Comments

@jovandeginste
Copy link
Contributor

jovandeginste commented Jan 3, 2021

Would it be possible to suggest appropriate tags when editing a document?

I know I can add and create new tags, but those are all the tags known. It might be nice to have some kind of ranking for "appropriate tags". I'm not sure how the document classification system now works under the hood, but since there are "auto" tags available, I think this should be possible?

Something like "Tags used for similar documents: <tag1> <tag2>"
(Click tags to add them to this document)

@Philmo67
Copy link

Philmo67 commented Jan 4, 2021

Suggesting tags that have not been added by the user implies using an external source or some kind of pre-existing lexicon (to be declined for each document language), I'm not sure that the components used here contain such things (@jonaswinkler will be able to tell us).

This subject inspires me another suggestion: sometimes we forget to tag documents with a tag that might have been appropriate.
Would it be possible, from the tag list screen for example, to suggest a list of additional documents, for which the tag would be appropriate? (and to provide a way of bulk assigning the tag, of course :-) )

@jovandeginste
Copy link
Contributor Author

I'm no talking about unknown tags; I'm talking about tags assigned to other documents (but not the one currently being edited).

In fact, your second paragraph comes close to what I mean, but from the tag-perspective.

@jonaswinkler
Copy link
Owner

First of all, "Auto" tags work roughly like this:

  • For each tag set to "Auto", paperless will extract some key terms from documents to which this tag has been assigned. It will also extract key terms from documents to which this tag has not been assigned. Based on that, Paperless assigns "Auto" tags to new documents based on which key terms are present (or not present) in that new document.
  • Paperless collects this data only for "Auto" tags, and no other tags.
  • The other matching algorithms are similar in that they assign tags / types / correspondents based on certain criteria (manually specified).

Providing tag suggestions is certainly possible, however:

  • Paperless has already assigned all matching tags to new documents. Therefore, the list of suggested tags on new documents would be exactly the list of tags that are already assigned to the document.
  • This could be useful when editing older documents, and when the matching rules changed after adding them.

This subject inspires me another suggestion: sometimes we forget to tag documents with a tag that might have been appropriate.
Would it be possible, from the tag list screen for example, to suggest a list of additional documents, for which the tag would be appropriate? (and to provide a way of bulk assigning the tag, of course :-) )

This is only possible for tags / correspondents / types for which a matching algorithm has been set. This is also a rather expensive operation, since I have to run the matching algorithms on all documents to discover document suggestions.

I like the idea, but I'll have to think about how to properly integrate that.

@jovandeginste
Copy link
Contributor Author

This is exactly the issue I'm trying to solve: I imported all my documents into paperless, and am now trying to make sense of them by tagging some. It would help a lot if I could tag a number of documents, then have the system suggest other documents that might be similar. Indeed, those documents were already consumed, so they won't be auto-tagged.

Do I understand that any document not tagged with a certain auto-tag, is actually anti-tagged? Maybe a middle ground might be a solution: yes, no, undecided.

@jonaswinkler
Copy link
Owner

Do I understand that any document not tagged with a certain auto-tag, is actually anti-tagged? Maybe a middle ground might be a solution: yes, no, undecided.

As far as the "Auto" matching algorithm is concerned: Anything in your inbox (marked with inbox tags) is regarded as undecided. Apart from that, you understand correctly that anything not tagged with a particular "Auto" matching tag is "anti-tagged". I think that should be enough and works in most use cases without requiring the user to know all the details about how the system works internally. For that reason, I don't want to add a third state to tags.

This is exactly the issue I'm trying to solve: I imported all my documents into paperless, and am now trying to make sense of them by tagging some. It would help a lot if I could tag a number of documents, then have the system suggest other documents that might be similar. Indeed, those documents were already consumed, so they won't be auto-tagged.

Hm. You could try using the retagger. Define some tags with manual matching algorithms and see what the retagger comes up with. If you don't like the outcome, bulk-remove the data and try again, or use -f with the retagger.

I'll think about some way for tag suggestions, but this will def. take a while.

@jonaswinkler jonaswinkler added the feature request New feature or request label Jan 7, 2021
@jovandeginste
Copy link
Contributor Author

Thank you. I will see from my side what is possible. An external classification system through the api (if that exists) might be a nice workaround.

@Philmo67
Copy link

In order not to consume resources unnecessarily, I wouldn't be bothered if the action to suggest missing tags (or missing documents) was done on demand, with a progress bar to keep me waiting.
I suppose users would be aware that this is a complex operation.

@jonaswinkler
Copy link
Owner

In terms of required cpu time, this is actually a very simple operation.

@jonaswinkler jonaswinkler added this to the 1.1 milestone Jan 23, 2021
@jonaswinkler jonaswinkler added the fixed in next release This is stuff that I've already addressed on the development branch. label Jan 29, 2021
@jonaswinkler
Copy link
Owner

jonaswinkler commented Jan 29, 2021

@jovandeginste How does this look?

image

Clicking would add the tag / type / correspondent to the form and hide the suggestions.

jonaswinkler pushed a commit that referenced this issue Jan 29, 2021
@shamoon
Copy link
Contributor

shamoon commented Jan 29, 2021

This is awesome!

@jovandeginste
Copy link
Contributor Author

That would be great, if it actually works ? 😄 Can I test it? (AKA do you have a docker image ready?)

@jonaswinkler
Copy link
Owner

jonaswinkler commented Jan 29, 2021

This will simply run the matching algorithms again on the current document and suggest any missing tag or alternative correspondent / type.

@jovandeginste The 'dev' version on the Hub has this feature. (I'm really happy about the CI/CD pipeline)

@jovandeginste
Copy link
Contributor Author

I rebuilt my container from the latest dev image, and see suggestions on occasion. Also on correspondent. Nice! I'll try a bit more, but I'm happy so far!

@shamoon
Copy link
Contributor

shamoon commented Feb 7, 2021

This is great, bravo!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature request New feature or request fixed in next release This is stuff that I've already addressed on the development branch.
Projects
None yet
Development

No branches or pull requests

4 participants