Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priority inbox doesn't do what it's supposed to do #3968

Open
2 of 5 tasks
ChristophWurst opened this issue Nov 3, 2020 · 18 comments
Open
2 of 5 tasks

Priority inbox doesn't do what it's supposed to do #3968

ChristophWurst opened this issue Nov 3, 2020 · 18 comments
Assignees
Labels
2. developing bug feature:priority inbox Features and bugs related to the "priority inbox" feature

Comments

@ChristophWurst
Copy link
Member

ChristophWurst commented Nov 3, 2020

Expected behavior

PI should help users organize their email into the important ones and the rest. This algorithm is based on ML, so it's a bit of a black box and performs differently depending on the input. Some people say it does not work but that doesn't give us any input on how to iron out the issues.

For those users the PI does more bad than good.

Actual behavior

PI should deliver acceptable results for almost everyone. It's not supposed to be perfect. But it shouldn't be terrible.

Mail app

v1.4+

TODO

  • Refactor persistence -> replace DB + file system with a versioned memory cache
  • Finalize my classification work based on a TF-IDF transformer and a KNN classifier

Future work

  • Add a feature to the classifier to be able to distinguish between user interaction and automated tagging, e.g. user manually assigned or unassigned importance to an email
  • Investigate if importance flag changes from external clients could be picked up -> check our sync logic
  • Investigate if training could be done "online" -> depends on Rubix

Context: https://nextcloud.com/blog/nextcloud-mail-introduces-machine-learning-for-priority-inbox/ and #3265

@ChristophWurst ChristophWurst added bug 1. to develop feature:priority inbox Features and bugs related to the "priority inbox" feature priority:medium labels Nov 3, 2020
@ChristophWurst ChristophWurst self-assigned this Nov 3, 2020
@ChristophWurst
Copy link
Member Author

The great debugging

1 – Debug training

Okay, let's look at what might be going wrong on the affected instances/accounts. Let's start with the train account from the CLI that prints details about the training process:

$ php -f occ mail:account:train 1393
[debug] found 10 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 737 are important
[debug] data set split into 900 training and 100 validation sets
[debug] classification report: {"recall":0.810126582278481,"precision":0.90140845070422537,"f1Score":0.85333333333333339}
[debug] classifier validated: recall(important)=0.81012658227848, precision(important)=0.90140845070423 f1(important)=0.85333333333333
[debug] classifier 3067 persisted
14MB of memory used

@guzzisti @jasond2020 @dcrobertson01 @Karamelmar @umrath @Ornias1993 @LukaPitamic please help shed some light on this by running occ mail:account:train for your account and post the results here. The output doesn't contain anything sensitive as you can see above.

Let's see if there is a pattern, then I'll suggest the next step.

Thanks everyone ✌️

@ChristophWurst
Copy link
Member Author

@guzzisti @jasond2020 @dcrobertson01 @Karamelmar @umrath @Ornias1993 @LukaPitamic did you have time to try this? Do you need any help?

@jasond2020
Copy link

jasond2020 commented Nov 10, 2020

I get the feeling my instance won't learn ... there are manymany messages but no 'automatic' training - just the few manually flagged "important" and nothing else; here is the CLI output:
[debug] found 6 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 0 are important
[warning] not enough messages to train a classifier
34MB of memory used

maybe something wrong with accessing the data(-base)? But with this output there is no question why the priority box is not working for me... because it does not do anything

@ChristophWurst
Copy link
Member Author

ChristophWurst commented Nov 12, 2020

Thanks a lot for your help, @jasond2020. The output looks fine.

just the few manually flagged "important"

Are you sure you have some? The output ways there are 0 important messages We take the 1000 most recent emails – are your important ones perhaps a bit older?

maybe something wrong with accessing the data(-base)? But with this output there is no question why the priority box is not working for me... because it does not do anything

You're right. On the other hand if the numbers printed are correct, the app does what it's supposed to do. No important message -> nothing to learn. So let's find out if there should be important messages.

@jasond2020
Copy link

Yes, they may be older - older than the last 1000 I don't know, have not counted. I stopped marking them important for tow reasons: too much effort for no noticeable effects (no mails that i would think have the same 'pattern' as the ones i marked manually as important have been marked important automatically); no (otherwise triggered) automated process started to mark (other) mails important.
But, ok, i try to make the effort again and start marking mails as important the next days and post the output here afterwwards.

@ChristophWurst
Copy link
Member Author

ChristophWurst commented Nov 12, 2020

no (otherwise triggered) automated process started to mark (other) mails important.

Good point actually. There should be a fallback logic with a rule-based importance classification for just this case. You don't have to do all the manual work. I'll see if I can find out why it wouldn't do this for your emails …

But, ok, i try to make the effort again and start marking mails as important the next days and post the output here afterwwards.

Hold up until we know why the rules don't apply.

@JensKillermann
Copy link

I use 7 mailboxes. Here my data from occ mail:account:train

#15 - my large imap mailbox

[debug] found 26 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 37 are important
[debug] data set split into 900 training and 100 validation sets
[debug] classification report: {"recall":0,"precision":0,"f1Score":0}
[debug] classifier validated: recall(important)=0, precision(important)=0 f1(important)=0
[debug] classifier 7690 persisted
24MB of memory used

#3

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 63 messages of which 11 are important
[info] not enough messages to train a classifier
20MB of memory used

#2

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 2 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used

#18

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 15 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used

#19

[debug] found 7 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 322 messages of which 36 are important
[debug] data set split into 290 training and 32 validation sets
[debug] classification report: {"recall":1,"precision":1,"f1Score":1}
[debug] classifier validated: recall(important)=1, precision(important)=1 f1(important)=1
[debug] classifier 7691 persisted
22MB of memory used

#20

[debug] found 2 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 298 messages of which 86 are important
[debug] data set split into 269 training and 29 validation sets
[debug] classification report: {"recall":1,"precision":0.9655172413793104,"f1Score":0.9824561403508771}
[debug] classifier validated: recall(important)=1, precision(important)=0.96551724137931 f1(important)=0.98245614035088
[debug] classifier 7692 persisted
24MB of memory used

#7

[debug] found 5 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 0 messages of which 0 are important
[info] not enough messages to train a classifier
20MB of memory used

@ChristophWurst
Copy link
Member Author

Thanks a lot @JensKillermann. For the accounts with not enough messages to train a classifier there isn't much we can do. The ML needs a good amount of data to work reliably, hence there is a threshold of 20 messages minimum that are set as requirement for the ML training. Accounts with less than that will get some generic rules applied to detect some importance.

Account 15 seems to run into overfitting. As in, the classifier learns strong patterns on the input data. This can explain bad performance when that classifier is used to classify new messages.

Account 20 is also close to overfitting but I assume it works a tad more reliable.

The ratio of messages total to important messages might play a role.

@ChristophWurst
Copy link
Member Author

I checked a few other production instances and it turns out that many accounts are in fact overfitting these days. Whoopsie and totally my bad, I should have paid more attention to how this develops. I'll see if I can reproduce it with an account on my development instance because debugging in production is everything but easy.

@MrManor
Copy link

MrManor commented Apr 11, 2021

Hi - ended up here in search for the possibility to disable Priority Inbox. It has never had any content and I just have a spinner just above "Other" going on forever. For other users with less folders and less historical mail, the spinner disappears, but still only mail in the "Other" category. I have attached some examples on the requested debug log. although I dont know how to map the mailbox number to user, based on the number of inboxes I guess mine is one of the first shown.
train.txt

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

@ChristophWurst
Copy link
Member Author

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

You can disable the annoyance with v1.10. Have a good one ✌️

@mat-m
Copy link

mat-m commented Jul 8, 2021

@ChristophWurst : how do I find the account id ? I tried with the account inbox box id, but that didn't work

@mat-m
Copy link

mat-m commented Jul 8, 2021

I went to PI, browsed the Other category, and Mark some mails as important.
I marked some as unread so they could be in Important and unread.
They do appear in I&U in a new private window.
They do not in the current tab, even after a full reload (Ctrl+F5).

If it's another issue, I can open a new one.

@ChristophWurst
Copy link
Member Author

@pbanj
Copy link

pbanj commented Nov 29, 2021

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

You can disable the annoyance with v1.10. Have a good one ✌️

im trying to find the disable option but cant seem to find it. im on 1.10.5

@miaulalala
Copy link
Contributor

Anyway we would like to just be allowed to disable and/or hide this AI annoyance, I guess we are just old school here to.

You can disable the annoyance with v1.10. Have a good one v

im trying to find the disable option but cant seem to find it. im on 1.10.5

Settings in the bottom left > Automatically classify importance of new email

@olivn
Copy link

olivn commented Jun 8, 2022

NC 24.0.1 / Mail 1.13.2

[debug] found 39 incoming mailbox(es)
[debug] found 1 outgoing mailbox(es)
[debug] found 1000 messages of which 9 are important
[info] not enough messages to train a classifier
17MB of memory used

"Automatically classify importance of new email" is unchecked but half on the new mails are still randomly tagged as important !

How to disable automatic classification ?

@jancborchardt jancborchardt moved this to 🧭 Planning evaluation / ideas in 🖍 Design team Apr 10, 2024
@jancborchardt jancborchardt moved this from 🧭 Planning evaluation / ideas to 🏗️ At engineering in 🖍 Design team Apr 29, 2024
@nimishavijay nimishavijay moved this from 🏗️ At engineering to 👓 Design review in 🖍 Design team Aug 21, 2024
@st3iny
Copy link
Member

st3iny commented Oct 15, 2024

We had a brainstorming session and planned some improvements to importance classification. I added the to-dos to the PRs description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. developing bug feature:priority inbox Features and bugs related to the "priority inbox" feature
Projects
Status: 🏗️ In progress
Status: 👓 Design review
Development

No branches or pull requests