You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mailbagit should preserve the original arrangement of email exports, particularly the accounts folder structure, and use that arrangement when writing derivatives. mailbagit currently supports this by parsing the internal folder structure of PST files and reading the X-Folder header when present for MBOX and EML sources. This information is stored in the model using the Message-Path field (which is then converted to Derivatives-Path, as documented in the Input-Output Examples doc.
While X-Folder is often used to preserve email folder information, Gmail instead seems to us a X-Gmail-Labels header, which mailbagit does not currently parse. Unlike X-Folder (which could be Inbox/Listservs, X-Gmail-Labels appears to be a comma separated list that includes both the folder of the email as well as labels like "Important" or "Unread," which we need to exclude Common examples are:
X-Gmail-Labels: Inbox,Important,Opened,Category Personal
X-Gmail-Labels: Archived,Sent,Opened
X-Gmail-Labels: Sent,Notes
X-Gmail-Labels: Sent
The Spam label appears to be a folder, not a label:
X-Gmail-Labels: Unread,Spam
While the Category labels are arguably arrangement, I don't think we can treat them as such reasonably. Thus, I expect to ignore labels starting with "Category."
It looks like the X-GM-LABELS header is also used for this, so we should try to read both.
It does not appear that the folder is consistently the first or last item in the list or anything reliable like that to help with parsing.
I can't find clear documentation on these headers to be confident that this is consistent practice. Thus, I'm thinking we need to make this user-overridable, probably along the lines of the plugins. Perhaps a user could put a list of labels to exclude in with the plugin directory. This way, if there's another label we're not excluding a user could put that in a Gmail-Labels.txt file and mailbagit would use those labels instead of the default excluded labels. Just a line separated list could be fine across platforms just by relying in Python's .readlines() or similar. That way the line breaks should be whatever is native to the OS.
Relevant part of mailbag spec?
N/A
Type of component
Core
Input
Attachments
Derivatives conversion
Reporting/Exporting
GUI
Distribution
Expected contribution
Pull Request
Comment with proposed solution
Major challenges or things to keep in mind
Inconsistent arrangement structure is the worst.
The text was updated successfully, but these errors were encountered:
The problem the component solves
mailbagit
should preserve the original arrangement of email exports, particularly the accounts folder structure, and use that arrangement when writing derivatives.mailbagit
currently supports this by parsing the internal folder structure of PST files and reading theX-Folder
header when present for MBOX and EML sources. This information is stored in the model using theMessage-Path
field (which is then converted toDerivatives-Path
, as documented in the Input-Output Examples doc.While
X-Folder
is often used to preserve email folder information, Gmail instead seems to us aX-Gmail-Labels
header, whichmailbagit
does not currently parse. UnlikeX-Folder
(which could beInbox/Listservs
,X-Gmail-Labels
appears to be a comma separated list that includes both the folder of the email as well as labels like "Important" or "Unread," which we need to exclude Common examples are:X-Gmail-Labels: Unread,Inbox
X-Gmail-Labels: Unread,Important,Inbox
X-Gmail-Labels: Important,Inbox
X-Gmail-Labels: Inbox
X-Gmail-Labels: Inbox,Category Promotions,Unread
X-Gmail-Labels: Inbox,Important,Category Updates,Unread
X-Gmail-Labels: Spam,Category Personal,Unread
X-Gmail-Labels: Inbox,Opened,Category Updates
X-Gmail-Labels: Inbox,Category Social,Unread
X-Gmail-Labels: Inbox,Category Forums,Unread
X-Gmail-Labels: Inbox,Important,Opened,Category Personal
X-Gmail-Labels: Archived,Sent,Opened
X-Gmail-Labels: Sent,Notes
X-Gmail-Labels: Sent
The
Spam
label appears to be a folder, not a label:X-Gmail-Labels: Unread,Spam
While the
Category
labels are arguably arrangement, I don't think we can treat them as such reasonably. Thus, I expect to ignore labels starting with "Category."It looks like the
X-GM-LABELS
header is also used for this, so we should try to read both.It does not appear that the folder is consistently the first or last item in the list or anything reliable like that to help with parsing.
I can't find clear documentation on these headers to be confident that this is consistent practice. Thus, I'm thinking we need to make this user-overridable, probably along the lines of the plugins. Perhaps a user could put a list of labels to exclude in with the plugin directory. This way, if there's another label we're not excluding a user could put that in a Gmail-Labels.txt file and
mailbagit
would use those labels instead of the default excluded labels. Just a line separated list could be fine across platforms just by relying in Python's.readlines()
or similar. That way the line breaks should be whatever is native to the OS.Relevant part of mailbag spec?
N/A
Type of component
Expected contribution
Major challenges or things to keep in mind
Inconsistent arrangement structure is the worst.
The text was updated successfully, but these errors were encountered: