-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outlook MailItem Date Objects Wrong #519
Comments
I think this problem is getting to the crux of It relates to the mapping of As pointed out in #236 (comment) , there is an inconsistency problem between
Additionally, the COM library parser that was used to generates Python code treats
Any or all of these factors may be relevant. Currently, I am dedicating resources to refactoring, static typing, and However, please note that issues like this have been deep-rooted challenges. If you have other ideas, such as helper functions, feel free to share them. |
Hmm, well after reviewing your links, conversation, and related code comments I ended up finding my way to some interesting content that may explain part of the mystery I was facing. The brief summary seems to be related to the fact that I was using However, the longer explanation with references and some of my investigative code might explain why I say this only seemingly explains "part" of the issue. First some of the reference material I used (note that anything talking about Excel applies to Outlook, apparently):
So, despite all this research so far I still have some fields that seem to have stranger values yet. The following contains very similar code to what I first posted and then attempts to naively assume that any floating point value is likely a Serial Date and attempts to then parse it with a variety of methods (with some messy code to make a nice printout you can ignore). The import comtypes
import comtypes.client
# ==============================================================================
o = comtypes.client.CreateObject("Outlook.Application")
# ==============================================================================
def get_collection(countable, reverse=False):
if not hasattr(countable, "Count"):
raise AttributeError(f'{countable} does not have attribute "Count"')
_range = range(countable.Count, 0, -1) if reverse else range(1, countable.Count + 1)
return (countable.Item(i) for i in _range)
# ==============================================================================
def get_emails():
if hasattr(get_emails, "cache"):
return get_emails.cache
get_emails.cache = list(get_collection(o.GetNamespace("MAPI").Folders))
return get_emails.cache
# ==============================================================================
def get_folders(email):
if not hasattr(get_folders, "cache"):
get_folders.cache = {}
if (cache := get_folders.cache) and email in cache:
return cache[email]
def walk(parent):
for folder in get_collection(parent.Folders):
yield folder
yield from walk(folder)
cache[email] = list(walk(email))
return cache[email]
# ==============================================================================
def get_mail(folder):
def is_valid(item):
if not isinstance(item, comtypes._compointer_base):
return False
elif item.__com_interface__.__name__ != "_MailItem":
return False
elif not hasattr(item.Sender, "Address"):
return False
elif not item.Subject:
return False
return True
return filter(is_valid, get_collection(folder.Items))
# ==============================================================================
def get_senders():
interesting_attrs = {}
for e in get_emails():
for f in get_folders(e):
for m in get_mail(f):
print(m)
print(type(m))
for attr in dir(m):
try:
v = getattr(m, attr)
if isinstance(v, float):
interesting_attrs[attr] = v
except Exception:
continue
return interesting_attrs
# ==============================================================================
def investigate(checks):
from collections import namedtuple
from datetime import datetime, date, time, timedelta
# --------------------------------------------------------------------------
def try_fromtimestamp(x):
return date.fromtimestamp(x)
# --------------------------------------------------------------------------
def try_fromordinal_date(x):
return date.fromordinal(int(x))
# --------------------------------------------------------------------------
def try_fromordinal_datetime(x):
return datetime.fromordinal(int(x))
# --------------------------------------------------------------------------
def try_utcfromtimestamp(x):
import datetime
return datetime.datetime.fromtimestamp(x, datetime.UTC)
# --------------------------------------------------------------------------
def try_math(x):
# ----------------------------------------------------------------------
def serial_mod(f, m):
d, rem = divmod(f * m, 1)
return int(d), rem
# ----------------------------------------------------------------------
def parse_serial_date(d):
if not hasattr(parse_serial_date, "sentinel"):
parse_serial_date.sentinel = date(year=1899, month=12, day=31)
return parse_serial_date.sentinel + timedelta(days=d - 1 if d > 59 else d)
# ----------------------------------------------------------------------
def parse_serial_time(t):
h, rem = serial_mod(t, 24)
m, rem = serial_mod(rem, 60)
s, rem = serial_mod(rem, 60)
return time(hour=h, minute=m, second=s, microsecond=int(rem * 10 ** 6))
# ----------------------------------------------------------------------
d, t = serial_mod(x, 1)
return datetime.combine(parse_serial_date(d), parse_serial_time(t))
# --------------------------------------------------------------------------
# Ignore the code below here for testing
# Wasn't going to bother installing tabulate in this venv
# --------------------------------------------------------------------------
aw = max(map(len, checks))
ds, fs = zip(*(map(len, str(t).split('.')) for t in checks.values()))
dsw = max(ds)
fsw = max(fs)
sw = dsw + fsw + 1
fnw = max(len(name) for name in locals() if name.startswith("try_"))
dt_fmt = "%Y-%m-%d %H:%M %p"
tw = len(datetime.now().strftime("%Y-%m-%d %H:%M %p"))
Row = namedtuple("Row", ("attr", "serial", "fn", "result"))
fmt = f" {{attr: <{aw}}} | {{serial: >{sw}}} | {{fn: <{fnw}}} | {{result: <{tw}}}"
def check(attr, t, fn):
d, f = str(t).split('.')
try:
result = fn(t).strftime("%Y-%m-%d %H:%M %p")
except Exception as e:
result = f"Error: {e}"
serial_fmt = f"{{days: >{dsw}}}.{{frac: <{fsw}}}"
serial = serial_fmt.format(days=d, frac=f)
print(fmt.format(**Row(attr=attr, serial=serial, fn=fn.__name__, result=result)._asdict()))
print(fmt.format(**Row(attr='', serial='', fn="Correct Timestamp ----->", result="2023-12-31 08:01 AM")._asdict()).replace('|', ' '))
print('-' * 100)
for attr, stamp in sorted(checks.items(), key=lambda d: d[1]):
for fn in (fn for name, fn in locals().items() if name.startswith("try_")):
check(attr, stamp, fn)
print('-' * 100)
# ==============================================================================
def main():
checks = get_senders()
investigate(checks)
# ==============================================================================
if __name__ == "__main__":
main() Output (times rounded to seconds since Outlook doesn't seem to display anything more granular):
You'll notice that the Further, you'll notice So, anyway - just an update on my findings so far. Still not sure if there is anything actually wrong here or if it is more just being ignorant on my part. However, if it is possible to know a date comes from a Microsoft product it might be nice to offer utility methods to convert from the Serial Date format to a normal datetime object or to just parse to a datetime object by default and either assuming 1904 mode is off or somehow querying if that mode is on from the application itself implicitly. Just some thoughts. |
It appears there is yet more information on the confusion regarding the Serial Date According to this resource this is expected for when So, I guess this boils down to a misunderstanding of the data source and the way to convert the data to a proper representation. However, I'll leave this open for if we feel it is worth discussing automatic conversions to proper datetime/NoneType objects for users or not. |
Thank you for your thorough investigation. It seems that possibly to address legacy bugs related to Windows dates, the A similar item, Changing the default behavior during type conversion could have unpredictable impacts, so it's not a decision to be taken lightly. However, creating a utility function to cast from Do you have any thoughts or opinions on this matter? |
When it comes to the That said, I think the "ideal" solution would be to recognize when a call like this is made to the module: o = comtypes.client.CreateObject("Outlook.Application") Presumably the However, if a simple utility method can be added to objects using one of the two introspective methods mentioned above to convert from the serial date to a Regardless, I think for both of these we'd query what environment we're in to determine if we need to use 1900 or 1904 mode so that Mac users get accurate dates too. That said, I'm not too familiar with 1904 mode so I don't know if that should be based on the environment the application is running in or if it should be based on the environment the application was in when the data was saved (ex: made spreadsheet on Windows PC [1900 mode?], but opened it on Mac [1904 mode?] or vice versa). So, that could be another interesting discussion point. |
Thanks for sharing your thoughts.
They are different matters from this discussion. COM objects are unique to the Windows environment. Therefore, Mac users are out of scope for this project. |
I cannot agree with adding this implementation. COM object libraries are assigned GUIDs, so In other words, to map "this COM library's null date is that date", I believe that For your project, I think you can create a wrapper like the one below that encapsulates and hides the tricky date calculation. from dataclasses import dataclass
from datetime import datetime, date, timedelta, time
import comtypes.client
comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',))
from comtypes.gen import Outlook
@dataclass
class MailItemWrapper:
_item: Outlook._MailItem
@property
def recieved_time(self) -> datetime | None:
def try_math(x: float):
# ----------------------------------------------------------------------
def serial_mod(f: float, m: int) -> tuple[int, float]:
d, rem = divmod(f * m, 1)
return int(d), rem
# ----------------------------------------------------------------------
def parse_serial_date(d: int) -> date:
if not hasattr(parse_serial_date, "sentinel"):
parse_serial_date.sentinel = date(year=1899, month=12, day=31)
return parse_serial_date.sentinel + timedelta(days=d - 1 if d > 59 else d)
# ----------------------------------------------------------------------
def parse_serial_time(t: float) -> time:
h, rem = serial_mod(t, 24)
m, rem = serial_mod(rem, 60)
s, rem = serial_mod(rem, 60)
return time(hour=h, minute=m, second=s, microsecond=int(rem * 10 ** 6))
# ----------------------------------------------------------------------
d, t = serial_mod(x, 1)
return datetime.combine(parse_serial_date(d), parse_serial_time(t))
t = try_math(self._item.ReceivedTime)
if t == datetime(4501, 1, 1, 0, 0):
return None
return t
@property
def sender_name(self) -> str:
return self._item.SenderName
# ... and other properties and methods I hope this helps. |
I guess the only reason I brought this up was due to not knowing precisely how the mode is determined. If the mode was determined based on the file's originating OS then it may be relevant to consider even in a Windows centric library. That said, I'm still likely very ignorant on all this so my assumptions may be wrong.
I'm not very familiar with COM things at all so the prevalence of it in the world is not something I can attest to. That said, I don't believe you'd need an extensive mapping framework. I could be completely wrong, but I'd assume the number of applications with oddities such as what I've now encountered with the Microsoft Applications are actually quite minimal. Further, I'd imagine a grouping of mappings could be used to minimize bloat. Perhaps this idea is too naive, but here's a bit of untested code to convey my idea anyway: from dataclasses import dataclass
from datetime import datetime
@dataclass
class SerialNullCorrection:
name: str # or maybe make this a tuple if needed for multiple application suites
guids: tuple[str]
converter: callable
null_date: date
corrections = (
SerialNullCorrection(
name="Microsoft",
guids={"...", },
converter=try_math,
null_date=datetime(4501, 1, 1)
),
)
def get_corrector(guid):
return next((c for c in corrections if guid in c.guids), None)
def correct_serial_date(guid, x):
if not (corrector := get_corrector(guid)):
return x
if not isinstance(x, float):
raise TypeError
x = corrector.converter(x)
return None if x == corrector.null_date else x However, like I said earlier - my knowledge on COM things and this library is minimal so if you feel there's no good way to handle stuff like this in a general or at least collective sense then I completely understand. My only goal was to see if there was a way to add some QoL behaviors or methods to the library natively to help naive devs not need to know abstract/niche details about the environment they'll be working in produced by the dynamic nature of the COM interface with their target application. That said this code you wrote intrigues me: import comtypes.client
comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',))
from comtypes.gen import Outlook I'm not familiar with this approach. Is there a difference in behavior relative to my approach besides import comtypes.client
o = comtypes.client.CreateObject("Outlook.Application") |
Because anyone can create a COM library, theoretically, they exist in infinite numbers. We do not know how many COM libraries will use the null date in the future, and there is a possibility that the codebase will be inexhaustibly bloated, so I cannot agree that I think it is the responsibility of the provider of the application to document when the null date of that application is. While the purpose of |
I will explain the code for the In practice, you would pass an instance of from dataclasses import dataclass
import comtypes.client
comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',))
from comtypes.gen import Outlook
@dataclass
class MailItemWrapper:
_item: Outlook._MailItem
...
def main():
o = comtypes.client.CreateObject(Outlook.Application, interface=Outlook._Application)
ns: Outlook._NameSpace = o.GetNamespace("MAPI")
fld: Outlook.MAPIFolder = ns.GetDefaultFolder(Outlook.olFolderInbox)
for item in fld.Items:
if not isinstance(item, Outlook._MailItem):
raise TypeError
print(MailItemWrapper(item)) # HERE!
o.Quit()
if __name__ == "__main__":
main()
After that, it's just iterating over the emails in the Inbox. The runtime benefit of importing explicitly the module is that we can determine the type with |
Ah, I see. That makes sense. So, it'll have to be on the dev to know the quirks of their COM application interface. A shame, but understandable I suppose.
I do really like the ability to leverage
|
Please refer to the MS documentation for information about COM and GUID. We can pass various objects to GetModule. Instead, I recommend to use GUIDs, but using them as they are will lose readability as you pointed out. OUTLOOK_TYPELIB_GUID = '{00062FFF-0000-0000-C000-000000000046}'
comtypes.client.GetModule((OUTLOOK_TYPELIB_GUID,)) |
Is there an update on this issue? |
Apologies, I haven't been able to find much time for work on the project recently so I don't have much for additional questions at the moment. This can probably be closed out unless we feel there is a way to provide some extension to the |
In this case, it has been determined that a special branch configuration for the specifications of a specific application, which is outside the responsibility of However, when there are inconsistencies in the values handled by the COM library, it is difficult to determine whether they are due to the specifications of the application, or a bug in the application/Python/ Also, from the issues posted recently, I have come to understand that adding more detailed type hints to the method returns would be beneficial. |
Unless there's something I'm missing or I'm just going about this entirely wrong - I'm of the impression that the value returned by all methods that should return a DateTime object (or at least some kind of date/datetime/timestamp according to the docs) are returning untranslatable values (or at least I have no clue how to translate the value to the proper timestamp).
Environment:
Here is my code:
Output:
Expected (Approximately):
Expected Alt (Approximately):
If this is a bug then I hope this helps figure out how to resolve it.
If this is not a bug then please help me understand what I need to do to get the expected value.
The text was updated successfully, but these errors were encountered: