Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify datetime format used in connectors framework #2067

Closed
jedrazb opened this issue Jan 16, 2024 · 1 comment · Fixed by #2586
Closed

Unify datetime format used in connectors framework #2067

jedrazb opened this issue Jan 16, 2024 · 1 comment · Fixed by #2586
Assignees
Labels
enhancement New feature or request v8.15.0

Comments

@jedrazb
Copy link
Member

jedrazb commented Jan 16, 2024

Problem Description

We use different timestamp datetime formats for different fields. For programatic control perspective it's a bad UX to enforce different formats for different fields:

Example: .elastic-connector index (created in 8.12 deployment via Kibana, no use of Connector APIs). Different datetime formats in filtering and last_seen stats - first is populated by Kibana, last sync data is populated by framework

{

   "filtering":[
      {
         "domain":"DEFAULT",
         "draft":{
            "advanced_snippet":{
               "updated_at":"2024-01-16T11:02:39.245Z",
               "created_at":"2024-01-16T11:02:39.245Z",
               "value":{
                  
               }
            },
            ... 
   ],
   "index_name":"search-mysql",
   "last_seen":"2024-01-16T11:06:40.233382+00:00",
   "last_sync_status":"completed",
   "last_synced":"2024-01-16T11:00:06.809586+00:00"

}

Note: setting "2024-01-16T11:02:39.245Z" in last seen will break the framework in https://github.com/elastic/connectors/blob/main/connectors/protocol/connectors.py#L596

Python 3.10.13 (main, Aug 24 2023, 22:36:46) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> datetime.fromisoformat("2024-01-16T11:02:39.245Z")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '2024-01-16T11:02:39.245Z'

It's even more interesting with sync jobs, note different formats between created_at and started_at.

{
  "created_at": "2024-01-16T10:59:35.334Z",
   ...
  "job_type": "full",
  "last_seen": "2024-01-16T11:00:06.544795+00:00",
  "metadata": {},
  "started_at": "2024-01-16T11:00:04.397788+00:00",
  "status": "completed",
  ...
}

Proposed Solution

Have framework-level method to handle common datetime formats parsing,

Alternatives

For now I'm just hardcoding serializing different datetime formats on the Connectors API level, but this is error-prone and hard to properly QA.

@jedrazb jedrazb added the enhancement New feature or request label Jan 16, 2024
@jedrazb jedrazb changed the title Unify timestamp format used in connectors framework Unify datetime format used in connectors framework Jan 16, 2024
@jedrazb
Copy link
Member Author

jedrazb commented May 17, 2024

I realised that the service is actually writing two different timestamps formats:

  • e.g.2024-05-11T06:45:44.639748+00:00 comes from iso_utc() defined here, example field with this format last_synced
  • e.g. 2024-05-11T06:45:00.386594, it lacks tz info, it comes from update_last_sync_scheduled_at_by_job_type defined here, example last_sync_scheduled_at

ES had issues with processing the later one, I realised that a few days ago this was fixed by:
elastic/elasticsearch#106486

I confirmed that on main and latest SNAPSHOT in cloud the processing issues are gone. I'm looking into unifying that from connector side to always include TZ info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v8.15.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant