Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support user-defined and incomplete date formats #273

Merged

Conversation

GumpacG
Copy link

@GumpacG GumpacG commented Jun 1, 2023

Description

  • Added support user defined (custom) date formats (backed by java syntax)
  • Updated support for predefined formats (doc, list, another doc)
  • Combinations of custom and named formats are also supported
  • Values produced by incomplete date formatters (e.g. year, week_year) returned as TIMESTAMP

TODOs

Sample for test

Mapping:

{
  "mappings" : {
    "properties" : {
      "custom_date" : {
        "type" : "date",
        "format" : "yyyy-MM-dd"
      }
   }
}

Data:

{"index": {}}
{"epoch_millis": "1984-04-12"}

Sample 2

Mapping

{
    "mappings":
    {
        "properties":
        {
            "custom_time" :
            {
                "type" : "date",
                "format" : "::: k-A || A    "
            },
            "incomplete_1" :
            {
                "type" : "date",
                "format" : "year"
            },
            "incomplete_2" :
            {
                "type" : "date",
                "format" : "E-w"
            },
            "incomplete_custom_date" :
            {
                "type" : "date",
                "format" : "uuuu"
            },
            "incomplete_custom_time" :
            {
                "type" : "date",
                "format" : "HH"
            },
            "incorrect" :
            {
                "type" : "date",
                "format" : "'___'"
            },
            "epoch_sec" :
            {
                "type" : "date",
                "format" : "epoch_second"
            },
            "epoch_milli" :
            {
                "type" : "date",
                "format" : "epoch_millis"
            },
            "custom_no_delimiter_date" :
            {
                "type" : "date",
                "format" : "uuuuMMdd"
            },
            "custom_no_delimiter_time" :
            {
                "type" : "date",
                "format" : "HHmmss"
            },
            "custom_no_delimiter_ts" :
            {
                "type" : "date",
                "format" : "uuuuMMddHHmmss"
            }
        }
    }
}

Data

{"index": {}}
{ "custom_time":  "85476321", "incomplete_1" : 1984, "incomplete_2": null, "incomplete_custom_date": 1999, "incomplete_custom_time" : 10, "incorrect" : null, "epoch_sec" : 42, "epoch_milli" : 42, "custom_no_delimiter_date" : "19841020", "custom_no_delimiter_time" : "102030", "custom_no_delimiter_ts" : "19841020153548" }
{"index": {}}
{ "custom_time":  "::: 9-32476542", "incomplete_1" : 2022, "incomplete_2": null, "incomplete_custom_date": 3021, "incomplete_custom_time" : 20, "incorrect" : null, "epoch_sec" : 100500, "epoch_milli" : 100500, "custom_no_delimiter_date" : "19610412", "custom_no_delimiter_time" : "090700", "custom_no_delimiter_ts" : "19610412090700" }

Result set

SELECT * FROM test
+--------------------------+-----------+-------------------------+---------------------+------------------------+--------------+------------------------+--------------------------+--------------+------------------------+---------------------+ 
| custom_no_delimiter_date | incorrect | epoch_milli             | epoch_sec           | incomplete_custom_time | custom_time  | custom_no_delimiter_ts | custom_no_delimiter_time | incomplete_2 | incomplete_custom_date | incomplete_1        |
+--------------------------+-----------+-------------------------+---------------------+------------------------+--------------+------------------------+--------------------------+--------------+------------------------+---------------------+
| date                     | timestamp | timestamp               | timestamp           | time                   | time         | timestamp              | time                     | timestamp    | date                   | timestamp           |
+--------------------------+-----------+-------------------------+---------------------+------------------------+--------------+------------------------+--------------------------+--------------+------------------------+---------------------+
| 1984-10-20               | null      | 1970-01-01 00:00:00.042 | 1970-01-01 00:00:42 | 10:00:00               | 23:44:36.321 | 1984-10-20 15:35:48    | 10:20:30                 | null         | 1999-01-01             | 1984-01-01 00:00:00 |
| 1961-04-12               | null      | 1970-01-01 00:01:40.5   | 1970-01-02 03:55:00 | 20:00:00               | 09:01:16.542 | 1961-04-12 09:07:00    | 09:07:00                 | null         | 3021-01-01             | 2022-01-01 00:00:00 |
+--------------------------+-----------+-------------------------+---------------------+------------------------+--------------+------------------------+--------------------------+--------------+------------------------+---------------------+

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@GumpacG GumpacG marked this pull request as draft June 2, 2023 15:44
@codecov

This comment was marked as spam.

{"index": {}}
{"epoch_millis": "450608862000.123456", "epoch_second": "450608862.000123456", "date_optional_time": "1984-04-12T09:07:42.000Z", "strict_date_optional_time": "1984-04-12T09:07:42.000Z", "strict_date_optional_time_nanos": "1984-04-12T09:07:42.000123456Z", "basic_date": "19840412", "basic_date_time": "19840412T090742.000Z", "basic_date_time_no_millis": "19840412T090742Z", "basic_ordinal_date": "1984103", "basic_ordinal_date_time": "1984103T090742.000Z", "basic_ordinal_date_time_no_millis": "1984103T090742Z", "basic_time": "090742.000Z", "basic_time_no_millis": "090742Z", "basic_t_time": "T090742.000Z", "basic_t_time_no_millis": "T090742Z", "basic_week_date": "1984W154", "strict_basic_week_date": "1984W154", "basic_week_date_time": "1984W154T090742.000Z", "strict_basic_week_date_time": "1984W154T090742.000Z", "basic_week_date_time_no_millis": "1984W154T090742Z", "strict_basic_week_date_time_no_millis": "1984W154T090742Z", "date": "1984-04-12", "strict_date": "1984-04-12", "date_hour": "1984-04-12T09", "strict_date_hour": "1984-04-12T09", "date_hour_minute": "1984-04-12T09:07", "strict_date_hour_minute": "1984-04-12T09:07", "date_hour_minute_second": "1984-04-12T09:07:42", "strict_date_hour_minute_second": "1984-04-12T09:07:42", "date_hour_minute_second_fraction": "1984-04-12T09:07:42.000", "strict_date_hour_minute_second_fraction": "1984-04-12T09:07:42.000", "date_hour_minute_second_millis": "1984-04-12T09:07:42.000", "strict_date_hour_minute_second_millis": "1984-04-12T09:07:42.000", "date_time": "1984-04-12T09:07:42.000Z", "strict_date_time": "1984-04-12T09:07:42.000123456Z", "date_time_no_millis": "1984-04-12T09:07:42Z", "strict_date_time_no_millis": "1984-04-12T09:07:42Z", "hour": "09", "strict_hour": "09", "hour_minute": "09:07", "strict_hour_minute": "09:07", "hour_minute_second": "09:07:42", "strict_hour_minute_second": "09:07:42", "hour_minute_second_fraction": "09:07:42.000", "strict_hour_minute_second_fraction": "09:07:42.000", "hour_minute_second_millis": "09:07:42.000", "strict_hour_minute_second_millis": "09:07:42.000", "ordinal_date": "1984-103", "strict_ordinal_date": "1984-103", "ordinal_date_time": "1984-103T09:07:42.000123456Z", "strict_ordinal_date_time": "1984-103T09:07:42.000123456Z", "ordinal_date_time_no_millis": "1984-103T09:07:42Z", "strict_ordinal_date_time_no_millis": "1984-103T09:07:42Z", "time": "09:07:42.000Z", "strict_time": "09:07:42.000Z", "time_no_millis": "09:07:42Z", "strict_time_no_millis": "09:07:42Z", "t_time": "T09:07:42.000Z", "strict_t_time": "T09:07:42.000Z", "t_time_no_millis": "T09:07:42Z", "strict_t_time_no_millis": "T09:07:42Z", "week_date": "1984-W15-4", "strict_week_date": "1984-W15-4", "week_date_time": "1984-W15-4T09:07:42.000Z", "strict_week_date_time": "1984-W15-4T09:07:42.000Z", "week_date_time_no_millis": "1984-W15-4T09:07:42Z", "strict_week_date_time_no_millis": "1984-W15-4T09:07:42Z", "weekyear_week_day": "1984-W15-4", "strict_weekyear_week_day": "1984-W15-4", "year_month_day": "1984-04-12", "strict_year_month_day": "1984-04-12", "yyyy-MM-dd": "1984-04-12", "HH:mm:ss": "09:07:42", "yyyy-MM-dd_OR_epoch_millis": "450608862000.123456", "hour_minute_second_OR_t_time": "T09:07:42.000Z"}
{"epoch_millis": "450608862000.123456", "epoch_second": "450608862.000123456", "date_optional_time": "1984-04-12T09:07:42.000Z", "strict_date_optional_time": "1984-04-12T09:07:42.000Z", "strict_date_optional_time_nanos": "1984-04-12T09:07:42.000123456Z", "basic_date": "19840412", "basic_date_time": "19840412T090742.000Z", "basic_date_time_no_millis": "19840412T090742Z", "basic_ordinal_date": "1984103", "basic_ordinal_date_time": "1984103T090742.000Z", "basic_ordinal_date_time_no_millis": "1984103T090742Z", "basic_time": "090742.000Z", "basic_time_no_millis": "090742Z", "basic_t_time": "T090742.000Z", "basic_t_time_no_millis": "T090742Z", "basic_week_date": "1984W154", "strict_basic_week_date": "1984W154", "basic_week_date_time": "1984W154T090742.000Z", "strict_basic_week_date_time": "1984W154T090742.000Z", "basic_week_date_time_no_millis": "1984W154T090742Z", "strict_basic_week_date_time_no_millis": "1984W154T090742Z", "date": "1984-04-12", "strict_date": "1984-04-12", "date_hour": "1984-04-12T09", "strict_date_hour": "1984-04-12T09", "date_hour_minute": "1984-04-12T09:07", "strict_date_hour_minute": "1984-04-12T09:07", "date_hour_minute_second": "1984-04-12T09:07:42", "strict_date_hour_minute_second": "1984-04-12T09:07:42", "date_hour_minute_second_fraction": "1984-04-12T09:07:42.000", "strict_date_hour_minute_second_fraction": "1984-04-12T09:07:42.000", "date_hour_minute_second_millis": "1984-04-12T09:07:42.000", "strict_date_hour_minute_second_millis": "1984-04-12T09:07:42.000", "date_time": "1984-04-12T09:07:42.000Z", "strict_date_time": "1984-04-12T09:07:42.000123456Z", "date_time_no_millis": "1984-04-12T09:07:42Z", "strict_date_time_no_millis": "1984-04-12T09:07:42Z", "hour": "09", "strict_hour": "09", "hour_minute": "09:07", "strict_hour_minute": "09:07", "hour_minute_second": "09:07:42", "strict_hour_minute_second": "09:07:42", "hour_minute_second_fraction": "09:07:42.000", "strict_hour_minute_second_fraction": "09:07:42.000", "hour_minute_second_millis": "09:07:42.000", "strict_hour_minute_second_millis": "09:07:42.000", "ordinal_date": "1984-103", "strict_ordinal_date": "1984-103", "ordinal_date_time": "1984-103T09:07:42.000123456Z", "strict_ordinal_date_time": "1984-103T09:07:42.000123456Z", "ordinal_date_time_no_millis": "1984-103T09:07:42Z", "strict_ordinal_date_time_no_millis": "1984-103T09:07:42Z", "time": "09:07:42.000Z", "strict_time": "09:07:42.000Z", "time_no_millis": "09:07:42Z", "strict_time_no_millis": "09:07:42Z", "t_time": "T09:07:42.000Z", "strict_t_time": "T09:07:42.000Z", "t_time_no_millis": "T09:07:42Z", "strict_t_time_no_millis": "T09:07:42Z", "week_date": "1984-W15-4", "strict_week_date": "1984-W15-4", "week_date_time": "1984-W15-4T09:07:42.000Z", "strict_week_date_time": "1984-W15-4T09:07:42.000Z", "week_date_time_no_millis": "1984-W15-4T09:07:42Z", "strict_week_date_time_no_millis": "1984-W15-4T09:07:42Z", "weekyear_week_day": "1984-W15-4", "strict_weekyear_week_day": "1984-W15-4", "year_month_day": "1984-04-12", "strict_year_month_day": "1984-04-12", "yyyy-MM-dd": "1984-04-12", "custom_time": "09:07:42 PM", "yyyy-MM-dd_OR_epoch_millis": "450608862000.123456", "hour_minute_second_OR_t_time": "T09:07:42.000Z", "custom_timestamp": "1984-04-12 10:07:42 ---- PM", "custom_date_or_date": "1984-04-12"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting really big.
We should split this out into a separate file when we do the real fix.

@Yury-Fridlyand Yury-Fridlyand force-pushed the dev-custom-datetime-formats branch from 3b0c2d2 to 8e446ce Compare June 29, 2023 04:30
@Yury-Fridlyand Yury-Fridlyand changed the title [POC] Support user-defined date formats Support user-defined and incomplete date formats Jun 29, 2023
@Yury-Fridlyand Yury-Fridlyand marked this pull request as ready for review June 29, 2023 04:30
@acarbonetto
Copy link

acarbonetto commented Jul 6, 2023

Found one issue. Note that the year of 1999 isn't being received in the response.

Mapping:

{
  "mappings" : {
    "properties" : {
      "custom_time" : {
        "type" : "date",
        "format" : "yyyy HH:mm"
      }
    }
  }
}

data:

{"index": {}}
{"custom_time": "1999 01:01"}

returns:

{
    "schema": [
        {
            "name": "custom_time",
            "type": "timestamp"
        }
    ],
    "datarows": [
        [
            "1970-01-01 01:01:00"
        ]
    ],
    "total": 2,
    "size": 2,
    "status": 200
}

@GumpacG
Copy link
Author

GumpacG commented Jul 6, 2023

When a mapping with incomplete time/date format is loaded and queried, the return type should remain the same.
For example:

"incomplete_custom_time" :
            {
                "type" : "date",
                "format" : "HH"
            },

Should remain a time. It currently is returned as a timestamp. Incomplete custom date also has the same behaviour.

@GumpacG
Copy link
Author

GumpacG commented Jul 6, 2023

When a mapping with incomplete time/date format is loaded and queried, the return type should remain the same. For example:

"incomplete_custom_time" :
            {
                "type" : "date",
                "format" : "HH"
            },

Should remain a time. It currently is returned as a timestamp. Incomplete custom date also has the same behaviour.

Also test that CAST to timestamp from incomplete_custom_time returns todays date with the custom time.

@GumpacG
Copy link
Author

GumpacG commented Jul 7, 2023

Checkstyle is failing and a few IT needs to get updated

@Yury-Fridlyand Yury-Fridlyand force-pushed the dev-custom-datetime-formats branch from b75d2bc to c086dad Compare July 7, 2023 20:07
@Yury-Fridlyand
Copy link

Yury-Fridlyand commented Jul 7, 2023

Found one issue. Note that the year of 1999 isn't being received in the response.

This comes from org.opensearch.common.time.DateFormatters. Maybe not a bug though. To avoid that we probably need to split custom format into date and time parts and then split values.


Should remain a time. It currently is returned as a timestamp. Incomplete custom date also has the same behaviour

Fixed in 6d214a5


Also test that CAST to timestamp from incomplete_custom_time returns todays date with the custom time.

Works as expected: converting time to date/dt/ts adds today's date (not epoch).

opensearchsql> select CAST(incomplete_custom_time as TIMESTAMP) from dt_formats;
fetched rows / total rows = 2/2
+---------------------------------------------+
| CAST(incomplete_custom_time as TIMESTAMP)   |
|---------------------------------------------|
| 2023-07-07 10:00:00                         |
| 2023-07-07 20:00:00                         |
+---------------------------------------------+

Checkstyle is failing and a few IT needs to get updated

Fixed in c086dad thanks for noticing

@Yury-Fridlyand Yury-Fridlyand merged commit 56e5621 into integ-custom-datetime-formats Jul 8, 2023
@Yury-Fridlyand Yury-Fridlyand deleted the dev-custom-datetime-formats branch July 8, 2023 01:51
matthewryanwells pushed a commit that referenced this pull request Jul 11, 2023
…roject#1821)

* Support user-defined and incomplete date formats (#273)

* Check custom formats for characters

Signed-off-by: Guian Gumpac <[email protected]>

* Removed duplicated code

Signed-off-by: Guian Gumpac <[email protected]>

* Reworked checking for exprcoretype

Signed-off-by: Guian Gumpac <[email protected]>

* Changed check for time

Signed-off-by: Guian Gumpac <[email protected]>

* Rework processing custom and incomplete formats and add tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Values of incomplete and incorrect formats to be returned as `TIMESTAMP` instead of `STRING`.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Complete fix and update tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* More fixes for god of fixes.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Yury-Fridlyand <[email protected]>

* Refactoring.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Guian Gumpac <[email protected]>
MitchellGale pushed a commit that referenced this pull request Jul 11, 2023
…roject#1821)

* Support user-defined and incomplete date formats (#273)

* Check custom formats for characters

Signed-off-by: Guian Gumpac <[email protected]>

* Removed duplicated code

Signed-off-by: Guian Gumpac <[email protected]>

* Reworked checking for exprcoretype

Signed-off-by: Guian Gumpac <[email protected]>

* Changed check for time

Signed-off-by: Guian Gumpac <[email protected]>

* Rework processing custom and incomplete formats and add tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Values of incomplete and incorrect formats to be returned as `TIMESTAMP` instead of `STRING`.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Complete fix and update tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* More fixes for god of fixes.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Yury-Fridlyand <[email protected]>

* Refactoring.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Guian Gumpac <[email protected]>
Signed-off-by: Mitchell Gale <[email protected]>
MitchellGale pushed a commit that referenced this pull request Jul 12, 2023
…roject#1821) (opensearch-project#1830)

* Support user-defined and incomplete date formats (#273)

* Check custom formats for characters

Signed-off-by: Guian Gumpac <[email protected]>

* Removed duplicated code

Signed-off-by: Guian Gumpac <[email protected]>

* Reworked checking for exprcoretype

Signed-off-by: Guian Gumpac <[email protected]>

* Changed check for time

Signed-off-by: Guian Gumpac <[email protected]>

* Rework processing custom and incomplete formats and add tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Values of incomplete and incorrect formats to be returned as `TIMESTAMP` instead of `STRING`.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Complete fix and update tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* More fixes for god of fixes.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Yury-Fridlyand <[email protected]>

* Refactoring.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Guian Gumpac <[email protected]>
(cherry picked from commit a60b222)

Co-authored-by: Yury-Fridlyand <[email protected]>
Yury-Fridlyand added a commit that referenced this pull request Aug 22, 2023
…roject#1821) (opensearch-project#1840)

* Support user-defined and incomplete date formats (#273)

* Check custom formats for characters

Signed-off-by: Guian Gumpac <[email protected]>

* Removed duplicated code

Signed-off-by: Guian Gumpac <[email protected]>

* Reworked checking for exprcoretype

Signed-off-by: Guian Gumpac <[email protected]>

* Changed check for time

Signed-off-by: Guian Gumpac <[email protected]>

* Rework processing custom and incomplete formats and add tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Values of incomplete and incorrect formats to be returned as `TIMESTAMP` instead of `STRING`.

Signed-off-by: Yury-Fridlyand <[email protected]>

* Complete fix and update tests.

Signed-off-by: Yury-Fridlyand <[email protected]>

* More fixes for god of fixes.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Yury-Fridlyand <[email protected]>

* Refactoring.

Signed-off-by: Yury-Fridlyand <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Co-authored-by: Guian Gumpac <[email protected]>
(cherry picked from commit a60b222)

Co-authored-by: Yury-Fridlyand <[email protected]>
andy-k-improving pushed a commit that referenced this pull request Nov 16, 2024
Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
andy-k-improving pushed a commit that referenced this pull request Nov 16, 2024
* Implement creation of ip2geo feature (#257)

* Update gradle version to 7.6 (#265)

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Implement creation of ip2geo feature

* Implementation of ip2geo datasource creation
* Implementation of ip2geo processor creation

Signed-off-by: Heemin Kim <[email protected]>
---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
Co-authored-by: Vijayan Balasubramanian <[email protected]>

* Added unit tests with some refactoring of codes (#271)

* Add Unit tests
* Set cache true for search query
* Remove in memory cache implementation (Two way door decision)
 * Relying on search cache without custom cache
* Renamed datasource state from FAILED to CREATE_FAILED
* Renamed class name from *Helper to *Facade
* Changed updateIntervalInDays to updateInterval
* Changed value type of default update_interval from TimeValue to Long
* Read setting value from cluster settings directly

Signed-off-by: Heemin Kim <[email protected]>

* Sync from main (#280)

* Update gradle version to 7.6 (#265)

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Exclude lombok generated code from jacoco coverage report (#268)

Signed-off-by: Heemin Kim <[email protected]>

* Make jacoco report to be generated faster in local (#267)

Signed-off-by: Heemin Kim <[email protected]>

* Update dependency org.json:json to v20230227 (#273)

Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>

* Baseline owners and maintainers (#275)

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
Co-authored-by: Vijayan Balasubramanian <[email protected]>
Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>

* Add datasource name validation (#281)

Signed-off-by: Heemin Kim <[email protected]>

* Refactoring of code (#282)

1. Change variable name from datasourceName to name
2. Change variable name from id to name
3. Added helper methods in test code

Signed-off-by: Heemin Kim <[email protected]>

* Change field name from md5 to sha256 (#285)

Signed-off-by: Heemin Kim <[email protected]>

* Implement get datasource api (#279)

Signed-off-by: Heemin Kim <[email protected]>

* Update index option (#284)

1. Make geodata index as hidden
2. Make geodata index as read only allow delete after creation is done
3. Refresh datasource index immediately after update

Signed-off-by: Heemin Kim <[email protected]>

* Make some fields in manifest file as mandatory (#289)

Signed-off-by: Heemin Kim <[email protected]>

* Create datasource index explicitly (#283)

Signed-off-by: Heemin Kim <[email protected]>

* Add wrapper class of job scheduler lock service (#290)

Signed-off-by: Heemin Kim <[email protected]>

* Remove all unused client attributes (#293)

Signed-off-by: Heemin Kim <[email protected]>

* Update copyright header (#298)

Signed-off-by: Heemin Kim <[email protected]>

* Run system index handling code with stashed thread context (#297)

Signed-off-by: Heemin Kim <[email protected]>

* Reduce lock duration and renew the lock during update (#299)

Signed-off-by: Heemin Kim <[email protected]>

* Implements delete datasource API (#291)

Signed-off-by: Heemin Kim <[email protected]>

* Set User-Agent in http request (#300)

Signed-off-by: Heemin Kim <[email protected]>

* Implement datasource update API (#292)

Signed-off-by: Heemin Kim <[email protected]>

* Refactoring test code (#302)

Make buildGeoJSONFeatureProcessorConfig method to be more general

Signed-off-by: Heemin Kim <[email protected]>

* Add ip2geo processor integ test for failure case (#303)

Signed-off-by: Heemin Kim <[email protected]>

* Bug fix and refactoring of code (#305)

1. Bugfix: Ingest metadata can be null if there is no processor created
2. Refactoring: Moved private method to another class for better testing support
3. Refactoring: Set some private static final variable as public so that unit test can use it
4. Refactoring: Changed string value to static variable

Signed-off-by: Heemin Kim <[email protected]>

* Add integration test for Ip2GeoProcessor (#306)

Signed-off-by: Heemin Kim <[email protected]>

* Add ConcurrentModificationException (#308)

Signed-off-by: Heemin Kim <[email protected]>

* Add integration test for UpdateDatasource API (#307)

Signed-off-by: Heemin Kim <[email protected]>

* Bug fix on lock management and few performance improvements (#310)

* Release lock before response back to caller for update/delete API
* Release lock in background task for creation API
* Change index settings to improve indexing performance

Signed-off-by: Heemin Kim <[email protected]>

* Change index setting from read_only_allow_delete to write (#311)

read_only_allow_delete does not block write to an index.
The disk-based shard allocator may add and remove this block automatically.
Therefore, use index.blocks.write instead.

Signed-off-by: Heemin Kim <[email protected]>

* Fix bug in get datasource API and improve memory usage (#313)

Signed-off-by: Heemin Kim <[email protected]>

* Change package for Strings.hasText (#314) (#317)

Signed-off-by: Heemin Kim <[email protected]>

* Remove jitter and move index setting from DatasourceFacade to DatasourceExtension (#319)

Signed-off-by: Heemin Kim <[email protected]>

* Do not index blank value and do not enrich null property (#320)

Signed-off-by: Heemin Kim <[email protected]>

* Move index setting keys to constants (#321)

Signed-off-by: Heemin Kim <[email protected]>

* Return null index name for expired data (#322)

Return null index name for expired data so that it can be deleted
by clean up process. Clean up process exclude current index from deleting.
Signed-off-by: Heemin Kim <[email protected]>

* Add new fields in datasource (#325)

Signed-off-by: Heemin Kim <[email protected]>

* Delete index once it is expired (#326)

Signed-off-by: Heemin Kim <[email protected]>

* Add restoring event listener (#328)

In the listener, we trigger a geoip data update

Signed-off-by: Heemin Kim <[email protected]>

* Reverse forcemerge and refresh order (#331)

Otherwise, opensearch does not clear old segment files

Signed-off-by: Heemin Kim <[email protected]>

* Removed parameter and settings (#332)

* Removed first_only parameter
* Removed max_concurrency and batch_size setting

first_only parameter was added as current geoip processor has it.
However, the parameter have no benefit for ip2geo processor as we don't do a sequantial search for array data but use multi search.

max_concurrency and batch_size setting is removed as these are only reveal internal implementation and could be a future blocker to improve performance later.

Signed-off-by: Heemin Kim <[email protected]>

* Add a field in datasource for current index name (#333)

Signed-off-by: Heemin Kim <[email protected]>

* Delete GeoIP data indices after restoring complete (#334)

We don't want to use restored GeoIP data indices. Therefore we
delete the indices once restoring process complete.

When GeoIP metadata index is restored, we create a new GeoIP data index instead.

Signed-off-by: Heemin Kim <[email protected]>

* Use bool query for array form of IPs (#335)

Signed-off-by: Heemin Kim <[email protected]>

* Run update/delete request in a new thread (#337)

This is not to block transport thread

Signed-off-by: Heemin Kim <[email protected]>

* Remove IP2Geo processor validation (#336)

Cannot query index to get data to validate IP2Geo processor.
Will add validation when we decide to store some of data in cluster state metadata.

Signed-off-by: Heemin Kim <[email protected]>

* Acquire lock sychronously (#339)

By acquiring lock asychronously, the remaining part of the code
is being run by transport thread which does not allow blocking code.
We want only single update happen in a node using single thread. However,
it cannot be acheived if I acquire lock asynchronously and pass the listener.

Signed-off-by: Heemin Kim <[email protected]>

* Added a cache to store datasource metadata (#338)

Signed-off-by: Heemin Kim <[email protected]>

* Changed class name and package (#341)

Signed-off-by: Heemin Kim <[email protected]>

* Refactoring of code (#342)

1. Changed class name from Ip2GeoCache to Ip2GeoCachedDao
2. Moved the Ip2GeoCachedDao from cache to dao package

Signed-off-by: Heemin Kim <[email protected]>

* Add geo data cache (#340)

Signed-off-by: Heemin Kim <[email protected]>

* Add cache layer to reduce GeoIp data retrieval latency (opensearch-project#343)

Signed-off-by: Heemin Kim <[email protected]>

* Use _primary in query preference and few changes (opensearch-project#347)

1. Use _primary preference to get datasource metadata so that it can read the latest data. RefreshPolicy.IMMEDIATE won't refresh replica shards immediately according to #346
2. Update datasource metadata index mapping
3. Move batch size from static value to setting

Signed-off-by: Heemin Kim <[email protected]>

* Wait until GeoIP data to be replicated to all data nodes (opensearch-project#348)

Signed-off-by: Heemin Kim <[email protected]>

* Update packages according to a change in OpenSearch core (opensearch-project#354)

* Update packages according to a change in OpenSearch core

Signed-off-by: Heemin Kim <[email protected]>

* Update packages according to a change in OpenSearch core (opensearch-project#353)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
Co-authored-by: Vijayan Balasubramanian <[email protected]>
Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants