Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Star tree validations #15491

Closed
bharath-techie opened this issue Aug 29, 2024 · 0 comments · Fixed by #15533
Closed

[Feature Request] Star tree validations #15491

bharath-techie opened this issue Aug 29, 2024 · 0 comments · Fixed by #15533
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance v2.17.0

Comments

@bharath-techie
Copy link
Contributor

bharath-techie commented Aug 29, 2024

Is your feature request related to a problem? Please describe

We are adding following blocks when user configures star tree index

  • Unsigned long currently is not supported as it has special comparator logic which is not handled currently . For more see : [Star tree] Handle 'unsigned long' as part of star tree #15231

  • We need to limit the maximum number of base metrics with 2.17 experimental release.

  • For documents with array values - currently star tree index cannot handle such cases, since 'star' property does not get satisfied.

  • Since flush is heavy for star tree index, limiting the maximum number of documents during flush will help with the indexing throughput.

Example

Dimension fields
Timestamp
Status

Metric fields
Size

Document 1 :
{"Timestamp": 1999, "status": [200,300], "size" :1000 }

 Queries for above doc:
1. Count of size = 1
2. Sum of size = 1000
3. Count of size where timestamp = 1999 => 1
4. Sum of size where timestamp = 1999   => 1000

Star tree index :
          Dimensions            |  Metrics
DocId    Timestamp     Status     Sum(Size)    Count(Size)     Correct ?
1         1999          200        1000         1               Yes
2         1999          300        1000         1               Yes
3         *             200        1000         1               Yes
4         *             300        1000         1               Yes
5         1999          *          2000         2               NO
6         *             *          2000         2               NO
    
With the above star tree documents
Queries

1. Count of size
    Since there are no filters , we will query for * , * ==> "Doc ID 6"
   Answer =   2
   Expected = 1
   
2. Sum of size
   Since there are no filters , we will query for * , * ==> "Doc ID 6"
   Answer =   2000
   Expected = 1000
   

3. Count of size where timestamp = 1999
   we will query for 1999 , * ==> "Doc ID 5"
   Answer =   2
   Expected = 1
   
4. Sum of size where timestamp = 1999
   we will query for 1999 , * ==> "Doc ID 5"
   Answer =   2
   Expected = 1

Describe the solution you'd like

  • We will block unsigned long as part of star tree mapping as part of star tree dimensions and metrics
  • We will block documents with array values for index with star tree index enabled [ block during bulk / indexing ]
  • We will to limit the maximum number of base metrics to 100 with 2.17 experimental release.
  • Limit the index.translog.flush_threshold_size to maximum of 512 mb which can be configured via another final setting indices.composite_index.translog.max_flush_threshold_size

Related component

Indexing:Performance

Describe alternatives you've considered

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance v2.17.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant