[FEATURE] Push mode streaming support #29

dai-chen · 2023-08-30T20:30:47Z

Is your feature request related to a problem?

Currently, the refreshing of the Flint index is dependent on "polling" within the Spark FileStreamSource operator. This approach can potentially lead to performance issues, especially when dealing with a source table containing a substantial number of partitions and files.

What solution would you like?

The proposal is to allow user provide SNS topic for S3 data source. In this way, the streaming execution can find out "delta" (changed file list) efficiently.

Questions to think about:

Is this option provided on source table or Flint index DDL statement?
Do we only handle new changes via notification or we can also load cold data?

What alternatives have you considered?

Provide some way for user to refresh source table metadata periodically. But need to figure out how-to because:

Spark Hive table: MSTK REPAIR statement works for this purpose but Hive table doesn't support Spark structured streaming
Spark data source table: as aforementioned, FileStreamSource polls S3 file list

Do you have any additional context?

N/A

The text was updated successfully, but these errors were encountered:

dai-chen added the feature New feature label Aug 30, 2023

github-actions bot added the untriaged label Aug 30, 2023

dai-chen removed the untriaged label Aug 30, 2023

dai-chen mentioned this issue Aug 30, 2023

[Feature] OpenSearch and Apache Spark Integration #3

Closed

dai-chen assigned penghuo Sep 18, 2023

dai-chen added the 0.2 label Oct 31, 2023

penghuo mentioned this issue Dec 8, 2023

[EPIC] Zero-ETL - AWS ALB Logs Integration #186

Open

penghuo added 0.3 and removed 0.2 labels Feb 23, 2024

dai-chen mentioned this issue Jun 3, 2024

[FEATURE] Performance and Scalability Enhancements for Flint Index #365

Open

dai-chen added DataSource:File and removed 0.3 labels Sep 30, 2024

dai-chen mentioned this issue Nov 4, 2024

[FEATURE] Cost-effective materialized view for high cardinality data #765

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Push mode streaming support #29

[FEATURE] Push mode streaming support #29

dai-chen commented Aug 30, 2023

[FEATURE] Push mode streaming support #29

[FEATURE] Push mode streaming support #29

Comments

dai-chen commented Aug 30, 2023