Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.3.4 #472

Merged
merged 37 commits into from
Dec 27, 2024
Merged

V1.3.4 #472

merged 37 commits into from
Dec 27, 2024

Conversation

flarco
Copy link
Collaborator

@flarco flarco commented Dec 23, 2024

Task Storage Refactoring

  • Consolidated StoreInsert and StoreUpdate into a single StoreSet function
  • Improved state management and storage operations

Connection Handling

  • Added new connection context handling methods (AsDatabaseContext, AsFileContext)
  • Improved connection pooling and caching mechanisms
  • Enhanced TLS configuration for MySQL connections

Delete Missing Feature

  • Added new DeleteMissing functionality for incremental mode
  • Introduced slingDeletedAtColumn for tracking deleted records
  • Added soft/hard delete options

DuckDB Improvements

  • Added HTTP-based import method alongside existing CSV and named pipes methods
  • Enhanced partitioned file writing for both Parquet and CSV formats
  • Improved temporary table handling

Incremental Processing

  • Enhanced state management for incremental updates
  • Added support for incremental state with update keys
  • Improved handling of incremental values via state storage

Dependencies

  • Updated various dependencies including:
    • github.com/flarco/g to v0.1.133
    • github.com/microsoft/go-mssqldb to v1.8.0
    • Added github.com/labstack/echo/v4 for HTTP handling

- updated github.com/microsoft/go-mssqldb from v1.7.2 to v1.8.0
- Added handling for casting string columns to fixed-type columns in Snowflake.
- Addresses an issue where string-to-fixed type casting was not properly handled, leading to potential errors.
- Improved error handling during replication to prevent cascading failures.
- Added `FailErr` field to `ReplicationConfig` to store the error encountered when a connection issue occurs.
- Modified `replicationRun` to set `FailErr` when a connection error is detected, stopping further execution.
- Added a unique connection ID to the MySQL connection to improve TLS configuration management.
- The `Init()` function now registers the TLS configuration using the unique connection ID.
- Added error handling for TLS configuration registration.
- Modified `GetURL` to use the unique connection ID as the key for custom TLS configurations.
- This prevents issues with multiple connections potentially using the same TLS configuration.
- Added functionality to expand environment variables within the envfile.
- Improved flexibility and ease of configuration.
- Enhanced security by allowing sensitive information to be stored in environment variables rather than directly in the envfile.
- Replaced `g.Rme` with `g.Rmd` in `LoadEnvFile` function to fix a potential bug in environment variable processing.
- Updated the `github.com/flarco/g` dependency to version v0.1.133.
- Fixed a typo in the `json` type mapping for `nvarchar(max)` in `types_general_to_native.tsv`.  The previous entry incorrectly listed `nvarchar(65535)` twice.  This commit corrects it to the consistent `nvarchar(max)`.
- Replaced manual map population with a loop for better readability and maintainability.
- Improved code clarity and reduced redundancy.
- Use `osext.Executable()` to reliably get the executable path, replacing previous method which relied on string matching.  This improves accuracy and robustness across different environments.
- Moved `osext.Executable()` call to `init()` function to ensure it's run once during initialization.
- Updated `checkUpdate` function to use the updated `env.Executable` variable for more accurate package detection.  This removes the need to directly call `osext.Executable()` within this function.
- Replaced connection name with connection hash in cache key to fix caching issues.
- This ensures that different connections with the same name but different configurations are cached separately.
- increased precision of timestamp layout in clickhouse template to 9 decimal places for improved accuracy
- enhanced sqlserver timestamp layout to handle microseconds and timezone information more reliably, improving data consistency
- Updated datetime format strings in `dateLayouts` to use 9 digits for nanosecond precision instead of 6.
- Modified `CastToString`, `CastToStringSafe`, and `CastToStringSafeMask` functions to reflect the change in precision.  This ensures consistent and accurate handling of datetime values across different functions and contexts.
- Added new functions to extract partition levels from file paths and truncate timestamps based on the specified partition level.
- Added tests to cover all scenarios for partition level extraction and timestamp truncation.
- Improved validation of partition levels and added a test case to cover invalid partition levels.
- Created PartitionLevel type to represent the available partition levels, and added corresponding methods: IsValid and TruncateTime.
- Added tests to cover all scenarios for partition level truncation.

✨ feat(cmd/sling): improve sling CLI test and add new tests

- Added a check to see if the sling binary exists before running tests.
- Added new test cases for different scenarios, including CSV source with single quote and $symbol quote, direct insert full-refresh, and incremental with delete missing (soft and hard).
- Added a test case for writing to partitioned parquet files, both locally and on AWS S3.
- Added test cases to cover incremental writing to partitioned parquet files.
- Added new test cases to cover all scenarios for different partitioning options.

🐛 fix(core/dbio/database): improve duckdb log message

- Changed the log level for the "The file ... does not exist" message from debug to trace to prevent excess information in logs

♻️ refactor(core/dbio/filesys): improve copy from local recursive function

- Changed concurrency handling for file copying with a context to allow for proper cancellation
- Added error handling when processing files in a recursive copy operation
- Added debug log when writing partitions

🐛 fix(core/dbio/iop): improve validation and handling of partition levels in DuckDB

- Changed partition fields from a string array to an array of PartitionLevel to improve validation and error handling.
- Added new enum `PartitionLevel` for improved type safety and readability of partition levels.
- Fixed bug where partition expressions for month, week, and day were not correctly formatted.
- Added validation to prevent invalid PartitionLevels being used.
- Added new `PartitionLevelsAscending` and `PartitionLevelsDescending` constants for consistency and clarity.

🐛 fix(core/dbio/scripts): update test script for better code coverage

- Updated test script to run all test cases of the `iop` module.

♻️ refactor(core/sling): improve incremental value handling

- Added `IncrementalGTE` flag to config to allow >= comparison for incremental mode.

🐛 fix(core/sling): improve handling of incremental mode with update keys

- Improved handling of incremental mode with update keys to use SLING_STATE environment variable for better state management.
- Improved error handling when `SLING_STATE` is not set but using `update_key` field in incremental mode.

♻️ refactor(core/sling): remove unnecessary function

- Removed the `extractPartFields` function as its functionality was superseded by the newly introduced `iop.ExtractPartitionFields` function.

🐛 fix(core/sling): improve handling of incremental writes to files

- Improved handling of incremental writes to files by using `>=` instead of `>` when comparing update keys, and updating the query accordingly.
- updated `ReadFromDB` function to use `>=` for `incremental_gte` property and added tests to cover this functionality
- updated sling state handling to support update key with incremental mode and sling state
- improved logic for determining whether to use duckdb for writing data
- optimized condition check for incremental state with update key.
- Added a new import method using a local HTTP server to improve performance and handle large datasets.
- The HTTP server serves data in CSV format, allowing DuckDB to efficiently import the data using `read_csv`.
- Implemented error handling for server startup and data streaming.
- Improved logging to track import progress and handle potential issues.
- Added support for configuring the CSV import parameters.
- added a small delay after waiting for the local server to start to improve stability
- added check for runtime variables in ObjectHasStreamVars()
- set Single to true if no runtime variables are found in wildcard replication
- improves handling of wildcard replication scenarios without runtime variables
- Correctly set `Single` flag for wildcard targets without runtime variables, considering `FileMaxBytes` and `FileMaxRows` settings.
- Improves accuracy of replication configuration processing for wildcard targets.
- corrected the prefix assertion to include "file://" to accurately reflect the stream path.
- Updated the `table_incremental_from_postgres` test to use the `email` column instead of the `code` column for incremental updates.  This aligns with recent schema changes and ensures the test continues to function correctly.
- changed primary key for StarRocks to `id,email` to resolve incompatibility with decimal primary keys
- updated `suite.db.template.tsv` to reflect the change in primary key for the `table_incremental_from_postgres` test case.  This ensures that tests run correctly with StarRocks.
- starrocks doesn't support decimal as primary key, fix the primary key to "id" only
- Increased timeout for database tests from 15m to 25m to prevent intermittent failures due to long-running operations.
- Addresses an issue where wildcard streams were not processed correctly, leading to errors.
- Improved wildcard stream handling by using a clone stream to apply defaults.
- Added default setting for single streams with zero file max variables.
- Ensured that the correct stream configuration is used for wildcard streams.
@flarco flarco merged commit 9f2fdc5 into main Dec 27, 2024
8 checks passed
@flarco flarco deleted the v1.3.4 branch December 27, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant