Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qsv 2.0.0 #203333

Merged
merged 2 commits into from
Jan 6, 2025
Merged

qsv 2.0.0 #203333

merged 2 commits into from
Jan 6, 2025

Conversation

BrewTestBot
Copy link
Member

Created by brew bump


Created with brew bump-formula-pr.

prerelease notes
## qsv v2.0.0 is here! 🎉
It took 193 releases to get to v1.0.0, and we're already at v2.0.0 a month later!?!

Yes! We wanted a running start for 2025, and qsv 2.0.0 marks qsv's biggest release yet!

  • It fully enables the "Data Resource Upload First (DRUF)" workflow, allowing Datapusher+ to infer "automagical metadata" from the data itself. It exposes two Domain Specific Language (DSL) options - Luau and MiniJinja - to enable powerful data transformation and validation capabilities. This allows data stewards to upload data first, then use qsv's DSL capabilities inside DP+ to automatically generate rich metadata - including data dictionaries, field descriptions, data quality rules, and data validation schemas. This "automagical metadata" approach dramatically reduces the friction in compiling high-quality, high-resolution metadata (using the DCAT-US 3.0 specification as a reference) that would otherwise be a manual, laborious, and error-prone process.
    Under the hood, the fetchpost, template, stats, and luau commands now have the necessary scaffolding to fully support this workflow inside Datapusher+ and ckanext-scheming.
  • It adds a new pivotp command, powered by Polars, to enable fast pivot operations on large datasets. You can now pivot your data in seconds by simply specifying the columns to pivot on while blowing past Excel's PivotTable limitations.
  • stats now computes geometric mean and harmonic mean and adds string length stats, all while getting a performance boost.
  • join and joinp got a lot of love in this release, with several new options:
    • joinp: non-equi join support! 🎉💯🥳
      See "Lightning Fast and Space Efficient Inequality Joins" paper and this Polars non-equi join tracking issue.
    • join & joinp: --right-anti and --right-semi joins
    • joinp: --ignore-leading-zeros option for join keys
    • joinp: --maintain-order option to maintain the order of the either the left or right dataset in the output
    • joinp: expanded --cache-schema options to make joinp smarter/faster by leveraging the stats cache
    • join: --keys-output option to write successfully joined keys to a separate output file.

This release lays the groundwork for the outliers "smart" command to quickly identify outliers using stats/frequency info.

It also sets the stage for an initial implementation of our "Data Concierge" that leverages all the high-quality, high-res metadata we automagically compile with DRUF to enable Metadata Gardening Agents to proactively link seemingly unrelated data and glean insights as it constantly grooms the Data Catalog - effectively making it a FAIR Data Factory.


Added

Changed

Fixed

Full Changelog: dathere/qsv@1.0.0...2.0.0

@github-actions github-actions bot added rust Rust use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels Jan 6, 2025
@chenrui333 chenrui333 added the pre-release Artifact is pre-release label Jan 6, 2025
@jqnatividad
Copy link
Contributor

qsv maintainer here!

Just wanted to confirm that qsv 2.0.0 has been released!
https://github.com/dathere/qsv/releases/tag/2.0.0

@chenrui333 chenrui333 added ready to merge PR can be merged once CI is green and removed pre-release Artifact is pre-release labels Jan 6, 2025
qsv: update repo location

Signed-off-by: Rui Chen <[email protected]>

qsv: update test

Signed-off-by: Rui Chen <[email protected]>
@chenrui333 chenrui333 added the CI-no-fail-fast Continue CI tests despite failing GitHub Actions matrix builds. label Jan 6, 2025
Copy link
Contributor

github-actions bot commented Jan 6, 2025

🤖 An automated task has requested bottles to be published to this PR.

@github-actions github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label Jan 6, 2025
@BrewTestBot BrewTestBot enabled auto-merge January 6, 2025 23:16
@BrewTestBot BrewTestBot added this pull request to the merge queue Jan 6, 2025
Merged via the queue into master with commit 16c6e53 Jan 6, 2025
15 checks passed
@BrewTestBot BrewTestBot deleted the bump-qsv-2.0.0 branch January 6, 2025 23:24
@chenrui333 chenrui333 added repo-location-update and removed CI-no-fail-fast Continue CI tests despite failing GitHub Actions matrix builds. labels Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bump-formula-pr PR was created using `brew bump-formula-pr` CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. ready to merge PR can be merged once CI is green repo-location-update rust Rust use is a significant feature of the PR or issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants