Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sortBy(ColumnReference) accept pathOf without extra cast #779

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

koperagen
Copy link
Collaborator

@koperagen koperagen commented Jul 15, 2024

Before this change only df.sortBy(pathOf().cast()) would work. df.sortByDesc(pathOf().cast()) also didn't! Was very hard to figure out correct way to call these functions. PR solves it.
I found this problem while working with this dataset in Datalore which can be found in their sample database. It's public domain and seems nice, let's add it.
https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries. It shows a good use case

@koperagen koperagen added the enhancement New feature or request label Jul 15, 2024
@koperagen koperagen added this to the 0.14.0 milestone Jul 15, 2024
@koperagen koperagen requested a review from zaleslaw July 15, 2024 17:09
@koperagen koperagen self-assigned this Jul 15, 2024
@@ -67,4 +72,27 @@ class SortDataColumn {
col.sortWith { df1, df2 -> df1[a] - df2[a] } shouldBe sortedCol
col.sortWith(compareBy { it[a] }) shouldBe sortedCol
}

@Test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool new dataset for unit tests! Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to validate on that some high-level integration tests with more complex logic instead of low-level A.a.b


@Suppress("unused")
@DataSchema
interface DsSalaries {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add somewhere the link to the initial source of dataset (for example here, on that data schema), the origin, the license, the possible transformation which we did

It's a gentleman set of actions for copying of external dataset to your codebase. If we missed it earlier, let's start from that

Copy link
Collaborator Author

@koperagen koperagen Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link. It was downloaded as is, without modifications and is public domain. So, do we need to add anything else here?

Copy link
Collaborator

@zaleslaw zaleslaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved in advance, but please add minor information about dataset

Copy link
Contributor

Generated sources will be updated after merging this PR.
Please inspect the changes in here.

@koperagen koperagen merged commit 89db8e6 into master Jul 19, 2024
4 of 5 checks passed
@koperagen koperagen deleted the sort-by-columnreference branch August 26, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants