Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include indices of duplicate rows in the Non_Unique_Primary_Key error #6437

Closed
3 tasks
radeusgd opened this issue Apr 26, 2023 · 2 comments · Fixed by #6604
Closed
3 tasks

Include indices of duplicate rows in the Non_Unique_Primary_Key error #6437

radeusgd opened this issue Apr 26, 2023 · 2 comments · Fixed by #6604
Assignees
Labels
-libs Libraries: New libraries to be implemented l-db-write Libraries: database writer p-lowest Should be completed at some point
Milestone

Comments

@radeusgd
Copy link
Member

As discussed, it could be useful to include the indices of duplicate rows in the error indicating that the selected primary key was not unique.

To not stall that PR, since this has a lower priority, it's separated into this task.

  • Add duplicate_rows : Vector Integer field to Non_Unique_Primary_Key.Error.
  • Adapt its to_display_text to display what rows were duplicated.
  • Fill in this field when reporting the error.
    • We should fill in row indices of one group that has duplicate entries. See below.

If we have a table

# X
0 a
1 b
2 a
3 a
4 b
5 c

and set the primary key to be X, we have values a and b duplicated - so duplicated_rows should contain rows of one of these groups - i.e. it should either be [0, 2, 3] or [1, 4] (and it is unspecified which one). It shouldn't be [0, 1, 2, 3, 4] - if we include duplicates from all groups, in some cases we could just list all rows of a table (if every row has some duplicate) and that would not be helpful.

@radeusgd radeusgd added p-lowest Should be completed at some point -libs Libraries: New libraries to be implemented l-db-write Libraries: database writer labels Apr 26, 2023
@jdunkerley jdunkerley added this to the Beta Release milestone May 2, 2023
@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board May 2, 2023
@radeusgd radeusgd moved this from 📤 Backlog to 🔧 Implementation in Issues Board May 5, 2023
@radeusgd
Copy link
Member Author

radeusgd commented May 5, 2023

After further thought and discsussion with @jdunkerley we decided to display the values associated with the primary key of the first row that has a clashing primary key. That is because it can be easily used to find related rows, is easy to display and is portable (row ids were not viable in Database as they are ill-defined there).

So for the table above, the result would be: clashing_primary_key=["a"] and the following message: The primary key [X] is not unique. For example, the key [a] corresponds to more than one row.

@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board May 8, 2023
@mergify mergify bot closed this as completed in #6604 May 9, 2023
mergify bot pushed a commit that referenced this issue May 9, 2023
…nto read/write, improve SQLite format detection (#6604)

Closes #6437
Related to #6410

- Add example duplicate row to `Non_Unique_Primary_Key`.
- Ensure `File.read` fails if the file does not exist, always.
- Ensure SQLite fails if file is empty or nonexistent or malformed.
- Split file format detection into read and write modes, so that the read mode can depend on actual file _contents_.
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board May 9, 2023
@enso-bot
Copy link

enso-bot bot commented May 9, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-05-08):

Progress: Improved Non_Unique_Primary_Key error. Improved SQL File Format detection and in general format detection. It should be finished by 2023-05-09.

Next Day: Next day I will be working on the #6543 task. Work on Date_Range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented l-db-write Libraries: database writer p-lowest Should be completed at some point
Projects
Archived in project
2 participants