-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disambiguating and filling in headers similar to Pandas #1436
Comments
Hi, thanks for the suggestion :) We had a lot of feature creep with ManuallyTo do this manually, you could get the titles and data with code similar to (where header = spreadsheet.get("1:1")[0]
header_unique = rename_unique(header)
cells = spreadsheet.get("6:9")
some_records = utils.to_records(header, cells) ...as described in the v5 to v6 migration guide. See documentation for As part of
|
allow_underscores_in_numeric_literals=False, | |
empty2zero=False, | |
) -> List[Dict[str, Union[int, float, str]]]: |
allow_underscores_in_numeric_literals=False,
empty2zero=False,
+ rename_identical_headers=False,
) -> List[Dict[str, Union[int, float, str]]]:
and then the implementation somewhere like here
Lines 567 to 568 in 8d55a6b
keys = entire_sheet[head - 1] | |
values = entire_sheet[head:] |
keys = entire_sheet[head - 1]
values = entire_sheet[head:]
+
+ if rename_identical_headers is True:
+ keys = utils.rename_unique(keys)
Opinion
Personally, I think this change is "simple enough" that I would be happy for it to be added.
I will ask @lavigne958's opinion too, on the added complexity.
Implementation
I will not implement this. As you say
Happy to potentially contribute.
I am happy to help you contribute if you desire this feature :)
Let us wait for @lavigne958's opinion and then you can read the contributing guide and we can help if you are interested in implementing this feature
good day :)
Hi @imrehg thank you for this insight. I really like the idea. It could solve our problem and offer the users some flexibility. When I think about it, the major problem is:
Then from your proposal and the background of issues we faced, we don't actually need the user to change the data nor skip some columns of data too. We need to actually create a unique name if it's not the case. What I propose as a flexible solution to this is:
In the end a user can choose to have As mentioned by @alifeee if you need any help contributing we are happy to help 🙃 I agree as well it will be useful to put it all in a single function in utile so it could benefit everyone. |
Hi @imrehg let us know if you want us to make this feature or if you'd rather contribute, if so we can guide you too. |
Hey @lavigne958 @alifeee, sorry that I've missed the original repies. Yes, I'm still happy to contribute. Your messages have quite a bit of details to start off on, and I'll go through that first, and look at making the changes in those spirit. |
The most useful link for you will be: contributing guide 😁 |
Hi @imrehg , just wondering if you are still interested in making the changes ? |
Hey @lavigne958 definitely still interested, just life got in the way a bit, apologies! 😰 Gonna be looking at this soon, unless I'm bottlenecking you. In that case just let me know! |
No problem we have other issues to solve for now, so far so good thanks 👍 |
Is your feature request related to a problem? Please describe.
In the real world it is quite common to have spreadsheets which have non-unique column headers (often the case when the sheet is intended to be used by people rather than code). There are also cases when column headers might be missing, even if there are available data in the columns.
Currently these wouldn't work when running
get_all_records
(e.g before passing data into a Pandas DataFrame), as the tool expects unique headers in general, and expects unique headers even when the expectations are passed in.Describe the solution you'd like
The Pandas'
read_excel
tool handles these cases, and does a reasonable job with it.Duplicate
at the first encounter, on subsequent ones it isDuplicate.1
,Duplicate.2
...Unnamed: 0
,Unnamed: 1
, ... so there's always a column name that can be used (and likely renamed)Something similar for the two types of disambiguation could be useful.
Describe alternatives you've considered
I've considered using manually rolled
worksheet.col_values(...)
to get the individual columns and manipulate them as needed, rather than usingget_all_records
, though this would be a lot of manual shuffling.Additional context
Happy to potentially contribute.
The text was updated successfully, but these errors were encountered: