-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engine_options
vs read_options
in read_excel
is confusing
#15596
Comments
It used to be that the only engine was xlsx2csv and so there was a parameter to pass options to xlsx2csv and then there was another parameter to pass options to pl.read_csv and the parameter names reflected that. The parameters got renamed since there are more options so they're less clear. For the newer engines that have a workbook class and a load_sheet method, the workbook class gets the engine_options and the load_sheet method gets the read_options. For the old xlsx2csv, its engine_options go to xlsx2csv (including which sheet) and then read_options goes to pl.read_csv so unfortunately it can't be delineated as workbook and worksheet options. It seems like it might just need to be options1 and options2 with a description of what that means per engine since they're not consistent enough to have a more generic meaning. To be clear, I'm not arguing to do that, I'm just lamenting the lack of good options. |
In this, case I think the clearest would be to have separate input argument names for each engine, which only apply when that engine is selected, e.g.: pl.read_excel(file, engine="calamine", calamine_options = {...})
pl.read_excel(file, engine="xlsx2", xlsx2_options = {...}) We could then tailor each to the appropriate engine, and the user does not have to mentally separate engine and read options. |
For that to work, polars has to take the user inputs in calamine_options and manually separate them into the ones that go to |
That isn't really any more ergonomic. Imagine if we took the same approach to "execute_options" in Also, I'm not sure if you were looking at older code, but the "engine_options" parameter docstring does actually tell you exactly what it's going to pass those options to:
However, "read_options" needs some updates to achieve the same level of clarity, and linking directly to the underlying engine's docs also sounds like a good idea 👌 |
Description
It is not clear from reading the docs what the functional difference between
engine_options
andread_options
is:engine_options
is declared as "additional options passed to the underlying engine’s primary parsing constructor".read_options
is "Extra options passed to the function that reads the sheet data".Is the function that reads the sheet data not the same as the engine that reads the sheet data? If not, what does the engine do, how does one find the underlying function, and where does one find the API for each of those?
Additionally, I think that we should directly link to the APIs of the underlying engines so that the user can know explicitly what can be passed. See #15486 where this caused confusion elsewhere. There are also many different
fastexcel
repositories.The text was updated successfully, but these errors were encountered: