Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop converting characters to factors in pkAnalysesAsDataFrame()? #673

Closed
IndrajeetPatil opened this issue Dec 4, 2021 · 6 comments
Closed

Comments

@IndrajeetPatil
Copy link
Member

Parameters, units, etc. are not categorical data, and treating them as such makes it difficult to work with them further.

If you agree, the calls to readr::col_factor() here can be replaced with readr::col_character():

colTypes <- list(
IndividualId = readr::col_integer(),
QuantityPath = readr::col_factor(),
Parameter = readr::col_factor(),
Value = readr::col_double(),
Unit = readr::col_factor()
)

The only downside to this is that character/string columns will take up more space than factors, which are integers in disguise, so the returned dataframes will be slightly heavier.

@msevestre
Copy link
Member

Is this a breaking change?

@msevestre
Copy link
Member

I agree with this change. The only reason we have it as factor is because....euh....wait... no idea

@IndrajeetPatil
Copy link
Member Author

Is this a breaking change?

I have no clue.

@PavelBal Do you have any sense about this? Do you find yourself using this function?

@PavelBal
Copy link
Member

All pk analyses workflow is not straightforward to me, so go ahead.

IndrajeetPatil added a commit that referenced this issue Dec 13, 2021
@msevestre
Copy link
Member

All pk analyses workflow is not straightforward to me

What does this mean? If something is not clear, it should be mentioned and discussed don't you think?

As far as breaking change is concerned, my question is simple, independently from how this is being used:
If someone was relying on the dataframe from v10 with factors to do some stuff, and in v11 it is not factor anymore, will it break the code or not? If yes, we need to mark this as a breaking change in our release note

@IndrajeetPatil Can you create a label: Breaking change so that we can tag those changes that we are adding in v11 (for instance I would not export all the validatexxx again)

@IndrajeetPatil
Copy link
Member Author

If someone was relying on the dataframe from v10 with factors to do some stuff, and in v11 it is not factor anymore, will it break the code or not?

This is hard to predict because of the coercion rules, but since, in most contexts, R will convert factor to character and vice versa, I don't think it will be a breaking change. But I will make a note of it as such, just to be safe.

x <- factor(c("z", "y", "x"), levels = c("x", "y", "z"))
as.character(x)
#> [1] "z" "y" "x"

y <- c("z", "y", "x")
as.factor(y)
#> [1] z y x
#> Levels: x y z

Created on 2021-12-15 by the reprex package (v2.0.1)

Can you create a label: Breaking change

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants