Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable users to pass functions to readSource #222

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

SallyDa
Copy link

@SallyDa SallyDa commented Oct 18, 2024

Hi all,

It would be useful for users to be able to directly pass the wrapped functions as arguments to readSource. This can make it easier to debug, because users can then jump through the code by clicking on the functions. This pull request only extends functionality and is fully backwards compatible.

Example:
There is a bug in readworldsteel, if we call it using readSource("worldsteel"), then a) the person who found the bug must know the naming conventions used and b) this still doesn't enable them to easily locate where readworldsteel is defined.
With the updates in this pull request, we can instead call readSource(readworldsteel) and (IDE dependent) can jump from there into the readworldsteel function.

I'm new here, so not sure how pull requests are handled. Please let me know if I need to do anything else to get it accepted!

@tscheypidi
Copy link
Member

Hi @SallyDa,

thank your very much for your PR. I am always happy to receive suggested additions in this form.

Concerning your proposed changes I was wondering whether the same behavior cannot be already achieved via the config setting globalenv which allows to override existing functions with functions currently loaded in the global environment. In your example I thought one should get the intended behavior by running the following lines of code:

library(madrat)
readworldsteel <- function() ....
setConfig(globalenv=TRUE)
a <- readSource("worldsteel")

Does that lead to the behavior your are looking for or is there difference in this and what you would like to have?

@JakobBD
Copy link

JakobBD commented Oct 18, 2024

We think there is a difference, so we're afraid this does not solve it.

When we work through existing code from others and try to understand what's happening, we usually jump from one function to the next based on these function calls.

For example, trying to understand the mrremind function calcFeDemandIndustry, I stumble upon the line

stationary <- readSource("Stationary")

and since I don't know anything about it, I'd like to jump into its source code, read it's documentation, and so on.
Currently, I have to open it manually in order to do that, provided I know which package it's in.

With the present PR, the call in calcFeDemandIndustry.R would look like

stationary <- readSource(readStationary)

and I could click on the function passed as type to directly jump to it, or mouse-hover over it to get information on it.

For us, this would significantly ease understanding and debugging legacy code.
It seems like a small speed-up, but it's something we do all the time, so it really adds up.

We'd like to suggest a similar addition for calcOutput calls, for the same reasons.

@tscheypidi
Copy link
Member

After your comments and some discussions I had with @pascal-sauer I have now a better understanding of what you are trying to achieve. The bad news is that to my understanding the current implementation does not achieve the desired behavior. The main issue at the moment is that the function itself is not further processed in readSource but instead only the name of the function is used again (to be compatible to the rest what is happening in the wrapper). This opens the possibility that the function you are providing and the function which is later executed in the wrapper are two completely separate things. This could totally mislead the user during debugging.

I think there are possibilities to implement the desired feature, but it would require significantly more work and would need to be implemented analog to the globalenv implementation in order to ensure that also aspects like caching work properly in that context. We have for instance some function hash tables which are produced when the package is loaded. With the dynamic injection of functions these tables would need to be updated on the fly in order to still be correct. In addition, we have some function duplicate checks which would need to run as well when injecting code through this pipeline. And for readSource you would need to find a concept to properly address that you sometimes need to provide more than one function (e.g. read and convert) to work.

So in short: It is possible to achieve what you want to achieve but it would be considerably more work to get it compatible to the functionality of the madrat universe. My feeling is that it would be too much work relative to the benefits this interface would provide.

If you like we can meet with a cup of coffee to discuss that further.

@SallyDa
Copy link
Author

SallyDa commented Oct 22, 2024

Hi, nice to hear that you understand what we want to achieve! I would argue that we actually do achieve it with the current pr, but I'm fully aware that it's more of a hack than an actual solution, and that it relies completely on the conversion back from string to function. I'm OOO this week, but maybe let's chat over a coffee next week.

@SallyDa SallyDa marked this pull request as draft November 6, 2024 08:47
Copy link
Contributor

@pascal-sauer pascal-sauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't go through in too much detail, just some things I noticed :)

R/readSourceAlternative.R Outdated Show resolved Hide resolved
R/readSourceAlternative.R Outdated Show resolved Hide resolved
@SallyDa
Copy link
Author

SallyDa commented Nov 13, 2024

Hello,

I just want to record my thoughts on this here. I came up with a roadmap for some refactoring of madrat.

  1. Write alternative functions which take functions as arguments. Advantages: improving readability and traceability, making it easier to debug the code (as discussed in previous comments).
readSourceAlternative <- function(read, correct, convert, ...)  # with read, correct, convert as (optional) functions
  1. Make the original functions wrappers to the alternative versions, to avoid duplication. This enables old/existing users to continue using madrat in the same way they always did (backwards compatibility).
readSource <- function(type, ..., convert=True)  # with arguments unchanged, i.e. strings and bool
{ # get functions from arguments and call readSourceAlternative}
  1. Propose a class structure for users. Advantages: improved organisation (all functions related to the same dataset are in the same place, and at the same time we maintain the separation of the data preprocessing steps), decluttering of namespaces.
WorldSteelData <- setRefClass(
    methods = list(
        read <- function() {# similar to current readWorldSteel}
        convert <- function() {# similar to current convertWorldSteel}
        prepare_data <- function() {# run readSourceAlternative with read=read, etc}
    )
)

Note that we would also need alternative caching functions that work with functions as arguments, but that otherwise the caching would not need to be changed (as far as I understand).

This proposal brings improvements to usability, while maintaining (or at least having the option to maintain) backwards compatibility and current caches. Of course, it would be a significant amount of work to implement and test everything. I am happy to help with this, but only if you, @pascal-sauer, @tscheypidi, are also motivated to refactor madrat! I understand that it might not be a priority at the moment, and/or that you might have different ideas for improvements (e.g. outsourcing caching), but still I thought it would be helpful to write down my thoughts for your reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants