Enable users to pass functions to readSource #222

SallyDa · 2024-10-18T08:30:09Z

Hi all,

It would be useful for users to be able to directly pass the wrapped functions as arguments to readSource. This can make it easier to debug, because users can then jump through the code by clicking on the functions. This pull request only extends functionality and is fully backwards compatible.

Example:
There is a bug in readworldsteel, if we call it using readSource("worldsteel"), then a) the person who found the bug must know the naming conventions used and b) this still doesn't enable them to easily locate where readworldsteel is defined.
With the updates in this pull request, we can instead call readSource(readworldsteel) and (IDE dependent) can jump from there into the readworldsteel function.

I'm new here, so not sure how pull requests are handled. Please let me know if I need to do anything else to get it accepted!

tscheypidi · 2024-10-18T09:59:15Z

Hi @SallyDa,

thank your very much for your PR. I am always happy to receive suggested additions in this form.

Concerning your proposed changes I was wondering whether the same behavior cannot be already achieved via the config setting globalenv which allows to override existing functions with functions currently loaded in the global environment. In your example I thought one should get the intended behavior by running the following lines of code:

library(madrat)
readworldsteel <- function() ....
setConfig(globalenv=TRUE)
a <- readSource("worldsteel")

Does that lead to the behavior your are looking for or is there difference in this and what you would like to have?

JakobBD · 2024-10-18T10:43:32Z

We think there is a difference, so we're afraid this does not solve it.

When we work through existing code from others and try to understand what's happening, we usually jump from one function to the next based on these function calls.

For example, trying to understand the mrremind function calcFeDemandIndustry, I stumble upon the line

stationary <- readSource("Stationary")

and since I don't know anything about it, I'd like to jump into its source code, read it's documentation, and so on.
Currently, I have to open it manually in order to do that, provided I know which package it's in.

With the present PR, the call in calcFeDemandIndustry.R would look like

stationary <- readSource(readStationary)

and I could click on the function passed as type to directly jump to it, or mouse-hover over it to get information on it.

For us, this would significantly ease understanding and debugging legacy code.
It seems like a small speed-up, but it's something we do all the time, so it really adds up.

We'd like to suggest a similar addition for calcOutput calls, for the same reasons.

tscheypidi · 2024-10-21T09:33:25Z

After your comments and some discussions I had with @pascal-sauer I have now a better understanding of what you are trying to achieve. The bad news is that to my understanding the current implementation does not achieve the desired behavior. The main issue at the moment is that the function itself is not further processed in readSource but instead only the name of the function is used again (to be compatible to the rest what is happening in the wrapper). This opens the possibility that the function you are providing and the function which is later executed in the wrapper are two completely separate things. This could totally mislead the user during debugging.

I think there are possibilities to implement the desired feature, but it would require significantly more work and would need to be implemented analog to the globalenv implementation in order to ensure that also aspects like caching work properly in that context. We have for instance some function hash tables which are produced when the package is loaded. With the dynamic injection of functions these tables would need to be updated on the fly in order to still be correct. In addition, we have some function duplicate checks which would need to run as well when injecting code through this pipeline. And for readSource you would need to find a concept to properly address that you sometimes need to provide more than one function (e.g. read and convert) to work.

So in short: It is possible to achieve what you want to achieve but it would be considerably more work to get it compatible to the functionality of the madrat universe. My feeling is that it would be too much work relative to the benefits this interface would provide.

If you like we can meet with a cup of coffee to discuss that further.

SallyDa · 2024-10-22T09:29:31Z

Hi, nice to hear that you understand what we want to achieve! I would argue that we actually do achieve it with the current pr, but I'm fully aware that it's more of a hack than an actual solution, and that it relies completely on the conversion back from string to function. I'm OOO this week, but maybe let's chat over a coffee next week.

This reverts commit 7ff37be.

pascal-sauer

Didn't go through in too much detail, just some things I noticed :)

R/readSourceAlternative.R

Co-authored-by: Pascal Sauer <[email protected]>

SallyDa · 2024-11-13T17:11:58Z

Hello,

I just want to record my thoughts on this here. I came up with a roadmap for some refactoring of madrat.

Write alternative functions which take functions as arguments. Advantages: improving readability and traceability, making it easier to debug the code (as discussed in previous comments).

readSourceAlternative <- function(read, correct, convert, ...)  # with read, correct, convert as (optional) functions

Make the original functions wrappers to the alternative versions, to avoid duplication. This enables old/existing users to continue using madrat in the same way they always did (backwards compatibility).

readSource <- function(type, ..., convert=True)  # with arguments unchanged, i.e. strings and bool
{ # get functions from arguments and call readSourceAlternative}

Propose a class structure for users. Advantages: improved organisation (all functions related to the same dataset are in the same place, and at the same time we maintain the separation of the data preprocessing steps), decluttering of namespaces.

WorldSteelData <- setRefClass(
    methods = list(
        read <- function() {# similar to current readWorldSteel}
        convert <- function() {# similar to current convertWorldSteel}
        prepare_data <- function() {# run readSourceAlternative with read=read, etc}
    )
)

Note that we would also need alternative caching functions that work with functions as arguments, but that otherwise the caching would not need to be changed (as far as I understand).

This proposal brings improvements to usability, while maintaining (or at least having the option to maintain) backwards compatibility and current caches. Of course, it would be a significant amount of work to implement and test everything. I am happy to help with this, but only if you, @pascal-sauer, @tscheypidi, are also motivated to refactor madrat! I understand that it might not be a priority at the moment, and/or that you might have different ideas for improvements (e.g. outsourcing caching), but still I thought it would be helpful to write down my thoughts for your reference.

Enable users to pass functions to readSource

7ff37be

Revert "Enable users to pass functions to readSource"

f89255b

This reverts commit 7ff37be.

SallyDa marked this pull request as draft November 6, 2024 08:47

Alternative readSource, takes functions as args

a7b939c

SallyDa force-pushed the readsources branch from 4e263c8 to a7b939c Compare November 7, 2024 14:26

pascal-sauer reviewed Nov 7, 2024

View reviewed changes

R/readSourceAlternative.R Outdated Show resolved Hide resolved

R/readSourceAlternative.R Outdated Show resolved Hide resolved

Apply first suggestions from code review

9d8880b

Co-authored-by: Pascal Sauer <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable users to pass functions to readSource #222

Enable users to pass functions to readSource #222

SallyDa commented Oct 18, 2024

tscheypidi commented Oct 18, 2024

JakobBD commented Oct 18, 2024

tscheypidi commented Oct 21, 2024

SallyDa commented Oct 22, 2024

pascal-sauer left a comment

SallyDa commented Nov 13, 2024

Enable users to pass functions to readSource #222

Are you sure you want to change the base?

Enable users to pass functions to readSource #222

Conversation

SallyDa commented Oct 18, 2024

tscheypidi commented Oct 18, 2024

JakobBD commented Oct 18, 2024

tscheypidi commented Oct 21, 2024

SallyDa commented Oct 22, 2024

pascal-sauer left a comment

Choose a reason for hiding this comment

SallyDa commented Nov 13, 2024