-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor getDbCohortMethodData() #134
Comments
I've added the following functions in the DataLoadingSaving.R:
The way Instead of all that, I've opted to setup an environment at the start of |
I must admit I'm not sure this is really an improvement. Functions are normally a good way to reduce complexity because the only way a function can interact with your environment is through its arguments and return value. For example, I think this still works really well with your But it breaks down for example in the Instead, the
and return a list with two items:
|
I agree, an object like
Cool! I will update the functionality to use
I have actually done exactly that, but never pushed that variant. Mostly for consistency's sake as to how the sub-functions are implemented. I'll change |
I ran in to the issue where If I specify a dummy version of
ignoring the default parameters, which is similar to this issue. In this issue the following is the accepted answer:
Which gets all the variables in the current environment (nth frame of sys.frame) through Which will result into a named list of all parameters with either the passed, or default values, which can then be passed to So my question is, what do you prefer:
|
I think I actually prefer option 4:
Let me try to explain why:
The only thing I don't like is that we have code above the parameter check block. This should ideally not happen. I think we can achieve that by no longer allowing |
I've pushed the following updates:
to:
|
Sounds good! Could you create a PR? That makes it easier to view all the changes. |
Hi @schuemie as per this comment:
I implemented a version of I split the functionality in two classes:
Both classes are called within the new I also benchmarked the new implementation and compared it with the old one to see if the speed and memory usage are at least similar with BenchmarkDummy Setup
MicroBenchmark
There seems to be no significant speed difference in either implementation. The maximum and minimum time it took to execute are however slightly shorter.
Profvis
Memory usage seems to be nearly identical as well.
|
Thanks! Could you help me understand your overall design here? You have
The Any reason for having the two R6 objects? Code like this makes me wonder if that is adding anything. (The new solution also seems to have more than doubled the lines of code, making it hard to argue it has simplified things ;-) ) |
Yes, so my goal for this implementation was to remain compatible with the current implementation of CohortMethod. This means that the input and output of the implementation should remain the same, so functionality that depends on it doesn't fail.
Initially I created one massive R6 class. While doing that I then realized that half the parameters:
Are exclusively used for interfacing with the database, thus splitting the class in two. I do agree that this implementation is redundant. I initially only wanted to call methods from As to simplifying things; I guess it depends how you define simplified. I agree that the code is scattered around more across the two classes, however the trade off is that the cyclomatic complexity of each function drastically decreases. Current implementation
R6 implementationCohortMethodData
CohortDbInterface
|
The
getDbCohortMethodData()
is quite long and hard to read. It could use some refactoring. Some ideas:Separate code for fetching all the data, and constructing the attrition table (e.g. using a private function call).
This code looks like it could be written to be shorter and clearer. All it is trying to do is make sure counts are 0 when a cohort is empty.
In case the study calendar time is restricted, make sure there's a top row in the attrition object that just counts the cohorts prior to any restriction (in the current code the top row already restricts to calendar time)
The text was updated successfully, but these errors were encountered: