Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split cstwMPC into general tools and example code #449

Closed
sbenthall opened this issue Dec 9, 2019 · 22 comments · Fixed by #666
Closed

split cstwMPC into general tools and example code #449

sbenthall opened this issue Dec 9, 2019 · 22 comments · Fixed by #666
Assignees
Milestone

Comments

@sbenthall
Copy link
Contributor

#415 (comment)
From @llorracc

I agree that there are components to cstwMPC that could be made into more general purpose tools to be incorporated in the main toolkit. I'm working on some other things right now but let's get back to a discussion of this soon.

This issue is for this cstwMPC related task.

@sbenthall
Copy link
Contributor Author

The cstwMPC code is now in an awkward position.

They have been slated for removal for almost 10 months, but blocking on @llorracc making some kinds of changes that are not otherwise ticketed up.

The problem is that as this is now unsupported code with no automated tests, it may not work as the HARK library is updated.

This code should be moved into a private repository and worked into a reusable HARK contribution,

@llorracc
Copy link
Collaborator

llorracc commented Apr 8, 2020

Actually, it should become a REMARK. Just haven't had time to figure out what bits might be worth importing into HARK as functions, and had a nagging feeling that some of the DemARKs or REMARKs might rely on certain things being in the cstwMPC directory structure. However, to the extent that that was true, it may have just been parameters and the data for the Survey of Consumer Finances; the former have now been reorganized, but I don't think we've designated a standard location for the SCF data (or any other data that we want to provide for general-purpose use). Probably we should do that, then change the [Dem]/[REM]ARKs that use the SCF data, then we can remove cstwMPC from main HARK.

@MridulS
Copy link
Member

MridulS commented Apr 8, 2020 via email

@llorracc
Copy link
Collaborator

llorracc commented Apr 8, 2020

These data are likely to be used by future REMARKs and DemARKs and projects that people develop with the toolkit. They are kind of like the utility tools in HARK.utilities in that sense -- provide a standard benchmark. We should probably give some thought about how to store a small set of standardized datasets like this. Am sure there is lots of "best practice" out there to take advantage of.

@sbenthall
Copy link
Contributor Author

How about I put it as-is into a branch of the REMARK repository, then remove it from HARK.

Then it can be refined into a proper REMARK and committed to REMARK master.

Then later, when you get to it, you can figure out what parts of it can be contributed reusably into HARK.

@llorracc
Copy link
Collaborator

llorracc commented Apr 8, 2020

That's fine -- except that before doing that we need to figure out how any existing REMARKs/DemARKs that use stuff from it should be fixed so they won't break when we move it.

@sbenthall
Copy link
Contributor Author

Well, the REMARKs should be pegged to a fixed version of HARK.
I believe the current procedure for the DemARKs is to make the change and then see if the tests fail.

@MridulS
Copy link
Member

MridulS commented Apr 8, 2020 via email

@llorracc
Copy link
Collaborator

llorracc commented Apr 8, 2020

Eventually, yes, a "completed" remark should be pegged. The only REMARK about which we can say for sure that it is "completed" is the Portfolio blog post for TFI -- at least, it will be completed in the sense that we should have a frozen archival version. However, that gets us back to the other issue we've been round and round about: Versioning for REMARKs and DemARKs, because the Porftolio blog post uses the CGMPortfolio REMARK.

@sbenthall
Copy link
Contributor Author

We can create a datasets/ folder in HARK to store this data, this is used to many other projects to distribute data (scikit-learn comes to mind).

This is a great idea. +1 to it from me.

This is what it looks like in scikit-learn:
https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/datasets

@llorracc
Copy link
Collaborator

llorracc commented Apr 8, 2020

Sounds good to me.

But not sure we want to standardize on csv -- that's a very limited data structure. SQL would be overkill. Does pandas have a standardized file structure? Maybe we should standardize on that.

@MridulS
Copy link
Member

MridulS commented Apr 8, 2020 via email

@sbenthall
Copy link
Contributor Author

If CSV is good enough for sci-kit learn, I think it's good enough for us.

@sbenthall
Copy link
Contributor Author

Regarding software versions and releases... I don't know how I can make my views on this any clearer, but I'll try again.

If every time there's a change to HARK, it's necessary to check every downstream use of it in some other library, that quickly makes it impossible to make any changes to HARK.

I think that if there's an issue with REMARK's dependency management and contribution process, we should discuss it and fix it as part of our determination of what REMARKs are.

But these processes need to be decoupled if HARK is going to go anywhere.

@llorracc
Copy link
Collaborator

llorracc commented Apr 8, 2020

Your views are clear, and I am convinced they are right. My point is basically the same as yours: We have not achieved a comparable degree of clarity or discipline about version control and management for REMARKs and DemARKs, and we need to do so.

@sbenthall
Copy link
Contributor Author

Ok, here is an issue for discussion the REMARK release and dependency locking process.

econ-ark/REMARK#56

@sbenthall
Copy link
Contributor Author

It looks like cstwMPC is already a REMARK, via a submodule link, to a private llorrocc repository:

https://github.com/llorracc/cstwMPC

Is the version in the HARK/ library different from this REMARK version significantly?

@sbenthall
Copy link
Contributor Author

had a nagging feeling that some of the DemARKs or REMARKs might rely on certain things being in the cstwMPC directory structure.

One possibility we haven't considered yet in this thread is for whatever DEMARKs (or REMARKs?) that depend on this get the data and code from the cstwMPC REMARK itself.

@sbenthall
Copy link
Contributor Author

The cstwMPC code in HARK is currently breaking, because it is not getting updated, because it is deprecated.

These are the offending lines:
https://github.com/econ-ark/HARK/blob/master/HARK/cstwMPC/cstwMPC.py#L15

This means some downstream DemARKS that use this code are broken as well.
See econ-ark/DemARK#115

I could make a PR to fix these before the 0.10.6 release, but it will take a little time. I have other obligations today.

We could also formally deprecated this code for now and delay the next release of the DemARKs until this issue with cstwMPC migration is resolved.

This latter option would be my preference.

  • The COVID paper takes priority
  • COVID needs a HARK release
  • The problem of the legacy cstwMPC dependency hasn't been solved for 10 months and it shouldn't block HARK releases
  • If DemARK progress is decoupled from HARK, then DemARK can stay pegged to HARK 0.10.5 until this is resolved.

@llorracc
Copy link
Collaborator

Agreed. But fixing the DemARKs is top priority after the release.

@sbenthall
Copy link
Contributor Author

Ok. Fixing the DemARKs will require either:

  • patching HARK.cstwMPC to work with the changed HARK libraries ( [WIP] updates to cstwMPC for 0.10.6 #626 ), or
  • removing cstwMPC from HARK and having the DemARKs get the necessary cstwMPC components in another way.

In my view, the latter is more preferable. Either way, it is going to depend on some critical input/action from Chris.

@llorracc
Copy link
Collaborator

I agree, the latter is preferable.

The chief thing that needs to be done, I think, is to define a place in HARK for the SCF dataset that these tools all use.

Many toolkits have a few 'sample' datasets that are distributed with the toolkit, and for which there are easy-to-use commands to obtain the data because the toolkit "knows" where the data should be. That's the preferred endpoint. We don't need to get all the way to that endpoint right now, but it might be useful to know what the endpoint is as we decide how to do this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants