-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CmdStan read and logic #1565
Conversation
I thought that pandas reader was written in C to be fast, do you have some advise on which cases make having a custom worth it? |
I will give you example later today. |
Run duration was approx 40 minutes |
Codecov Report
@@ Coverage Diff @@
## main #1565 +/- ##
==========================================
- Coverage 90.28% 90.25% -0.04%
==========================================
Files 105 105
Lines 11405 11419 +14
==========================================
+ Hits 10297 10306 +9
- Misses 1108 1113 +5
Continue to review full report at Codecov.
|
Results (in seconds)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Difference against pandas (x - pandas) (in seconds)\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
I don't know, maybe go with the manual handling? All csv files should be "good" (we don't need to consider ill-formed ones) |
d6b93a5
to
568a296
Compare
@OriolAbril are you happy with the changes? Code should now a bit more clear than previously. |
Looks good, thanks! |
* rewrite cmdstan logic * clean sample_stats * fix * Handle empty lines * use numbers * fix typo * temporarily downgrade pylint * downgrade astroid * fix * fix errors * change dict kw order * fix handling * update test * combine pandas and manual * remove requirement restrictions * use numpy gentext for fileloading * remove pandas import * fix typo * change test * update csv reader * clean file * add info to changelog
Description
Read CmdStan csv files manually. This enables us to parse large models (100k parameters) much faster than pandas.
This PR also adds
dtypes
argument, which user can use to transform dtypes for specific parametersChecklist