-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster import 2 #171
Faster import 2 #171
Conversation
No significant time gain on my test dataset, but might. https://transport.data.gouv.fr/datasets/tisseo-offre-de-transport-gtfs
Looks like this function consumes 5 seconds out of 13 for my control dataset "toulouse". This memoization does not make us gain these 5 seconds, but could nevertheless help
Step result : for a rather large dataset (Bretagne), we're getting 40 seconds of processing compared to the 1minute 04 seconds of the master branch. The last optimisation attempt, db.transaction() does not appear to speed things up, but is cleaner IMHO. Note : some tests are failing. One about shapes (I didn't look into it) and one more worrysome about get-stops. |
@brendannee I can't find why this test fails : Could you look into it ? |
Result for my whole GTFS collection so far (one part of France, let's say 1/3rd).
Final db size : 9,4 Go / 11Go. |
Wait @brendannee some tests did not pass 😅 |
Thanks so much for this big improvement. I released a new version: https://github.com/BlinkTagInc/node-gtfs/releases/tag/4.15.0 (with a few other reorganizations). Let me know what you think and if you have other ideas for improvements. |
I fixed the issue with the tests that did not pass - so it should be good to go. |
Closes #170 to drop the first technique of using the CLI sqlite3 .import option, as it results in check errors and the IO to write the CSV is too slow.
In this PR, we're keeping the in-js sqlite3 import but we're keeping other optimizations developed in the first PR.