Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/replace csv parser by papa parse #14

Merged
merged 11 commits into from
May 15, 2019

Conversation

lvancraen
Copy link
Contributor

Feature

gtfsNodeLib nows uses Papa Parse to parse through the CSV instead of the internal CSV parser.

@lvancraen lvancraen requested review from gcamp and iliasbhal May 2, 2019 14:04
@lvancraen
Copy link
Contributor Author

lvancraen commented May 2, 2019

Processing the CSV as is currently done is taking considerably more time than with the old method. I'm going to see what I can do to improve the efficiency of the parsing.

I would wait to review this until I have found a better solution.

@lvancraen lvancraen changed the title Feat/replace csv parser by papa parse (WAIT FOR REVIEW) Feat/replace csv parser by papa parse May 2, 2019
@lvancraen lvancraen changed the title (WAIT FOR REVIEW) Feat/replace csv parser by papa parse Feat/replace csv parser by papa parse May 2, 2019
@lvancraen
Copy link
Contributor Author

lvancraen commented May 2, 2019

I have updated the code to run MUCH MUCH faster. Instead of processing each row individually within Papa parse, I parse the entire CSV then process each row. This increased the speed by a factor of 10. I tested on WINDON, and previously it was taking ~550secs to run CLE. Now, it only takes ~45secs.

You can now review 🙏

Copy link
Member

@gcamp gcamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatively small thing but necessary. Everything else looks good! 🙌

}

row = row.trim();
const trimmedRow = row.map(value => value.trim());
const gtfsRow = new GtfsRow(trimmedRow);
Copy link
Member

@gcamp gcamp May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're recreating the GtfsRow for each row, which defeats the memory saving purpose (creating the class n time + object n time, instead of class 1 time + object n time). The keys will be copied the same way than before the optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I will update this

@lvancraen lvancraen merged commit f576813 into master May 15, 2019
@lvancraen lvancraen deleted the feat/replace_csv_parser_by_papa_parse branch May 15, 2019 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants