Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizing point base features, a few other fixes #318

Merged
merged 11 commits into from
Aug 20, 2023
Merged

Conversation

bpben
Copy link
Collaborator

@bpben bpben commented Dec 30, 2021

Main thing trying to do here - enable a config to specify that there are multiple point-based features in one file (rather than repeating the information again for each feature).

There are also some documentation updates here, part of ongoing process

Also - noticed a major issue with how crashes were being joined to segments related to segment_ids being mixed types. Should now be fixed.

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2021

Codecov Report

Merging #318 (ca49f8f) into master (bf9267e) will increase coverage by 0.40%.
The diff coverage is 88.04%.

@@            Coverage Diff             @@
##           master     #318      +/-   ##
==========================================
+ Coverage   64.02%   64.42%   +0.40%     
==========================================
  Files          33       33              
  Lines        3263     3314      +51     
==========================================
+ Hits         2089     2135      +46     
- Misses       1174     1179       +5     

Copy link
Collaborator

@j-t-t j-t-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boston, philly, dc and chicago configs use point based features. Do these configs need to be updated, or is the formatting still the same?

Did you use this to add any features to any city?

Can you add a test for create_segment that shows adding multiple features in one file?

Also looks like something's going on with the pinned mac version

src/data/config.py Outdated Show resolved Hide resolved
},
'feat_agg': 'latest',
'value': 87679
'feature': 'aggtest',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation here also looks weird

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, this one seems fine to me? Indent within brackets?

@bpben
Copy link
Collaborator Author

bpben commented Jan 2, 2022

Thanks for the comments, needed actually do some more edits, handling categorical variables a bit differently, requiring less preprocessing. Added docs.

Also - this should still work fine for old configs, it will just process them differently if they have multiple features per additional source. README has deets.

@bpben
Copy link
Collaborator Author

bpben commented Jan 2, 2022

@j-t-t also, do you think the test in create_segments is necessary? I have the test for multi feature in standardize_points:

def test_read_file_info_multi(tmpdir):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants