Add functionality to ignore selfloops #49

tlarock · 2019-05-06T17:44:21Z

This PR adds the ability to ignore selfloops in path data to the Paths.add_path instance function and Paths.read_file class function. It also removes some redundant code from paths.py by changing read_file to use add_path with the expand_subpaths option.

I implemented this because I have a dataset that includes (apparently) meaningless self-loops that I want to be able to remove/include in my pipeline without preprocessing the data manually. The two main changes are:

Add remove_selfloops option to add_path with functionality that collapses consecutively repeated symbols into a single symbol. For example, the sequence ('a', 'a', 'b', 'b', 'c') will collapse to just ('a', 'b', 'c').
- Note that since a sanity check in add_path is already looping through the elements of the path to ensure the separator character is safe to use, I only add comparisons and list appends to the loop so the change should not meaningfully impact computational complexity.
Change Paths.read_file to call cls.add_path rather than reproducing the same functionality. Moves expand_subpaths functionality to add_path rather than calling it on the whole object at the end.

In working with this code, I couldn't understand why Paths.read_edges is static and Paths.read_file is class. I think it makes sense for them to be consistent and I think they should both be static, since they both populate and return a new Paths object. I can change the decorator in this PR if desired.

Let me know if you have questions/comments/changes!

Edit: Meant to include a note that this is completely separate from my other open PR (#47).

…hs. Updated Paths.read_file to use add_path.

tlarock added 6 commits May 6, 2019 11:44

Updated Paths.add_path() to allow removal of selfloops from input pat…

48eaa13

…hs. Updated Paths.read_file to use add_path.

Added option to read_file for remove_selfloops.

3916457

Changed read_file with frequency to also use add_path.

f5384e9

Removed duplicate line left in by accident.

8297340

Minor typo fix in docstring.

4d1d75e

Merge branch 'master' into remove_selfloops

8a42554

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to ignore selfloops #49

Add functionality to ignore selfloops #49

tlarock commented May 6, 2019 •

edited

Loading

Add functionality to ignore selfloops #49

Are you sure you want to change the base?

Add functionality to ignore selfloops #49

Conversation

tlarock commented May 6, 2019 • edited Loading

tlarock commented May 6, 2019 •

edited

Loading