Add functionality to ignore selfloops #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the ability to ignore selfloops in path data to the
Paths.add_path
instance function andPaths.read_file
class function. It also removes some redundant code frompaths.py
by changingread_file
to useadd_path
with theexpand_subpaths
option.I implemented this because I have a dataset that includes (apparently) meaningless self-loops that I want to be able to remove/include in my pipeline without preprocessing the data manually. The two main changes are:
remove_selfloops
option toadd_path
with functionality that collapses consecutively repeated symbols into a single symbol. For example, the sequence ('a', 'a', 'b', 'b', 'c') will collapse to just ('a', 'b', 'c').add_path
is already looping through the elements of the path to ensure the separator character is safe to use, I only add comparisons and list appends to the loop so the change should not meaningfully impact computational complexity.Paths.read_file
to callcls.add_path
rather than reproducing the same functionality. Movesexpand_subpaths
functionality toadd_path
rather than calling it on the whole object at the end.In working with this code, I couldn't understand why
Paths.read_edges
is static andPaths.read_file
is class. I think it makes sense for them to be consistent and I think they should both be static, since they both populate and return a newPaths
object. I can change the decorator in this PR if desired.Let me know if you have questions/comments/changes!
Edit: Meant to include a note that this is completely separate from my other open PR (#47).