Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(disordered_tracing): Work with skan-0.12.2 #1052

Merged
merged 3 commits into from
Dec 16, 2024

Conversation

ns-rse
Copy link
Collaborator

@ns-rse ns-rse commented Dec 16, 2024

Closes #986

With the release of skan-0.12.2 we can remove the code which converts - in the variable names of Pandas DataFrames to instead use _ which allows the use of "dot-notation" when referring to the variables of data frames.

We therefore unpin the skan version in pyproject.toml, explicitly state the separator="_" argument to skan.summarize() and adapt the hard-coded variable names appropriately. As noted in the release notes the default will be _ from v0.13.0 and the flag allows backwards compatibility but since we have gone with the use of _ in variable names it will do no harm leaving these in place (although they could be removed when skan-v0.13.0 is released).

In addressing #986 I tested in a fresh virtual environment. This pulled in matplotlib-3.10.0 which was released 2024-12-14 (two days ago!). Our dependency on topoly doesn't work with this new version because of the removal of inset_location.InsetPosition in favour of inset_axes.

This has been reported to the Topoly developers, see ImportError: cannot import name 'InsetPosition' from
'mpl_toolkits.axes_grid1.inset_locator' · Issue #5 · ilbsm/topoly_tutorial
and hopefully won't take long to resolve. For now though we pin our dependency matplotlib~=3.9.4 and all tests pass.


Before submitting a Pull Request please check the following.

  • Existing tests pass.
  • Pre-commit checks pass.

Closes #986

With the release of `skan-0.12.2` we can remove the code which converts `-` in the variable names of Pandas DataFrames
to instead use `_` which allows the use of "dot-notation" when referring to the variables of data frames.

We therefore unpin the `skan` version in `pyproject.toml`, explicitly state the `separator="_"` argument to
`skan.summarize()` and adapt the hard-coded variable names appropriately. As noted in the [release
notes](https://github.com/jni/skan/releases/tag/v0.12.2) the default will be `_` from `v0.13.0` and the flag allows
backwards compatibility but since we have gone with the use of `_` in variable names it will do no harm leaving these in
place (although they could be removed when `skan-v0.13.0` is released).
In addressing #986 I tested in a fresh virtual environment. This pulled in
[matplotlib-3.10.0](https://github.com/matplotlib/matplotlib/releases/tag/v3.10.0) which was released 2024-12-14 (two
days ago!). Our dependency on [topoly](topoly.cent.uw.edu.pl/documentation.html) doesn't work with this new version
because of the removal of `inset_location.InsetPosition` in favour of `inset_axes`.

This has been reported to the Topoly developers, see [ImportError: cannot import name 'InsetPosition' from
'mpl_toolkits.axes_grid1.inset_locator' · Issue #5 ·
ilbsm/topoly_tutorial](ilbsm/topoly_tutorial#5) and hopefully won't take long to resolve. For
now though we pin our dependency `matplotlib~=3.9.4` and all tests pass.
@ns-rse ns-rse added the v2.3.0 label Dec 16, 2024
@ns-rse ns-rse added this to the v2.3.0 milestone Dec 16, 2024
@ns-rse ns-rse added DNATracing Issues pertaining to the DNATracing class v2.3.0 and removed v2.3.0 labels Dec 16, 2024
@MaxGamill-Sheffield
Copy link
Collaborator

Hey Neil, thanks for following up on this issue with the skan and topoly people!

I've not had time to test but does this then change any of the disordered_stats column names?
This might be an issue for the paper as we have a bunch of analysis scripts relying on certain names and it might be more of an overhead to change this in the v2.3.0 release along with all the analysis scripts than to cite the version once released.

@ns-rse
Copy link
Collaborator Author

ns-rse commented Dec 16, 2024

I've not had time to test but does this then change any of the disordered_stats column names?

Too Long; Did not Read

No, the PR doesn't include any changes to test files where changes in column names would have an impact, ergo nothing in the scripts needs changing because none of the column names have actually changed.

Long; Did Read

Skan originally used - in column names to the dataframe that skan.summarize() produces.

This means that you can't use the notation df.<colname> when referring to columns that are produced (its a useful convenience to be able to do so).

In discussion with the skan developer it was agreed to switch this out and I made a PR to skan to use _ and this was incorporated into release skan-0.12.0, but there were some issues with data types so we had to pin skan~=0.11.0. I helped upstream fix these and they are in skan-0.12.2 release that came out a few days ago.

When disordered_tracing() was being refactored and the results of skan.summarize() were used I mentioned that there was this pending change coming and that we should use _ rather than - in column names.

This PR removes the line that made that change to the returned data frame (see line itself and associated comment) and also explicitly calls skan.summarize() with the separator="_" so that the returned data frame uses _ in the column names and we don't need to manually make the switch. A few other changes were required where comparisons to keys or arguments are hard coded.

Thus the output doesn't actually change at all, which is why all the tests pass without any need to modify any of the test files.

It also threw up the mentioned problem with Matplotlib 3.10.0 causing problems with Topoly which was pure chance but I've addressed it as otherwise the tests in Continuous Integration would pull in the new version and throw errors.

Copy link
Collaborator

@MaxGamill-Sheffield MaxGamill-Sheffield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ace! In that case this looks clean and good to go, thanks Neil!

@ns-rse ns-rse added this pull request to the merge queue Dec 16, 2024
Merged via the queue into main with commit 5ec56ec Dec 16, 2024
11 checks passed
@ns-rse ns-rse deleted the ns-rse/986-update-skan-separator branch December 16, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DNATracing Issues pertaining to the DNATracing class v2.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use new skan.summarize(separator="_")
4 participants