Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDEP-17: Backwards compatibility and deprecation policy #59125

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Aloqeely
Copy link
Member

cc @pandas-dev/pandas-core @pandas-dev/pandas-triage

An implementation of the PandasDeprecationWarning variable mentioned here can be seen in #58169

@Aloqeely Aloqeely requested a review from datapythonista as a code owner June 27, 2024 19:21
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this @Aloqeely . Interested to see how this conversation goes - this is an important topic!


## General policy

This policy applies to the public API. Anything not part of the public API or is marked as "Experimental" may be changed or removed at anytime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some random thoughts:

  1. We should state here what the public API is - I assume just anything in the pd. namespace. We've talked about that before but not sure we've ever stated officially
  2. We should probably also clarify what "experimental" means in this PDEP. A recent example has been the nullable integer types. Those came out in January 2019 as part of the 0.24 release but are still labeled experimental

For the sake of proposing something on point 2 I'd say we should drop the experimental label when something survives a full major release cycle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume just anything in the pd. namespace.

I understand what you're going for, but core is in the pd. namespace. I think we'll have to use "documented as public", or explicitly list it out in this PDEP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this page in the docs that defines the public API: https://pandas.pydata.org/docs/reference/index.html, going to hyperlink to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we should drop the experimental label when something survives a full major release cycle

Should we really enforce something like this? I feel like the duration depends on the type of functionality and I can see some cases where we'd want to keep something as experimental for a bit longer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would starting with 2 full release cycles in the proposal mitigate that concern? I don't have a strong point of view on what that duration is; I just think its important to have something that we all agree upon so things don't stay "experimental" forever

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds more reasonable but do we really mind if things stay experimental for a long period? I personally don't think we should have rules on anything experimental as that sort of defeats the point, but will leave it up for discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd leave some room for ambiguity here.

Maybe we should just say that we should review experimental features every major release to decide whether they should be promoted to a regular feature.

@Aloqeely Aloqeely mentioned this pull request Jun 28, 2024
@Aloqeely Aloqeely added the PDEP pandas enhancement proposal label Jun 29, 2024
Copy link
Contributor

github-actions bot commented Aug 7, 2024

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 7, 2024
@Aloqeely
Copy link
Member Author

Aloqeely commented Aug 9, 2024

@pandas-dev/pandas-core I'd like to start a vote soon but I'd really appreciate more input from the team before doing that.

Copy link
Member

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this - and once again sorry for the long silence.

This looks mostly good to me, just had some minor nits.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aloqeely thanks a lot of working on this!

For the abstract or motivation, I would reference our existing policy https://pandas.pydata.org/pandas-docs/version/2.2/development/policies.html and summarize the main addition (my current understanding is that we are mostly following what we had written before, but expanding it / clarifying some details about timelines and warning classes to use)

## Background

pandas uses a loose variant of semantic versioning.
A pandas release number is written in the format of ``MAJOR.MINOR.PATCH``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that MAJOR.MINOR.PATCH is the terminology typically used in semver, but how do people feel about using a terminology like major.feature.bugfix ?

That feels closer to how we also communicate about it (e.g. when announcing it, we didn't call pandas 2.2.0 a "minor" release)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely easier to understand, but don't you think it might be a bit misleading? Because we do release features even in bugfix (patch) releases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging into past release notes, it seems like v2.2.1 is the only patch release with a new 'feature', so maybe that was just an odd case. I'm OK with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose that was an odd case (and it was also only a packaging feature, i.e. a new extra, not an actual code feature)

There might always be exceptions (when there is a good reason), but I think at least the general rule is that bug-fix / patch releases only contain bug fixes (and then even mostly regression fixes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth clarifying that we consider "features" from the perspective of a user and not a developer? The latter can happen at any time

Copy link
Member

@rhshadrach rhshadrach Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "bugfix" misleading. We typically do not backport mere bugfixes, only those that fix regressions.

@jorisvandenbossche
Copy link
Member

I think one of the main changes in this proposal compared to our current practice, is to more consistently use first a DeprecationWarning and then FutureWarning for higher visibility (this was one of the points that came out of the discussion with a downstream package author in #54970, and triggered doing this PDEP I think).
And to be explicit, I am +1 on that proposal.

@Aloqeely
Copy link
Member Author

I think one of the main changes in this proposal compared to our current practice, is to more consistently use first a DeprecationWarning and then FutureWarning for higher visibility (this was one of the points that came out of the discussion with a downstream package author in #54970, and triggered doing this PDEP I think).

Exactly.

@Aloqeely
Copy link
Member Author

Planning to start the vote in 15 days if there is no unresolved discussion by then.

@attack68
Copy link
Contributor

attack68 commented Nov 4, 2024

I just voted for this but wanted to add a comment here: As a developer I think it would be helpful to guide users what to do in these situations in docs, hyper linked from the console-message. My users may have any version of pandas installed (often corporates use quite old versions packaged internally) which means a solution must work across multiple versions. My code often contains version "ifs" and then different branches with "TODO"s as remnants to tidy it up once my software has migrated to a minimum versions of pandas.

Sometimes deprecations can be avoided using current code, other times it uses new implementations which need different types of solutions.

@Aloqeely
Copy link
Member Author

Aloqeely commented Nov 5, 2024

As a developer I think it would be helpful to guide users what to do in these situations in docs, hyper linked from the console-message.

It is usually the case that we guide users how to deal with a deprecation in the warning message if possible, as explicitly mentioned in the PDEP:

A deprecation's warning message should: - Mention how to achieve similar behavior if an alternative is available.

@simonjayhawkins
Copy link
Member

Thanks @Aloqeely for working on this.

As a reader of this document, I would ask, "What is the main purpose of PDEP-17?" The abstract states, "This PDEP defines pandas' backwards compatibility and deprecation policy." This seems somewhat unclear as the motivation behind the PDEP since we already have a Version policy. It becomes apparent from following the in-depth discussion that the actual motivation was #54970. This should probably be conveyed in the document for future readers, so they don't need to dig deep into the discussion.

On a related note, to avoid ambiguity, does this document wholly supersede our current policy? The abstract mentions "The main additions ... ," but there is also a removal from the current policy. Specifically, the warning should include the pandas version in which the deprecation will be enforced is being removed. This is my understanding from the discussion, but I cannot see this explicitly mentioned.

Am I correct in thinking that the published policy on the web will be the text in the General policy section only and in its entirety, and not any modifications to the current policy?

Regarding the Public API, a common practice in the past has been to revert changes that break another established package and discuss the issue. Other packages may use experimental features or the internal API, and we do our best to accommodate that, even if sometimes reluctantly.

All that said... +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PDEP pandas enhancement proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants