API: concat on sparse values #25719

TomAugspurger · 2019-03-13T23:00:37Z

API breaking change to concat(List[DataFrame[Sparse]]) to
return a DataFrame with sparse values, rather than a SparseDataFrame.

Doing an outright break, rather than deprecation, because I have a
followup PR deprecating SparseDataFrame. We hit this internally in
a few places (e.g. get_dummies on all-sparse data).

Closes #25702

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

API breaking change to `concat(List[DataFrame[Sparse]])` to return a DataFrame with sparse values, rather than a SparseDataFrame. Doing an outright break, rather than deprecation, because I have a followup PR deprecating SparseDataFrame. We return this internally in a few places (e.g. get_dummies on all-sparse data). Closes pandas-dev#25702

codecov · 2019-03-13T23:46:44Z

Codecov Report

Merging #25719 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25719      +/-   ##
==========================================
- Coverage   91.24%   91.24%   -0.01%     
==========================================
  Files         172      172              
  Lines       52967    52970       +3     
==========================================
+ Hits        48332    48334       +2     
- Misses       4635     4636       +1

Flag	Coverage Δ
#multiple	`89.82% <100%> (ø)`	⬆️
#single	`41.75% <33.33%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/concat.py	`96.58% <ø> (ø)`	⬆️
pandas/core/groupby/generic.py	`87.05% <100%> (+0.05%)`	⬆️
pandas/util/testing.py	`88.98% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1017382...96508ca. Read the comment docs.

codecov · 2019-03-13T23:46:46Z

Codecov Report

Merging #25719 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25719      +/-   ##
==========================================
- Coverage   91.24%   91.24%   -0.01%     
==========================================
  Files         172      172              
  Lines       52967    52970       +3     
==========================================
+ Hits        48332    48334       +2     
- Misses       4635     4636       +1

Flag	Coverage Δ
#multiple	`89.82% <100%> (ø)`	⬆️
#single	`41.75% <33.33%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/concat.py	`96.58% <ø> (ø)`	⬆️
pandas/core/groupby/generic.py	`87.05% <100%> (+0.05%)`	⬆️
pandas/util/testing.py	`88.98% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1017382...96508ca. Read the comment docs.

jreback

tiny change, otherwise lgtm.

jreback · 2019-03-14T21:20:49Z

doc/source/whatsnew/v0.25.0.rst

@@ -64,6 +64,42 @@ is respected in indexing. (:issue:`24076`, :issue:`16785`)
    df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
    df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']

+Concatenating Sparse Values


I have seen you make this review comment before, but IMO, there is not really much added value in asking for this. It doesn't change anything in the built documentation, and is only needed if you make a link to it from somewhere else in our docs (and in this case, I think it is rather unlikely that we will ever link to it).

disagree. as a matter of course if we have subsections we should have links. its not much of a burden and promotes consistency.

It's just that you are asking something (and we are often already asking a lot of contributors in many rounds) that is basically useless, except for the consistency argument that you give (IMO that's not worth the overhead of always adding it, but OK, we can disagree on that).

as a matter of course if we have subsections we should have links

Note that this does not introduce links. Things like anchors so that people can link to that section on the html docs, that is done automatically by sphinx, you don't need this label for that.

again disagree. consistency is SO important in the pandas docs / code. If it can be automated (either the actual code / docs or the check of it great), otherwise the purpose is NOT to special case the world. Again having to make that judgement call on each PR would make a big inconsistency across PR's & maintainers.

having to make that judgement call on each PR would make a big inconsistency across PR's & maintainers.

The thing is that you don't need to make a judgement call here: it simply is not needed for new sections being added. Only when you ask to add somewhere an internal link, you need to add a label.

you are still missing the point

pls add the ref here - arguing about reviews is a waste of time

TomAugspurger · 2019-03-15T20:53:04Z

Changing the convention might save time in the long run, if we avoid unnecessary CI runs, abandoned PRs, etc.

…

On Thu, Mar 14, 2019 at 6:11 PM Jeff Reback ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doc/source/whatsnew/v0.25.0.rst <#25719 (comment)>: > @@ -64,6 +64,42 @@ is respected in indexing. (:issue:`24076`, :issue:`16785`) df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00'] +Concatenating Sparse Values you are still missing the point pls add the ref here - arguing about reviews is a waste of time — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#25719 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHImm40yVgqRq7eV83ucR_5OytS_6dks5vWtcJgaJpZM4bxfDE> .

jreback · 2019-03-19T23:40:34Z

thanks @TomAugspurger

* origin/master: DOC: clean bug fix section in whatsnew (pandas-dev#25792) DOC: Fixed PeriodArray api ref (pandas-dev#25526) Move locale code out of tm, into _config (pandas-dev#25757) Unpin pycodestyle (pandas-dev#25789) Add test for rdivmod on EA array (GH23287) (pandas-dev#24047) ENH: Support datetime.timezone objects (pandas-dev#25065) Cython language level 3 (pandas-dev#24538) API: concat on sparse values (pandas-dev#25719) TST: assert_produces_warning works with filterwarnings (pandas-dev#25721) make core.config self-contained (pandas-dev#25613) CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721) TST: Check pytables<3.5.1 when skipping (pandas-dev#25773) DOC: Fix typo in docstring of DataFrame.memory_usage (pandas-dev#25770) Replace dicts with OrderedDicts in groupby aggregation functions (pandas-dev#25693) TST: Fixturize tests/frame/test_missing.py (pandas-dev#25640) DOC: Improve the docsting of Series.iteritems (pandas-dev#24879) DOC: Fix function name. (pandas-dev#25751) Implementing iso_week_year support for to_datetime (pandas-dev#25541) DOC: clarify corr behaviour when using a callable (pandas-dev#25732) remove unnecessary check_output (pandas-dev#25755) # Conflicts: # doc/source/whatsnew/v0.25.0.rst

* upstream/master: (55 commits) PERF: Improve performance of StataReader (pandas-dev#25780) Speed up tokenizing of a row in csv and xstrtod parsing (pandas-dev#25784) BUG: Fix _binop for operators for serials which has more than one returns (divmod/rdivmod). (pandas-dev#25588) BUG-24971 copying blocks also considers ndim (pandas-dev#25521) CLN: Panel reference from documentation (pandas-dev#25649) ENH: Quoting column names containing spaces with backticks to use them in query and eval. (pandas-dev#24955) BUG: reading windows utf8 filenames in py3.6 (pandas-dev#25769) DOC: clean bug fix section in whatsnew (pandas-dev#25792) DOC: Fixed PeriodArray api ref (pandas-dev#25526) Move locale code out of tm, into _config (pandas-dev#25757) Unpin pycodestyle (pandas-dev#25789) Add test for rdivmod on EA array (GH23287) (pandas-dev#24047) ENH: Support datetime.timezone objects (pandas-dev#25065) Cython language level 3 (pandas-dev#24538) API: concat on sparse values (pandas-dev#25719) TST: assert_produces_warning works with filterwarnings (pandas-dev#25721) make core.config self-contained (pandas-dev#25613) CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721) TST: Check pytables<3.5.1 when skipping (pandas-dev#25773) DOC: Fix typo in docstring of DataFrame.memory_usage (pandas-dev#25770) ...

TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Sparse Sparse Data Type labels Mar 13, 2019

TomAugspurger added this to the 0.25.0 milestone Mar 13, 2019

jorisvandenbossche approved these changes Mar 14, 2019

View reviewed changes

jreback approved these changes Mar 14, 2019

View reviewed changes

jreback merged commit 8311214 into pandas-dev:master Mar 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: concat on sparse values #25719

API: concat on sparse values #25719

TomAugspurger commented Mar 13, 2019

codecov bot commented Mar 13, 2019

codecov bot commented Mar 13, 2019 •

edited

Loading

jreback left a comment

jreback Mar 14, 2019

jorisvandenbossche Mar 14, 2019

jreback Mar 14, 2019

jorisvandenbossche Mar 14, 2019

jreback Mar 14, 2019

jorisvandenbossche Mar 14, 2019

jreback Mar 14, 2019

TomAugspurger commented Mar 15, 2019 via email

jreback commented Mar 19, 2019

API: concat on sparse values #25719

API: concat on sparse values #25719

Conversation

TomAugspurger commented Mar 13, 2019

codecov bot commented Mar 13, 2019

Codecov Report

codecov bot commented Mar 13, 2019 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

jreback Mar 14, 2019

Choose a reason for hiding this comment

jorisvandenbossche Mar 14, 2019

Choose a reason for hiding this comment

jreback Mar 14, 2019

Choose a reason for hiding this comment

jorisvandenbossche Mar 14, 2019

Choose a reason for hiding this comment

jreback Mar 14, 2019

Choose a reason for hiding this comment

jorisvandenbossche Mar 14, 2019

Choose a reason for hiding this comment

jreback Mar 14, 2019

Choose a reason for hiding this comment

TomAugspurger commented Mar 15, 2019 via email

jreback commented Mar 19, 2019

codecov bot commented Mar 13, 2019 •

edited

Loading