Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: concat on sparse values #25719

Merged
merged 1 commit into from
Mar 19, 2019

Conversation

TomAugspurger
Copy link
Contributor

API breaking change to concat(List[DataFrame[Sparse]]) to
return a DataFrame with sparse values, rather than a SparseDataFrame.

Doing an outright break, rather than deprecation, because I have a
followup PR deprecating SparseDataFrame. We hit this internally in
a few places (e.g. get_dummies on all-sparse data).

Closes #25702

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

API breaking change to `concat(List[DataFrame[Sparse]])` to
return a DataFrame with sparse values, rather than a SparseDataFrame.

Doing an outright break, rather than deprecation, because I have a
followup PR deprecating SparseDataFrame. We return this internally in
a few places (e.g. get_dummies on all-sparse data).

Closes pandas-dev#25702
@TomAugspurger TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Sparse Sparse Data Type labels Mar 13, 2019
@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Mar 13, 2019
@codecov
Copy link

codecov bot commented Mar 13, 2019

Codecov Report

Merging #25719 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25719      +/-   ##
==========================================
- Coverage   91.24%   91.24%   -0.01%     
==========================================
  Files         172      172              
  Lines       52967    52970       +3     
==========================================
+ Hits        48332    48334       +2     
- Misses       4635     4636       +1
Flag Coverage Δ
#multiple 89.82% <100%> (ø) ⬆️
#single 41.75% <33.33%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/concat.py 96.58% <ø> (ø) ⬆️
pandas/core/groupby/generic.py 87.05% <100%> (+0.05%) ⬆️
pandas/util/testing.py 88.98% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1017382...96508ca. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Mar 13, 2019

Codecov Report

Merging #25719 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25719      +/-   ##
==========================================
- Coverage   91.24%   91.24%   -0.01%     
==========================================
  Files         172      172              
  Lines       52967    52970       +3     
==========================================
+ Hits        48332    48334       +2     
- Misses       4635     4636       +1
Flag Coverage Δ
#multiple 89.82% <100%> (ø) ⬆️
#single 41.75% <33.33%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/concat.py 96.58% <ø> (ø) ⬆️
pandas/core/groupby/generic.py 87.05% <100%> (+0.05%) ⬆️
pandas/util/testing.py 88.98% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1017382...96508ca. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny change, otherwise lgtm.

@@ -64,6 +64,42 @@ is respected in indexing. (:issue:`24076`, :issue:`16785`)
df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']

Concatenating Sparse Values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ref

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen you make this review comment before, but IMO, there is not really much added value in asking for this. It doesn't change anything in the built documentation, and is only needed if you make a link to it from somewhere else in our docs (and in this case, I think it is rather unlikely that we will ever link to it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree. as a matter of course if we have subsections we should have links. its not much of a burden and promotes consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that you are asking something (and we are often already asking a lot of contributors in many rounds) that is basically useless, except for the consistency argument that you give (IMO that's not worth the overhead of always adding it, but OK, we can disagree on that).

as a matter of course if we have subsections we should have links

Note that this does not introduce links. Things like anchors so that people can link to that section on the html docs, that is done automatically by sphinx, you don't need this label for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again disagree. consistency is SO important in the pandas docs / code. If it can be automated (either the actual code / docs or the check of it great), otherwise the purpose is NOT to special case the world. Again having to make that judgement call on each PR would make a big inconsistency across PR's & maintainers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having to make that judgement call on each PR would make a big inconsistency across PR's & maintainers.

The thing is that you don't need to make a judgement call here: it simply is not needed for new sections being added. Only when you ask to add somewhere an internal link, you need to add a label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are still missing the point

pls add the ref here - arguing about reviews is a waste of time

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Mar 15, 2019 via email

@jreback jreback merged commit 8311214 into pandas-dev:master Mar 19, 2019
@jreback
Copy link
Contributor

jreback commented Mar 19, 2019

thanks @TomAugspurger

sighingnow added a commit to sighingnow/pandas that referenced this pull request Mar 20, 2019
* origin/master:
  DOC: clean bug fix section in whatsnew (pandas-dev#25792)
  DOC: Fixed PeriodArray api ref (pandas-dev#25526)
  Move locale code out of tm, into _config (pandas-dev#25757)
  Unpin pycodestyle (pandas-dev#25789)
  Add test for rdivmod on EA array (GH23287) (pandas-dev#24047)
  ENH: Support datetime.timezone objects (pandas-dev#25065)
  Cython language level 3 (pandas-dev#24538)
  API: concat on sparse values (pandas-dev#25719)
  TST: assert_produces_warning works with filterwarnings (pandas-dev#25721)
  make core.config self-contained (pandas-dev#25613)
  CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721)
  TST: Check pytables<3.5.1 when skipping (pandas-dev#25773)
  DOC: Fix typo in docstring of DataFrame.memory_usage  (pandas-dev#25770)
  Replace dicts with OrderedDicts in groupby aggregation functions (pandas-dev#25693)
  TST: Fixturize tests/frame/test_missing.py (pandas-dev#25640)
  DOC: Improve the docsting of Series.iteritems (pandas-dev#24879)
  DOC: Fix function name. (pandas-dev#25751)
  Implementing iso_week_year support for to_datetime (pandas-dev#25541)
  DOC: clarify corr behaviour when using a callable (pandas-dev#25732)
  remove unnecessary check_output (pandas-dev#25755)

# Conflicts:
#	doc/source/whatsnew/v0.25.0.rst
thoo added a commit to thoo/pandas that referenced this pull request Mar 20, 2019
* upstream/master: (55 commits)
  PERF: Improve performance of StataReader (pandas-dev#25780)
  Speed up tokenizing of a row in csv and xstrtod parsing (pandas-dev#25784)
  BUG: Fix _binop for operators for serials which has more than one returns (divmod/rdivmod). (pandas-dev#25588)
  BUG-24971 copying blocks also considers ndim (pandas-dev#25521)
  CLN: Panel reference from documentation (pandas-dev#25649)
  ENH: Quoting column names containing spaces with backticks to use them in query and eval. (pandas-dev#24955)
  BUG: reading windows utf8 filenames in py3.6 (pandas-dev#25769)
  DOC: clean bug fix section in whatsnew (pandas-dev#25792)
  DOC: Fixed PeriodArray api ref (pandas-dev#25526)
  Move locale code out of tm, into _config (pandas-dev#25757)
  Unpin pycodestyle (pandas-dev#25789)
  Add test for rdivmod on EA array (GH23287) (pandas-dev#24047)
  ENH: Support datetime.timezone objects (pandas-dev#25065)
  Cython language level 3 (pandas-dev#24538)
  API: concat on sparse values (pandas-dev#25719)
  TST: assert_produces_warning works with filterwarnings (pandas-dev#25721)
  make core.config self-contained (pandas-dev#25613)
  CLN: replace %s syntax with .format in pandas.io.parsers (pandas-dev#24721)
  TST: Check pytables<3.5.1 when skipping (pandas-dev#25773)
  DOC: Fix typo in docstring of DataFrame.memory_usage  (pandas-dev#25770)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat([sparse]) should be a Series / DataFrame
3 participants