Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: code samples for reset_index and sort_values #282

Merged
merged 7 commits into from
Dec 22, 2023
161 changes: 161 additions & 0 deletions third_party/bigframes_vendored/pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1138,6 +1138,93 @@ def reset_index(

Reset the index of the DataFrame, and use the default one instead.

**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> import numpy as np
>>> df = bpd.DataFrame([('bird', 389.0),
... ('bird', 24.0),
... ('mammal', 80.5),
... ('mammal', np.nan)],
... index=['falcon', 'parrot', 'lion', 'monkey'],
... columns=('class', 'max_speed'))
>>> df
class max_speed
falcon bird 389.0
parrot bird 24.0
lion mammal 80.5
monkey mammal <NA>
<BLANKLINE>
[4 rows x 2 columns]

When we reset the index, the old index is added as a column, and a new sequential index is used:

>>> df.reset_index()
index class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal <NA>
<BLANKLINE>
[4 rows x 3 columns]

We can use the ``drop`` parameter to avoid the old index being added as a column:

>>> df.reset_index(drop=True)
class max_speed
0 bird 389.0
1 bird 24.0
2 mammal 80.5
3 mammal <NA>
<BLANKLINE>
[4 rows x 2 columns]

You can also use ``reset_index`` with ``MultiIndex``.

>>> import pandas as pd
>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
... ('bird', 'parrot'),
... ('mammal', 'lion'),
... ('mammal', 'monkey')],
... names=['class', 'name'])
>>> columns = ['speed', 'max']
>>> df = bpd.DataFrame([(389.0, 'fly'),
... (24.0, 'fly'),
... (80.5, 'run'),
... (np.nan, 'jump')],
... index=index,
... columns=columns)
>>> df
speed max
class name
bird falcon 389.0 fly
parrot 24.0 fly
mammal lion 80.5 run
monkey <NA> jump
<BLANKLINE>
[4 rows x 2 columns]

>>> df.reset_index()
class name speed max
0 bird falcon 389.0 fly
1 bird parrot 24.0 fly
2 mammal lion 80.5 run
3 mammal monkey <NA> jump
<BLANKLINE>
[4 rows x 4 columns]

>>> df.reset_index(drop=True)
speed max
0 389.0 fly
1 24.0 fly
2 80.5 run
3 <NA> jump
<BLANKLINE>
[4 rows x 2 columns]


Args:
drop (bool, default False):
Do not try to insert index into dataframe columns. This resets
Expand Down Expand Up @@ -1347,6 +1434,80 @@ def sort_values(
) -> DataFrame:
"""Sort by the values along row axis.

**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> df = bpd.DataFrame({
... 'col1': ['A', 'A', 'B', bpd.NA, 'D', 'C'],
... 'col2': [2, 1, 9, 8, 7, 4],
... 'col3': [0, 1, 9, 4, 2, 3],
... 'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 <NA> 8 4 D
4 D 7 2 e
5 C 4 3 F
<BLANKLINE>
[6 rows x 4 columns]

Sort by col1:

>>> df.sort_values(by=['col1'])
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 <NA> 8 4 D
<BLANKLINE>
[6 rows x 4 columns]

Sort by multiple columns:

>>> df.sort_values(by=['col1', 'col2'])
col1 col2 col3 col4
1 A 1 1 B
0 A 2 0 a
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 <NA> 8 4 D
<BLANKLINE>
[6 rows x 4 columns]

Sort Descending:

>>> df.sort_values(by='col1', ascending=False)
col1 col2 col3 col4
4 D 7 2 e
5 C 4 3 F
2 B 9 9 c
0 A 2 0 a
1 A 1 1 B
3 <NA> 8 4 D
<BLANKLINE>
[6 rows x 4 columns]

Putting NAs first:

>>> df.sort_values(by='col1', ascending=False, na_position='first')
col1 col2 col3 col4
3 <NA> 8 4 D
4 D 7 2 e
5 C 4 3 F
2 B 9 9 c
0 A 2 0 a
1 A 1 1 B
<BLANKLINE>
[6 rows x 4 columns]

Args:
by (str or Sequence[str]):
Name or list of names to sort by.
Expand Down
110 changes: 110 additions & 0 deletions third_party/bigframes_vendored/pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,53 @@ def reset_index(
when the index is meaningless and needs to be reset to the default
before another operation.

**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([1, 2, 3, 4], name='foo',
... index=['a', 'b', 'c', 'd'])
>>> s.index.name = "idx"
>>> s
idx
a 1
b 2
c 3
d 4
Name: foo, dtype: Int64

Generate a DataFrame with default index.

>>> s.reset_index()
idx foo
0 a 1
1 b 2
2 c 3
3 d 4
<BLANKLINE>
[4 rows x 2 columns]

To specify the name of the new column use ``name`` param.

>>> s.reset_index(name="bar")
idx bar
0 a 1
1 b 2
2 c 3
3 d 4
<BLANKLINE>
[4 rows x 2 columns]

To generate a new Series with the default index set param ``drop=True``.

>>> s.reset_index(drop=True)
0 1
1 2
2 3
3 4
Name: foo, dtype: Int64

Args:
drop (bool, default False):
Just reset the index, without inserting it as a column in
Expand Down Expand Up @@ -699,6 +746,69 @@ def sort_values(
Sort a Series in ascending or descending order by some
criterion.

**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([np.nan, 1, 3, 10, 5])
>>> s
0 <NA>
1 1.0
2 3.0
3 10.0
4 5.0
dtype: Float64

Sort values ascending order (default behaviour):

>>> s.sort_values(ascending=True)
1 1.0
2 3.0
4 5.0
3 10.0
0 <NA>
dtype: Float64

Sort values descending order:

>>> s.sort_values(ascending=False)
3 10.0
4 5.0
2 3.0
1 1.0
0 <NA>
dtype: Float64

Sort values putting NAs first:

>>> s.sort_values(na_position='first')
0 <NA>
1 1.0
2 3.0
4 5.0
3 10.0
dtype: Float64

Sort a series of strings:

>>> s = bpd.Series(['z', 'b', 'd', 'a', 'c'])
>>> s
0 z
1 b
2 d
3 a
4 c
dtype: string

>>> s.sort_values()
3 a
1 b
4 c
2 d
0 z
dtype: string

Args:
axis (0 or 'index'):
Unused. Parameter needed for compatibility with DataFrame.
Expand Down