DOC: Harmonize column selection to bracket notation (pandas-dev#27562)

* Harmonize column selection to bracket notation As suggested by https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428#46f9
proost · Dec 19, 2019 · 84a60db · 84a60db
1 parent 892233e
commit 84a60db
Show file tree

Hide file tree

Showing 9 changed files with 54 additions and 51 deletions.
diff --git a/doc/source/getting_started/10min.rst b/doc/source/getting_started/10min.rst
@@ -278,7 +278,7 @@ Using a single column's values to select data.
 
 .. ipython:: python
 
-   df[df.A > 0]
+   df[df['A'] > 0]
 
 Selecting values from a DataFrame where a boolean condition is met.
 

diff --git a/doc/source/getting_started/basics.rst b/doc/source/getting_started/basics.rst
@@ -926,7 +926,7 @@ Single aggregations on a ``Series`` this will return a scalar value:
 
 .. ipython:: python
 
-   tsdf.A.agg('sum')
+   tsdf['A'].agg('sum')
 
 
 Aggregating with multiple functions
@@ -950,13 +950,13 @@ On a ``Series``, multiple functions return a ``Series``, indexed by the function
 
 .. ipython:: python
 
-   tsdf.A.agg(['sum', 'mean'])
+   tsdf['A'].agg(['sum', 'mean'])
 
 Passing a ``lambda`` function will yield a ``<lambda>`` named row:
 
 .. ipython:: python
 
-   tsdf.A.agg(['sum', lambda x: x.mean()])
+   tsdf['A'].agg(['sum', lambda x: x.mean()])
 
 Passing a named function will yield that name for the row:
 
@@ -965,7 +965,7 @@ Passing a named function will yield that name for the row:
    def mymean(x):
        return x.mean()
 
-   tsdf.A.agg(['sum', mymean])
+   tsdf['A'].agg(['sum', mymean])
 
 Aggregating with a dict
 +++++++++++++++++++++++
@@ -1065,7 +1065,7 @@ Passing a single function to ``.transform()`` with a ``Series`` will yield a sin
 
 .. ipython:: python
 
-   tsdf.A.transform(np.abs)
+   tsdf['A'].transform(np.abs)
 
 
 Transform with multiple functions
@@ -1084,7 +1084,7 @@ resulting column names will be the transforming functions.
 
 .. ipython:: python
 
-   tsdf.A.transform([np.abs, lambda x: x + 1])
+   tsdf['A'].transform([np.abs, lambda x: x + 1])
 
 
 Transforming with a dict

diff --git a/doc/source/getting_started/comparison/comparison_with_r.rst b/doc/source/getting_started/comparison/comparison_with_r.rst
@@ -81,7 +81,7 @@ R                                            pandas
 ===========================================  ===========================================
 ``select(df, col_one = col1)``               ``df.rename(columns={'col1': 'col_one'})['col_one']``
 ``rename(df, col_one = col1)``               ``df.rename(columns={'col1': 'col_one'})``
-``mutate(df, c=a-b)``                        ``df.assign(c=df.a-df.b)``
+``mutate(df, c=a-b)``                        ``df.assign(c=df['a']-df['b'])``
 ===========================================  ===========================================
 
 
@@ -258,8 +258,8 @@ index/slice as well as standard boolean indexing:
 
    df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
    df.query('a <= b')
-   df[df.a <= df.b]
-   df.loc[df.a <= df.b]
+   df[df['a'] <= df['b']]
+   df.loc[df['a'] <= df['b']]
 
 For more details and examples see :ref:`the query documentation
 <indexing.query>`.
@@ -284,7 +284,7 @@ In ``pandas`` the equivalent expression, using the
 
    df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
    df.eval('a + b')
-   df.a + df.b  # same as the previous expression
+   df['a'] + df['b']  # same as the previous expression
 
 In certain cases :meth:`~pandas.DataFrame.eval` will be much faster than
 evaluation in pure Python. For more details and examples see :ref:`the eval

diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst
@@ -738,7 +738,7 @@ and allows efficient indexing and storage of an index with a large number of dup
    df['B'] = df['B'].astype(CategoricalDtype(list('cab')))
    df
    df.dtypes
-   df.B.cat.categories
+   df['B'].cat.categories
 
 Setting the index will create a ``CategoricalIndex``.
 

diff --git a/doc/source/user_guide/cookbook.rst b/doc/source/user_guide/cookbook.rst
@@ -592,8 +592,8 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
 .. ipython:: python
 
    df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 1, 1], columns=['A'])
-   df.A.groupby((df.A != df.A.shift()).cumsum()).groups
-   df.A.groupby((df.A != df.A.shift()).cumsum()).cumsum()
+   df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).groups
+   df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).cumsum()
 
 Expanding data
 **************
@@ -719,7 +719,7 @@ Rolling Apply to multiple columns where function calculates a Series before a Sc
    df
 
    def gm(df, const):
-       v = ((((df.A + df.B) + 1).cumprod()) - 1) * const
+       v = ((((df['A'] + df['B']) + 1).cumprod()) - 1) * const
        return v.iloc[-1]
 
    s = pd.Series({df.index[i]: gm(df.iloc[i:min(i + 51, len(df) - 1)], 5)

diff --git a/doc/source/user_guide/enhancingperf.rst b/doc/source/user_guide/enhancingperf.rst
@@ -393,15 +393,15 @@ Consider the following toy example of doubling each observation:
 .. code-block:: ipython
 
    # Custom function without numba
-   In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba)  # noqa E501
+   In [5]: %timeit df['col1_doubled'] = df['a'].apply(double_every_value_nonumba)  # noqa E501
    1000 loops, best of 3: 797 us per loop
 
    # Standard implementation (faster than a custom function)
-   In [6]: %timeit df['col1_doubled'] = df.a * 2
+   In [6]: %timeit df['col1_doubled'] = df['a'] * 2
    1000 loops, best of 3: 233 us per loop
 
    # Custom function with numba
-   In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy())
+   In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df['a'].to_numpy())
    1000 loops, best of 3: 145 us per loop
 
 Caveats
@@ -643,8 +643,8 @@ The equivalent in standard Python would be
 .. ipython:: python
 
    df = pd.DataFrame(dict(a=range(5), b=range(5, 10)))
-   df['c'] = df.a + df.b
-   df['d'] = df.a + df.b + df.c
+   df['c'] = df['a'] + df['b']
+   df['d'] = df['a'] + df['b'] + df['c']
    df['a'] = 1
    df
 
@@ -688,7 +688,7 @@ name in an expression.
 
    a = np.random.randn()
    df.query('@a < a')
-   df.loc[a < df.a]  # same as the previous expression
+   df.loc[a < df['a']]  # same as the previous expression
 
 With :func:`pandas.eval` you cannot use the ``@`` prefix *at all*, because it
 isn't defined in that context. ``pandas`` will let you know this if you try to

diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst
@@ -210,7 +210,7 @@ as an attribute:
      See `here for an explanation of valid identifiers
      <https://docs.python.org/3/reference/lexical_analysis.html#identifiers>`__.
 
-   - The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed.
+   - The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed, but ``s['min']`` is possible.
 
    - Similarly, the attribute will not be available if it conflicts with any of the following list: ``index``,
      ``major_axis``, ``minor_axis``, ``items``.
@@ -540,7 +540,7 @@ The ``callable`` must be a function with one argument (the calling Series or Dat
                       columns=list('ABCD'))
    df1
 
-   df1.loc[lambda df: df.A > 0, :]
+   df1.loc[lambda df: df['A'] > 0, :]
    df1.loc[:, lambda df: ['A', 'B']]
 
    df1.iloc[:, lambda df: [0, 1]]
@@ -552,7 +552,7 @@ You can use callable indexing in ``Series``.
 
 .. ipython:: python
 
-   df1.A.loc[lambda s: s > 0]
+   df1['A'].loc[lambda s: s > 0]
 
 Using these methods / indexers, you can chain data selection operations
 without using a temporary variable.
@@ -561,7 +561,7 @@ without using a temporary variable.
 
    bb = pd.read_csv('data/baseball.csv', index_col='id')
    (bb.groupby(['year', 'team']).sum()
-      .loc[lambda df: df.r > 100])
+      .loc[lambda df: df['r'] > 100])
 
 .. _indexing.deprecate_ix:
 
@@ -871,9 +871,9 @@ Boolean indexing
 Another common operation is the use of boolean vectors to filter the data.
 The operators are: ``|`` for ``or``, ``&`` for ``and``, and ``~`` for ``not``.
 These **must** be grouped by using parentheses, since by default Python will
-evaluate an expression such as ``df.A > 2 & df.B < 3`` as
-``df.A > (2 & df.B) < 3``, while the desired evaluation order is
-``(df.A > 2) & (df.B < 3)``.
+evaluate an expression such as ``df['A'] > 2 & df['B'] < 3`` as
+``df['A'] > (2 & df['B']) < 3``, while the desired evaluation order is
+``(df['A > 2) & (df['B'] < 3)``.
 
 Using a boolean vector to index a Series works exactly as in a NumPy ndarray:
 
@@ -1134,7 +1134,7 @@ between the values of columns ``a`` and ``c``. For example:
    df
 
    # pure python
-   df[(df.a < df.b) & (df.b < df.c)]
+   df[(df['a'] < df['b']) & (df['b'] < df['c'])]
 
    # query
    df.query('(a < b) & (b < c)')
@@ -1241,7 +1241,7 @@ Full numpy-like syntax:
    df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=list('abc'))
    df
    df.query('(a < b) & (b < c)')
-   df[(df.a < df.b) & (df.b < df.c)]
+   df[(df['a'] < df['b']) & (df['b'] < df['c'])]
 
 Slightly nicer by removing the parentheses (by binding making comparison
 operators bind tighter than ``&`` and ``|``).
@@ -1279,12 +1279,12 @@ The ``in`` and ``not in`` operators
    df.query('a in b')
 
    # How you'd do it in pure Python
-   df[df.a.isin(df.b)]
+   df[df['a'].isin(df['b'])]
 
    df.query('a not in b')
 
    # pure Python
-   df[~df.a.isin(df.b)]
+   df[~df['a'].isin(df['b'])]
 
 
 You can combine this with other expressions for very succinct queries:
@@ -1297,7 +1297,7 @@ You can combine this with other expressions for very succinct queries:
    df.query('a in b and c < d')
 
    # pure Python
-   df[df.b.isin(df.a) & (df.c < df.d)]
+   df[df['b'].isin(df['a']) & (df['c'] < df['d'])]
 
 
 .. note::
@@ -1326,7 +1326,7 @@ to ``in``/``not in``.
    df.query('b == ["a", "b", "c"]')
 
    # pure Python
-   df[df.b.isin(["a", "b", "c"])]
+   df[df['b'].isin(["a", "b", "c"])]
 
    df.query('c == [1, 2]')
 
@@ -1338,7 +1338,7 @@ to ``in``/``not in``.
    df.query('[1, 2] not in c')
 
    # pure Python
-   df[df.c.isin([1, 2])]
+   df[df['c'].isin([1, 2])]
 
 
 Boolean operators
@@ -1352,7 +1352,7 @@ You can negate boolean expressions with the word ``not`` or the ``~`` operator.
    df['bools'] = np.random.rand(len(df)) > 0.5
    df.query('~bools')
    df.query('not bools')
-   df.query('not bools') == df[~df.bools]
+   df.query('not bools') == df[~df['bools']]
 
 Of course, expressions can be arbitrarily complex too:
 
@@ -1362,7 +1362,10 @@ Of course, expressions can be arbitrarily complex too:
    shorter = df.query('a < b < c and (not bools) or bools > 2')
 
    # equivalent in pure Python
-   longer = df[(df.a < df.b) & (df.b < df.c) & (~df.bools) | (df.bools > 2)]
+   longer = df[(df['a'] < df['b'])
+               & (df['b'] < df['c'])
+               & (~df['bools'])
+               | (df['bools'] > 2)]
 
    shorter
    longer
@@ -1835,14 +1838,14 @@ chained indexing expression, you can set the :ref:`option <options>`
 
    # This will show the SettingWithCopyWarning
    # but the frame values will be set
-   dfb['c'][dfb.a.str.startswith('o')] = 42
+   dfb['c'][dfb['a'].str.startswith('o')] = 42
 
 This however is operating on a copy and will not work.
 
 ::
 
    >>> pd.set_option('mode.chained_assignment','warn')
-   >>> dfb[dfb.a.str.startswith('o')]['c'] = 42
+   >>> dfb[dfb['a'].str.startswith('o')]['c'] = 42
    Traceback (most recent call last)
         ...
    SettingWithCopyWarning:

diff --git a/doc/source/user_guide/reshaping.rst b/doc/source/user_guide/reshaping.rst
@@ -469,7 +469,7 @@ If ``crosstab`` receives only two Series, it will provide a frequency table.
                        'C': [1, 1, np.nan, 1, 1]})
     df
 
-    pd.crosstab(df.A, df.B)
+    pd.crosstab(df['A'], df['B'])
 
 Any input passed containing ``Categorical`` data will have **all** of its
 categories included in the cross-tabulation, even if the actual data does
@@ -489,21 +489,21 @@ using the ``normalize`` argument:
 
 .. ipython:: python
 
-   pd.crosstab(df.A, df.B, normalize=True)
+   pd.crosstab(df['A'], df['B'], normalize=True)
 
 ``normalize`` can also normalize values within each row or within each column:
 
 .. ipython:: python
 
-   pd.crosstab(df.A, df.B, normalize='columns')
+   pd.crosstab(df['A'], df['B'], normalize='columns')
 
 ``crosstab`` can also be passed a third ``Series`` and an aggregation function
 (``aggfunc``) that will be applied to the values of the third ``Series`` within
 each group defined by the first two ``Series``:
 
 .. ipython:: python
 
-   pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum)
+   pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=np.sum)
 
 Adding margins
 ~~~~~~~~~~~~~~
@@ -512,7 +512,7 @@ Finally, one can also add margins or normalize this output.
 
 .. ipython:: python
 
-   pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum, normalize=True,
+   pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=np.sum, normalize=True,
                margins=True)
 
 .. _reshaping.tile:

diff --git a/doc/source/user_guide/visualization.rst b/doc/source/user_guide/visualization.rst
@@ -1148,10 +1148,10 @@ To plot data on a secondary y-axis, use the ``secondary_y`` keyword:
 
 .. ipython:: python
 
-   df.A.plot()
+   df['A'].plot()
 
    @savefig series_plot_secondary_y.png
-   df.B.plot(secondary_y=True, style='g')
+   df['B'].plot(secondary_y=True, style='g')
 
 .. ipython:: python
    :suppress:
@@ -1205,7 +1205,7 @@ Here is the default behavior, notice how the x-axis tick labeling is performed:
    plt.figure()
 
    @savefig ser_plot_suppress.png
-   df.A.plot()
+   df['A'].plot()
 
 .. ipython:: python
    :suppress:
@@ -1219,7 +1219,7 @@ Using the ``x_compat`` parameter, you can suppress this behavior:
    plt.figure()
 
    @savefig ser_plot_suppress_parm.png
-   df.A.plot(x_compat=True)
+   df['A'].plot(x_compat=True)
 
 .. ipython:: python
    :suppress:
@@ -1235,9 +1235,9 @@ in ``pandas.plotting.plot_params`` can be used in a `with statement`:
 
    @savefig ser_plot_suppress_context.png
    with pd.plotting.plot_params.use('x_compat', True):
-       df.A.plot(color='r')
-       df.B.plot(color='g')
-       df.C.plot(color='b')
+       df['A'].plot(color='r')
+       df['B'].plot(color='g')
+       df['C'].plot(color='b')
 
 .. ipython:: python
    :suppress: