-
Notifications
You must be signed in to change notification settings - Fork 206
Docstring Best Practice
Dataprep uses a few sphinx packages to accelerate docstring writing, thus brings in additional best practices. Here lists all these best practices and please kindly give it a read.
-
Automatic parameter type inference.
Dataprep strongly enforces typing for all the functions, classes and variables. When writing function parameters, the convention from
NumPy
says you should write the parameter type after a:
. Here, we don't, as long as the type is annotated correctly in the function signature. Takedataprep.eda.basic.plot
as an example: Since we have the function signature typed,def plot( df: Union[pd.DataFrame, dd.DataFrame], x: Optional[str] = None, y: Optional[str] = None, *, bins: int = 10, ngroups: int = 10, largest: bool = True, nsubgroups: int = 5, bandwidth: float = 1.5, sample_size: int = 1000, value_range: Optional[Tuple[float, float]] = None, yscale: str = "linear", tile_size: Optional[float] = None, ) -> Figure: ...
-
No Type for Function Parameters
In the docstring you don't need to write type for a parameter
Parameters ---------- df Dataframe from which plots are to be generated
we already have the type of
df
from the signature. Also, the documentation will be generated correctly as: -
Give the Type for Default Values
Alternatively, you can still write the parameter type to override the auto-generated one. A very good use case would be default values:
Parameters ---------- x: Optional[str], default None A valid column name from the dataframe.
This gives you
Notice that how the parameter type changes from bold to italic - this is the sign of ** overridden** parameter types.
-
No Returns Unless for Comments
We can also infer the function return type from the signature! This means no need for docstrings like this:
Returns ------- Figure An object of figure
, unless you want to write some meaningful comments for the return type:
Returns ------- Figure A meaningful message!!!
-
-
Make class members private by a leading
_
.Remember all the members without a leading underscore will be shown in the documentation!
- Module Docstring: one short description of the main purpose of the file. E.g.,
"""Clean and validate a DataFrame column containing geographic coordinates."""
-
Function Docstring
a. Start with a high-level, one-sentence description of the function. E.g,
""" Clean and standardize latitude and longitude coordinates.
b. Optionally, further relevant information can be given in paragraphs under the first sentence.
c. If there exists an associated User Guide, the last sentence before the parameter descriptions should reference it. E.g.,
Read more in the :ref:`User Guide <clean_lat_long_user_guide>`. Parameters ----------
-
Parameter Descriptions
a. If a parameter defines a format, an example should be given. E.g.
output_format The desired format of the coordinates. - 'dd': decimal degrees (51.4934, 0.0098) - 'ddh': decimal degrees with hemisphere ('51.4934° N, 0.0098° E') - 'dm': degrees minutes ('51° 29.604′ N, 0° 0.588′ E') - 'dms': degrees minutes seconds ('51° 29′ 36.24″ N, 0° 0′ 35.28″ E') (default: 'dd')
b. The default value should be specified after a blank line at the end of the parameter description. E.g.,
report If True, output the summary report. Otherwise, no report is outputted. (default: True)
c. If a parameter has the exact same functionality as in other functions, the description should be the same. E.g., the
report
parameter above. -
Examples: after defining the parameters, include a short example that demonstrates the function. E.g.
Examples
--------
Split a column containing latitude and longitude strings into separate
columns in decimal degrees format.
>>> df = pd.DataFrame({'coordinates': ['51° 29′ 36.24″ N, 0° 0′ 35.28″ E', '51.4934° N, 0.0098° E']})
>>> clean_lat_long(df, 'coordinates', split=True)
coordinates latitude longitude
0 51° 29′ 36.24″ N, 0° 0′ 35.28″ E 51.4934 0.0098
1 51.4934° N, 0.0098° E 51.4934 0.0098
Notes:
- Each statement should begin with a capital letter and end with a period.
- All internal functions should begin with an underscore so they do not appear in the documentation.
- Please use single quotes for text (i.e., 'US' not "US") in the docstring.
To add a file to appear in the API reference section of the documentation, add it in alphabetical order here.
To create a link to a user guide from a docstring, follow the instructions here.
To link to the API reference of a function from a user guide, first set the raw NBConvert format of the cell to reST as explained in the previous section. Then use the syntax :func:`Text you want to link <full path to function>` to reference the function's API docstring. For example :func:`clean_country() <dataprep.clean.clean_country.clean_country>`. Please link to a function when it's first introduced in the user guide.
To preview the documentation, run poetry run sphinx-build -M html docs/source docs/build
in your dataprep directory. A local copy of the main page can then be accessed from docs/build/html/index.html.