Validate PEP-8 in docstring examples #23154

datapythonista · 2018-10-14T18:27:42Z

At the moment we are automatically validating PEP-8 in our code (using flake8 . in the CI), and we also have the script scripts/validate_docstrings.py that reports errors for different formatting errors in our docstrings, and it also runs the tests to see if they work and generate the expected input. But we are not validating whether the examples follow PEP-8.

In the next example:

def head():
    """
    Return the first n rows of the Series.

    Examples
    --------
    >>> s=pd.Series([1, 2, 3, 4])
    >>> s.head(2)
    0    1
    1    2
    dtype: int64

Everything will be reported as correct, because the formatting of the docstring is the expected, and when running the examples, they work and they generated the presented output.

According to PEP-8, there should be spaces around the = in an assignment. So, we should have s = pd.Series([1, 2, 3, 4]) instead of s=pd.Series([1, 2, 3, 4]). So, the validation should report it as an error.

What we need to do is:

Add a method in scripts/validate_docstrings.py that obtains the code in the examples (and if possible their line in the code to be presented to the user if there are errors). For what I know, the module doctest in the Python standard library provides a way to extract the code.
Add another method that given some code, returns possible PEP-8 errors. We currently have pyflakes as a dependency, so using it is probably the best option. But if it's better to use pycodestyle or something else, that could be an option.
Use these methods in the validation, so errors are reported as the rest of the errors from the validation
Add unit tests for the implemented features in scripts/tests/test_validate_docstrings.py

The text was updated successfully, but these errors were encountered:

dubielt1 · 2018-10-16T00:16:07Z

I'll make an attempt at this

FHaase · 2018-10-18T10:05:58Z

As flake8 is already being used, wouldn't it make sense to use flake8 plugins to check for those things?

flake8-rst For code in docstrings and *.rst files
Pull request created on conda-forge
flake8-docstrings A plugin to flake8 to include checks provided by pep257
Already added to conda-forge

datapythonista · 2018-10-25T05:49:56Z

Using a plugin makes sense, better not reinvent the wheel. But it'd be good if the plugin is called from the validator script, so devs working on docstrings don't need to execute more than a command to validate that the docstring is all right.

FHaase · 2018-10-25T11:14:08Z

I'm currently including it in ./ci/code_checks.sh under lint section.
However as it finds many PEP-8 violations in the *.rst files it might take a while until it can be merged.

datapythonista · 2018-10-25T11:18:14Z

Feel free to open issues for the PEP-8 problems if in your local branch you can already get the list of problems. If there are many I'd have more than one issue, so different people can work on it at the same time.

FHaase · 2018-10-25T11:48:30Z

commit
is all it takes to include it.

so it would get run with ./ci/code-checks.sh lint

Many mistakes are:

F821 undefined name 'Substitution' -> I would ignore them in documentation context
E999 SyntaxError: unexpected EOF while parsing
- mostly because of missing bodys of functions. [missing pass]
- however E999 also warns about false indentations which actually lead to usefull warnings.
e.g. pandas/doc/source/gotchas.rst:322:8: E999 IndentationError: unexpected indent
where the code-block should not be python but console

Also it doesn't catch all occurences of python only

no ::
no code-blocks without python or pycon
currently no >>> outside of code-blocks [Added a PR]

jorisvandenbossche · 2018-10-26T10:57:44Z

@FHaase thanks for looking into it!

F821 undefined name 'Substitution' -> I would ignore them in documentation context

Where is this one coming from?

FHaase · 2018-10-26T13:53:46Z

./doc/source/contributing_docstring.rst:956:9: F821 undefined name 'Substitution'
./doc/source/contributing_docstring.rst:957:9: F821 undefined name 'Appender'
./doc/source/contributing_docstring.rst:965:14: F821 undefined name 'Parent'
./doc/source/contributing_docstring.rst:966:14: F821 undefined name 'ChildA'
./doc/source/contributing_docstring.rst:967:14: F821 undefined name 'ChildB'

.. code-block:: python

   class Parent:
       def my_function(self):
           """Apply my function to %(klass)s."""
           ...

   class ChildA(Parent):
       @Substitution(klass="ChildA")                                
       @Appender(Parent.my_function.__doc__)
       def my_function(self):
           ...

   class ChildB(Parent):
       @Substitution(klass="ChildB")                   
       @Appender(Parent.my_function.__doc__)
       def my_function(self):
           ...

The resulting docstrings are

The problem is that in a context of a documentation it would make sense to use names that are not defined, however the plugin checks the code-block as if it was a normal python file.

Forcing to include all names in the documentation would in my opinion lead to cluttered documentation and therefore I would ignore F821 completely.

datapythonista · 2018-10-27T03:12:00Z

I think we should import Substitution and Appender in that example. Wouldn't that fix the problem?

FHaase · 2018-10-27T07:58:41Z

Short answer: Yes
Long answer: F821 validations are a quite common issue.

flake8-rst doc --filename=*.rst --select=F821 --bootstrap="import pandas as pd; import numpy as np"

doc/source/dsintro.rst:162:9: F821 undefined name 's'
doc/source/dsintro.rst:661:7: F821 undefined name 'df'
doc/source/dsintro.rst:661:12: F821 undefined name 'df'
doc/source/dsintro.rst:668:7: F821 undefined name 'df'
doc/source/dsintro.rst:668:14: F821 undefined name 'df'
doc/source/basics.rst:318:12: F821 undefined name 'df'
doc/source/basics.rst:318:19: F821 undefined name 'df2'
doc/source/basics.rst:729:8: F821 undefined name 'f'
doc/source/basics.rst:729:10: F821 undefined name 'g'
doc/source/basics.rst:729:12: F821 undefined name 'h'
doc/source/basics.rst:729:14: F821 undefined name 'df'
doc/source/basics.rst:830:4: F821 undefined name 'df'
doc/source/basics.rst:830:13: F821 undefined name 'subtract_and_divide'
doc/source/advanced.rst:1038:13: F821 undefined name 'df'
doc/source/advanced.rst:1042:14: F821 undefined name 'df'
doc/source/text.rst:444:9: F821 undefined name 's'
doc/source/comparison_with_stata.rst:173:4: F821 undefined name 'tips'
doc/source/comparison_with_stata.rst:179:4: F821 undefined name 'tips'
doc/source/extending.rst:159:28: F821 undefined name 'ExtensionArray'
doc/source/extending.rst:159:44: F821 undefined name 'ExtensionScalarOpsMixin'
doc/source/extending.rst:256:27: F821 undefined name 'Series'
doc/source/extending.rst:266:30: F821 undefined name 'DataFrame'
doc/source/extending.rst:326:31: F821 undefined name 'DataFrame'
doc/source/extending.rst:341:13: F821 undefined name 'SubclassedDataFrame2'
doc/source/indexing.rst:1355:7: F821 undefined name 'df'
doc/source/indexing.rst:1744:4: F821 undefined name 'data'
doc/source/indexing.rst:1744:17: F821 undefined name 'index'
doc/source/indexing.rst:1799:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1799:35: F821 undefined name 'value'
doc/source/indexing.rst:1801:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1801:59: F821 undefined name 'value'
doc/source/indexing.rst:1807:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1807:28: F821 undefined name 'value'
doc/source/indexing.rst:1809:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1809:50: F821 undefined name 'value'
doc/source/indexing.rst:1834:21: F821 undefined name 'value'
doc/source/io.rst:1094:4: F821 undefined name 'read_csv'
doc/source/io.rst:1094:13: F821 undefined name 'path'
doc/source/io.rst:1102:4: F821 undefined name 'read_csv'
doc/source/io.rst:1102:13: F821 undefined name 'path'
doc/source/io.rst:1108:4: F821 undefined name 'read_csv'
doc/source/io.rst:1108:13: F821 undefined name 'path'
doc/source/io.rst:1114:4: F821 undefined name 'read_csv'
doc/source/io.rst:1114:13: F821 undefined name 'path'
doc/source/io.rst:2354:27: F821 undefined name 'url'
doc/source/io.rst:2363:23: F821 undefined name 'url'
doc/source/io.rst:2369:23: F821 undefined name 'url'
doc/source/io.rst:2375:23: F821 undefined name 'url'
doc/source/io.rst:2382:23: F821 undefined name 'url'
doc/source/io.rst:2388:24: F821 undefined name 'url'
doc/source/io.rst:2389:24: F821 undefined name 'url'
doc/source/io.rst:2396:23: F821 undefined name 'url'
doc/source/io.rst:2404:23: F821 undefined name 'url'
doc/source/io.rst:2425:23: F821 undefined name 'url'
doc/source/io.rst:2431:22: F821 undefined name 'randn'
doc/source/io.rst:2442:23: F821 undefined name 'url'
doc/source/io.rst:2448:23: F821 undefined name 'url'
doc/source/io.rst:2456:23: F821 undefined name 'url'
doc/source/io.rst:2692:4: F821 undefined name 'read_excel'
doc/source/io.rst:2740:26: F821 undefined name 'read_excel'
doc/source/io.rst:2741:26: F821 undefined name 'read_excel'
doc/source/io.rst:2744:12: F821 undefined name 'read_excel'
doc/source/io.rst:2766:4: F821 undefined name 'read_excel'
doc/source/io.rst:2773:4: F821 undefined name 'read_excel'
doc/source/io.rst:2780:4: F821 undefined name 'read_excel'
doc/source/io.rst:2787:4: F821 undefined name 'read_excel'
doc/source/io.rst:2794:4: F821 undefined name 'read_excel'
doc/source/io.rst:2861:4: F821 undefined name 'read_excel'
doc/source/io.rst:2868:4: F821 undefined name 'read_excel'
doc/source/io.rst:2882:4: F821 undefined name 'read_excel'
doc/source/io.rst:2893:4: F821 undefined name 'read_excel'
doc/source/io.rst:2905:4: F821 undefined name 'read_excel'
doc/source/io.rst:2919:4: F821 undefined name 'read_excel'
doc/source/io.rst:2937:4: F821 undefined name 'df'
doc/source/io.rst:2950:4: F821 undefined name 'df'
doc/source/io.rst:2957:9: F821 undefined name 'ExcelWriter'
doc/source/io.rst:2958:8: F821 undefined name 'df1'
doc/source/io.rst:2959:8: F821 undefined name 'df2'
doc/source/io.rst:2989:13: F821 undefined name 'ExcelWriter'
doc/source/io.rst:2990:4: F821 undefined name 'df'
doc/source/io.rst:3039:4: F821 undefined name 'df'
doc/source/io.rst:3042:13: F821 undefined name 'ExcelWriter'
doc/source/io.rst:3048:4: F821 undefined name 'df'
doc/source/io.rst:3733:7: F821 undefined name 'store'
doc/source/io.rst:3740:7: F821 undefined name 'store'
doc/source/io.rst:3750:7: F821 undefined name 'store'
doc/source/io.rst:3750:42: F821 undefined name 'string'
doc/source/io.rst:4203:4: F821 undefined name 'store'
doc/source/io.rst:4203:23: F821 undefined name 'df'
doc/source/io.rst:4734:9: F821 undefined name 'engine'
doc/source/io.rst:4845:30: F821 undefined name 'engine'
doc/source/io.rst:4846:30: F821 undefined name 'engine'
doc/source/io.rst:4861:4: F821 undefined name 'df'
doc/source/io.rst:4861:23: F821 undefined name 'engine'
doc/source/io.rst:4862:31: F821 undefined name 'engine'
doc/source/io.rst:4904:44: F821 undefined name 'engine'
doc/source/io.rst:4905:58: F821 undefined name 'engine'
doc/source/io.rst:4993:4: F821 undefined name 'data'
doc/source/io.rst:4993:24: F821 undefined name 'cnx'
doc/source/io.rst:4994:44: F821 undefined name 'con'
doc/source/io.rst:5213:9: F821 undefined name 'do_something'
doc/source/enhancingperf.rst:233:9: F821 undefined name 'apply_integrate_f'
doc/source/enhancingperf.rst:233:27: F821 undefined name 'df'
doc/source/enhancingperf.rst:233:36: F821 undefined name 'df'
doc/source/enhancingperf.rst:233:45: F821 undefined name 'df'
doc/source/enhancingperf.rst:239:9: F821 undefined name 'apply_integrate_f'
doc/source/enhancingperf.rst:239:27: F821 undefined name 'df'
doc/source/enhancingperf.rst:239:43: F821 undefined name 'df'
doc/source/enhancingperf.rst:239:59: F821 undefined name 'df'
doc/source/contributing.rst:645:9: F821 undefined name 'warnings'
doc/source/contributing.rst:646:9: F821 undefined name 'new_func'
doc/source/contributing.rst:896:9: F821 undefined name 'tm'
doc/source/contributing.rst:897:8: F821 undefined name 'df'
doc/source/contributing.rst:912:5: F821 undefined name 'pytest'
doc/source/contributing.rst:926:9: F821 undefined name 'warch'
doc/source/contributing.rst:927:8: F821 undefined name 'warnings'
doc/source/contributing_docstring.rst:28:9: F821 undefined name 'add'
doc/source/contributing_docstring.rst:29:9: F821 undefined name 'add'
doc/source/contributing_docstring.rst:30:9: F821 undefined name 'add'
doc/source/contributing_docstring.rst:461:16: F821 undefined name 'random'
doc/source/contributing_docstring.rst:481:18: F821 undefined name 'random'
doc/source/contributing_docstring.rst:482:27: F821 undefined name 'random'
doc/source/contributing_docstring.rst:482:41: F821 undefined name 'string'
doc/source/contributing_docstring.rst:503:19: F821 undefined name 'random'
doc/source/contributing_docstring.rst:957:9: F821 undefined name 'Substitution'
doc/source/contributing_docstring.rst:958:9: F821 undefined name 'Appender'
doc/source/contributing_docstring.rst:964:9: F821 undefined name 'Substitution'
doc/source/contributing_docstring.rst:965:9: F821 undefined name 'Appender'
doc/source/contributing_docstring.rst:973:14: F821 undefined name 'Parent'
doc/source/contributing_docstring.rst:974:14: F821 undefined name 'ChildA'
doc/source/contributing_docstring.rst:975:14: F821 undefined name 'ChildB'
doc/source/contributing_docstring.rst:994:5: F821 undefined name 'Appender'
doc/source/contributing_docstring.rst:994:14: F821 undefined name 'template'
doc/source/contributing_docstring.rst:994:25: F821 undefined name '_shared_doc_kwargs'
doc/source/groupby.rst:86:19: F821 undefined name 'obj'
doc/source/groupby.rst:86:31: F821 undefined name 'key'
doc/source/groupby.rst:87:19: F821 undefined name 'obj'
doc/source/groupby.rst:87:31: F821 undefined name 'key'
doc/source/groupby.rst:88:19: F821 undefined name 'obj'
doc/source/groupby.rst:88:32: F821 undefined name 'key1'
doc/source/groupby.rst:88:38: F821 undefined name 'key2'
doc/source/groupby.rst:1277:4: F821 undefined name 'df'
doc/source/groupby.rst:1277:42: F821 undefined name 'report_func'
doc/source/comparison_with_sas.rst:171:4: F821 undefined name 'tips'
doc/source/timeseries.rst:902:16: F821 undefined name 'DateOffset'

bootstrapping pandas as pd, numpy as np reduces the output already.

The question that rises is: Does a reader of a documentation require all these names to be defined in order to understand the example?

In my opinion: No. And the boilerplate code around every example in the doc makes it even more unreadable and confusing. Documented code-blocks are not meant to be understood by a computer, but by a human. Most of the variables explain themeselves by what they are called: url is a url, key is a key ...

I think from all the output above Substitution and Appender are the most debatable examples.
If I would read the documentation my first guess would be they were imported anyway.

So as a conclusion: Forcing the documentation to have all names defined would force a kind of documentation that is less readable without being more clear.
Deciding weather to specify a name or not should be a choice of the writer for the documentation, when the semantic is clear it's all fine. F821 issues come purely from syntax.

datapythonista · 2018-10-27T09:24:58Z

I disagree. I think all examples should be explicit in defining or importing all objects, so they run, and users can reproduce them excatly in the way they are presented on the documentation.

If I'm new to pandas and I see in the documentation something like:

s.head()

I don't see how this can be useful to me. I'd rather have:

import pandas
s = pandas.Series([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
s.head()

Which provides all the required information to understand, run, and edit and play with what is being shown.

We need to ignore F821 now, because it will take some time until all examples are fixed, and it'll be good to validate the rest of pep8 in the meantime, but I don't think we should accept or encourage having examples with missing context.

FHaase · 2018-10-27T10:55:56Z

Okay I see what you mean. My point is forcing every use of .. code-block:: python anywhere in the project to be a copy and pasteable example might lead to a more or less confusing situation for the reader. Let me quote some examples:

from pandas/core/accessor.py

Notes
-----
When accessed, your accessor will be initialized with the pandas object
the user is interacting with. So the signature must be

.. code-block:: python

    def __init__(self, pandas_object):  # noqa: E999
        ...
        
For consistency with pandas methods, you should raise an ``AttributeError``
if the data passed to your accessor has an incorrect dtype.

If I read that, it's absolutely clear what is meant, although it's invalid Syntax [E999]. If I imagine a class, imports, and something written within the function itself, it no longer correlates to So the signature must be.

Having to use invalid Syntax is less common.

from doc/source/groupby.rst

pandas objects can be split on any of their axes. The abstract definition of
grouping is to provide a mapping of labels to group names. To create a GroupBy
object (more on what the GroupBy object is later), you may do the following:

.. code-block:: python

    # default is axis=0
    >>> grouped = obj.groupby(key)
    >>> grouped = obj.groupby(key, axis=1)
    >>> grouped = obj.groupby([key1, key2])

what exactly obj, key, key1, key2 is is not important at this point of the documentation. And I think it should be the choice of the writer how abstract the code should be to be most clear.

After writing a section with mostly comments like this explaining how it works, someone could include a complete working example in a regular .py file included with .. literalinclude::

That way it's possible to run tests on the code so it can get updated when the api changes. And its flake8 and not flake8-rst that checks the code_style.

jorisvandenbossche · 2018-10-27T12:18:18Z

Note that a bunch of the warnings you listed above in #23154 (comment) are due to not properly importing the used packages / not properly using the imported name (eg all the read_excel warnings should be solved by using pd.read_excel).
See #9886 for this (already a lot has been cleaned up, but io.rst is clearly still to be done)

But, that said, I agree with @FHaase that not all code blocks should be exactly running code. Often, it can be more educational to show a certain snippet.

But other things:

The flake-rst plugin does not check code blocks in .. ipython:: python directives, is that correct? This is the majority of the code blocks
It seems to me (from the output you show above), that code-block:: directives are also not checked if they have like IPython input/output prompts in them (like In [8]: )

FHaase · 2018-10-27T14:09:54Z

Yes, currently not everything gets checked. The regex to find the parts is
\.\. (code-block|sourcecode):: (python|pycon)\n
I opened a PR to also include ipython within the checks as well as >>> ... outside of code-block directives.
So I hope shortly .. ipython:: python directives will be covered.

In order to support In [8]: the regex needs to be modified.
.. code-block:: or the very short :: without a language specification might contain ruby, java etc. code so linting would be problematic. -> solution could be to disallow these directives.

I've marked the PR to close this issue, because once flake8-rst finds some python-code it gets checked by flake8 just like any other .py file.
Further improvements could be done by extending the amount of code that flake8-rst finds, but that is in the scope of that plugin, not this project.

datapythonista · 2018-11-09T14:31:36Z

Fixed in #23399

datapythonista added Docs CI Continuous Integration Code Style Code style, linting, code_checks Effort Low good first issue labels Oct 14, 2018

jorisvandenbossche mentioned this issue Oct 14, 2018

DOC: docstring validation script improvements #20298

Open

19 tasks

datapythonista mentioned this issue Oct 14, 2018

DOC: Updating operators docstrings #20415

Merged

5 tasks

dubielt1 mentioned this issue Oct 17, 2018

[WIP] Validate PEP-8 in docstrings #23194

Closed

5 tasks

FHaase mentioned this issue Oct 26, 2018

ENH: Tox as main entry point for development #23356

Closed

FHaase mentioned this issue Oct 27, 2018

Integrate flake8_rst into ./ci/code_check.sh #23381

Merged

2 tasks

FHaase mentioned this issue Oct 29, 2018

ENH: Use flake8 to check for PEP8 violations in doctests #23399

Merged

2 tasks

FHaase mentioned this issue Nov 5, 2018

Discussion: General ignore F821 within documentation #23508

Closed

datapythonista closed this as completed Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate PEP-8 in docstring examples #23154

Validate PEP-8 in docstring examples #23154

datapythonista commented Oct 14, 2018 •

edited

Loading

dubielt1 commented Oct 16, 2018

FHaase commented Oct 18, 2018

datapythonista commented Oct 25, 2018

FHaase commented Oct 25, 2018

datapythonista commented Oct 25, 2018

FHaase commented Oct 25, 2018

jorisvandenbossche commented Oct 26, 2018

FHaase commented Oct 26, 2018 •

edited

Loading

datapythonista commented Oct 27, 2018

FHaase commented Oct 27, 2018

datapythonista commented Oct 27, 2018

FHaase commented Oct 27, 2018

jorisvandenbossche commented Oct 27, 2018

FHaase commented Oct 27, 2018

datapythonista commented Nov 9, 2018

Validate PEP-8 in docstring examples #23154

Validate PEP-8 in docstring examples #23154

Comments

datapythonista commented Oct 14, 2018 • edited Loading

dubielt1 commented Oct 16, 2018

FHaase commented Oct 18, 2018

datapythonista commented Oct 25, 2018

FHaase commented Oct 25, 2018

datapythonista commented Oct 25, 2018

FHaase commented Oct 25, 2018

jorisvandenbossche commented Oct 26, 2018

FHaase commented Oct 26, 2018 • edited Loading

datapythonista commented Oct 27, 2018

FHaase commented Oct 27, 2018

datapythonista commented Oct 27, 2018

FHaase commented Oct 27, 2018

jorisvandenbossche commented Oct 27, 2018

FHaase commented Oct 27, 2018

datapythonista commented Nov 9, 2018

datapythonista commented Oct 14, 2018 •

edited

Loading

FHaase commented Oct 26, 2018 •

edited

Loading