Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix flake8 issues in doc/source/groupby.rst #24178

Closed
datapythonista opened this issue Dec 9, 2018 · 12 comments
Closed

DOC: Fix flake8 issues in doc/source/groupby.rst #24178

datapythonista opened this issue Dec 9, 2018 · 12 comments

Comments

@datapythonista
Copy link
Member

We didn't start validating the format of PEP8 and other code standards in the documentation examples until recently. We still have some files with errors, that we need to skip, and that we should fix, so we can start validating them.

The first step of this issue would be edit setup.cfg in the pandas home, and in the flake8-rst section, remove from the exclude list the file doc/source/groupby.rst

After that, running the next command will report the errors in the file (note that syntax error usually prevent to validate other errors, and the list of errors to fix can become much longer when the syntax error is fixed (please make sure that you are using flake8-rst version 0.7.0 or higher):

$ flake8-rst doc/source/groupby.rst 
doc/source/groupby.rst:242:15: E225 missing whitespace around operator
doc/source/groupby.rst:242:15: E999 SyntaxError: invalid syntax
doc/source/groupby.rst:242:19: E225 missing whitespace around operator

Once all the errors are addressed, please open a pull request with the fixes in the file, and removing the file from setup.cfg. If you need to do something that feels wrong to fix an error, please ask in a comment to this issue. Please avoid other unrelated changes, which can be addressed in a separate pull request.

@LJArendse
Copy link
Contributor

@datapythonista I would like to give this issue a try :)

@datapythonista
Copy link
Member Author

please do, and let me know if you have questions or need help, thanks @LJArendse

LJArendse added a commit to LJArendse/pandas that referenced this issue Dec 10, 2018
@LJArendse
Copy link
Contributor

@datapythonista
I have a question about the following 'syntax error' found on line 242:

doc/source/groupby.rst:242:15: E225 missing whitespace around operator
doc/source/groupby.rst:242:15: E999 SyntaxError: invalid syntax
doc/source/groupby.rst:242:19: E225 missing whitespace around operator

Line 242 looks as follows:

239		.. ipython::
240
241		@verbatim
242		In [1]: gb.<TAB>
243		gb.agg        gb.boxplot    gb.cummin     gb.describe   gb.filter     gb.get_group  gb.height     gb.last       gb.median     gb.ngroups    gb.plot       gb.rank       gb.std        gb.transform
244		gb.aggregate  gb.count      gb.cumprod    gb.dtype      gb.first      gb.groups     gb.hist       gb.max        gb.min        gb.nth        gb.prod       gb.resample   gb.sum        gb.var
245		gb.apply      gb.cummax     gb.cumsum     gb.fillna     gb.gender     gb.head       gb.indices    gb.mean       gb.name       gb.ohlc       gb.quantile   gb.size       gb.tail       gb.weight

The <TAB> is throwing the SyntaxError. My fix is the following:

239		.. ipython::
240
241		@verbatim
242		# After typing "gd." in the ipython terminal, click the <Tab> button on your keyboard which will allow you to tab complete any of the commands below.
243		In [1]: gb.ClickTab
244		gb.agg        gb.boxplot    gb.cummin     gb.describe   gb.filter     gb.get_group  gb.height     gb.last       gb.median     gb.ngroups    gb.plot       gb.rank       gb.std        gb.transform
245		gb.aggregate  gb.count      gb.cumprod    gb.dtype      gb.first      gb.groups     gb.hist       gb.max        gb.min        gb.nth        gb.prod       gb.resample   gb.sum        gb.var
246		gb.apply      gb.cummax     gb.cumsum     gb.fillna     gb.gender     gb.head       gb.indices    gb.mean       gb.name       gb.ohlc       gb.quantile   gb.size       gb.tail       gb.weight

Do you have any suggestions for how we can address this better? I don't think gb.ClickTab is an ideal way to practically show the tab completion...

@datapythonista
Copy link
Member Author

I think we already fixed it with a noqa somewhere else, I think a grep "<TAB>" *.rst should tell you quickly

@LJArendse
Copy link
Contributor

Thanks for the help, that's awesome will give it a try

@LJArendse
Copy link
Contributor

@datapythonista thanks for the help, found in computation.rst
Could you explain what the noqa does in:

266                        In [14]: r.<TAB>                                          # noqa: E225, E999

@datapythonista
Copy link
Member Author

when flake8 finds a comment with a noqa in a line, it does not report as an error the specified error codes

LJArendse added a commit to LJArendse/pandas that referenced this issue Dec 13, 2018
@LJArendse
Copy link
Contributor

@datapythonista What do you suggest is the best way to fix the following errors:

flake8-rst doc/source/groupby.rst
doc/source/groupby.rst:72:18: F821 undefined name 'obj'
doc/source/groupby.rst:72:30: F821 undefined name 'key'
doc/source/groupby.rst:73:18: F821 undefined name 'obj'
doc/source/groupby.rst:73:30: F821 undefined name 'key'
doc/source/groupby.rst:74:18: F821 undefined name 'obj'
doc/source/groupby.rst:74:31: F821 undefined name 'key1'
doc/source/groupby.rst:74:37: F821 undefined name 'key2'

The lines in question are:

   >>> grouped = obj.groupby(key)
   >>> grouped = obj.groupby(key, axis=1)
   >>> grouped = obj.groupby([key1, key2])

It is a very good generic explanation of how to groupby an object. Should I keep it as is? or Should I rather show the same example but with an actual example dataframe and dummy data?

@LJArendse
Copy link
Contributor

LJArendse commented Dec 15, 2018

@datapythonista A suggested actual example would be something like this:

A groupby can be applied in the following ways to a pandas object

grouped = obj.groupby(key)
grouped = obj.groupby(key, axis=1)
grouped = obj.groupby([key1, key2])

Below you can see groupby applied to a dataframe object

import pandas as pd
import numpy as np

n = 1000

df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
                      'Product': np.random.choice(['Product_1',
                                                   'Product_2'], n),
                      'Revenue': (np.random.random(n) * 50 + 10).round(2),
                      'Quantity': np.random.randint(1, 10, size=n)})
key = 'Product'
key1 = 'Store'
key2 = 'Product'

grouped = df.groupby(key)
grouped = df.groupby(key, axis=1)
grouped = df.groupby([key1, key2])

@datapythonista
Copy link
Member Author

Looks good, but not a big fan of random data. May be you can use the same example as in: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html

Also,

  • I wouldn't define key, key1..., just use the value directly
  • axis='columns' is more explicit than axis=1
  • You don't need the imports, they are defined at the beginning of the file already

LJArendse added a commit to LJArendse/pandas that referenced this issue Dec 19, 2018
doc/source/groupby.rst:72:18: F821 undefined name 'obj'
doc/source/groupby.rst:72:30: F821 undefined name 'key'
doc/source/groupby.rst:73:18: F821 undefined name 'obj'
doc/source/groupby.rst:73:30: F821 undefined name 'key'
doc/source/groupby.rst:74:18: F821 undefined name 'obj'
doc/source/groupby.rst:74:31: F821 undefined name 'key1'
doc/source/groupby.rst:74:37: F821 undefined name 'key2'
@LJArendse
Copy link
Contributor

@datapythonista Done, flake8-rst doc/source/groupby.rst is not reporting any more errors. I will open a pull request for you to review my changes.

@datapythonista
Copy link
Member Author

Closed by #24363

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants