-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pycodestyle
] Add E12 rules (continuation lines indentation rules)
#8557
Conversation
Add E122 rule and base for adding the other E12 rules, based on the pycodestyle implementation. Autofix and configuration are not included.
…le implementation is broken.
This reverts commit 239c164.
Change indent_chances into a HashMap to keep track of what kind of indent was on which line.
CodSpeed Performance ReportMerging #8557 will degrade performances by 7.6%Comparing Summary
Benchmarks breakdown
|
Also add the rule examples to KNOWN_FORMATTING_VIOLATIONS.
c53a28f
to
f398c96
Compare
As a note, fixes for these would bring Ruff closer to parity with autopep8 #9057 (comment) |
This is nearly the last missing pycodestyle check. Having it would allow use to throw away the slow |
@MichaReiser sorry to ping you here, but I would appreciate some feedback/advice on this PR. I had implemented the E12 rules for completeness (to get closer to having all the boxes checked in #2402), but I went at it lazily and used the same logic as pycodesyle, which resulted in the PR not even passing the CI. Do you think it would be worth it to spend time trying to fix/rework it, or should this PR be closed and forgotten ? |
@hoel-bagard I try to take a look on Monday. I've a lot of catch up to do on PRs and I'm, unfortunately, not as fast as @charliermarsh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hoel-bagard
I haven't reviewed the code yet but I read through the rules.
My main concern is that not all rules are formatter-compatible. For example, ClosingBracketNotMatchingOpeningBracketVisualIndentation
is guaranteed to conflict with all formatted code. This is problematic because many users use --select ALL
, which enables these new rules, and they then get a lot of errors if they use our formatter (and see an incompatibility warning when running format). That's why we want to hold off from adding new formatting-related lint rules that are formatter incompatible until we're done with the rules re-categorization where we are likely to have a format
(or similar) related category that we only recommend for users that do not use our formatter.
I'm having a hard time judging if other rules conflict with the formatter, too, from just reading their documentation. E.g. I'm unsure about ContinuationLineOverIndentedForHangingIndent
, ContinuationLineOverIndentedForVisualIndent
,
I believe VisuallyIndentedLineWithSameIndentAsNextLogicalLine
and ContinuationLineUnalignedForHangingIndent
are incompatible, too (at least today, we are considering changing that).
I'm sorry that this means we can't move forward with the rules as they are now. We could explore limiting the rules to the formatter-compatible ones or finding a way to make them formatter-compatible.
} | ||
|
||
/// ## What it does | ||
/// Checks for continuation line over-indented for visual indent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between this and ContinuationLineOverIndentedForHangingIndent
and ContinuationLineUnderIndentedForVisualIndent
isn't clear to me from reading the documentation. Are these different styles and users should only enable one of those rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like they could both apply in one codebase, although that would not be a very consistent codebase.
Looking at the fixtures from pycodestyle, E126
(ContinuationLineOverIndentedForHangingIndent
) seems to apply when there is a newline after the symbol causing the continuation line ((
, \
, etc...) whereas E128
(ContinuationLineUnderIndentedForVisualIndent
) applies when there is at least one value after the start of the continuation line.
For example this is a E126
error
print("E126", (
"1",
"2",
))
Whereas this is a E128
error:
print("E128", ("1",
"2",
))
|
||
for ch in line.chars() { | ||
if ch == '\t' { | ||
indent = indent / 8 * 8 + 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to support the new indent_width
setting
Hello, I want to emphasize that this addition would be very welcome. I had many instances where I would have liked ruff to tell me about over-indentation (E127) when I refactored my code (removing arguments from a function that changed the indentation). I am interested in this PR. If I have the time and the code is not yet OK to be merge I'll look into this in the next month. |
Having rules that conflict with the formatter would be quite annoying indeed (I already noticed the formatter disagreeing with flake8, and that meant having to add an exception to the flake8 config). I tried running ruff with the I tried to make an example that would cause the formatter and the The following triggers if (var in "one_looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_string"
and var not in "another_looooooooooooooooooooooooooooooooooooooooooooooooooooong_string"):
... But it get formatted into the if (
var
in "one_looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_string"
and var
not in "another_looooooooooooooooooooooooooooooooooooooooooooooooooooong_string"
):
... The docstring examples and fixtures would definitely need to be simplified, I'll try to do that today/tomorrow. I wanted to run the formatter on the |
@MichaReiser I ran ruff format on the The errors that did not get fixed automatically are both if """
""":
pass
for foo in """
abc
def
""".strip().split():
print(foo) They can be fixed by manually formatting them to: if """""":
pass
for foo in """
abc
def
""".strip().split():
print(foo) (this changes the content of the string, but that's not really the point here) At which point the formatter does not attempt to change the file anymore, and both pycodestyle and the rules added in this PR do not detect any So it seems like the |
I haven't forgotton about this PR but we first need to decide internally if we want to add more formatting lint rules and my priority for now is to fix incompatibilities with existing rules. |
Thank you @hoel-bagard for your patience and sorry that it took me so long. I started reviewing the rules but stopped after E125 because discovering all possible formatter incompatibilities took a very long time, and I generally struggled to understand the rules or how they differ (this isn't a critique, it's just me struggling and not being familiar with the rules). For a = (
a
+ b
+ len([
1,
2,
]) # NOK
+ more([
a,
]) # NOK
) I'm generally open to accepting the rules The other thing we need to consider is that the rule seems to regress performance somewhat significantly. I think we need to figure out a design that reduces the performance less. This could involve refactoring existing pycodestye rules to see if performance can be improved. What do you think? Do you want to pursue the rules in individual PRs (please don't open all PRs at once, let's do one at a time)? |
Thanks for taking time to look into it. Implementing the rules one by one sounds good to me, that would also likely make understanding what each rule does easier.
I suspect that working directly on tokens would help with the performance issue. I used logical lines because that's what pycodestyle does, but I don't think that was a good idea.
I can try to start working on |
Logical lines might be fine, I'm not sure. One problem I noticed while skimming over the code is that we allocate multiple vectors and hash sets for each line. I think we should look into how we can avoid that. It can mean that we should use something other than logical line, reuse the allocations across check calls etc.
Thanks a lot! I'll close this PR. We can still use it as a reference. I hope that's okay with you. |
I think the main issue with logical lines is that they have to be sliced to look for new lines (like here for example). This currently happens for every token, which I assume is the slow part.
If you're referring to this, then it's being done for every token, which is most likely more than needed indeed.
No issue at all. I'd like to keep using this PR as a reference / discussion point to avoid going into a wrong direction. |
Summary
This PR is part of #2402, it adds the
E12
rules (continuation lines indentation rules).Test Plan
The test fixture uses the one from pycodestyle, except for
E133
for which there is no fixture since it is an opt-in config option (same as pycodestyle) and I do not know how to handle it.Discussion
This PR replaces the add E122 PR.
Performance issue
The changes from the PR seem to slow down ruff quite a bit, I'm guessing that the
get_token_infos
function is the slow part, but it was needed to follow the pycodestyle implementation. Is there a way to implement it that would be faster ?Extra rules
The PR is essentially a port of the pycodestyle implementation from python to rust. I implemented all the errors present in that function before realizing that E121, E123, E126 and E133 are not in the #2402 list. Should I remove them from the PR ?
Test error
When running the tests, the following deserialization test fails:
I'm not sure what to do about it, although I suppose the issue would disappear if
E123
is removed from the PR.