-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CS2] Fix handling of tabbed code blocks in .litcoffee files #4485
Conversation
…parsing of tabbed .litcoffee files; and more accurate stack traces (assuming you don’t do your own word wrapping within list items)
src/helpers.coffee
Outdated
# `marked` really doesn’t process tabs too well. All this code is headed | ||
# into JavaScript anyway, so replace the tabs with two spaces to make | ||
# `marked`’s job a little easier. | ||
code = code.replace /^(\t)+/gm, (match) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about multiline strings and regexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code
here is the source code of an entire file. This function is intending to replace all leading tabs (between the start of any line and the first non-tab character) with 2 spaces × the number of leading tabs on that line. Does it not do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m worried that it changes the meaning of mulitline strings and regexes.
Here’s a multiline string:
myString = """
line one
line two
"""
eq myString, "line one\n\tline two"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps only replacing the first tab on each line is enough? code.replace /^\t/gm, ' '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the previous approach, replacing the first tab with a tab plus a magic token. Then after the parsing by Marked, the tab plus token were replaced back with a tab. For some reason I was getting lots of compiler errors regarding indentation when I did that, that I couldn't figure out. I'll look into it some more.
Many text editors have a function to split long lines and insert newlines to make all lines fit within, say, 80 columns. In my experience, I’ve seen just as much “soft wrapped” as “hard wrapped” markdown in the wild. |
Throw a Parsing |
Just an aside: Almost every style guide for every language recommends spaces over tabs these days. Officially, Python only tolerates tabs to support existing codebases. There's an opportunity now to mandate spaces. It's obviously going to upset some people, but it would simplify things if the new spec simply disallowed tabs in source code entirely. |
@carlsmith That would be a breaking change with no upside for users. Sure it would make our lives easier, but supporting tabs isn't so hard. The compiler shouldn't also be a linter. |
How bad would it be to return to the v1 implementation? What concrete bugs does using Marked solve? At the very least it makes stack traces difficult to keep correct. @billymoon |
Most languages are trying to phase out the use of tabs, but if it's a non-starter, that's cool. It was worth a shot. Thanks for everything you've been doing @GeoffreyBooth. Appreciate it. |
Yet an approach would be to switch to another markdown parser, such as markdown-it. |
@GeoffreyBooth I am actually in favour of removing support for literate coffee from the project, and creating a more general project to handle any file-type, which is what I was experimenting with a POC for (https://github.com/billymoon/illiterate). The new workflow would be something like...
... and for other files ...
... etc Ultimately, I really like literate style, and want to be able to use it on all my code, not just coffee-script. Actually, I remember when adding a |
Just a thought: If the CoffeeScript compiler allows scripts to import other CoffeeScript modules, wouldn't it need to understand literate files? I may be wrong. I'm not familiar with how imports work, but in interpreted languages, the interpreter would need to understand literate syntax to import those files. |
CoffeeScript doesn’t care what you try to import. |
@billymoon take a look at my illiterate branch. I agree that that’s a good approach long-term, but it would be a pretty major change to CoffeeScript at the moment; not least because part of the compiler itself is written in .litcoffee. One big problem that needs solving is source maps, or rather how to preserve accurate line numbers in stack traces. Check out that branch and introduce a syntax error near the end of Also it really should understand tabs properly. Regardless of whatever movement I may be unaware of to get rid of tabs across the world, they make up substantial percentages of repos on GitHub, and I would only expect that to increase as more people realize there are no technical limitations anymore to working with tabs and source control. Tabs provide the benefit that different members of a team can choose their own indentation level (i.e. tell their editor to render a tab as being 2 spaces or 4) and that has really kept the peace on many of the teams I manage. I wouldn’t want to lose that option. |
Please let’s not discuss tabs vs spaces. Markdown supports both spaces and tabs for code blocks, so we have to as well. Period. :) https://daringfireball.net/projects/markdown/syntax#precode |
@GeoffreyBooth - I never said anything about any movements across the world. I said...
...which is objectively true, these days. Tons of code exists that was written with tabs, which is why they're a bitch to phase out, but the general consensus, at least amongst the authors of official style guides, is that we should stop using them. The Python Style Guide says don't use tabs. Same for Ruby, PHP and Java, as well as Google's style guides for CSS, C++ and JavaScript. They're the first examples I could find yesterday during a similar conversation. I just looked at Google's Markdown Style Guide as well, and it says "use four spaces". You made it clear that you aren't interested, which I happily accepted, and thanked you for your hard work:
I don't understand why you mocked me for that, but it doesn't really matter. If you think tabs are better than spaces, and that they're going to grow in popularity (despite the style guides), then you should obviously support them. There's no point countering your argument for tabs over spaces, as we've all heard both sides of that debate a thousand times. |
@carlsmith My apologies if I came across as mocking, that wasn’t my intent. And I agree with @lydell, I don’t want yet another flame war about tabs vs spaces. I was trying to avoid taking a side; my point was just that tabs are still quite popular, despite style guides advising against them, and so we need to support them. I doubt many of those projects are just legacy codebases that haven’t been updated to the latest styles. Spaces have been most languages’ recommended style for a long, long time. |
@GeoffreyBooth - No problem. To be fair, I'm tired, and probably shouldn't have let your comment bother me as much as I did. Sorry mate. I have these days sometimes, and can only apologise. I understand what you're saying about tabs, and it does make sense. I'm a four-spaces guy, so there's a good chance I'm guilty of an unconscious bias here. The thing is, when you don't really contribute to a project, but care about it a lot, you end up in a tricky situation. On one hand, you always want to chip in with your own perspective, but on the other hand, you don't want to be the ideas-guy that argues for a particular position, but then expects someone else to implement it. |
@carlsmith And I bet it irks you to read the CoffeeScript codebase with its 2-space indentation, doesn’t it? If only it had been written in tabs, that you could choose to display as 4 spaces . . . 😛 I wouldn’t say you don’t contribute to the project. You’ve written a lot in many of the issue threads, and your comments have been valuable and helped guide the discussion and consensus. That in and of itself is a positive contribution, and I appreciate it. At work I manage teams of developers. I’ve had to mediate this dispute way too many times. I’ve adopted a policy of letting each team decide for themselves per language what they want. Most people—certainly not everyone, but more than half—prefer spaces, and often they can agree on either 2 spaces or 4 for indentation; but sometimes there are fierce disagreements over 2 spaces versus 4. When that happens the solution is to compromise on tabs, so that each camp can see the code the way they prefer. I once had someone insist that even that wasn’t good enough, that she needed not just 4-space indentation but 4 actual space characters as well, and I wrote her some Sublime Text macros to auto-format her code to 4-spaces on file open and back to tabs on save. That’s pretty extreme, but it gets to the larger point that we’re not in the 1970s or 1990s anymore: we’re not constrained by technical limitations that force one way or another, so each team member should get to see their own preference, just like each developer gets to choose their own editor or syntax highlighting. There’s nothing wrong with tabs, like there’s nothing wrong with 2 spaces or 4. They’re all reasonable choices, and we shouldn’t make people preprocess their code to our standard before the compiler will accept it. I don’t know about Python, but Go is going in the opposite direction: http://stackoverflow.com/a/19094725/223225 |
You've got me there :)
Thank you. That's very kind of you. The Go approach is really interesting. |
@GeoffreyBooth - I hope your team appreciate how well you deal with people. People are difficult. You have the qualities a good lead should have, but all too often lack. Thanks again mate, and sorry for taking things the wrong way. |
@lydell thanks for the tip, I updated illiterate to use markdown-it @GeoffreyBooth I broke the back of adding source maps to illiterate, currently as inline data uri string. I need to think about the whole cli command structure to allow for optional source maps and on a per file basis, and how to deal with input from stdio, but the source map writing part is done (mapping full source line to destination line, disregarding columns, which are going to be identical in this case). Does that help at all with your experimental illiterate branch? |
@billymoon I’m not sure how I would pass along and process the sourcemaps generated by Illiterate in the main CoffeeScript parser. The I at least figured out why the “replace tabs with a magic token” approach was failing: Marked itself replaces tabs with spaces! See: node -e 'console.log(require("marked").lexer("\t\tfoo();\t\t\tbar();", {}))'
[ { type: 'code', text: ' foo(); bar();' },
links: {} ] |
…eScript files; use MarkdownIt’s `map` property to preserve correct line numbers
…tarting indentation
…s must be unindented
…ed a blank line separating them from the list item text
Okay, I think I’ve got it! Including correct line numbers in stack traces! The function is so short now I’ll just paste it here: exports.invertLiterate = (code) ->
out = []
md.renderer.rules =
code_block: (tokens, idx) ->
startLine = tokens[idx].map[0]
lines = tokens[idx].content.split '\n'
for line, i in lines
out[startLine + i] = line
md.render code
out.join '\n' Yes it could squeeze even shorter, but I’m not trying to golf here. Anyway let’s unpack this. First, @lydell’s suggestion of MarkdownIt ( Second, @billymoon’s commit had the brilliant idea of using MarkdownIt’s rules engine for pulling out the code blocks. Much easier than my first idea, which was to use its I inspected what the There is one caveat: the new library introduces a two minor breaking changes to Literate CoffeeScript:
I can live with these changes. I had to modify two tests to accommodate them, but nothing in |
@GeoffreyBooth just checking, is it still desirable to externalise literate feature and pass source maps from illiterate? I can have a bash at that if it's useful. p.s. you know the code is right when there are only 10 lines :) |
@billymoon Yes and no 😄 I think at this point people expect the CoffeeScript compiler to process There’s actually no need for source maps using the final implementation I have now, as every line of code is at the same row and column as the original Markdown; Illiterate could take the same approach, which might simplify things. |
@GeoffreyBooth well, if that's the case, I don't really see much use in putting illiterate in. Keep up the good work :) |
@billymoon sorry. I support what you’re doing though, I think you’re right and Illiterate is the way Literate CoffeeScript should’ve been implemented in the first place. If you want to propose a way to replace our current hodgepodge with Illiterate, and it wouldn’t be a breaking change, I would heartily support that. |
Fixes #4463.
So since #4345, code blocks in .litcoffee files that were indented with tabs were simply skipped over, and the code never executed. That’s why the tests were “passing,” because they were never running. Oops!
I wanted the tests to be a bit more thorough, so I duplicated the
literate.litcoffee
file to a version where all the code blocks are indented with tabs. Now not only are we testing that tabbed code blocks work, but that they work in all the same scenarios as spaced code blocks.After a lot of investigation, and a detour in trying to implement this with Illiterate (see branch) I think the root cause is that our Markdown parsing library Marked just doesn’t handle tabbed code blocks properly. @billymoon was aware of this in his initial implementation, which had a magic token substituting for tabs, but it was buggy (for one thing, it was only replacing the first tab in the file, not all tabs or all leading tabs). I realized that since this input is just getting turned into JavaScript anyway, we might as well replace the leading tabs with spaces before Marked does its parsing, to sidestep Marked‘s bug. This seems to have solved it.
I went a little further to try to preserve correct line numbers for stack traces, which have also been completely blown away since we switched to using Marked. I pretty much have it fixed if you don’t do your own word wrapping for list items, which of course we’re doing in our tests. (Maybe we shouldn’t; I doubt it’s common for people to manually word-wrap their Markdown.) Anyway improving it further could be the subject of another PR if anyone is interested.