Automatic Dialect Switching using Dialect Tags Embedded in a Comment. #6453
Replies: 1 comment 4 replies
-
So, this has very little to do with Linguist. Lets start with a few facts:
With that out of the way, I can say you're not going to be able to get automatic dialect detection working in codeblocks and/or files as I think you're trying to do as this is going to need major changes to markup and GitHub's syntax highlighting engine. The closest I think you can get is one or both of the following:
OR
So with that in mind, I can comment on some of your comments:
You already know this, but this has nothing to do with Linguist and is entirely down to the third-party grammar.
You'll only be able to get close to this using the methods I detailed above.
This has nothing to do with Linguist. As previously stated, Linguist supplies the grammars needed by the highlighting engine and these need to be in Textmate-compatible format. How to write and maintain Textmate compatible grammars is outside of the scope of Linguist though @Alhadis is quite the expert so may be able to offer tips and help. Textmate has their own documentation (though it is a bit poor the last time I looked) as does VS Code.
The only place you can do this is within the grammar, assuming it's possible to support dialects.
You can't. Linguist has a one-to-one mapping between languages and grammars (the Footnotes
|
Beta Was this translation helpful? Give feedback.
-
Hi there,
I have several Modula-2 projects, one of which is a multi-dialect compiler. The sources are in different dialects, there are also other Modula-2 projects on Github which use different dialects.
In some cases the differences between a dialect is very minimal, for example when a compiler implements one or two additional built-in functions that aren't standard. In those cases one might argue that it is sufferable not to have this highlighted correctly.
HOWEVER, there are major differences between three main language dialects and for those differences it is extremely annoying when the sources are incorrectly highlighted. For this reason, I had written a multi-dialect plugin for Pygments where the dialect was automatically chosen by examining a comment at the beginning of a source file that contained a dialect tag indicating the dialect.
The dialect tagged comments were:
I even added support for various compiler-specific extensions over time, like:
All one had to do is add such a tag comment at the top of one's source files and everything was rendered properly for the given dialect.
The very same dialect tags are also supported by VIM and Emacs. I contributed support to VIM and the maintainer of GNU Modula-2 contributed the support to Emacs.
So, this is a kind of de-facto standard that has been in place for years now.
Unfortunately though, both Bitbucket and Github moved away from using Pygments. As a result, this scheme no longer works there.
Worse still, the current support for Modula-2 in Linguist is broken. It doesn't recognise interface modules at all, and the list of reserved words and predefined identifiers that it highlights is a mish-mash that does not reflect any dialect, nor any compiler extensions, it is just thrown together with various reserved words and identifiers from Pascal. It doesn't reflect Modula-2. It's just all wrong.
I would like to help making this auto-selecting multi-dialect rendering work again on Github, but I find Linguist extremely confusing, in particular since it doesn't seem to have its own way to define grammars but somehow uses various grammar schemes of third party software. It seems to me that in the time one needs to learn all the different components and understand how they work together, I could write a lexer and parser with HTML renderer from scratch. But hey, its not my decision to make how Github want to render code, however, I will need a bit of help to be able to contribute that functionality that was once working on Github and had been inadvertently removed when Github moved off Pygments.
So, I would appreciate if somebody could give me some pointers, in particular where one would add code to search and parse the aforementioned dialect tag comments to then be able to choose the corresponding grammar, and also how to make Linguist accept multiple grammars for the same language. At the moment, I cannot see anything in the documentation that gives me any ideas how to do those two essential things.
thanks in advance
regards
benjamin
Beta Was this translation helpful? Give feedback.
All reactions