-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regex matching for Snippets #2525
base: main
Are you sure you want to change the base?
Add regex matching for Snippets #2525
Conversation
2ceaee1
to
b252d7f
Compare
I think it would be nice if it was possible to specify the string to join the match groups with somehow, but I doubt anyone will ever need that. |
I'm questioning how useful this would be to the general population. I personally have no need for such functionality, and I question how generally useful it is. I don't doubt that some may use it for some niche purposes, but it is inherently more fragile and could cause people frustration with false positive matches, gathering too many lines across the file, etc. I think, while powerful, it is hard to use and likely impractical in most cases. Snippets is purposely designed to be more explicit when declaring what you wish to include: lines, sections, files, etc. I'm not sure how comfortable I am in opening it up to a, what I personally view as, very niche behavior that is not generally useful to the average user. To be frank, I personally don't feel this is a good way to manage snippets and a lot harder to support for the average user as subtle changes in a linked file could unexpectedly include things in your Markdown you did not intend to include. This is, of course, a subjective opinion, but I would need a lot of convincing. |
Thanks for the quick and thorough response! I hope I can answer it satisfyingly. I think you have some really good points. The reason I want to add this feature is that the pymdown-extensions are used a lot in technical documentation, for example through the popular Material for MkDocs. To me, regex is not a niche utility - it's the main way to extract text from files, so I was surprised it's not part of the Snippets code. I use regex constantly, and I rarely if ever encounter a tool related to technical writing that doesn't support some form of it. The advantage of being able to use regex in Snippets is that I can parse configuration documents without requiring significant line numbers. In most configuration languages, you wouldn't expect the order and number of lines to be significant. From my point of view, using line numbers is a much more fragile solution than regex. Example: I always want to extract the username from the config file. With line numbers, I'd have to make sure that it never moves to another line if the config file is updated. With regex, I'd know that I always get what I asked for. I think you are completely right that regex can be impractical in many cases, but the same goes for line numbers and the sections system. I think they are for different use cases. Another example: Lets say I have a system where I manage authorization through a plain text file in git. Every time I make a commit to the main branch, the system pulls the configuration, and finally mkdocs builds the documentation. It could be a csv file that looks like this:
Now I want to document what users have read access, and what users have admin access. With this feature in Snippets, all I'd have to do would be to write a markdown file like this:
This would be really convenient for me. I could achieve the same using the sections syntax, but that would make the CSV file invalid. There are so many similar use cases, like generating an inventory of servers with some server role or whatever. And sure, I could make the mistake of writing But why would anyone complain to you about that? If that's something that worries you, I could add a line to the documentation where it says that it uses the python Without the regex feature in Snippets, I'd have to write a CI script that parses the csv file, outputs the list of users and have Snippets add that to the rendered markdown file. This would not be as good a solution, as now part of the text processing happens in the CI script and half of the text processing happens in the markdown file. CI pipelines are for security reasons often not editable for all users - frequently you just include the pipeline from some other repository where there are tight security controls. This means that for people to be able to add stuff to the documentation this way, they need CI pipeline access. And it also means I now have to maintain a CI pipeline per config file format I need to parse, which really sucks and adds a lot of complexity. And from my point of view, it is much worse to have documentation writers destroy the entire pipeline instead of making a mistake in the documentation. Finally, I completely agree that adding this functionality lets the user make some confusing mistakes. However I think this is entirely expected by anyone who chooses to use the regex functionality. The same goes for the line number syntax - you have to include the correct lines and make sure the lines you want to include don't change positions. I also think my contribution is quite elegant and simple, as it is convention in regex to enclose the regex string in slashes, and that the Snippets code would never be able to include a file that ends in / anyway, since that would not be a valid filename. It doesn't force the user to use regex for line matching - it just adds it as an option. If the user chooses the wrong tool for the job, of course they are going to have a bad time. But I think it would be really nice to have for those of us that need it! |
I added this in the latest commits |
Let me be clear so this is not taken out of context. I am not saying regex is niche, I am a heavy user of regex, but that I feel usage of regex in the context of Snippets would be niche. I am not trying to develop a grep tool for inserts and I don't think this is what the majority of users are desiring as well.
While I don't deny that managing lines for Snippets is fragile, this was my argument when it was requested, but I relented only because many include extensions allow this and there were a significant number of people asking for it despite my reservations. I disagree with your assessment of sections though as it provides a clear explicit way to define what you want and I see no evidence that it is impractical. Keep in mind that Snippets was designed to allow users to define reusable content and insert it where they need to. Moving forward, people have used it for all sorts of things, and yes, I have obliged with expanding my scope to some degree, but keep in mind, I had (and have) no interest in creating a templating language. Snippets was always meant for easy, dumb, and predictable inserts. This regex approach is a bit problematic as well as you can single out arbitrary lines, but you cannot target blocks. It's usefulness is greatly diminished which relegates it to ver niche cases.
While I do not deny you may have this one niche case where you may want to grab users from a CSV, many people are targeting blocks of text from snippets, this is a lot more difficult for people to use in a practical sense for that purpose as often the kind of content that is targetted is not regular. I do believe this example you've given is far from a common case. I also believe this approach does not apply well to how the majority of people use this extension. |
Thanks for your response. I have a couple comments:
I think the text you grab with regex is just as much a snippet as any other (section of a) file is. I don't think it's a templating language, all I do with this is grab one or more lines from a file which I think is pretty in line with how Snippets works.
I implemented multiline regex matches just now! :)
Disagree (but of course you can't use the Snippets extension like that right now since it doesn't support regex). I think this is a huge use case from my point of view. I personally can't really see the benefit of including specific lines, but I know for sure that generating documentation this way is gaining massive traction because of the entire gitops approach to doing stuff is such a big influence on the devops field, and because mkdocs is a big player in generating documentation, I naturally end up in your project. I guess I could write a plugin that does the same thing, but my code just fits in with snippets so nicely. I think it's a very natural extension of Snippets and I don't really understand why Snippets can't support both the line-selecting approach and the full file including approach. Isn't it just a matter of how you structure your text snippets - in files or inside files? So tl;dr: Snippets is the de facto way to include sections of files in mkdocs-rendered documentation. I think there actually is a very large amount of users that would love to have this functionality but have to look elsewhere because it isn't available in mkdocs. |
I didn't mean to insinuate it is per see, but I'm trying to illustrate there is a limit to the scope of what I am willing to add. Even Jinja2, which is a templating language, I don't think provides inclusion of lines or blocks per regex. To do so, you have to construct programmatic blocks of logic to attempt such a feat because it is a far from a common use case, most likely due to practical limitations.
That's fine, but in a practical sense, I'm not quite sure how useful this would be. I don't deny that it can be useful if your narrow case lends well to it, but to be fair, I think capturing content programmatically would be far more useful as much content is not regular, though I'm not advocating that we do this either. You can often include regular expression in the process, but pure regex can be cumbersome, and often inefficient when singling out non-regular targets. In a small set of cases, this might work, but I can think of many that would be difficult for a user to target well, even with multiline regex. I support a couple of tools that employ regular expression and I get a number of questions on how to effectively use regular expression to do specific tasks. It should be noted that this is something I also take into consideration: what will it cost me in support time? The more difficult it is to apply, the more questions I get. The return has to be significant enough to offset this cost. I guarantee I will get questions such as: I have this Snippet where I am trying to use regex and I want to capture this block of text. Everything I've tried so far doesn't work, how do I do it? A small number of users will navigate this just fine and quickly identify what they can and cannot do, but there will be a large number who don't fully understand the limitations of what they can and can't do (and yes, there are limitations that limit the scope of what multiline regex can do). And there will be a number of people who just need help using regex effectively. In the vast majority of cases, I think regex would not be the answer and I would direct them to the other approaches. I think the need to use regex would be niche.
Disagreement is fine, but you are the first to ever request such a feature, and outside the CSV example, which I don't think is a common case, I cannot see this being used generally to a significant degree. I certainly could be wrong, but I was honest from the start, I would likely need a lot of convincing. Anything I allow added to the code adds complexity that I must maintain, long after the requester is gone. So I am, at times, very critical and will question strongly the necessity of any addition, big or small. Adding certain code also opens the doors for other related requests as well. I have to determine what the scope of what I offer is. Sometimes, suggestions seem like a clear win for me, others I feel require more scrutiny to assess the real payoff for the general population of users vs the cost of supporting the feature.
I am open to having people chime in and champion this request in order to note how useful this would really be. I'm not sure I'm convinced there is such a strong desire for this. I am open to being proven wrong. |
Oops, we commented at the same time. I've made my case and I think this is entirely within the scope of the already existing functionality (As per the example in my previous comment), even though it's more than your original intention. I do understand what you're saying and respect that, I don't have anything more to add. How many people do you need to chime in before you think there is a use-case? I could easily gather a quite large amount of people who agree this is very useful. Just looking through open issues, my PR would solve #2385 quite easily:
Just one comment:
I don't think you have any responsiblity to support that but it's nice of you that you do :) |
I still have to take time to respond to all of these, even if it is to say I have no intention to help them. But sure, I don't have to give them helpful answers.
I think you missed a lookahead as you would grab the first character in the next function or whatever. You should probably also look for
This only illustrates how easy it is for people to accidentally grab what they don't want, but sure that's not really my problem, but theirs.
In the context of certain coding languages, yes, I can see this being true, less true in prose, HTML, and various other places, but yes, when targeting something with a regular structure, yes, it works fine. Let me think about this some more. If this was going to be included there are some things that should be addressed. I'm not promising acceptance, only that I will take some more time to consider the possibility.
|
yep i wrote it too fast, but it would have given me an error if i had
Of couse. I love this.
You mean the case where you don't have any match groups? I think it already works like how I interpret your
It's entirely supported - the regex string just has to start and end with a |
If it does, great, but tests should confirm this.
I don't think I understand what match groups have to do with greedy. If there is some baked in behavior here, I don't think it sounds intuitive. |
For sure.
I think I'm confused, it's a bit late here. Can you give an example of the behavior you expect and the behavior you want to avoid? |
It's possible I don't understand the implementation (I have not looked at the code closely). Does |
Marking this as needing more information per the earlier questions. Also, this would need testing as well. |
I think this would be really useful.