Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HEREDOC generation in hclwrite #540

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

m4dcoder
Copy link

Currently, the hclwrite generator has a TODO on rendering HEREDOC. This patch will format multi-line string ending in a newline as a HEREDOC instead.

Currently, the hclwrite generator has a TODO on rendering HEREDOC. This
patch will format multi-line string ending in a newline as a HEREDOC
instead.
@hashicorp-cla
Copy link

hashicorp-cla commented Jun 23, 2022

CLA assistant check
All committers have signed the CLA.

@apparentlymart
Copy link
Contributor

Hi @m4dcoder! Thanks for working on this. It looks like a great start!

I believe I was that one that left that TODO in previous work, and I think I had the following concerns about robustly writing out an idiomatic heredoc:

  • We need to make sure that we don't choose an end marker that appears inside the given string as a literal line. One way to do that would be to scan the string first to see if it has our default marker (EOH in your patch here, although I think EOT is a more typical choice in examples I've seen) and, only if so, try concatenation consecutive numbers to the end of the marker until we find a marker we can safely use. While it is admittedly pretty rare for a string to have a line containing EOT on it, part of the contract of this package is being able to faithfully represent whatever literal string was given without potentially generating something invalid or that HCL's parser would misinterpret.
  • HCL's heredoc syntax forces each line of the given string, including the last line, to end with either \n or \r\n. It looks like you already took care of this problem by only generating the heredoc syntax if the string ends with \n, so that's great.
  • It's more idiomatic to generate an indented heredoc whenever that's possible, by using the <<-EOT introducer instead of just <<EOT, and then indenting all of the lines by two spaces more than the line containing the <<-EOT marker and the closing EOT marker. It isn't always possible to do that -- in particular, it can't work if there isn't at least one line in the file that doesn't start with whitespace -- but I'd prefer to use that form where possible so that hclwrite generates a similar style as what a human might typically write, and so that the result is as readable as possible.

These additional requirements do make the generation problem a little more "finicky" than what you did here, but hopefully not prohibitively so. Changes to what hclwrite generates can potentially be disruptive for applications relying on it (e.g. if it changes the "expected output" of some of their test cases) and so I'd prefer to introduce full heredoc support with all of these cases handled in a single step, rather than gradually changing it over the course of several releases.

Would you be interested in trying for extending this to meet the above goals?


One minor note I saw on first read, aside from the extra requirements, is that I think the expected token type for the heredoc end marker is TokenCHeredoc rather than TokenOHeredoc. For these code-generation needs we can sometimes get away with not getting those quite right -- since typically the final step is to reduce everything down to a single flat []byte anyway -- but getting the token types correct means that the whitespace tweaks made during formatting can make different layout decisions based on the token types, and so using the wrong token types can sometimes cause some subtle oddities in the auto-formatter.

@m4dcoder
Copy link
Author

@apparentlymart I can extend the work here to include these additional requirements you mentioned. I think I understand the first two points. Can you provide an example here on your last point regarding indenting extra two spaces?

@apparentlymart
Copy link
Contributor

Hi @m4dcoder! Sorry for not being clearer about what I meant with that last point.

Let's consider the example input string "hello\n world\n". I think your current implementation would generate something like this:

    example = <<EOH
hello
  world
EOH

I think the more idiomatic way to write that would be as follows:

    example = <<-EOT
      hello
        world
    EOT

That extra - in the opening sequence tells the parser that it should search all of the lines of the literal strings that follow for the one that has the least amount of leading whitespace, and then trim the same amount of whitespace from all of the other lines in the string so that the result would still be "hello\n world\n" rather than " hello\n world\n". This form allows staying flush with the prevailing indentation, and so the typical convention is for the closing EOT to align with the opening line and the actual string content between to be indented too more spaces, similar to if that were content inside a braced block { ... }.


Now that I've written this out I think I've remembered why I didn't implement this on the first pass: the codepath that deals with automatic indentation and horizontal alignment currently avoids touching literal template content at all, and so whatever whitespace is in the raw tokens between TokenOHeredoc and TokenCHeredoc will be left unchanged by the formatter. But that's a problem for this requirement, because the appendTokensForValue function has no awareness of the current indentation level and instead it relies entirely on the automatic formatter to fix up the indentation retroactively after we've finished generating the tokens.

At the time I wasn't feeling sure about how best to address this, and honestly I'm still not feeling certain about it. One way forward would be to change the auto-formatter so that it treats the indented heredoc syntax (<<-EOT rather than just <<EOT) differently than other string templates so that it does apply automatic formatting to those, effectively treating the TokenOHeredoc as if it were a TokenOBrace, TokenCHeredoc as a TokenCBrace, and the content in between as lines subject to the current indentation level. However, I think that's a bit easier said than done because normally the formatter works just by changing the SpacesBefore of the tokens, but I think the string literal tokens inside a heredoc have the leading spaces effectively built into the content (in the Bytes []byte field) rather than accounting for them in SpacesBefore. (I may be misremembering though, since it's honestly been several years since I looked at this.)

I think if I were to go attack this now I would probably try for the following strategy:

  • Try generating the individual lines inside the generated heredoc as a separate TokenStringLit for each line, with each one having only the spaces that are part of the real value (the two spaces in front of world in the above example, and not the six spaces of indentation).
  • Teach the formatter to recognize that and set SpacesBefore of each line to 6 in the above example so that we end up with six spaces before "hello" and six spaces before " world" (where there are two additional spaces built into Bytes).

I want to be up front that I don't feel sure that this path is viable -- there may be some other assumptions in the parser that make it impossible for us to generate TokenStringLit tokens in that same way when reading from existing files. This is just my best first initial plan to try.

I understand that this is now probably a lot more work than you had in mind when you submitted this PR, so of course I'll totally understand if you'd rather not wander down this unknown road that isn't certain to yield a viable outcome. The uncertainty here is, I think, why I left this TODO here rather than implementing this fully in the first place. 😖

@m4dcoder
Copy link
Author

The implementation of <<-EOT is a bit more work than I anticipated. This is also not the use case that my team is pursuing. We are ok with the generated HCL spec with <<EOT and not having the indentation for the text block. Can this be an option at runtime to format one way or the other? Can the support of <<-EOT and required indentation be in a separate PR? I can make the other changes purposed here.

@apparentlymart
Copy link
Contributor

Hi @m4dcoder!

I understand that this is more work than you had anticipated doing here, and that is totally reasonable. I don't think we want to support multiple optional different ways of generating the same data here, because the goal of the functions we are talking about is to produce a good default compromise serialization of a value. For more fine control we have the capability to insert tokens directly using methods like SetAttributeRaw.

While I would like to split the change to use <<-EOT instead of <<EOTinto a separate PR, my reservation is that making any change here at all will be somewhat disruptive to any existing systems relying on the wayhclwrite` currently generates multi-line strings. I think the readability benefits are worth that cost, but it would be more disruptive to change it twice rather than change it just once. The indented form is the idiomatic form recommended in application docs and the most readable, so if our argument for this change is to maximize readability then I think we should go all the way.

The one compromise that comes to my mind is that if you were to work on the rest of the changes I could potentially sign up to finish the indented heredoc mode before we merge it. However, unfortunately my focus is currently elsewhere and so I can't commit to when I'd be available to work on that, and so this compromise may involve your willingness to use HCL from the branch associated with this PR, rather than from a tagged release, until I can devote enough time to remind myself how the formatter works enough to implement that part of the solution I described; at the moment I can't say when that will be. Are you intending to use HCL in a situation where it would be viable to select an untagged commit as your HCL version in your go.mod file?

@netmarkjp
Copy link

Is there any progress on this PR?
I'm interesetg in generating heredoc but also using <<-EOT is not my use case.
I hope that is there any way to generate heredoc that isn't completely perfect?
(Just an idea example: use the PREFER_HEREDOC flag/environment variable).

Generating heredoc is becoming more important every day as terraform use cases expand. Especially in the use cases that using terraform to maintain SaaS. However, there is still no way to generate heredoc and we feels unhappy with this situation.
We feel that this PR implementation is sufficient to start using it.

We know done is better than perfect, but we also know breaking changes are very horrible/cofusing things. But if we have a small way to generate heredoc as progressive improvement, we'll be very happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants