-
-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for TOML config files (pants.toml
)
#9052
Conversation
This reverts commit 008bacc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewers: I recommend starting by looking at 008bacc as an example of what our config files look like with TOML. (Those work now when loaded with --pants-config-files
- I only reverted them for a smaller diff.)
Then, move on to config_test.py
to see what the PR "Solution" section means by being able to swap between _IniValues
and _TomlValues
without the rest of Pants having any knowledge of what format was used. I tried to be incredibly exhaustive but possibly missed some edge cases. Please feel free to propose more tests if you can think of any!
Finally, move on to config.py
to see how we implement this all. I tried to keep it as readable as possible, but the implementation is a bit tricky due to how TOML stores config as a single nested dictionary + all the edge cases we must handle to behave identically to _IniValues
. Lots of recursion and special casing.
Even though the TOML implementation is fairly involved, I hope that that does not result in us deciding to not add TOML support. I tried to encapsulate the complexity, and argue that the complexity in our implementation is justified by the benefits this will bring our users.
def defaults(self) -> Mapping[str, str]: | ||
return self.parser.defaults() | ||
|
||
|
||
_TomlPrimitve = Union[bool, int, float, str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't (yet?) account for TOML's native support of date times. We don't have any options that use that type
so I left it off.
def defaults(self) -> Mapping[str, str]: | ||
return self.parser.defaults() | ||
|
||
|
||
_TomlPrimitve = Union[bool, int, float, str] | ||
_TomlValue = Union[_TomlPrimitve, List[_TomlPrimitve]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also technically be a Dict
to represent native Tables, but I left this off because we restrict Tables to solely being used for distinct option scopes/sections, rather than also for dict values. See the PR description.
pattern=r"%\((?P<interpolated>[a-zA-Z_0-9]*)\)s", | ||
repl=r"{\g<interpolated>}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, it's best practice to use re.compile
. But it looks like that's not necessary in modern Python due to automatic caching? https://docs.python.org/3/library/re.html#re.compile
Lmk if I should use re.compile
and any tips for where to put that - I'm not sure if it needs to be declared as a module constant, rather than a variable belonging to the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If python3 automatically caches regexes, and these regexes are only used once on options parsing and then not needed for the rest of the run, then it's probably best not to re.compile
at the module level, so I think this is good as-is.
def _stringify_val( | ||
self, raw_value: _TomlValue, *, interpolate: bool = True, list_prefix: Optional[str] = None, | ||
) -> str: | ||
"""For parity with configparser, we convert all values back to strings, which allows us to | ||
avoid upstream changes to files like parser.py. | ||
|
||
This is clunky. If we drop INI support, we should remove this and use native values.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I did not take this approach and preserved the parsed data types. But, I realized for the initial prototype that this complicates things too much because it means we have a leaky API. For example, it would be really hard to preserve the my_list_option.append
and my_list_option.filter
information.
If we end up removing INI, we could remove this all.
|
||
def has_option(self, section: str, option: str) -> bool: | ||
try: | ||
self.get_value(section, option) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, has_option
is doing more work than necessary by relying on the get_value()
implementation, e.g. it will interpolate values and _stringify
them. But, I think that's fine to make this method much simpler. Otherwise, it has extremely high duplication of get_value()
.
[a] | ||
list: [1, 2, 3, %(answer)s] | ||
list2: +[7, 8, 9] | ||
list3: -["x", "y", "z"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new. It's meant to test:
- Filter syntax
- Correctly quoting list members. Things don't work with
-[x, y, z]
.
list: [1, 2, 3, %(answer)s] | ||
list2: +[7, 8, 9] | ||
list3: -["x", "y", "z"] | ||
list4: +[0, 1],-[8, 9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new. It's meant to test that we correctly handle both appending and filtering to a list option at the same time.
@property | ||
def default_file1_values(self): | ||
return {**super().default_file1_values, "disclaimer": "Let it be known\nthat."} | ||
|
||
@property | ||
def expected_file1_options(self): | ||
return { | ||
**super().expected_file1_options, | ||
"a": { | ||
**super().expected_file1_options["a"], "list": '["1", "2", "3", "42"]', | ||
}, | ||
"b.nested": { | ||
"dict": '{\n "a": 1,\n "b": "42",\n "c": ["42", "42"],\n}' | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These only vary because of formatting of the multiline strings. We could change the TOML value to mirror the INI values, but I wanted the test TOML file to look like how we'd actually configure things in the wild, i.e. with sane indentation.
51a4bea
to
7306439
Compare
…se `.append` and `.filter`
Without this change, users would need to explicitly set `--pants-config-files=pants.toml` every time. If `pants.toml` exists, we will ignore the default of `pants.ini` unless the user explicitly adds `pants.ini` via `--pants-config-files`.
f50b6ec
to
fb49453
Compare
The list and dict stuff here is pretty weird and magical... The list stuff is particularly weird - rather than inventing a new mechanism, I'd be inclined to say that if people want to be doing layered modification of data structures, they should probably be generating a file, rather than doing un-coordinated modifications to accumulated state across files... For dicts, is there a way we could do dictionaries to be more toml-native and less backwards-compatible with ini files? If we were designing this from scratch, what would dicts look like? |
I don't follow what you mean. Pants has a long history of allowing you to either replace, append, or filter a list value, including from the command line, env vars, and config files, or a mix of the three. See https://www.pantsbuild.org/options.html#list-options What do you mean by generating a file? From what I can tell, we must keep support for appending to and filtering from a list option in the config file, even if we don't think that feature is a good one because of "un-coordinated modifications to accumulated state across files". We could make users wrap appends and filters in quotes: [jvm]
options = ["-Xmx1g"]
[isort]
config = """+[".isort.cfg"],-["bad.cfg"]""" But that seems much worse to me. We lose the benefits of native support of TOML lists (like syntax highlighting) and there's a gotcha why you use quotes for appending and filtering but not for replacing.
Yes, there is, through tables and inline tables. That was the original approach I took. But it means that we have no structured way to disambiguate between option scopes vs. dict values, because tables would be used for both purposes. We could make this work, but it means the The other reason I didn't like using tables for dict values is that it creates ambiguity for the human reader. Right now, the human reader knows unambiguously that a table header like [jvm-platform.platforms]
java7 = { source = 7, target = 7, args = [] }
java8 = { source = 8, target = 8, args = [] }
java9 = { source = 9, target = 9, args = [] } Worth repeating that the TOML dict support is still an improvement from the INI dict support because of indentation. Possible in TOML: [jvm-platform]
platforms = """
{
"java7": {"source": 7, "target": 7, "args": []},
"java8": {"source": 8, "target": 8, "args": []},
"java9": {"source": 9, "target": 9, "args": []},
}
""" You could also the below in TOML. In INI, you must use the below : [jvm-platform]
platforms = {
"java7": {"source": 7, "target": 7, "args": []},
"java8": {"source": 8, "target": 8, "args": []},
"java9": {"source": 9, "target": 9, "args": []},
} |
The And at that point, I'm not sure we need the append/filter functionality. It's just more ways to do things. Also, I think +/- works for dict-valued options as well? |
Re a conversion script, wouldn't this be easy to get right basically 100% of the time, just by parsing the ini file using the standard python ini parser and writing out the resulting data as properly formatted TOML? Why the lowball estimate of 65%-75%? |
FWIT, there's nothing stopping a user from using
I wasn't sure of this, as it's not mentioned in our docs. https://www.pantsbuild.org/options.html#dict-options Regardless, it's irrelevant to offer syntatic sugar for dict appending/filtering because we always use strings to represent dict values, given the ambiguity of sections vs. dict values. So, users can still use this without issue: [jvm-platform]
platforms = """
+{
"java7": {"source": 7, "target": 7, "args": []},
"java8": {"source": 8, "target": 8, "args": []},
"java9": {"source": 9, "target": 9, "args": []},
}
"""
|
We could do this, but we would end up stripping all comments and whitespace. See #9054 for the implementation. After running it on all of Pants' INI files, I estimate it at converting 80-85%, rather than 60-70%. The remaining issues are pretty simple to fix thanks to https://www.toml-lint.com/ and validation from editors like Pycharm. |
Regardless of niggles on the list stuff, this change is overall awesome! .ini is awful. |
I'm also fine not having such a script, TBH. It is very little work to
manually port an ini file to toml. I wouldn't put too much effort into
perfecting it.
…On Mon, Feb 3, 2020 at 11:52 AM Eric Arellano ***@***.***> wrote:
Re a conversion script, wouldn't this be easy to get right basically 100%
of the time, just by parsing the ini file using the standard python ini
parser and writing out the resulting data as properly formatted TOML? Why
the lowball estimate of 65%-75%?
We could do this, but we would end up stripping all comments and
whitespace.
See #9054 <#9054> for the
implementation. After running it on all of Pants' INI files, I estimate it
at converting 80-85%, rather than 60-70%. The remaining issues are pretty
simple to fix thanks to https://www.toml-lint.com/ and validation from
editors like Pycharm.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#9052?email_source=notifications&email_token=AAD5F7HTY33AHCWXKI2ZQFDRBBYWBA5CNFSM4KOXSA4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKVFI7Y#issuecomment-581588095>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD5F7EBV3OW56A6DUWX6YDRBBYWBANCNFSM4KOXSA4A>
.
|
Hence settling for 80% conversion, rather than trying to get to 100% conversion. Pareto principle. Originally, I converted all of our INI files by hand and it was frustratingly tedious. Took probably 10-15 minutes and lots of copy and paste + find and replace. With the script in #9054, took about 3 minutes and only fixing a couple small, remaining issues with multiline options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
(Really important note: I'm not saying we shouldn't do the things currently in this PR, and getting away from ini is a great thing, I'm just trying to explore the design space - the easiest way to not have to support and document and teach custom conventions is to simply not have them!)
Yeah, I understand that we have that history. I'm trying to look at this from a perspective of "If I approached Pants for the first time (without the history of knowing about how we had ini files), and I was reading the documentation of how to write a pants.toml file, how would I feel?" and this feels to me like pretty magical non-standard syntax... I've not used another tool which added its own DSL for flexible list manipulation like this inside config files, and I'm not sure what's unique about Pants in this respect. So my real question here is: Is this feature so important that we want to diverge from the standard for it? If we were designing this from scratch, for a completely new system, I'm not sure what we would choose it to look like; personally, I'd probably be proposing that we read a standard file with no custom DSLs, we can read multiple files if they obviously combine in a standard way (either replacing or extending lists) and that if someone wanted to combine two files in a non-standard way, they should use an (ideally simple) programming language to read the two files and merge them in their non-standard way, outputting a file they can pass to pants (I'd generally recommend something like https://jsonnet.org/ - I suspect Danny would probably recommend coffeescript), rather than inventing a DSL to serialise inside the config file.
This sounds like a pretty reasonable middle-ground of encoding a DSL in magic values (which is something we already document/teach), without modifying the structure of keys, if it works :)
Yeah, there's definitely not a clear winner between the two in my mind... I think I prefer leaning into the toml-native way of doing this, rather than using magic multi-line strings, but I haven't convinced myself either way... I guess the multiline-strings approach is aiming to provide a form of consistency when reading ("I can easily grok what the sections are") at the expense of familiarity when reading ("I understand the general format I'm reading"). The toml-standard way feels like it's better for writing ("I understand what to write"), but between reading and writing, we should probably be optimising for reading. So I guess the question is: Do we want to favour people reading the file as toml, or people reading the file as a set of pants option scopes? |
Thanks for saying this and, more importantly, for asking great questions! This is a big decision and I appreciate you making sure we do due diligence.
We're not diverging very far from the specification. In fact, TOML recommends using this
It's fine to allow users to use
Good analysis! I do agree with erring on the side of favoring readers over writers. Generally, only one person or team writes to I think "reading the file as a set of Pants option scopes" is the correct mental model, here. It's ~an implementation detail that this config is implemented in TOML. Really, what our config is is a way to set options for their corresponding option scopes. I remember when learning Pants I was a bit confused with the idea of options subscopes, especially because command line args make them appear ambiguous: does |
Now that you mention it, I do like |
|
With us now on 1.26.x, this is now ready to review (and hopefully to land). |
I think that this kind of "deviation from the standard" (similar to putting The original motivating usecases for adding/removing still stand: allowing for adding or removing from a default value rather than replacing it is useful (...particularly adding... I could maybe do without removing). I think that I'm ok with moving toward toml, but I think it's important to track how much pain we're causing for existing users, and to minimize it with automation. So making sure that #9054 gets well tested will be critical. Thanks! |
Agreed. I tried to make the tests pretty comprehensive in that PR, along with manually testing on Pants and Toolchain. It'd be very helpful if someone from Twitter could try running it on your config files and confirm that conversion takes less than 5-10 minutes. |
Realistically, this won't be before next week. But yes: will do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore the naming nits!
Great work, thanks!
@@ -182,10 +193,11 @@ def get_source_for_option(self, section: str, option: str) -> Optional[str]: | |||
class _ConfigValues(ABC): | |||
"""Encapsulates resolving the actual config values specified by the user's config file. | |||
|
|||
Beyond providing better encapsulation, this allows us to support alternative config file formats | |||
in the future if we ever decide to support formats other than INI. | |||
Due to encapsulation, this allows us to support both TOML and INI config files without any of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Futureproofing that actually worked! Wooho!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach here looks solid to me, though I'm unlikely to have time to closely review any of the code :) Thanks!
Borja suggested created a File1 and File2 class. This works well. It centralizes the raw config content with the expected parsed values. It also makes it easier to add a test case for TOML being the main config and INI being optional.
Noting a small inaccuracy:
Our impl only behaves identically for string values - no others. ConfigParser supports substitution of any value since it treats all values as strings. I ran into this in the PEX upgrade PR where I had factored a shared parallelism integer value up into |
This is true. But, this should work: [DEFAULT]
num_workers = 4
[python-setup]
parallelism = "%(num_workers)" Why? We stringify all TOML values before passing to pants/src/python/pants/option/config.py Lines 350 to 385 in d690ab2
|
Erm. OK - thanks. I'm not honestly convinced the toml switch was a win - our use of it is now just about as unintuitive / unspecified as ini. |
What do you think might be a more appropriate reason for / feature of toml
for use to use? Apologies if it’s stated above, I attempted to check the
thread.
…On Fri, Feb 21, 2020 at 08:13 John Sirois ***@***.***> wrote:
Erm. OK - thanks. I'm not honestly convinced the toml switch was a win -
our use of it is now just about as unintuitive / unspecified as ini.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9052?email_source=notifications&email_token=AAJ6UTZLAYFVGLLBJ5E5P6LRD74T7A5CNFSM4KOXSA4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMTHCNQ#issuecomment-589721910>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ6UT3ITCGHEP4R3BL5S3DRD74T7ANCNFSM4KOXSA4A>
.
|
I think Eric did the best you can do with the impedance mismatch between an untyped config format and a typed one. The tradeoff is our use of toml no longer conforms to the types you expect in the toml file itself when you want to substitute. That use case is probably rare, so its probably a fine tradeoff. I just meant to point out that our use of toml deviates from standard toml files enough at this point that any argument for switching to toml for standardization benefits is probably washed away at this point. Onward. |
As decided in #9052, TOML brings several benefits over INI config files (despite some weirdness around things like dict values). This updates our docs to refer to `pants.toml`. A followup will deprecate `pants.ini`.
Problem
INI files have some issues that make them frustrating for users:
version: ipython==4.8
.config: [".isort.cfg"]
configparser
.Why TOML?
PEP 518's thoughts on file formats
Python core developers recently researched modern config values to choose the format for
pyproject.toml
. They compiled their results into PEP 518, comparing INI, JSON, YAML, and TOML.Reasons they rejected INI:
Reasons they rejected JSON:
Reasons they rejected YAML:
Reasons they liked TOML:
Reasons for us to use TOML
Solution
Implement a
config._TomlValues
class that behaves identically to_IniValues
. Both of these are subclassses of_ConfigValues
, meaning that the two config formats may be interchanged without any file outside ofconfig.py
knowing which config format was originally used.We use extremely comprehensive unit tests to ensure that
_TomlValues
behaves identically to_IniValues
.How we implement interpolation
TOML does not have native support for string interpolation, i.e. substituting
%(foo)s
for theDEFAULT
valuefoo
. So, we provide our own implementation that behaves identically.How we implement list options
TOML has native list support, but does not have syntax for
+[]
and-[]
.We could require users to use multiline strings, like
config = """+[".isort.cfg"],-["bad.cfg"]"""
. This will work, and in fact the format we convert into inconfig.py
so that the rest of Pants understands the value.But,
config = """+[".isort.cfg"],-["bad.cfg"]"""
is clunky, so we introduce syntatic sugar:This sugar allows users to use native TOML lists (more ergonomic than strings) and makes it easy to both add to and remove from a list option at the same time.
How we implement dict options
TOML has native support for dicts through both Tables and Inline Tables.
However, we already use Tables to implement the distinct option scopes. So, if we also used Tables to implement dictionary values, we would introduce an ambiguity whether a table refers to an options scope or a dictionary value. Originally, this PR tried to take this approach, but quickly rejected support for using Tables because it dramatically complicates the solution and would cause a leaky API.
Instead, users should use a multiline string for dictionary options:
(Note the indentation - this would error in INI because of being indented too much to the left, but it works in TOML 🎉)
Remaining followup
migrate_to_toml_config.py
script to automatically update INI config files to TOML #9054.