-
Notifications
You must be signed in to change notification settings - Fork 12
Refactor the spec #17
Conversation
…erly support subclassing.
…on to iterate over.
- In other words, ensure that two match objects in a row with the same “index” will stop the iteration.
…coerce it to a regex).
@allenwb @littledan @anba @bterlson any thoughts? It's not stage 2 yet, so detailed spec review isn't needed (tho I always welcome it!), but any overarching direction comments would be helpful :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an official review (or technically in depth) but the direction seems fine.
spec.emu
Outdated
<emu-note>_matchAll_ does not visibly mutate the provided _regexp_ in any way.</emu-note> | ||
<p>When the _matchAll_ method is called, the following steps are taken:</p> | ||
|
||
<p>When the `matchAll` method is called, the following steps are taken:</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually enumerates the arguments and their expected values here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had that, but the arguments are in the heading above, so it felt redundant to repeat them here. As for "expected values", do you mean like, prose that describes them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess value is often encoded in the var name as it is in this case, so it's fine. Agree about just repeating the inputs; if you can come up with more descriptive prose ala exec it would be good, but it's not a major deal of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely do that, update incoming.
spec.emu
Outdated
1. Let _matcher_ be ? GetMethod(_R_, @@matchAll). | ||
1. If _matcher_ is not *undefined*, then | ||
1. Return ? Call(_matcher_, _R_, « _O_ »). | ||
1. Return ? <emu-xref href="#sec-matchalliterator">MatchAllIterator</emu-xref>(_R_, _O_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't require an explicit xref.
1. Return ! CreateIterResultObject(*null*, *true*). | ||
1. Else, | ||
1. Return ! CreateIterResultObject(_match_, *false*). | ||
1. Let _previousIndex_ be _O_.[[PreviousIndex]]. | ||
1. Assert: Type(_previousIndex_) is Number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Least surprising assert ever :-P
spec.emu
Outdated
<emu-alg> | ||
1. Let _R_ be the *this* value. | ||
1. If ? IsRegExp(_R_) is not *true*, throw a *TypeError* exception. | ||
1. Return ? <emu-xref href="#sec-matchalliterator">MatchAllIterator</emu-xref>(_R_, _string_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Why include this link explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed per #17 (comment)
spec.emu
Outdated
</ins> | ||
|
||
<ins class="block"> | ||
<emu-clause id="sec-matchalliterator" aoid-"MatchAllIterator"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you want a = after aoid, rather than - (so then you might not need the explicit link).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, thanks, good catch
spec.emu
Outdated
1. Let _S_ be ? ToString(_O_). | ||
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%). | ||
1. Let _flags_ be ? ToString(? Get(_R_, *"flags"*)). | ||
1. Let _matcher_ be ? Construct(_C_, « _R_, _flags_ »). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW seems like the single-argument form of the RegExp constructor would work too (assuming other RegExp subclasses will do something similar, and that IsRegExp will be true for the relevant subclasses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single-argument form, when given a RegExp, returns it unmodified - the only way I'm aware of to unconditionally "clone" a regex with the constructor is by also passing the flags
argument.
spec.emu
Outdated
1. If _global_ is not *true*, throw a *TypeError* exception. | ||
1. Let _iterator_ be ObjectCreate(<emu-xref href="#%RegExpStringIteratorPrototype%">%RegExpStringIteratorPrototype%</emu-xref>, « [[IteratingRegExp]], [[IteratedString]] »). | ||
1. Assert: Type(_S_) is String. | ||
1. Assert: IsRegExp(_R_) is *true*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsRegExp is really dynamic, and you're running this on whatever came out of the species constructor. I don't think this assertion is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I guess that's true - it's a status that could change over time.
In that case, I think I should make it a runtime exception if it stops being a RegExp. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeatedly reading @@match
and throwing exceptions if it dissapears seems like rather surprising behavior. We don't do that in split, for example. I think you can just delete this assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we do in existing regex methods is done for legacy compat, not because it's necessarily the proper thing to do.
I suppose RegExpExec
would fall back to the builtin behavior if the Symbol later disappeared, so it wouldn't be the worst thing in the world to delete this check, but what's the benefit of removing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole IsRegExp thing was just added in ES2015. I proposed changing it after it was spec'd, before it was implemented, but the committee rejected that. It seems a little early to go around calling it all legacy already.
What's the benefit of removing it? Because it doesn't make sense to put in tons of defensive checks all over the place that things are a certain way. We don't do that in any of the JS standard library. Subclassable builtins just does the appropriate operations at the right time, and if they're not there, calling them throws. These checks will not ensure that anything is a particular way; the exec
method can still disappear at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point :-) what I mean is, the behavior for unmodified RegExp
instances is legacy, and there's a lot of spec machinery to make that work. A non-default Symbol.match
, however, is indeed not legacy.
Fair enough; I'm just trying to make things as strict and throwy as possible - but I'll remove the IsRegExp
checks (after the iterator is created, at least) and let RegExpExec
handle them.
1. Return ! CreateIterResultObject(_match_, *false*). | ||
1. Let _previousIndex_ be _O_.[[PreviousIndex]]. | ||
1. Assert: Type(_previousIndex_) is Number. | ||
1. Let _index_ be ? ToLength(? Get(_match_, *index*)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is index here supposed to be "lastIndex"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, match objects have an "index" and "input" property - regexes have "lastIndex".
1. Let _previousIndex_ be _O_.[[PreviousIndex]]. | ||
1. Assert: Type(_previousIndex_) is Number. | ||
1. Let _index_ be ? ToLength(? Get(_match_, *index*)). | ||
1. If _previousIndex_ is equal to _index_, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised by this special case. It's saying, if there's a zero-length match, then exit the iteration early? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's saying "if i've seen the match before, exit" - per https://github.com/tc39/proposal-string-matchall/pull/17/files#r132509509
spec.emu
Outdated
1. Set _iterator_.[[IteratingRegExp]] to _R_. | ||
1. Set _iterator_.[[IteratedString]] to _S_. | ||
1. Set _iterator_.[[PreviousIndex]] to *-1*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of maintaining the [[PreviousIndex]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't, then a non-sticky, non-global regex will give an infinite iterator with the same match repeated over and over again - the index is the method I chose to ensure that a regex with one match only yields one match object.
1. Set _iterator_.[[IteratingRegExp]] to _R_. | ||
1. Set _iterator_.[[IteratedString]] to _S_. | ||
1. Set _iterator_.[[PreviousIndex]] to *-1*. | ||
1. Set _iterator_.[[Done]] to *false*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the purpose here to ensure that, even with a poorly behaved RegExp subclass, the iterator doesn't come back to life?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly - without caching the "done" status, it will call RegExpExec
every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is also problematic with non-sticky non-global regexes, which either always or never return "no match")
@@ -52,13 +92,24 @@ contributors: Jordan Harband | |||
1. Let _O_ be the *this* value. | |||
1. If Type(_O_) is not Object, throw a *TypeError* exception. | |||
1. If _O_ does not have all of the internal slots of a RegExp String Iterator Object Instance (see <emu-xref href="#PropertiesOfRegExpStringIteratorInstances"></emu-xref>), throw a *TypeError* exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What problem are you trying to solve with your changes to next() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to merge this soon if there's no further objections; at which point further review can be done via issues :-) |
Highlights:
Symbol.matchAll
String.prototype.matchAll
stringifies and coerces to a regex when notIsRegExp
(the canonical test for "it's a regex")Symbol.matchAll
method, it calls itMatchAllIterator
abstract operationRegExp.prototype[Symbol.matchAll]
throws ifthis
isn't a regex, and otherwise calls intoMatchAllIterator
MatchAllIterator
usesSpeciesConstructor
to construct a copy of the regex using its.flags
and.lastIndex
and then calls intoCreateRegExpStringIterator
lastIndex
is respected, which is particularly important for "sticky" regexesCreateRegExpStringIterator
creates a new iterator, with some internal slots to hold on to the regex, the string, the last matched index, and whether the iteration is finished or not.exec
ing it, before being "done".I have similar changes prepared on my polyfill/shim (which I've moved to a separate repo) and will push those up and release them on npm as soon as this is merged.
Closes #6. Closes #4. Closes #3. Closes #2. Closes #15.