Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Refactor the spec #17

Merged
merged 14 commits into from
Sep 23, 2017
Merged

Refactor the spec #17

merged 14 commits into from
Sep 23, 2017

Conversation

ljharb
Copy link
Member

@ljharb ljharb commented Aug 1, 2017

Highlights:

  • Symbol.matchAll
  • String.prototype.matchAll stringifies and coerces to a regex when not IsRegExp (the canonical test for "it's a regex")
    • If the regex has a Symbol.matchAll method, it calls it
    • if not, it calls into the MatchAllIterator abstract operation
  • RegExp.prototype[Symbol.matchAll] throws if this isn't a regex, and otherwise calls into MatchAllIterator
  • MatchAllIterator uses SpeciesConstructor to construct a copy of the regex using its .flags and .lastIndex and then calls into CreateRegExpStringIterator
    • this means that the existing lastIndex is respected, which is particularly important for "sticky" regexes
  • CreateRegExpStringIterator creates a new iterator, with some internal slots to hold on to the regex, the string, the last matched index, and whether the iteration is finished or not
  • the effect is that a non-sticky/non-global regex will always yield one match before being "done", and a sticky and/or global one would yield all the matches you'd get from repeatedly .execing it, before being "done".

I have similar changes prepared on my polyfill/shim (which I've moved to a separate repo) and will push those up and release them on npm as soon as this is merged.

Closes #6. Closes #4. Closes #3. Closes #2. Closes #15.

@ljharb
Copy link
Member Author

ljharb commented Aug 1, 2017

@allenwb @littledan @anba @bterlson any thoughts?

It's not stage 2 yet, so detailed spec review isn't needed (tho I always welcome it!), but any overarching direction comments would be helpful :-)

Copy link
Member

@bterlson bterlson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an official review (or technically in depth) but the direction seems fine.

spec.emu Outdated
<emu-note>_matchAll_ does not visibly mutate the provided _regexp_ in any way.</emu-note>
<p>When the _matchAll_ method is called, the following steps are taken:</p>

<p>When the `matchAll` method is called, the following steps are taken:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually enumerates the arguments and their expected values here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had that, but the arguments are in the heading above, so it felt redundant to repeat them here. As for "expected values", do you mean like, prose that describes them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess value is often encoded in the var name as it is in this case, so it's fine. Agree about just repeating the inputs; if you can come up with more descriptive prose ala exec it would be good, but it's not a major deal of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely do that, update incoming.

spec.emu Outdated
1. Let _matcher_ be ? GetMethod(_R_, @@matchAll).
1. If _matcher_ is not *undefined*, then
1. Return ? Call(_matcher_, _R_, &laquo; _O_ &raquo;).
1. Return ? <emu-xref href="#sec-matchalliterator">MatchAllIterator</emu-xref>(_R_, _O_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't require an explicit xref.

1. Return ! CreateIterResultObject(*null*, *true*).
1. Else,
1. Return ! CreateIterResultObject(_match_, *false*).
1. Let _previousIndex_ be _O_.[[PreviousIndex]].
1. Assert: Type(_previousIndex_) is Number.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Least surprising assert ever :-P

spec.emu Outdated
<emu-alg>
1. Let _R_ be the *this* value.
1. If ? IsRegExp(_R_) is not *true*, throw a *TypeError* exception.
1. Return ? <emu-xref href="#sec-matchalliterator">MatchAllIterator</emu-xref>(_R_, _string_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why include this link explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed per #17 (comment)

spec.emu Outdated
</ins>

<ins class="block">
<emu-clause id="sec-matchalliterator" aoid-"MatchAllIterator">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you want a = after aoid, rather than - (so then you might not need the explicit link).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, thanks, good catch

spec.emu Outdated
1. Let _S_ be ? ToString(_O_).
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%).
1. Let _flags_ be ? ToString(? Get(_R_, *"flags"*)).
1. Let _matcher_ be ? Construct(_C_, &laquo; _R_, _flags_ &raquo;).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW seems like the single-argument form of the RegExp constructor would work too (assuming other RegExp subclasses will do something similar, and that IsRegExp will be true for the relevant subclasses).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single-argument form, when given a RegExp, returns it unmodified - the only way I'm aware of to unconditionally "clone" a regex with the constructor is by also passing the flags argument.

spec.emu Outdated
1. If _global_ is not *true*, throw a *TypeError* exception.
1. Let _iterator_ be ObjectCreate(<emu-xref href="#%RegExpStringIteratorPrototype%">%RegExpStringIteratorPrototype%</emu-xref>, &laquo; [[IteratingRegExp]], [[IteratedString]] &raquo;).
1. Assert: Type(_S_) is String.
1. Assert: IsRegExp(_R_) is *true*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsRegExp is really dynamic, and you're running this on whatever came out of the species constructor. I don't think this assertion is valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I guess that's true - it's a status that could change over time.

In that case, I think I should make it a runtime exception if it stops being a RegExp. Will fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeatedly reading @@match and throwing exceptions if it dissapears seems like rather surprising behavior. We don't do that in split, for example. I think you can just delete this assertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do in existing regex methods is done for legacy compat, not because it's necessarily the proper thing to do.

I suppose RegExpExec would fall back to the builtin behavior if the Symbol later disappeared, so it wouldn't be the worst thing in the world to delete this check, but what's the benefit of removing it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole IsRegExp thing was just added in ES2015. I proposed changing it after it was spec'd, before it was implemented, but the committee rejected that. It seems a little early to go around calling it all legacy already.

What's the benefit of removing it? Because it doesn't make sense to put in tons of defensive checks all over the place that things are a certain way. We don't do that in any of the JS standard library. Subclassable builtins just does the appropriate operations at the right time, and if they're not there, calling them throws. These checks will not ensure that anything is a particular way; the exec method can still disappear at any time.

Copy link
Member Author

@ljharb ljharb Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point :-) what I mean is, the behavior for unmodified RegExp instances is legacy, and there's a lot of spec machinery to make that work. A non-default Symbol.match, however, is indeed not legacy.

Fair enough; I'm just trying to make things as strict and throwy as possible - but I'll remove the IsRegExp checks (after the iterator is created, at least) and let RegExpExec handle them.

1. Return ! CreateIterResultObject(_match_, *false*).
1. Let _previousIndex_ be _O_.[[PreviousIndex]].
1. Assert: Type(_previousIndex_) is Number.
1. Let _index_ be ? ToLength(? Get(_match_, *index*)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is index here supposed to be "lastIndex"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, match objects have an "index" and "input" property - regexes have "lastIndex".

1. Let _previousIndex_ be _O_.[[PreviousIndex]].
1. Assert: Type(_previousIndex_) is Number.
1. Let _index_ be ? ToLength(? Get(_match_, *index*)).
1. If _previousIndex_ is equal to _index_, then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by this special case. It's saying, if there's a zero-length match, then exit the iteration early? Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's saying "if i've seen the match before, exit" - per https://github.com/tc39/proposal-string-matchall/pull/17/files#r132509509

spec.emu Outdated
1. Set _iterator_.[[IteratingRegExp]] to _R_.
1. Set _iterator_.[[IteratedString]] to _S_.
1. Set _iterator_.[[PreviousIndex]] to *-1*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of maintaining the [[PreviousIndex]]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't, then a non-sticky, non-global regex will give an infinite iterator with the same match repeated over and over again - the index is the method I chose to ensure that a regex with one match only yields one match object.

1. Set _iterator_.[[IteratingRegExp]] to _R_.
1. Set _iterator_.[[IteratedString]] to _S_.
1. Set _iterator_.[[PreviousIndex]] to *-1*.
1. Set _iterator_.[[Done]] to *false*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose here to ensure that, even with a poorly behaved RegExp subclass, the iterator doesn't come back to life?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly - without caching the "done" status, it will call RegExpExec every time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is also problematic with non-sticky non-global regexes, which either always or never return "no match")

@@ -52,13 +92,24 @@ contributors: Jordan Harband
1. Let _O_ be the *this* value.
1. If Type(_O_) is not Object, throw a *TypeError* exception.
1. If _O_ does not have all of the internal slots of a RegExp String Iterator Object Instance (see <emu-xref href="#PropertiesOfRegExpStringIteratorInstances"></emu-xref>), throw a *TypeError* exception.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problem are you trying to solve with your changes to next() here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljharb
Copy link
Member Author

ljharb commented Sep 22, 2017

I'm going to merge this soon if there's no further objections; at which point further review can be done via issues :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants