Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Always start from the beginning of string #5

Closed
RReverser opened this issue Nov 21, 2015 · 10 comments
Closed

Always start from the beginning of string #5

RReverser opened this issue Nov 21, 2015 · 10 comments

Comments

@RReverser
Copy link

[Continuation of Twitter discussion]

Currently proposal uses stored lastIndex instead of starting with the beginning of string, which makes it confusing when used in combination with <=ES5 methods that do make use of lastIndex and thus iterator will start from an implicit internal position from unrelated call and won't really "match all" occurences as naming implies.

IMO, it would make more sense to always start from the beginning of the string, so that iterator wouldn't conflict with earlier exec calls and would behave in the same way independently from context.

@ljharb
Copy link
Member

ljharb commented Nov 22, 2015

One interesting collision between this and #6 is that the whole point of the sticky flag is to respect lastIndex - if matchAll respected sticky but ignored lastIndex, I'm not sure that would make sense or be consistent.

I wonder if it would be better, instead of ignoring lastIndex, to take an optional second parameter startingIndex, which when omitted, defaulted to the lastIndex property? Then you could str.matchAll(regex, 0) and always be guaranteed it would start at zero, without needing to cripple someone who wanted to rely on the lastIndex property to govern initial behavior?

@RReverser
Copy link
Author

One interesting collision between this and #6 is that the whole point of the sticky flag is to respect lastIndex - if matchAll respected sticky but ignored lastIndex, I'm not sure that would make sense or be consistent.

Not really - you still do use and respect sticky flag and respect lastIndex in between calls, so regex will still follow semantics. Take a string 1,2,3.

  • matchAll(/\d+/) will return only first match - 1.
  • matchAll(/\d+/g) will return all the numbers - 1, 2, 3.
  • matchAll(/^\d+/y) will return only first match - 1 because lastIndex in between points to ,2,3 and sticky regexp tries to match ^\d+ starting from it, which obviously fails - so sticky would mean "^ has to match following regular expression right after the last match and not from the beginning of original string nor anywhere else".

In all those case, matchAll still does match all - like, entire string, and at the same times all the flags are respected in the context of each iterator.

@RReverser
Copy link
Author

(I believe this comment / discussion belongs to #6 more, so let's continue there)

@ljharb
Copy link
Member

ljharb commented Nov 22, 2015

OK, given the direction from #6, I think this question still applies.

I think the possible ways to handle lastIndex are the following:

  • ignore the provided regex's lastIndex, always start at zero
  • respect the provided regex's lastIndex, start there
  • Do either one of the above, and optionally receive a second startingIndex argument that when provided overrides the previous behavior.

In all cases, lastIndex would not be mutated on the provided regex.

One thing I think it's important to consider here is that the trivial case - a literal regex on a small string - is not the only thing this needs to account for. RegExp subclasses might exist that would match infinitely, for example - or, this might simply be a very very large portion of text, and you might know you already want to skip it - and simply slicing the large string might not be friendly to memory or performance.

I think there does need to be some way for the user to provide a starting index, and my initial thought was to just use the conventional method for this - the lastIndex property. My assumption is that in the common case, in a world where you don't need exec anymore, most every regex's lastIndex will be at 0 anyways.

@RReverser
Copy link
Author

simply slicing the large string might not be friendly to memory or performance

Not really - as I pointed, none of engines actually copies substrings on slicing as they're immutable, instead they create minimalisting object that just points to the same memory and contains offset + length of the chunk. There were even a lot of complaints from V8 users that one-character slice of a huge string prevents GC from collection original string at all during the lifetime of this slice. Of course, this can be considered implementation-specific, but if we're taking performance into account, it's impl-specific anyway, and currently any engine deals well with it.

My assumption is that in the common case, in a world where you don't need exec anymore, most every regex's lastIndex will be at 0 anyways.

Probably you're right... It's just that taking lastIndex into account by default feels like a potential root of more "gotchas" than if defaulting to beginning of string and asking developer to explicitly slice original string or pass 2nd argument with r.lastIndex should you really want to use it - it's not too hard to do, and will be more transparent IMO for cases when developer really wants it.

@ljharb
Copy link
Member

ljharb commented Nov 23, 2015

@RReverser Can you file a new issue for that last one?

@RReverser
Copy link
Author

Deleted comment here, raised as #7.

@littledan
Copy link
Member

littledan commented Apr 25, 2017

What's the conclusion here--should we copy the lastIndex over, or always start from 0? (Note: I imagine a common use case will be to use a particular RegExp only with matchAll, and that would make this issue not come up.) Personally, I would've expected the always-start-from-0 behavior; lastIndex in the source RegExp feels more like something to trip up on than anything else.

@ljharb
Copy link
Member

ljharb commented Apr 25, 2017

I think that the existence of the sticky flag makes lastIndex something we can't ignore; "sticky" is something for people to trip up on already, and we can't just gloss over it.

Per #5 (comment), I think we have to respect lastIndex, and I also think that virtually nobody will accidentally pass in a regex with a nonzero lastIndex.

@ljharb
Copy link
Member

ljharb commented Jul 29, 2017

Closing, since we're going to stick with respecting lastIndex.

@ljharb ljharb closed this as completed Jul 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants