Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved performance of String.repeat() #64489

Merged
merged 1 commit into from
Aug 23, 2022

Conversation

MewPurPur
Copy link
Contributor

@MewPurPur MewPurPur commented Aug 16, 2022

I was interested in implementing Array.repeat for my game so I looked into whether the method is already optimized in String.repeat; it wasn't, and also, it didn't avoid expensive computations in simple cases.

Optimization is like the fast power algorithm but with string concatenation via memcpy. Scales much better with big strings or more repeats.
Benchmark (updated):

Test Before After
len 1, x4 6.826ms 4.946ms
len 1, x40 25.656ms 5.388ms
len 1, x400 188.476ms 7.304ms
len 1, x4000 1764.399ms 8.035ms
len 6, x4 13.289ms 5.610ms
len 6, x40 99.884ms 7.016ms
len 6, x400 941.088ms 6.594ms
len 45, x4 66.852ms 6.724ms
len 45, x40 638.473ms 6.383ms
len 250, x4 381.928ms 7.406ms

@MewPurPur MewPurPur requested a review from a team as a code owner August 16, 2022 10:15
@Mickeon
Copy link
Contributor

Mickeon commented Aug 16, 2022

Would it be an unreasonable loss of performance to switch between the two algorithms depending on how long the String is?

@MewPurPur
Copy link
Contributor Author

MewPurPur commented Aug 16, 2022

Sure, could do it after the basic case checks. The optimization performs better for longer strings as well as for more repeats, and performance is not uniform (2^n will perform the best, 2^n - 1 will perform the worst for example). So no.

I'm not sure the method itself is as optimized as it could be, I don't know why it's slower for small inputs... But idk how to improve it. Maybe somehow avoid using a zero terminator for the intermediate strings, or somehow do a single resize at the beginning, or something else? I'm open to ideas.

@kleonc
Copy link
Member

kleonc commented Aug 16, 2022

String::repeat seems to be kinda the same as Image::fill (string = color, repetition count = image size), I wonder how approach taken there would perform here. If you want you could benchmark it, I think something like this would be equivalent:

String String::repeat(int p_count) const {
	ERR_FAIL_COND_V_MSG(p_count < 0, "", "Parameter count should be a non-negative number.");

	if (p_count == 0 || is_empty()) {
		return "";
	}

	if (p_count == 1) {
		return *this;
	}

	int len = length();

	String new_string = *this;
	new_string.resize(p_count * len + 1);

	char32_t *dst = new_string.ptrw();
	int offset = 1;
	int stride = 1;
	while (offset < p_count) {
		memcpy(dst + offset * len, dst, stride * len * sizeof(char32_t));
		offset += stride;
		stride = MIN(stride * 2, p_count - offset);
	}
	dst[p_count * len] = _null;

	return new_string;
}

@MewPurPur MewPurPur force-pushed the fast-string-repeat branch 2 times, most recently from b0bc971 to a18cae4 Compare August 16, 2022 14:59
@MewPurPur

This comment was marked as outdated.

@kleonc

This comment was marked as outdated.

@kleonc
Copy link
Member

kleonc commented Aug 16, 2022

@MewPurPur You need to fix the formatting (you have spaces instead of tabs), see Code style guidelines for how-to (or go directly to hooks).

@MewPurPur
Copy link
Contributor Author

No idea how that happened

@MewPurPur
Copy link
Contributor Author

MewPurPur commented Aug 17, 2022

No idea why the checks don't pass. But I want to try to re-implement this with memcpy anyway. I've figured the resizes from the concatenation are most likely the reason for the performance drop on small strings and inputs.

I don't want it merged like this, because 80% of the use cases would see a performance drop.

@MewPurPur MewPurPur marked this pull request as draft August 17, 2022 13:03
@kleonc
Copy link
Member

kleonc commented Aug 17, 2022

No idea why the checks don't pass.

It happens because String::operator+=(const String &) doesn't properly handle case when the same string is passed in (e.g. like in str += str). Basically in this line:

memcpy(dst, src, (rhs_len + 1) * sizeof(char32_t));

in such case destination and source memory blocks overlap as src + rhs_len * sizeof(char32_t) == dst (last char32_t to copy from src is at the same address as first char32_t in dst). And that's an undefined behavior as memcpy requires these to not overlap.

@MewPurPur
Copy link
Contributor Author

Ended up doing a condition for different lengths after all. Thankfully, it turns out for strings longer than 4 characters, this solution is always beneficial.

@kleonc
Copy link
Member

kleonc commented Aug 19, 2022

Ended up doing a condition for different lengths after all. Thankfully, it turns out for strings longer than 4 characters, this solution is always beneficial.

Looks way overcomplicated to me. I've edited my previous comment with a new version using memcpy, check it out. This one should be correct. 😉

@MewPurPur
Copy link
Contributor Author

@kleonc Oh, my, god.

The performance difference is stunning, especially compared to how it used to be. All of the tests now take between 5-8 ms in the benchmark.

@MewPurPur MewPurPur marked this pull request as ready for review August 19, 2022 14:20
core/string/ustring.cpp Outdated Show resolved Hide resolved
@akien-mga
Copy link
Member

I'm not sure why the godot-cpp test is failing. I restarted the build, if it fails again it might require a rebase if somehow this PR was made at a time where godot-cpp was in a broken state.

@akien-mga
Copy link
Member

I'm not sure why the godot-cpp test is failing. I restarted the build, if it fails again it might require a rebase if somehow this PR was made at a time where godot-cpp was in a broken state.

This will need a rebase indeed to fix the CI.

@MewPurPur
Copy link
Contributor Author

Never done rebases, hope I did well. Also I removed the conditions for basic cases since the default algorithm handles them fast now.

@akien-mga akien-mga merged commit b368985 into godotengine:master Aug 23, 2022
@akien-mga
Copy link
Member

akien-mga commented Aug 23, 2022

Thanks!

@Calinou
Copy link
Member

Calinou commented Aug 23, 2022

Benchmark (updated):

@MewPurPur Do you have the benchmark script available? I'm working on backporting this optimization to 3.x 🙂

@MewPurPur
Copy link
Contributor Author

MewPurPur commented Aug 23, 2022

@Calinou Yes, just gimme a few mins to get to my laptop! I'll pass it to you on rocketchat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants