-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolvers: optimize "uniq" iteration #5914
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TL;DR - It profiles quite nicely.from random import randint
from timeit import timeit
import pandas as pd
import matplotlib.pyplot as pyplot
pyplot.style.use('ggplot') def old_uniq(iterable):
ret = []
for item in iterable:
if item not in ret:
ret.append(item)
return ret def new_uniq(iterable):
cache = set()
for item in iterable:
if item not in cache:
cache.add(item)
yield item Generate samplessamples = {}
for samplesize in [10, 50, 100, 500, 1000, 5000, 10000, 50000]:
samples[samplesize] = [randint(1, 10) for i in range(samplesize)] Run testsIterable sizeresults = {}
for size, samples in samples.items():
res = {}
res['before'] = timeit(lambda: old_uniq(samples), number=100)
res['after'] = timeit(lambda: new_uniq(samples), number=100)
results[size] = res pd.DataFrame(results).T
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
pd.DataFrame(results).T.plot(xlabel='Iterable Size', ylabel='Time(s)', title='Time v. Iterable Size')
Selection variabilityselectionsize = {}
for size in [10, 50, 100, 500, 1000, 5000, 10000, 50000]:
selectionsize[size] = [randint(1, size) for i in range(1000)] results2 = {}
for size, samples in selectionsize.items():
res = {}
res['before'] = timeit(lambda: old_uniq(samples), number=100)
res['after'] = timeit(lambda: new_uniq(samples), number=100)
results2[size] = res pd.DataFrame(results2).T
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
pd.DataFrame(results2).T.plot(xlabel='Iterable Variability', ylabel='Time(s)', title='Time v. Iterable Variability')
Overallsizeandvar = {}
for size in [10, 50, 100, 500, 1000, 5000]:
sizeandvar[size] = [randint(1, size) for i in range(size)] results3 = {}
for size, samples in sizeandvar.items():
res = {}
res['before'] = timeit(lambda: old_uniq(samples), number=100)
res['after'] = timeit(lambda: new_uniq(samples), number=100)
results3[size] = res pd.DataFrame(results3).T
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
pd.DataFrame(results3).T.plot(xlabel='Iterable Size & Variability', ylabel='Time(s)', title='Time v. Iterable Variability')
|
wxtim
approved these changes
Jan 12, 2024
hjoliver
approved these changes
Jan 16, 2024
BLOCKED.. No merge till ...
|
* Add a more efficient method for stripping duplicate items whilst maintaining iteration order.
PR merged |
oliver-sanders
force-pushed
the
iter_uniq
branch
from
January 17, 2024 11:29
3e94637
to
d72f460
Compare
Unrelated linkcheck failure |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Requires: #5769
In the data store we sometimes need to strip duplicate items from an iterable whilst maintaining iteration order.
If we didn't need to maintain order, we would use sets. This new method is more efficient than the old one for iteration use cases.
Simple performance test:
The real world impact of this optimisation is probably quite small.
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.