-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc.data: save and try loading raw dir objects #7597
dvc.data: save and try loading raw dir objects #7597
Conversation
86c491e
to
d6152f7
Compare
8d4fc81
to
73e1716
Compare
73e1716
to
a207fcf
Compare
This comment was marked as outdated.
This comment was marked as outdated.
a207fcf
to
22fe50b
Compare
@dtrifiro Thanks for the benches! 🙏 Btw, could you take a look at what is still taking that long in relation to this PR? I suppose it is mostly the overhead of those ref obj recreations? Just wondering if there are any low hangers that we could pick until a proper solution with persistent refobjects arrives. |
This comment was marked as outdated.
This comment was marked as outdated.
22fe50b
to
6124165
Compare
@dtrifiro Thanks for the traces 🙏 Any particular points you were able to identify? Any particular ideas? |
352517c
to
27b7927
Compare
27b7927
to
3f24f91
Compare
3f24f91
to
b8362bc
Compare
Updated benchmarks can be found here: A summary for ----------------------------------------------------------------------------------- benchmark 'test_status-status': 3 tests ------------------------------------------------------------------------------------
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_status-status (test_status/0001_fix-739) 25.2779 (1.0) 25.2779 (1.0) 25.2779 (1.0) 0.0000 (1.0) 25.2779 (1.0) 0.0000 (1.0) 0;0 0.0396 (1.0) 1 1
test_status-status (test_status/0002_main) 36.9391 (1.46) 36.9391 (1.46) 36.9391 (1.46) 0.0000 (1.0) 36.9391 (1.46) 0.0000 (1.0) 0;0 0.0271 (0.68) 1 1
test_status-status (test_status/0003_2.10.0) 36.9755 (1.46) 36.9755 (1.46) 36.9755 (1.46) 0.0000 (1.0) 36.9755 (1.46) 0.0000 (1.0) 0;0 0.0270 (0.68) 1 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_status-status-changed': 3 tests ------------------------------------------------------------------------------------
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_status-status-changed (test_status/0001_fix-739) 14.9064 (1.0) 14.9064 (1.0) 14.9064 (1.0) 0.0000 (1.0) 14.9064 (1.0) 0.0000 (1.0) 0;0 0.0671 (1.0) 1 1
test_status-status-changed (test_status/0002_main) 26.9573 (1.81) 26.9573 (1.81) 26.9573 (1.81) 0.0000 (1.0) 26.9573 (1.81) 0.0000 (1.0) 0;0 0.0371 (0.55) 1 1
test_status-status-changed (test_status/0003_2.10.0) 27.1236 (1.82) 27.1236 (1.82) 27.1236 (1.82) 0.0000 (1.0) 27.1236 (1.82) 0.0000 (1.0) 0;0 0.0369 (0.55) 1 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------ benchmark 'test_status-status-changed-noop': 3 tests -----------------------------------------------------------------------------------
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_status-status-changed-noop (test_status/0001_fix-739) 3.9865 (1.0) 3.9865 (1.0) 3.9865 (1.0) 0.0000 (1.0) 3.9865 (1.0) 0.0000 (1.0) 0;0 0.2508 (1.0) 1 1
test_status-status-changed-noop (test_status/0002_main) 26.6765 (6.69) 26.6765 (6.69) 26.6765 (6.69) 0.0000 (1.0) 26.6765 (6.69) 0.0000 (1.0) 0;0 0.0375 (0.15) 1 1
test_status-status-changed-noop (test_status/0003_2.10.0) 27.0214 (6.78) 27.0214 (6.78) 27.0214 (6.78) 0.0000 (1.0) 27.0214 (6.78) 0.0000 (1.0) 0;0 0.0370 (0.15) 1 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
dvc/data/stage.py
Outdated
tree = Tree.load(odb, hash_info.as_raw()) | ||
tree.check(odb) | ||
except ObjectFormatError: | ||
raise FileNotFoundError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise FileNotFoundError | |
raise FileNotFoundError from exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are here:
raise FileNotFoundError | |
raise FileNotFoundError(errno.ENOENT, "No such object", odb.hash_to_path(hash_info.value)) |
Or, just use hash_info.value
for simplicity.
b8362bc
to
3c7d08e
Compare
We could push performance a little bit more by avoiding calling Hacking around, I was able to get a working prototype and ran some benchmarks with dvc-bench's |
3c7d08e
to
42f87aa
Compare
When staging a directory, always save a "raw dir object" to the odb. If the corresponding ".dir" object has not been added to the odb, `stage()` calls can load the tree from the raw dir object instead of rebuilding it by walking the directory. This can lead to significant speed improvements when calling `dvc status` for modified directories. Fixes iterative#7390
42f87aa
to
1f92c4b
Compare
@dtrifiro, are those improvements on |
I forgot to update the https://github.com/dtrifiro/dvc-bench/runs/6404025103?check_suite_focus=true#step:8:258
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it looks like we could clean this up a bit more, but let's run with it for now and get back to it when we safe ref objects.
obj = load(odb, hash_info) | ||
check(odb, obj, check_hash=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, this PR introduced a typo: it should be _odb
here 🙁
Introduced in iterative/dvc#7597
Introduced in iterative/dvc#7597
When staging a directory, always save a "raw dir object" to the odb. If the corresponding ".dir" object has not been added to the odb,
stage()
calls can load the tree from the raw dir object instead of rebuilding it by walking the directory.This can lead to significant speed improvements when calling
dvc status
for modified directories.Fixes #7390