Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AL-1619][AL-1586] Removes cachable, makes Info more dict like #1469

Merged
merged 59 commits into from
Feb 15, 2022

Conversation

AbhinavTuli
Copy link
Contributor

@AbhinavTuli AbhinavTuli commented Feb 1, 2022

🚀 🚀 Pull Request

Checklist:

  • My code follows the style guidelines of this project and the Contributing document
  • I have commented my code, particularly in hard-to-understand areas
  • I have kept the coverage-rate up
  • I have performed a self-review of my own code and resolved any problems
  • I have checked to ensure there aren't any other open Pull Requests for the same change
  • I have described and made corresponding changes to the relevant documentation
  • New and existing unit tests pass locally with my changes

Changes

  • Removes Cachable, should increase stability, reduce memory leaks and speed up certain writes.
  • Info object is now much more dict like and supports things like:-
ds.info.keys()
ds.info.values()
ds.info.items()
"xyz" in ds.info
ds.info.setdefault(0, "hello")
ds.info.popitem()
ds.info.pop(0)
del ds.info["abc"]
for it in ds.info:
    pass

@AbhinavTuli AbhinavTuli changed the title [AL-1619][AL-1586] Remove cachable [AL-1619][AL-1586] Removes cachable, makes Info more dict like Feb 1, 2022
@tatevikh tatevikh requested a review from jraman February 8, 2022 16:19
jraman
jraman previously requested changes Feb 9, 2022
hub/api/info.py Outdated Show resolved Hide resolved
hub/api/info.py Outdated Show resolved Hide resolved
hub/api/info.py Show resolved Hide resolved
@@ -41,13 +42,14 @@ def test_dataset(local_ds_generator):
assert len(ds.info) == 7
assert ds.info.test == [99]

ds.info.delete("test")
ds.info.pop("test")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change doesn't look to be backward compatible, hub version needs to be appropriately bumped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight change in API for info, delete is barely used atm, we should be fine.

assert ds.info[0] == "yo"
assert ds.info[1] == "world"

ds.info.popitem()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate tests for some of these methods (like popitem) might be better -- LIFO popitem not obvious here :)

hub/core/tensor.py Show resolved Hide resolved
hub/core/tensor.py Outdated Show resolved Hide resolved
hub/core/version_control/commit_diff.py Show resolved Hide resolved
hub/util/diff.py Show resolved Hide resolved
hub/util/transform.py Show resolved Hide resolved

def popitem(self):
with self:
popped = self._info.popitem()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In python < 3.7, popitem() pops and returns an arbitrary entry while in 3.7+ it returns LIFO. Have you taken that into account wherever popitem is called? (Not just info, but lru_sizes.popitem etc as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For info, it shouldn't matter, it will behave like the corresponding version's dictionary.
For LRUCache, I don't think it should be an issue even in 3.6 as the Cpython implementation retains order.

From StackOverflow (https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6):-
"They are insertion ordered. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior).

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature."

@AbhinavTuli AbhinavTuli merged commit aada0a9 into main Feb 15, 2022
@AbhinavTuli AbhinavTuli deleted the remove/cachable branch February 15, 2022 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants