Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json, List, Text #1214

Merged
merged 23 commits into from
Oct 12, 2021
Merged

Json, List, Text #1214

merged 23 commits into from
Oct 12, 2021

Conversation

farizrahman4u
Copy link
Contributor

@farizrahman4u farizrahman4u commented Sep 30, 2021

🚀 🚀 Pull Request

Checklist:

  • My code follows the style guidelines of this project and the Contributing document
  • I have commented my code, particularly in hard-to-understand areas
  • I have kept the coverage-rate up
  • I have performed a self-review of my own code and resolved any problems
  • I have checked to ensure there aren't any other open Pull Requests for the same change
  • I have described and made corresponding changes to the relevant documentation
  • New and existing unit tests pass locally with my changes

Changes

https://activeloop.atlassian.net/browse/AL-1081

@farizrahman4u farizrahman4u marked this pull request as ready for review October 7, 2021 14:32
@codecov
Copy link

codecov bot commented Oct 7, 2021

Codecov Report

Merging #1214 (b32da45) into main (6a7f272) will decrease coverage by 0.09%.
The diff coverage is 88.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1214      +/-   ##
==========================================
- Coverage   91.64%   91.55%   -0.10%     
==========================================
  Files         135      138       +3     
  Lines        9074     9425     +351     
==========================================
+ Hits         8316     8629     +313     
- Misses        758      796      +38     
Impacted Files Coverage Δ
hub/api/tests/test_api.py 99.37% <ø> (ø)
hub/htype.py 100.00% <ø> (ø)
hub/core/sample.py 86.71% <80.00%> (+0.53%) ⬆️
hub/core/serialize.py 92.22% <83.33%> (-1.08%) ⬇️
hub/util/json.py 83.78% <83.78%> (ø)
hub/core/meta/tensor_meta.py 87.58% <85.18%> (-1.31%) ⬇️
hub/core/chunk_engine.py 88.49% <88.46%> (-0.06%) ⬇️
hub/core/tensor.py 78.28% <91.30%> (+1.81%) ⬆️
hub/__init__.py 100.00% <100.00%> (ø)
hub/api/tests/test_json.py 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a7f272...b32da45. Read the comment docs.

@CLAassistant
Copy link

CLAassistant commented Oct 7, 2021

CLA assistant check
All committers have signed the CLA.

Comment on lines 783 to 796
assert ds.list.shape == (4, 3)
for i in range(4):
assert list(ds.list[i].numpy()) == items[i % 2]


def test_text(memory_ds):
ds = memory_ds
ds.create_tensor("text", htype="text")
items = ["abcd", "efgh", "0123456"]
with ds:
for x in items:
ds.text.append(x)
ds.text.extend(items)
assert ds.text.shape == (6, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the shape for list basic is (4, 3) but the text shape is (6, 1)? why again? I feel like gathering the proper shape for text is easier than gathering the shape for list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, string are treated as scalars in numpy, so a text sample will have a shape (1,). np.array(list) will have shape (len(list),). Basically we want tensor.shape() to be consistent with tensor.numpy().shape.

Comment on lines 756 to 759
def test_json_basic(memory_ds):
ds = memory_ds
ds.create_tensor("json", htype="json")
items = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write a test for the dtype=schema?

hub/util/json.py Outdated
return replacements.get(typ, typ)


def _parse_schema(schema: Union[str, GenericMeta]) -> Tuple[str, List[str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstr

Comment on lines 408 to 410
def data(self) -> Any:
# TODO
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this returning json data only (alternative to numpy?) if so, can we instead do:

text, json? i feel like it would be confusing because data can be mistaken for numpy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text if called on json data would return the actual text for the json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now .data() return whatever was put in for json, list and text. For other htypes, the numpy array is returned. The internal json text shouldn't exposed, as it can contain encoded numpy arrays and hub samples.

Comment on lines 760 to 761
{"x": [1, 2, 3], "y": [4, [5, 6]]},
{"x": [1, 2, 3], "y": [4, {"z": [0.1, 0.2, []]}]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, could we write a test for hub.read("path/to/file.json")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing hub.read() for text files require some extra work + we will have to refac sample.py to support text mode. We should do this in a separate ticket.

Copy link
Contributor

@verbose-void verbose-void left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at minimum re-review design decision, update tests

Copy link
Contributor

@verbose-void verbose-void left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beeeeefy tests, i love to see it

@verbose-void verbose-void merged commit 872ce9f into main Oct 12, 2021
@verbose-void verbose-void deleted the fr_json branch October 12, 2021 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants