Json, List, Text #1214

farizrahman4u · 2021-09-30T10:41:56Z

🚀 🚀 Pull Request

Checklist:

My code follows the style guidelines of this project and the Contributing document
I have commented my code, particularly in hard-to-understand areas
I have kept the coverage-rate up
I have performed a self-review of my own code and resolved any problems
I have checked to ensure there aren't any other open Pull Requests for the same change
I have described and made corresponding changes to the relevant documentation
New and existing unit tests pass locally with my changes

Changes

https://activeloop.atlassian.net/browse/AL-1081

…r_json

codecov · 2021-10-07T16:05:07Z

Codecov Report

Merging #1214 (b32da45) into main (6a7f272) will decrease coverage by 0.09%.
The diff coverage is 88.71%.

@@            Coverage Diff             @@
##             main    #1214      +/-   ##
==========================================
- Coverage   91.64%   91.55%   -0.10%     
==========================================
  Files         135      138       +3     
  Lines        9074     9425     +351     
==========================================
+ Hits         8316     8629     +313     
- Misses        758      796      +38

Impacted Files	Coverage Δ
hub/api/tests/test_api.py	`99.37% <ø> (ø)`
hub/htype.py	`100.00% <ø> (ø)`
hub/core/sample.py	`86.71% <80.00%> (+0.53%)`	⬆️
hub/core/serialize.py	`92.22% <83.33%> (-1.08%)`	⬇️
hub/util/json.py	`83.78% <83.78%> (ø)`
hub/core/meta/tensor_meta.py	`87.58% <85.18%> (-1.31%)`	⬇️
hub/core/chunk_engine.py	`88.49% <88.46%> (-0.06%)`	⬇️
hub/core/tensor.py	`78.28% <91.30%> (+1.81%)`	⬆️
hub/__init__.py	`100.00% <100.00%> (ø)`
hub/api/tests/test_json.py	`100.00% <100.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a7f272...b32da45. Read the comment docs.

CLAassistant · 2021-10-07T17:48:29Z

All committers have signed the CLA.

hub/api/tests/test_api.py

hub/core/meta/tensor_meta.py

verbose-void · 2021-10-08T17:50:52Z

hub/api/tests/test_api.py

+    assert ds.list.shape == (4, 3)
+    for i in range(4):
+        assert list(ds.list[i].numpy()) == items[i % 2]
+
+
+def test_text(memory_ds):
+    ds = memory_ds
+    ds.create_tensor("text", htype="text")
+    items = ["abcd", "efgh", "0123456"]
+    with ds:
+        for x in items:
+            ds.text.append(x)
+        ds.text.extend(items)
+    assert ds.text.shape == (6, 1)


so the shape for list basic is (4, 3) but the text shape is (6, 1)? why again? I feel like gathering the proper shape for text is easier than gathering the shape for list?

As we discussed, string are treated as scalars in numpy, so a text sample will have a shape (1,). np.array(list) will have shape (len(list),). Basically we want tensor.shape() to be consistent with tensor.numpy().shape.

verbose-void · 2021-10-08T17:52:49Z

hub/api/tests/test_api.py

+def test_json_basic(memory_ds):
+    ds = memory_ds
+    ds.create_tensor("json", htype="json")
+    items = [


can you write a test for the dtype=schema?

verbose-void · 2021-10-08T17:54:58Z

hub/util/json.py

+    return replacements.get(typ, typ)
+
+
+def _parse_schema(schema: Union[str, GenericMeta]) -> Tuple[str, List[str]]:


verbose-void · 2021-10-08T17:56:19Z

hub/core/tensor.py

+    def data(self) -> Any:
+        # TODO
+        pass


is this returning json data only (alternative to numpy?) if so, can we instead do:

text, json? i feel like it would be confusing because data can be mistaken for numpy

text if called on json data would return the actual text for the json

For now .data() return whatever was put in for json, list and text. For other htypes, the numpy array is returned. The internal json text shouldn't exposed, as it can contain encoded numpy arrays and hub samples.

verbose-void · 2021-10-08T17:58:37Z

hub/api/tests/test_api.py

+        {"x": [1, 2, 3], "y": [4, [5, 6]]},
+        {"x": [1, 2, 3], "y": [4, {"z": [0.1, 0.2, []]}]},


also, could we write a test for hub.read("path/to/file.json")?

Implementing hub.read() for text files require some extra work + we will have to refac sample.py to support text mode. We should do this in a separate ticket.

verbose-void

at minimum re-review design decision, update tests

…r_json

verbose-void

beeeeefy tests, i love to see it

farizrahman4u added 8 commits September 30, 2021 12:12

json validation

679bf16

updates

de91ae0

initial

6db2394

merge main

47d445a

fixes

e4823e6

test

d6fc7fb

format

3cefdff

Merge branch 'main' of https://www.github.com/activeloopai/hub into f…

367d570

…r_json

farizrahman4u marked this pull request as ready for review October 7, 2021 14:32

farizrahman4u added 3 commits October 7, 2021 18:36

py38 fix

daf0ec2

test list

f127d1a

test text

523bcee

davidbuniat requested review from aliubimov and verbose-void October 8, 2021 03:11

verbose-void reviewed Oct 8, 2021

View reviewed changes

hub/api/tests/test_api.py Outdated Show resolved Hide resolved

verbose-void reviewed Oct 8, 2021

View reviewed changes

hub/api/tests/test_api.py Outdated Show resolved Hide resolved

verbose-void reviewed Oct 8, 2021

View reviewed changes

hub/core/meta/tensor_meta.py Outdated Show resolved Hide resolved

verbose-void reviewed Oct 8, 2021

View reviewed changes

verbose-void suggested changes Oct 8, 2021

View reviewed changes

farizrahman4u added 5 commits October 11, 2021 02:20

refac

025e2f4

better exception

7c4ee2a

numpy tests

d68d448

hub.read support tests

ac62f86

list + numpy test

7110eee

farizrahman4u added 7 commits October 11, 2021 02:58

list + hub sample

ce9c2e8

schema tests

f402375

.data()

362c899

format

42ec0e3

smol fix

1cb89ed

Merge branch 'main' of https://www.github.com/activeloopai/hub into f…

a0fe88d

…r_json

mypy

b32da45

verbose-void approved these changes Oct 11, 2021

View reviewed changes

aliubimov approved these changes Oct 12, 2021

View reviewed changes

verbose-void merged commit 872ce9f into main Oct 12, 2021

verbose-void deleted the fr_json branch October 12, 2021 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Json, List, Text #1214

Json, List, Text #1214

farizrahman4u commented Sep 30, 2021 •

edited

Loading

codecov bot commented Oct 7, 2021 •

edited

Loading

CLAassistant commented Oct 7, 2021 •

edited

Loading

verbose-void Oct 8, 2021

farizrahman4u Oct 10, 2021

verbose-void Oct 8, 2021

verbose-void Oct 8, 2021

verbose-void Oct 8, 2021

verbose-void Oct 8, 2021

farizrahman4u Oct 11, 2021

verbose-void Oct 8, 2021

farizrahman4u Oct 10, 2021

verbose-void left a comment

verbose-void left a comment

		return replacements.get(typ, typ)


		def _parse_schema(schema: Union[str, GenericMeta]) -> Tuple[str, List[str]]:

		{"x": [1, 2, 3], "y": [4, [5, 6]]},
		{"x": [1, 2, 3], "y": [4, {"z": [0.1, 0.2, []]}]},

Json, List, Text #1214

Json, List, Text #1214

Conversation

farizrahman4u commented Sep 30, 2021 • edited Loading

🚀 🚀 Pull Request

Checklist:

Changes

codecov bot commented Oct 7, 2021 • edited Loading

Codecov Report

CLAassistant commented Oct 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

verbose-void left a comment

Choose a reason for hiding this comment

verbose-void left a comment

Choose a reason for hiding this comment

farizrahman4u commented Sep 30, 2021 •

edited

Loading

codecov bot commented Oct 7, 2021 •

edited

Loading

CLAassistant commented Oct 7, 2021 •

edited

Loading