-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json, List, Text #1214
Json, List, Text #1214
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1214 +/- ##
==========================================
- Coverage 91.64% 91.55% -0.10%
==========================================
Files 135 138 +3
Lines 9074 9425 +351
==========================================
+ Hits 8316 8629 +313
- Misses 758 796 +38
Continue to review full report at Codecov.
|
hub/api/tests/test_api.py
Outdated
assert ds.list.shape == (4, 3) | ||
for i in range(4): | ||
assert list(ds.list[i].numpy()) == items[i % 2] | ||
|
||
|
||
def test_text(memory_ds): | ||
ds = memory_ds | ||
ds.create_tensor("text", htype="text") | ||
items = ["abcd", "efgh", "0123456"] | ||
with ds: | ||
for x in items: | ||
ds.text.append(x) | ||
ds.text.extend(items) | ||
assert ds.text.shape == (6, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the shape for list basic is (4, 3) but the text shape is (6, 1)? why again? I feel like gathering the proper shape for text is easier than gathering the shape for list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, string are treated as scalars in numpy, so a text sample will have a shape (1,). np.array(list)
will have shape (len(list),). Basically we want tensor.shape() to be consistent with tensor.numpy().shape.
hub/api/tests/test_api.py
Outdated
def test_json_basic(memory_ds): | ||
ds = memory_ds | ||
ds.create_tensor("json", htype="json") | ||
items = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write a test for the dtype=schema?
hub/util/json.py
Outdated
return replacements.get(typ, typ) | ||
|
||
|
||
def _parse_schema(schema: Union[str, GenericMeta]) -> Tuple[str, List[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstr
hub/core/tensor.py
Outdated
def data(self) -> Any: | ||
# TODO | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this returning json data only (alternative to numpy?) if so, can we instead do:
text
, json
? i feel like it would be confusing because data
can be mistaken for numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text if called on json data would return the actual text for the json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now .data()
return whatever was put in for json, list and text. For other htypes, the numpy array is returned. The internal json text shouldn't exposed, as it can contain encoded numpy arrays and hub samples.
hub/api/tests/test_api.py
Outdated
{"x": [1, 2, 3], "y": [4, [5, 6]]}, | ||
{"x": [1, 2, 3], "y": [4, {"z": [0.1, 0.2, []]}]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, could we write a test for hub.read("path/to/file.json")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing hub.read()
for text files require some extra work + we will have to refac sample.py to support text mode. We should do this in a separate ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at minimum re-review design decision, update tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beeeeefy tests, i love to see it
🚀 🚀 Pull Request
Checklist:
coverage-rate
upChanges
https://activeloop.atlassian.net/browse/AL-1081