-
Notifications
You must be signed in to change notification settings - Fork 232
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
12 changed files
with
195 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,46 +1,136 @@ | ||
(construct-array)= | ||
# Construct | ||
|
||
You can construct a `DocumentArray` in different ways: | ||
## Construct an empty array | ||
|
||
````{tab} From empty Documents | ||
```python | ||
from jina import DocumentArray | ||
from docarray import DocumentArray | ||
|
||
da = DocumentArray.empty(10) | ||
``` | ||
```` | ||
|
||
```text | ||
<DocumentArray (length=10) at 4456123280> | ||
``` | ||
|
||
## Construct from list-like objects | ||
|
||
You can construct DocumentArray from a `Sequence`, `List`, `Tuple` or `Iterator` that yields `Document` object. | ||
|
||
````{tab} From list of Documents | ||
```python | ||
from jina import DocumentArray, Document | ||
from docarray import DocumentArray, Document | ||
da = DocumentArray([Document(...), Document(...)]) | ||
da = DocumentArray([Document(text='hello'), Document(text='world')]) | ||
``` | ||
```text | ||
<DocumentArray (length=2) at 4866772176> | ||
``` | ||
```` | ||
````{tab} From generator | ||
```python | ||
from jina import DocumentArray, Document | ||
da = DocumentArray((Document(...) for _ in range(10))) | ||
da = DocumentArray((Document() for _ in range(10))) | ||
``` | ||
```text | ||
<DocumentArray (length=10) at 4866772176> | ||
``` | ||
```` | ||
````{tab} From another DocumentArray | ||
```python | ||
from jina import DocumentArray, Document | ||
|
||
da = DocumentArray((Document() for _ in range(10))) | ||
As DocumentArray itself is also a "list-like object that yields `Document`", you can also construct DocumentArray from another DocumentArray: | ||
|
||
```python | ||
da = DocumentArray(...) | ||
da1 = DocumentArray(da) | ||
``` | ||
```` | ||
|
||
````{tab} From JSON, CSV, ndarray, files, ... | ||
## Construct from a single Document | ||
|
||
```python | ||
from docarray import DocumentArray, Document | ||
|
||
d1 = Document(text='hello') | ||
da = DocumentArray(d1) | ||
``` | ||
|
||
```text | ||
<DocumentArray (length=1) at 4452802192> | ||
``` | ||
|
||
## Deep copy on elements | ||
|
||
You can find more details about those APIs in {class}`~jina.types.arrays.mixins.io.from_gen.FromGeneratorMixin`. | ||
Note that, as in Python list, adding Document object into DocumentArray only adds its memory reference. The original Document is *not* copied. If you change the original Document afterwards, then the one inside DocumentArray will also change. Here is an example, | ||
|
||
```python | ||
da = DocumentArray.from_ndjson(...) | ||
da = DocumentArray.from_csv(...) | ||
da = DocumentArray.from_files(...) | ||
da = DocumentArray.from_lines(...) | ||
da = DocumentArray.from_ndarray(...) | ||
from docarray import DocumentArray, Document | ||
|
||
d1 = Document(text='hello') | ||
da = DocumentArray(d1) | ||
|
||
print(da[0].text) | ||
d1.text = 'world' | ||
print(da[0].text) | ||
``` | ||
```` | ||
|
||
```text | ||
hello | ||
world | ||
``` | ||
|
||
This may surprise some users, but considering the following Python code, you will find this behavior is very natural and authentic. | ||
|
||
```python | ||
d = {'hello': None} | ||
a = [d] | ||
|
||
print(a[0]['hello']) | ||
d['hello'] = 'world' | ||
print(a[0]['hello']) | ||
``` | ||
|
||
```text | ||
None | ||
world | ||
``` | ||
|
||
To make a deep copy, set `DocumentArray(..., copy=True)`. Now all Documents in this DocumentArray are completely new objects with identical contents as the original ones. | ||
|
||
```python | ||
from docarray import DocumentArray, Document | ||
|
||
d1 = Document(text='hello') | ||
da = DocumentArray(d1, copy=True) | ||
|
||
print(da[0].text) | ||
d1.text = 'world' | ||
print(da[0].text) | ||
``` | ||
|
||
```text | ||
hello | ||
hello | ||
``` | ||
|
||
## Construct from local files | ||
|
||
You may recall the common pattern that {ref}`I mentioned here<content-uri>`. With {meth}`~docarray.document.generators.from_files` One can easily construct a DocumentArray object with all file paths defined by a glob expression. | ||
|
||
```python | ||
from docarray import DocumentArray | ||
|
||
da_jpg = DocumentArray.from_files('images/*.jpg') | ||
da_png = DocumentArray.from_files('images/*.png') | ||
da_all = DocumentArray.from_files(['images/**/*.png', 'images/**/*.jpg', 'images/**/*.jpeg']) | ||
``` | ||
|
||
This will scan all filenames that match the expression and construct Documents with filled `.uri` attribute. You can control if to read each as text or binary with `read_mode` argument. | ||
|
||
|
||
|
||
## What's next? | ||
|
||
In the next chapter, we will see how to construct DocumentArray from binary bytes, JSON, CSV, dataframe, Protobuf message. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,53 +1,13 @@ | ||
# Serialization | ||
|
||
`DocumentArray` provides the following methods for importing from/exporting to different formats. | ||
DocArray is designed to be "ready-to-wire" at anytime. Serialization is important. DocumentArray provides multiple serialization methods that allows one transfer DocumentArray object over network and across different microservices. | ||
|
||
| Description | Export Method | Import Method | | ||
|-----------------------------------|---------------------------------------------------------------------|-----------------------------------------------| | ||
| LZ4-compressed binary string/file | `.to_bytes()` (or `bytes(...)` for more Pythonic), `.save_binary()` | `.load_binary()` | | ||
| JSON string/file | `.to_json()`, `.save_json()` | `.load_json()`, `.from_ndjson()` | | ||
| CSV file | `.save_csv()` | `.load_csv()`, `.from_lines()`, `.from_csv()` | | ||
| `pandas.Dataframe` object | `.to_dataframe()` | `.from_dataframe()` | | ||
| Local files | | `.from_files()` | | ||
| `numpy.ndarray` object | | `.from_ndarray()` | | ||
| Jina Cloud Storage (experimental) | `.push()` | `.pull()` | | ||
## From/to JSON | ||
|
||
```{seealso} | ||
`.from_*()` functions often utlizes generators. When using independently, can be more memory-efficient. See {mod}`~jina.types.document.generators`. | ||
``` | ||
## From/to bytes | ||
|
||
### Sharing DocumentArray across machines | ||
|
||
```{caution} | ||
This is an experimental feature introduced in Jina `2.5.4`. The behavior of this feature might change in the future. | ||
``` | ||
|
||
Since Jina `2.5.4` we introduce a new IO feature: {meth}`~jina.types.arrays.mixins.io.pushpull.PushPullMixin.push` and {meth}`~jina.types.arrays.mixins.io.pushpull.PushPullMixin.pull`, | ||
which allows you to share a DocumentArray object across machines. | ||
|
||
Considering you are working on a GPU machine via Google Colab/Jupyter. After preprocessing and embedding, you got everything you need in a DocumentArray. You can easily transfer it to the local laptop via: | ||
|
||
```python | ||
from jina import DocumentArray | ||
|
||
da = DocumentArray(...) # heavylifting, processing, GPU task, ... | ||
da.push(token='myda123') | ||
``` | ||
|
||
Then on your local laptop, simply | ||
|
||
```python | ||
from jina import DocumentArray | ||
|
||
da = DocumentArray.pull(token='myda123') | ||
``` | ||
|
||
Now you can continue the work at local, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends. | ||
|
||
For more information of this feature, please refer to {class}`~jina.types.arrays.mixins.io.pushpull.PushPullMixin`. | ||
|
||
```{danger} | ||
The lifetime of the storage is not promised at the momennt: could be a day, could be a week. Do not use it for persistence in production. Only consider this as temporary transmission or a clipboard. | ||
``` | ||
## From/to Protobuf | ||
|
||
## From/to list | ||
|
||
## From/to dataframe |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Data Sharing | ||
|
||
|
||
Since Jina `2.5.4` we introduce a new IO feature: {meth}`~jina.types.arrays.mixins.io.pushpull.PushPullMixin.push` and {meth}`~jina.types.arrays.mixins.io.pushpull.PushPullMixin.pull`, | ||
which allows you to share a DocumentArray object across machines. | ||
|
||
Considering you are working on a GPU machine via Google Colab/Jupyter. After preprocessing and embedding, you got everything you need in a DocumentArray. You can easily transfer it to the local laptop via: | ||
|
||
```python | ||
from jina import DocumentArray | ||
|
||
da = DocumentArray(...) # heavylifting, processing, GPU task, ... | ||
da.push(token='myda123') | ||
``` | ||
|
||
Then on your local laptop, simply | ||
|
||
```python | ||
from jina import DocumentArray | ||
|
||
da = DocumentArray.pull(token='myda123') | ||
``` | ||
|
||
Now you can continue the work at local, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends. | ||
|
||
For more information of this feature, please refer to {class}`~jina.types.arrays.mixins.io.pushpull.PushPullMixin`. | ||
|
||
```{danger} | ||
The lifetime of the storage is not promised at the momennt: could be a day, could be a week. Do not use it for persistence in production. Only consider this as temporary transmission or a clipboard. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Similar Image Search with ResNet50 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# QA Matching with Transformer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# What is DocArray | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters