-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: Add documentation+FAQs for Parquet DataSource #153
doc: Add documentation+FAQs for Parquet DataSource #153
Conversation
…' into feat/issue#109-add-parquet-source-documentation
Added the configurations + some common gotchas for setting up configurations. Also, some FAQs and how to create the stream config for parquet. I haven’t touched the diagrams yet, as they will require some effort to be redrawn since the original resources are missing. However, have changed the description in relevent places. |
docs/docs/guides/create_dagger.md
Outdated
|
||
``` | ||
|
||
or data partitioned, such as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo here, I believe we meant date partitioned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via commit 31f5ec3
…' into feat/issue#109-add-parquet-source-documentation
docs/docs/concepts/architecture.md
Outdated
- A Stream defines a logical grouping of a data source and its associated [`protobuf`](https://developers.google.com/protocol-buffers) | ||
schema. All data produced by a source follows the protobuf schema. The source can be a bounded one such as `KAFKA_SOURCE` or `KAFKA_CONSUMER` | ||
in which case, a single stream can consume from one or more topics all sharing the same schema. Otherwise, the source | ||
can be an unbounded one such as `PARQUET_SOURCE` in which case, one or more parquet files as provided are consumed in a single stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kafka is unbounded and parquet is bounded data source. Fixing this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via commit 91a73a5
docs/docs/concepts/basics.md
Outdated
A Stream defines a logical grouping of a data source and its associated [`protobuf`](https://developers.google.com/protocol-buffers) | ||
schema. All data produced by a source follows the protobuf schema. The source can be a bounded one such as `KAFKA_SOURCE` or `KAFKA_CONSUMER` | ||
in which case, a single stream can consume from one or more topics all sharing the same schema. Otherwise, the source | ||
can be an unbounded one such as `PARQUET_SOURCE` in which case, one or more parquet files as provided are consumed in a single stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as https://github.com/odpf/dagger/pull/153/files#r890081887. Will fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via commit 91a73a5
PR for #109