doc: Add documentation+FAQs for Parquet DataSource #153

Meghajit · 2022-05-23T05:55:56Z

PR for #109

[raystack#109]

…' into feat/issue#109-add-parquet-source-documentation

[raystack#109]

Meghajit · 2022-05-30T11:16:03Z

Added the configurations + some common gotchas for setting up configurations. Also, some FAQs and how to create the stream config for parquet. I haven’t touched the diagrams yet, as they will require some effort to be redrawn since the original resources are missing. However, have changed the description in relevent places.

kevinbheda · 2022-06-06T10:10:15Z

docs/docs/guides/create_dagger.md

+
+```
+
+or data partitioned, such as:


small typo here, I believe we meant date partitioned here.

Fixed via commit 31f5ec3

…' into feat/issue#109-add-parquet-source-documentation

Meghajit · 2022-06-06T12:03:47Z

docs/docs/concepts/architecture.md

+- A Stream defines a logical grouping of a data source and its associated [`protobuf`](https://developers.google.com/protocol-buffers)
+  schema. All data produced by a source follows the protobuf schema. The source can be a bounded one such as `KAFKA_SOURCE` or `KAFKA_CONSUMER`
+  in which case, a single stream can consume from one or more topics all sharing the same schema. Otherwise, the source
+  can be an unbounded one such as `PARQUET_SOURCE` in which case, one or more parquet files as provided are consumed in a single stream.


Kafka is unbounded and parquet is bounded data source. Fixing this as well

Fixed via commit 91a73a5

Meghajit · 2022-06-06T12:05:14Z

docs/docs/concepts/basics.md

+A Stream defines a logical grouping of a data source and its associated [`protobuf`](https://developers.google.com/protocol-buffers) 
+schema. All data produced by a source follows the protobuf schema. The source can be a bounded one such as `KAFKA_SOURCE` or `KAFKA_CONSUMER` 
+in which case, a single stream can consume from one or more topics all sharing the same schema. Otherwise, the source 
+can be an unbounded one such as `PARQUET_SOURCE` in which case, one or more parquet files as provided are consumed in a single stream.


Same as https://github.com/odpf/dagger/pull/153/files#r890081887. Will fix this

Fixed via commit 91a73a5

[raystack#109]

doc: change headline and subsections in view of parquet source

9e6e304

[raystack#109]

Meghajit self-assigned this May 23, 2022

Meghajit linked an issue May 23, 2022 that may be closed by this pull request

doc: Add documentation+FAQs for Parquet DataSource #109

Closed

Meghajit added 13 commits May 23, 2022 12:38

doc: change introduction page to include parquet source

7a2eec1

[raystack#109]

doc: add parquet configs to Configuration reference page

2dfd54a

[raystack#109]

doc: add gotchas for parquet time range config

930e206

[raystack#109]

doc: add details on how to create dagger

36fe8c8

[raystack#109]

doc: edit common configurations page

f297fdd

[raystack#109]

doc: edit basic concepts page

6bf9adf

[raystack#109]

doc: edit lifecycle page

0d4271d

[raystack#109]

doc: edit architecture page

83fb139

[raystack#109]

doc: add watermark delay note

ccbe793

[raystack#109]

doc: edit troubleshooting page

f1ed0d2

[raystack#109]

Merge remote-tracking branch 'upstream/dagger-parquet-file-processing…

33ec343

…' into feat/issue#109-add-parquet-source-documentation

doc: edit troubleshooting questions

6402329

[raystack#109]

doc: add more notes in configuration.md for file paths

f67c832

[raystack#109]

Meghajit marked this pull request as ready for review May 30, 2022 11:10

kevinbheda reviewed Jun 6, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/dagger-parquet-file-processing…

21f4030

…' into feat/issue#109-add-parquet-source-documentation

Meghajit commented Jun 6, 2022

View reviewed changes

Meghajit added 2 commits June 6, 2022 17:37

doc: interchange source type statements

91a73a5

[raystack#109]

doc: fix typo

31f5ec3

[raystack#109]

kevinbheda merged commit d97e37e into raystack:dagger-parquet-file-processing Jun 6, 2022

Meghajit mentioned this pull request Aug 18, 2022

docs: update documentation to account for parquet source #190

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: Add documentation+FAQs for Parquet DataSource #153

doc: Add documentation+FAQs for Parquet DataSource #153

Meghajit commented May 23, 2022

Meghajit commented May 30, 2022

kevinbheda Jun 6, 2022 •

edited

Loading

Meghajit Jun 6, 2022

Meghajit Jun 6, 2022

kevinbheda Jun 6, 2022

Meghajit Jun 6, 2022 •

edited

Loading

Meghajit Jun 6, 2022

Meghajit Jun 6, 2022 •

edited

Loading

doc: Add documentation+FAQs for Parquet DataSource #153

doc: Add documentation+FAQs for Parquet DataSource #153

Conversation

Meghajit commented May 23, 2022

Meghajit commented May 30, 2022

kevinbheda Jun 6, 2022 • edited Loading

Choose a reason for hiding this comment

Meghajit Jun 6, 2022

Choose a reason for hiding this comment

Meghajit Jun 6, 2022

Choose a reason for hiding this comment

kevinbheda Jun 6, 2022

Choose a reason for hiding this comment

Meghajit Jun 6, 2022 • edited Loading

Choose a reason for hiding this comment

Meghajit Jun 6, 2022

Choose a reason for hiding this comment

Meghajit Jun 6, 2022 • edited Loading

Choose a reason for hiding this comment

kevinbheda Jun 6, 2022 •

edited

Loading

Meghajit Jun 6, 2022 •

edited

Loading

Meghajit Jun 6, 2022 •

edited

Loading