docs: Port Conduit documentation from Indexer repo. (#9)

algorand · Mar 14, 2023 · 79b8292 · 79b8292
1 parent dfab75d
commit 79b8292
Show file tree

Hide file tree

Showing 16 changed files with 838 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,60 @@
+<div style="text-align:center" align="center">
+  <picture>
+    <img src="./docs/assets/algorand_logo_mark_black.svg" alt="Algorand" width="400">
+    <source media="(prefers-color-scheme: dark)" srcset="./assets/docs/algorand_logo_mark_white.svg">
+    <source media="(prefers-color-scheme: light)" srcset="./assets/docs/algorand_logo_mark_black.svg">
+  </picture>
+
+[![CircleCI](https://img.shields.io/circleci/build/github/algorand/indexer/develop?label=develop)](https://circleci.com/gh/algorand/indexer/tree/develop)
+[![CircleCI](https://img.shields.io/circleci/build/github/algorand/indexer/master?label=master)](https://circleci.com/gh/algorand/indexer/tree/master)
+![Github](https://img.shields.io/github/license/algorand/indexer)
+[![Contribute](https://img.shields.io/badge/contributor-guide-blue?logo=github)](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md)
+</div>
+
+# Algorand Conduit
+
+Conduit is a framework for ingesting blocks from the Algorand blockchain into external applications. It is designed as modular plugin system that allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network.
+
+# Getting Started
+
+See the [Getting Started](./docs/GettingStarted.md) page.
+
+## Building from source
+
+Development is done using the [Go Programming Language](https://golang.org/), the version is specified in the project's [go.mod](go.mod) file. This document assumes that you have a functioning
+environment setup. If you need assistance setting up an environment please visit
+the [official Go documentation website](https://golang.org/doc/).
+
+Run `make` to build Conduit, the binary is located at `cmd/conduit/conduit`.
+
+# Configuration
+
+See the [Configuration](./docs/Configuration.md) page.
+
+# Develoment
+
+See the [Development](./docs/Development.md) page for building a plugin.
+
+# Plugin System
+A Conduit pipeline is composed of 3 components, [Importers](./conduit/plugins/importers/), [Processors](./conduit/plugins/processors/), and [Exporters](./conduit/plugins/exporters/).
+Every pipeline must define exactly 1 Importer, exactly 1 Exporter, and can optionally define a series of 0 or more Processors.
+
+# Contributing
+
+Contributions are welcome! Please refer to our [CONTRIBUTING](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md) document for general contribution guidelines, and individual plugin documentation for contributing to new and existing Conduit plugins.
+
+# Common Setups
+
+The most common usage of Conduit is to get validated blocks from a local `algod` Algorand node, and adding them to a database (such as [PostgreSQL](https://www.postgresql.org/)).
+Users can separately (outside of Conduit) serve that data via an API to make available a variety of prepared queries--this is what the Algorand Indexer does.
+
+Conduit works by fetching blocks one at a time via the configured Importer, sending the block data through the configured Processors, and terminating block handling via an Exporter (traditionally a database).
+For a step-by-step walkthrough of a basic Conduit setup, see [Writing Blocks To Files](./docs/tutorials/WritingBlocksToFile.md).
+
+# Migrating from Indexer
+
+Indexer was built in a way that strongly coupled it to Postgresql, and the defined REST API. We've built Conduit in a way which is backwards compatible with the preexisting Indexer application. Running the `algorand-indexer` binary will use Conduit to construct a pipeline that replicates the Indexer functionality.
+
+Going forward we will continue to maintain the Indexer application, however our main focus will be enabling and optimizing a multitude of use cases through the Conduit pipeline design rather the singular Indexer pipeline.
+
+For a more detailed look at the differences between Conduit and Indexer, see [our migration guide](./docs/tutorials/IndexerMigration.md).
diff --git a/docs/Configuration.md b/docs/Configuration.md
@@ -0,0 +1,54 @@
+# Configuration
+
+Configuration is stored in a file in the data directory named `conduit.yml`.
+Use `./conduit -h` for command options.
+
+## conduit.yml
+
+There are several top level configurations for configuring behavior of the conduit process. Most detailed configuration is made on a per-plugin basis. These are split between `Importer`, `Processor` and `Exporter` plugins.
+
+Here is an example configuration which shows the general format:
+```yaml
+# optional: hide the startup banner.
+hide-banner: true|false
+
+# optional: level to use for logging.
+log-level: "INFO, WARN, ERROR"
+
+# optional: path to log file
+log-file: "<path>"
+
+# optional: if present perform runtime profiling and put results in this file.
+cpu-profile: "path to cpu profile file."
+
+# optional: maintain a pidfile for the life of the conduit process.
+pid-filepath: "path to pid file."
+
+# optional: setting to turn on Prometheus metrics server
+metrics: 
+  mode: "ON, OFF"
+  addr: ":<server-port>"
+  prefix: "promtheus_metric_prefix"
+
+# Define one importer.
+importer:
+    name:
+    config:
+
+# Define one or more processors.
+processors:
+  - name:
+    config:
+  - name:
+    config:
+
+# Define one exporter.
+exporter:
+    name:
+    config:
+```
+
+## Plugin configuration
+
+See [plugin list](plugins/home.md) for details.
+Each plugin is identified by a `name`, and provided the `config` during initialization.
diff --git a/docs/Development.md b/docs/Development.md
@@ -0,0 +1,80 @@
+# Creating A Plugin
+
+There are three different interfaces to implement, depending on what sort of functionality you are adding:
+* Importer: for sourcing data into the system.
+* Processor: for manipulating data as it goes through the system.
+* Exporter: for sending processed data somewhere.
+
+All plugins should be implemented in the respective `importers`, `processors`, or `exporters` package.
+
+# Registering a plugin
+
+## Register the Constructor
+
+The constructor is registered to the system by name in the init this is how the configuration is able to dynamically create pipelines:
+```
+func init() {
+	exporters.RegisterExporter(noopExporterMetadata.ExpName, exporters.ExporterConstructorFunc(func() exporters.Exporter {
+		return &noopExporter{}
+	}))
+}
+```
+
+There are similar interfaces for each plugin type.
+
+## Load the Plugin
+
+Each plugin package contains an `all.go` file. Add your plugin to the import statement, this causes the init function to be called and ensures the plugin is registered.
+
+# Implement the interface
+
+Generally speaking, you can follow the code in one of the existing plugins.
+
+# Lifecycle
+
+## Init
+
+Each plugin will have it's `Init` function called once as the pipline is constructed.
+
+The context provided to this function should be saved, and used to terminate any long-running operations if necessary.
+
+## Per-round function
+
+Each plugin type has a function which is called once per round:
+* Importer: `GetBlock` called when a particular round is required. Generally this will be increasing over time.
+* Processor: `Process` called to process a round.
+* Exporter: `Receive` for consuming a round.
+
+## Close
+
+Called during a graceful shutdown. We make every effort to call this function, but it is not guaranteed.
+
+## Hooks
+
+There are special lifecycle hooks that can be registered on any plugin by implementing additional interfaces.
+
+### Completed
+
+When all processing has completed for a round, the `OnComplete` function is called on any plugin that implements it.
+
+```go
+// Completed is called by the conduit pipeline after every exporter has
+// finished. It can be used for things like finalizing state.
+type Completed interface {
+	// OnComplete will be called by the Conduit framework when the pipeline
+	// finishes processing a round.
+	OnComplete(input data.BlockData) error
+}
+```
+
+### PluginMetrics
+
+After the pipeline has been initialized, and before it has been started, plugins may provide prometheus metric handlers. The subsystem is a configurable value that should be passed into the Prometheus metric constructors.
+The ProvideMetrics function will only be called once.
+
+```go
+// PluginMetrics is for defining plugin specific metrics
+type PluginMetrics interface {
+	ProvideMetrics(subsystem string) []prometheus.Collector
+}
+```
diff --git a/docs/GettingStarted.md b/docs/GettingStarted.md
@@ -0,0 +1,42 @@
+# Getting Started
+
+
+## Installation
+
+### Install from Source
+
+1. Checkout the repo, or download the source, `git clone https://github.com/algorand/indexer.git && cd indexer`
+2. Run `make conduit`.
+3. The binary is created at `cmd/conduit/conduit`.
+
+### Go Install
+
+Go installs of the indexer repo do not currently work because of its use of the `replace` directive to support the 
+go-algorand submodule. 
+
+**In Progress**
+There is ongoing work to remove go-algorand entirely as a dependency of indexer/conduit. Once
+that work is complete users should be able to use `go install` to install binaries for this project.
+
+## Getting Started
+
+Conduit requires a configuration file to set up and run a data pipeline. To generate an initial skeleton for a conduit
+config file, you can run `./conduit init`. This will set up a sample data directory with a config located at
+`data/conduit.yml`.
+
+You will need to manually edit the data in the config file, filling in a valid configuration for conduit to run.  
+You can find a valid config file in [Configuration.md](Configuration.md) or via the `conduit init` command.
+
+Once you have a valid config file in a directory, `config_directory`, launch conduit with `./conduit -d config_directory`.
+
+
+# Configuration and Plugins
+Conduit comes with an initial set of plugins available for use in pipelines. For more information on the possible
+plugins and how to include these plugins in your pipeline's configuration file see [Configuration.md](Configuration.md).
+
+# Tutorials
+For more detailed guides, walkthroughs, and step by step writeups, take a look at some of our
+[Conduit tutorials](./tutorials). Here are a few of the highlights:
+* [How to write block data to the filesystem](./tutorials/WritingBlocksToFile.md)
+* [A deep dive into the filter processor](./tutorials/FilterDeepDive.md)
+* [The differences and migration paths between Indexer & Conduit](./tutorials/IndexerMigration.md)
diff --git a/docs/assets/algorand_logo_mark_black.svg b/docs/assets/algorand_logo_mark_black.svg
diff --git a/docs/assets/algorand_logo_mark_white.svg b/docs/assets/algorand_logo_mark_white.svg
diff --git a/docs/plugins/algod.md b/docs/plugins/algod.md
@@ -0,0 +1,16 @@
+# Algod Importer
+
+Fetch blocks one by one from the [algod REST API](https://developer.algorand.org/docs/rest-apis/algod/v2/). The node must be configured as an archival node in order to
+provide old blocks.
+
+Block data from the Algod REST API contains the block header, transactions, and a vote certificate.
+
+# Config
+```yaml
+importer:
+    name: algod
+    config:
+      - netaddr: "algod URL"
+        token: "algod REST API token"
+```
+
diff --git a/docs/plugins/file_writer.md b/docs/plugins/file_writer.md
@@ -0,0 +1,20 @@
+# Filewriter Exporter
+
+Write the block data to a file.
+
+Data is written to one file per block in JSON format.
+
+By default data is written to the filewriter plugin directory inside the indexer data directory.
+
+# Config
+```yaml
+exporter:
+  - name: file_writer
+    config:
+      - block-dir: "override default block data location."
+        # override the filename pattern.
+        filename-pattern: "%[1]d_block.json"
+        # exclude the vote certificate from the file.
+        drop-certificate: false
+```
+
diff --git a/docs/plugins/filter_processor.md b/docs/plugins/filter_processor.md
@@ -0,0 +1,66 @@
+# Filter Processor
+
+This is used to filter transactions to include only the ones that you want. This may be useful for some deployments
+which only require specific applications or accounts.
+
+## any / all
+One or more top-level operations should be provided.
+* any: transactions are included if they match `any` of the nested sub expressions.
+* all: transactions are included if they match `all` of the nested sub expressions.
+
+If `any` and `all` are both provided, the transaction must pass both checks.
+
+## Sub expressions
+
+Parts of an expression:
+* `tag`: the transaction field being considering.
+* `expression-type`: The type of expression.
+* `expression`: Input to the expression
+
+### tag
+The full path to a given field. Uses the messagepack encoded names of a canonical transaction. For example:
+* `txn.snd` is the sender.
+* `txn.amt` is the amount.
+
+For information about the structure of transactions, refer to the [Transaction Structure](https://developer.algorand.org/docs/get-details/transactions/) documentation. For detail about individual fields, refer to the [Transaction Reference](https://developer.algorand.org/docs/get-details/transactions/transactions/) documentation.
+
+**Note**: The "Apply Data" information is also available for filtering. These fields are not well documented. Advanced users can inspect raw transactions returned by algod to see what fields are available.
+
+### expression-type
+
+What type of expression to use for filtering the tag.
+* `exact`: exact match for string values.
+* `regex`:  applies regex rules to the matching.
+* `less-than` applies numerical less than expression.
+* `less-than-equal` applies numerical less than or equal expression.
+* `greater-than` applies numerical greater than expression.
+* `greater-than-equal` applies numerical greater than or equal expression.
+* `equal` applies numerical equal expression.
+* `not-equal` applies numerical not equal expression.
+
+### expression
+
+The input to the expression. A number or string depending on the expression type.
+
+# Config
+```yaml
+processors:
+  - name: filter_processor
+    config:
+      - filters:
+          - any
+              - tag:
+                expression-type:
+                expression:
+              - tag:
+                expression-type:
+                expression:
+          - all
+              - tag:
+                expression-type:
+                expression:
+              - tag:
+                expression-type:
+                expression:
+```
+
diff --git a/docs/plugins/home.md b/docs/plugins/home.md
@@ -0,0 +1,18 @@
+# Plugin Configuration
+
+Each plugin is identified by a `name`, and provided the `config` during initialization.
+
+## Importers
+
+* [algod](algod.md)
+* [file_reader](file_reader.md)
+
+## Processors
+* [filter_processor](filter_processor.md)
+* [noop_processor](noop_processor.md)
+
+## Exporters
+* [file_writer](file_writer.md)
+* [postgresql](postgresql.md)
+* [noop_exporter](noop_exporter.md)
+
diff --git a/docs/plugins/noop_exporter.md b/docs/plugins/noop_exporter.md
@@ -0,0 +1,11 @@
+# Noop Exporter
+
+For testing purposes, the noop processor discards any data it receives.
+
+# Config
+```yaml
+processors:
+  - name: noop
+    config:
+```
+
diff --git a/docs/plugins/noop_processor.md b/docs/plugins/noop_processor.md
@@ -0,0 +1,11 @@
+# Noop Processor
+
+For testing purposes, the noop processor simply passes the input to the output.
+
+# Config
+```yaml
+processors:
+  - name: noop
+    config:
+```
+