From 79b8292c317335dfa34d664ae14ed9e6783a1e73 Mon Sep 17 00:00:00 2001 From: Will Winder Date: Tue, 14 Mar 2023 14:47:04 -0400 Subject: [PATCH] docs: Port Conduit documentation from Indexer repo. (#9) --- README.md | 60 +++++++ docs/Configuration.md | 54 ++++++ docs/Development.md | 80 +++++++++ docs/GettingStarted.md | 42 +++++ docs/assets/algorand_logo_mark_black.svg | 1 + docs/assets/algorand_logo_mark_white.svg | 1 + docs/plugins/algod.md | 16 ++ docs/plugins/file_writer.md | 20 +++ docs/plugins/filter_processor.md | 66 +++++++ docs/plugins/home.md | 18 ++ docs/plugins/noop_exporter.md | 11 ++ docs/plugins/noop_processor.md | 11 ++ docs/plugins/postgresql.md | 23 +++ docs/tutorials/FilterDeepDive.md | 93 ++++++++++ docs/tutorials/IndexerMigration.md | 128 ++++++++++++++ docs/tutorials/WritingBlocksToFile.md | 214 +++++++++++++++++++++++ 16 files changed, 838 insertions(+) create mode 100644 README.md create mode 100644 docs/Configuration.md create mode 100644 docs/Development.md create mode 100644 docs/GettingStarted.md create mode 100644 docs/assets/algorand_logo_mark_black.svg create mode 100644 docs/assets/algorand_logo_mark_white.svg create mode 100644 docs/plugins/algod.md create mode 100644 docs/plugins/file_writer.md create mode 100644 docs/plugins/filter_processor.md create mode 100644 docs/plugins/home.md create mode 100644 docs/plugins/noop_exporter.md create mode 100644 docs/plugins/noop_processor.md create mode 100644 docs/plugins/postgresql.md create mode 100644 docs/tutorials/FilterDeepDive.md create mode 100644 docs/tutorials/IndexerMigration.md create mode 100644 docs/tutorials/WritingBlocksToFile.md diff --git a/README.md b/README.md new file mode 100644 index 00000000..54449be0 --- /dev/null +++ b/README.md @@ -0,0 +1,60 @@ +
+ + Algorand + + + + +[![CircleCI](https://img.shields.io/circleci/build/github/algorand/indexer/develop?label=develop)](https://circleci.com/gh/algorand/indexer/tree/develop) +[![CircleCI](https://img.shields.io/circleci/build/github/algorand/indexer/master?label=master)](https://circleci.com/gh/algorand/indexer/tree/master) +![Github](https://img.shields.io/github/license/algorand/indexer) +[![Contribute](https://img.shields.io/badge/contributor-guide-blue?logo=github)](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md) +
+ +# Algorand Conduit + +Conduit is a framework for ingesting blocks from the Algorand blockchain into external applications. It is designed as modular plugin system that allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network. + +# Getting Started + +See the [Getting Started](./docs/GettingStarted.md) page. + +## Building from source + +Development is done using the [Go Programming Language](https://golang.org/), the version is specified in the project's [go.mod](go.mod) file. This document assumes that you have a functioning +environment setup. If you need assistance setting up an environment please visit +the [official Go documentation website](https://golang.org/doc/). + +Run `make` to build Conduit, the binary is located at `cmd/conduit/conduit`. + +# Configuration + +See the [Configuration](./docs/Configuration.md) page. + +# Develoment + +See the [Development](./docs/Development.md) page for building a plugin. + +# Plugin System +A Conduit pipeline is composed of 3 components, [Importers](./conduit/plugins/importers/), [Processors](./conduit/plugins/processors/), and [Exporters](./conduit/plugins/exporters/). +Every pipeline must define exactly 1 Importer, exactly 1 Exporter, and can optionally define a series of 0 or more Processors. + +# Contributing + +Contributions are welcome! Please refer to our [CONTRIBUTING](https://github.com/algorand/go-algorand/blob/master/CONTRIBUTING.md) document for general contribution guidelines, and individual plugin documentation for contributing to new and existing Conduit plugins. + +# Common Setups + +The most common usage of Conduit is to get validated blocks from a local `algod` Algorand node, and adding them to a database (such as [PostgreSQL](https://www.postgresql.org/)). +Users can separately (outside of Conduit) serve that data via an API to make available a variety of prepared queries--this is what the Algorand Indexer does. + +Conduit works by fetching blocks one at a time via the configured Importer, sending the block data through the configured Processors, and terminating block handling via an Exporter (traditionally a database). +For a step-by-step walkthrough of a basic Conduit setup, see [Writing Blocks To Files](./docs/tutorials/WritingBlocksToFile.md). + +# Migrating from Indexer + +Indexer was built in a way that strongly coupled it to Postgresql, and the defined REST API. We've built Conduit in a way which is backwards compatible with the preexisting Indexer application. Running the `algorand-indexer` binary will use Conduit to construct a pipeline that replicates the Indexer functionality. + +Going forward we will continue to maintain the Indexer application, however our main focus will be enabling and optimizing a multitude of use cases through the Conduit pipeline design rather the singular Indexer pipeline. + +For a more detailed look at the differences between Conduit and Indexer, see [our migration guide](./docs/tutorials/IndexerMigration.md). diff --git a/docs/Configuration.md b/docs/Configuration.md new file mode 100644 index 00000000..fea7abc0 --- /dev/null +++ b/docs/Configuration.md @@ -0,0 +1,54 @@ +# Configuration + +Configuration is stored in a file in the data directory named `conduit.yml`. +Use `./conduit -h` for command options. + +## conduit.yml + +There are several top level configurations for configuring behavior of the conduit process. Most detailed configuration is made on a per-plugin basis. These are split between `Importer`, `Processor` and `Exporter` plugins. + +Here is an example configuration which shows the general format: +```yaml +# optional: hide the startup banner. +hide-banner: true|false + +# optional: level to use for logging. +log-level: "INFO, WARN, ERROR" + +# optional: path to log file +log-file: "" + +# optional: if present perform runtime profiling and put results in this file. +cpu-profile: "path to cpu profile file." + +# optional: maintain a pidfile for the life of the conduit process. +pid-filepath: "path to pid file." + +# optional: setting to turn on Prometheus metrics server +metrics: + mode: "ON, OFF" + addr: ":" + prefix: "promtheus_metric_prefix" + +# Define one importer. +importer: + name: + config: + +# Define one or more processors. +processors: + - name: + config: + - name: + config: + +# Define one exporter. +exporter: + name: + config: +``` + +## Plugin configuration + +See [plugin list](plugins/home.md) for details. +Each plugin is identified by a `name`, and provided the `config` during initialization. diff --git a/docs/Development.md b/docs/Development.md new file mode 100644 index 00000000..322fae38 --- /dev/null +++ b/docs/Development.md @@ -0,0 +1,80 @@ +# Creating A Plugin + +There are three different interfaces to implement, depending on what sort of functionality you are adding: +* Importer: for sourcing data into the system. +* Processor: for manipulating data as it goes through the system. +* Exporter: for sending processed data somewhere. + +All plugins should be implemented in the respective `importers`, `processors`, or `exporters` package. + +# Registering a plugin + +## Register the Constructor + +The constructor is registered to the system by name in the init this is how the configuration is able to dynamically create pipelines: +``` +func init() { + exporters.RegisterExporter(noopExporterMetadata.ExpName, exporters.ExporterConstructorFunc(func() exporters.Exporter { + return &noopExporter{} + })) +} +``` + +There are similar interfaces for each plugin type. + +## Load the Plugin + +Each plugin package contains an `all.go` file. Add your plugin to the import statement, this causes the init function to be called and ensures the plugin is registered. + +# Implement the interface + +Generally speaking, you can follow the code in one of the existing plugins. + +# Lifecycle + +## Init + +Each plugin will have it's `Init` function called once as the pipline is constructed. + +The context provided to this function should be saved, and used to terminate any long-running operations if necessary. + +## Per-round function + +Each plugin type has a function which is called once per round: +* Importer: `GetBlock` called when a particular round is required. Generally this will be increasing over time. +* Processor: `Process` called to process a round. +* Exporter: `Receive` for consuming a round. + +## Close + +Called during a graceful shutdown. We make every effort to call this function, but it is not guaranteed. + +## Hooks + +There are special lifecycle hooks that can be registered on any plugin by implementing additional interfaces. + +### Completed + +When all processing has completed for a round, the `OnComplete` function is called on any plugin that implements it. + +```go +// Completed is called by the conduit pipeline after every exporter has +// finished. It can be used for things like finalizing state. +type Completed interface { + // OnComplete will be called by the Conduit framework when the pipeline + // finishes processing a round. + OnComplete(input data.BlockData) error +} +``` + +### PluginMetrics + +After the pipeline has been initialized, and before it has been started, plugins may provide prometheus metric handlers. The subsystem is a configurable value that should be passed into the Prometheus metric constructors. +The ProvideMetrics function will only be called once. + +```go +// PluginMetrics is for defining plugin specific metrics +type PluginMetrics interface { + ProvideMetrics(subsystem string) []prometheus.Collector +} +``` diff --git a/docs/GettingStarted.md b/docs/GettingStarted.md new file mode 100644 index 00000000..326bee2b --- /dev/null +++ b/docs/GettingStarted.md @@ -0,0 +1,42 @@ +# Getting Started + + +## Installation + +### Install from Source + +1. Checkout the repo, or download the source, `git clone https://github.com/algorand/indexer.git && cd indexer` +2. Run `make conduit`. +3. The binary is created at `cmd/conduit/conduit`. + +### Go Install + +Go installs of the indexer repo do not currently work because of its use of the `replace` directive to support the +go-algorand submodule. + +**In Progress** +There is ongoing work to remove go-algorand entirely as a dependency of indexer/conduit. Once +that work is complete users should be able to use `go install` to install binaries for this project. + +## Getting Started + +Conduit requires a configuration file to set up and run a data pipeline. To generate an initial skeleton for a conduit +config file, you can run `./conduit init`. This will set up a sample data directory with a config located at +`data/conduit.yml`. + +You will need to manually edit the data in the config file, filling in a valid configuration for conduit to run. +You can find a valid config file in [Configuration.md](Configuration.md) or via the `conduit init` command. + +Once you have a valid config file in a directory, `config_directory`, launch conduit with `./conduit -d config_directory`. + + +# Configuration and Plugins +Conduit comes with an initial set of plugins available for use in pipelines. For more information on the possible +plugins and how to include these plugins in your pipeline's configuration file see [Configuration.md](Configuration.md). + +# Tutorials +For more detailed guides, walkthroughs, and step by step writeups, take a look at some of our +[Conduit tutorials](./tutorials). Here are a few of the highlights: +* [How to write block data to the filesystem](./tutorials/WritingBlocksToFile.md) +* [A deep dive into the filter processor](./tutorials/FilterDeepDive.md) +* [The differences and migration paths between Indexer & Conduit](./tutorials/IndexerMigration.md) diff --git a/docs/assets/algorand_logo_mark_black.svg b/docs/assets/algorand_logo_mark_black.svg new file mode 100644 index 00000000..382aae99 --- /dev/null +++ b/docs/assets/algorand_logo_mark_black.svg @@ -0,0 +1 @@ +ALGO_Logos_190320 \ No newline at end of file diff --git a/docs/assets/algorand_logo_mark_white.svg b/docs/assets/algorand_logo_mark_white.svg new file mode 100644 index 00000000..8c7a3667 --- /dev/null +++ b/docs/assets/algorand_logo_mark_white.svg @@ -0,0 +1 @@ +ALGO_Logos_190320 \ No newline at end of file diff --git a/docs/plugins/algod.md b/docs/plugins/algod.md new file mode 100644 index 00000000..bcc238c3 --- /dev/null +++ b/docs/plugins/algod.md @@ -0,0 +1,16 @@ +# Algod Importer + +Fetch blocks one by one from the [algod REST API](https://developer.algorand.org/docs/rest-apis/algod/v2/). The node must be configured as an archival node in order to +provide old blocks. + +Block data from the Algod REST API contains the block header, transactions, and a vote certificate. + +# Config +```yaml +importer: + name: algod + config: + - netaddr: "algod URL" + token: "algod REST API token" +``` + diff --git a/docs/plugins/file_writer.md b/docs/plugins/file_writer.md new file mode 100644 index 00000000..3c93d874 --- /dev/null +++ b/docs/plugins/file_writer.md @@ -0,0 +1,20 @@ +# Filewriter Exporter + +Write the block data to a file. + +Data is written to one file per block in JSON format. + +By default data is written to the filewriter plugin directory inside the indexer data directory. + +# Config +```yaml +exporter: + - name: file_writer + config: + - block-dir: "override default block data location." + # override the filename pattern. + filename-pattern: "%[1]d_block.json" + # exclude the vote certificate from the file. + drop-certificate: false +``` + diff --git a/docs/plugins/filter_processor.md b/docs/plugins/filter_processor.md new file mode 100644 index 00000000..ba00c0e0 --- /dev/null +++ b/docs/plugins/filter_processor.md @@ -0,0 +1,66 @@ +# Filter Processor + +This is used to filter transactions to include only the ones that you want. This may be useful for some deployments +which only require specific applications or accounts. + +## any / all +One or more top-level operations should be provided. +* any: transactions are included if they match `any` of the nested sub expressions. +* all: transactions are included if they match `all` of the nested sub expressions. + +If `any` and `all` are both provided, the transaction must pass both checks. + +## Sub expressions + +Parts of an expression: +* `tag`: the transaction field being considering. +* `expression-type`: The type of expression. +* `expression`: Input to the expression + +### tag +The full path to a given field. Uses the messagepack encoded names of a canonical transaction. For example: +* `txn.snd` is the sender. +* `txn.amt` is the amount. + +For information about the structure of transactions, refer to the [Transaction Structure](https://developer.algorand.org/docs/get-details/transactions/) documentation. For detail about individual fields, refer to the [Transaction Reference](https://developer.algorand.org/docs/get-details/transactions/transactions/) documentation. + +**Note**: The "Apply Data" information is also available for filtering. These fields are not well documented. Advanced users can inspect raw transactions returned by algod to see what fields are available. + +### expression-type + +What type of expression to use for filtering the tag. +* `exact`: exact match for string values. +* `regex`: applies regex rules to the matching. +* `less-than` applies numerical less than expression. +* `less-than-equal` applies numerical less than or equal expression. +* `greater-than` applies numerical greater than expression. +* `greater-than-equal` applies numerical greater than or equal expression. +* `equal` applies numerical equal expression. +* `not-equal` applies numerical not equal expression. + +### expression + +The input to the expression. A number or string depending on the expression type. + +# Config +```yaml +processors: + - name: filter_processor + config: + - filters: + - any + - tag: + expression-type: + expression: + - tag: + expression-type: + expression: + - all + - tag: + expression-type: + expression: + - tag: + expression-type: + expression: +``` + diff --git a/docs/plugins/home.md b/docs/plugins/home.md new file mode 100644 index 00000000..d9722393 --- /dev/null +++ b/docs/plugins/home.md @@ -0,0 +1,18 @@ +# Plugin Configuration + +Each plugin is identified by a `name`, and provided the `config` during initialization. + +## Importers + +* [algod](algod.md) +* [file_reader](file_reader.md) + +## Processors +* [filter_processor](filter_processor.md) +* [noop_processor](noop_processor.md) + +## Exporters +* [file_writer](file_writer.md) +* [postgresql](postgresql.md) +* [noop_exporter](noop_exporter.md) + diff --git a/docs/plugins/noop_exporter.md b/docs/plugins/noop_exporter.md new file mode 100644 index 00000000..f9b074c2 --- /dev/null +++ b/docs/plugins/noop_exporter.md @@ -0,0 +1,11 @@ +# Noop Exporter + +For testing purposes, the noop processor discards any data it receives. + +# Config +```yaml +processors: + - name: noop + config: +``` + diff --git a/docs/plugins/noop_processor.md b/docs/plugins/noop_processor.md new file mode 100644 index 00000000..ed696d0a --- /dev/null +++ b/docs/plugins/noop_processor.md @@ -0,0 +1,11 @@ +# Noop Processor + +For testing purposes, the noop processor simply passes the input to the output. + +# Config +```yaml +processors: + - name: noop + config: +``` + diff --git a/docs/plugins/postgresql.md b/docs/plugins/postgresql.md new file mode 100644 index 00000000..101515e1 --- /dev/null +++ b/docs/plugins/postgresql.md @@ -0,0 +1,23 @@ +# PostgreSQL Exporter + +Write block data to a postgres database with the Indexer REST API schema. + +## Connection string + +We are using the [pgx](https://github.com/jackc/pgconn) database driver, which dictates the connection string format. + +For most deployments, you can use the following format: +`host={url} port={port} user={user} password={password} dbname={db_name} sslmode={enable|disable}` + +For additional details, refer to the [parsing documentation here](https://pkg.go.dev/github.com/jackc/pgx/v4/pgxpool@v4.11.0#ParseConfig). + +# Config +```yaml +exporter: + - name: postgresql + config: + - connection-string: "postgres connection string" + max-conn: "connection pool setting, maximum active queries" + test: "a boolean, when true a mock database is used" +``` + diff --git a/docs/tutorials/FilterDeepDive.md b/docs/tutorials/FilterDeepDive.md new file mode 100644 index 00000000..ec3249e8 --- /dev/null +++ b/docs/tutorials/FilterDeepDive.md @@ -0,0 +1,93 @@ +## Filtering Transactions in Conduit + +### Intro +Conduit provides individual documentation for each plugin in [docs/conduit/plugins](./plugins). However, the filter +processor in particular has a complex set of features which empower users to search for data within Transactions. +This document will go through some of those features in detail, their use cases, and show some examples. + +### Logical Operators + +The filter processor provides (at this time) two top level logical operators, `any` and `all`. These are used to match +"sub-expressions" specified in the filters. For any set of expressions, e1, e2, e3, ... `any(e1,e2,e3,...eN)` will return +`true` if there exists `eX` for `1 >= X <= N` where `eX` evaulates to `true`, +and `all(e1,e2,e3,...eN)` will return true if for every `X` from `1..N`, `eX` evaluates to `true`. + +In simpler terms, `any` matches the transaction if at least one sub-expression matches, and `all` matches only if every +sub-expression matches. + +### Sub-Expressions +So, what defines a sub-expression? + +The sub-expression consists of 3 components. +#### `tag` +The tag identifies the field to attempt to match. The fields derive their tags according to the +[official reference docs](https://developer.algorand.org/docs/get-details/transactions/transactions/). +You can also attempt to match against the `ApplyData`, although this is not officially supported or documented. +Users interested in this will need to consult the official +[go-algorand](https://github.com/algorand/go-algorand/blob/master/data/transactions/transaction.go#L104) repository to match tags. + + +For now, we programmatically generate these fields into a map located in the +[filter package](https://github.com/algorand/indexer/blob/develop/conduit/plugins/processors/filterprocessor/fields/generated_signed_txn_map.go), +though this is not guaranteed to be the case. + + +Example: +```yaml +- tag: 'txn.snd' # Matches the Transaction Sender +- tag: 'txn.apar.c' # Matches the Clawback address of the asset params +- tag: 'txn.amt' # Matches the amount of a payment transaction +``` + +#### `expression-type` +The expression type is a selection of one of the available methods for evaluating the expression. The current list of +types is +* `exact`: exact match for string values. +* `regex`: applies regex rules to the matching. +* `less-than` applies numerical less than expression. +* `less-than-equal` applies numerical less than or equal expression. +* `greater-than` applies numerical greater than expression. +* `greater-than-equal` applies numerical greater than or equal expression. +* `equal` applies numerical equal expression. +* `not-equal` applies numerical not equal expression. + +You must use the proper expression type for the field your tag identifies based on the type of data stored in that field. +For example, do not use a numerical expression type on a string field such as address. + + +#### `expression` +The expression is the data against which each field will be compared. This must be compatible with the data type of +the expected field. For string fields you can also use the `regex` expression type to interpret the input of the +expression as a regex. + +### Examples + +Find transactions w/ fee greater than 1000 microalgos +```yaml +- filters: + - any: + - tag: "txn.fee" + expression-type: "greater-than" + expression: "1000" +``` + +Find state proof transactions +```yaml +- filters: + - any: + - tag: "txn.type" + expression-type: "exact" + expression: "stpf" +``` + +Find transactions calling app, "MYAPPID" +```yaml +- filters: + - all: + - tag: "txn.type" + expression-type: "exact" + expression: "appl" + - tag: "txn.apid" + expression-type: "exact" + expression: "MYAPPID" +``` \ No newline at end of file diff --git a/docs/tutorials/IndexerMigration.md b/docs/tutorials/IndexerMigration.md new file mode 100644 index 00000000..b5549ee5 --- /dev/null +++ b/docs/tutorials/IndexerMigration.md @@ -0,0 +1,128 @@ +## Migrating from Indexer to Conduit + +The [Algorand Indexer](https://github.com/algorand/indexer) provides both a block processing pipeline to ingest block +data from an Algorand node into a Postgresql database, and a rest API which serves that data. + +The [Conduit](https://github.com/algorand/indexer/blob/develop/docs/Conduit.md) project provides a modular pipeline +system allowing users to construct block processing pipelines for a variety of use cases as opposed to the single, +bespoke Indexer construction. + +### Migration +Talking about a migration from Indexer to Conduit is in some ways difficult because they only have partial overlap in +their applications. For example, Conduit does _not_ currently include a rest API either for checking pipeline health +or for serving data from the pipeline. + +Here is the Indexer architecture diagram at a high level. The raw block data is enriched by the account data retrieved +from the local ledger, and everything is written to Postgresql which can then be queried via the API. +```mermaid +graph LR; + algod["Algod"] + index["Indexer"] + ledger["Local Ledger"] + psql["Postgresql"] + restapi["Rest API"] + + algod-->index; + subgraph "Data Pipeline" + index-->ledger; + ledger-->index; + index-->psql; + end + psql-->restapi; + restapi-->psql; +``` + +However, Conduit was built to generalize and modularize a lot of the tasks which Indexer does when ingesting block data +into its database. For that reason you can swap out the core data pipeline in Indexer with an equivalent Conduit +pipeline--and that's just what we've done! + +```mermaid +graph LR; + algod["Algod"] + pe["postgresql Exporter"] + algodimp["algod Importer"] + restapi["Rest API"] + + algod-->algodimp + subgraph "Conduit Pipeline" + algodimp-->pe; + end + pe-->restapi; +``` + +Using the most recent release of Indexer will create a Conduit pipeline config and launch the pipeline to ingest the +data used to serve the rest API. Take a look +[here](https://github.com/algorand/indexer/blob/develop/cmd/algorand-indexer/daemon.go#L359) if you're interested in +seeing the exact config used in Indexer. + +### Adopting Conduit features in your Indexer pipeline + +Since Indexer is now using Conduit for its data pipeline, it will benefit from the continued development of the specific +plugins being used. However, we don't plan on exposing the full set of Conduit features through Indexer. In order to +start using new features, or new plugins to customize, filter, or further enrich the block data, or even change the +type of DB used in the backed, you will need to separate Indexer's data pipeline into your own custom Conduit pipeline. + +A common deployment of the Indexer might look something like this. +```mermaid +graph LR; + algod["Alogd"] + lb["Load Balancer"] + index["Indexer"] + ro1["ReadOnly Indexer"] + ro2["ReadOnly Indexer"] + ro3["ReadOnly Indexer"] + psql["Postgresql"] + + algod-->index; + index-->psql; + lb---index; + lb---ro1; + ro1---psql; + lb---ro2; + ro2---psql; + lb---ro3; + ro3---psql; + +``` +Because the database connection can only tolerate a single writer without having race conditions and/or deadlocks, +Indexer offers a read-only mode which does not run the data pipeline and has no write access to the database. It's +common to use the read only mode to scale out the rest API--running multiple web servers behind a load balancer as is +shown in the diagram. + + +Separating the data pipeline from the Indexer when using this setup is simple--take Indexer's Conduit config +[shown earlier there](https://github.com/algorand/indexer/blob/develop/cmd/algorand-indexer/daemon.go#L359), write it +to a file, and launch the Conduit binary. Take a look at the [getting started guide](../GettingStarted.md) for more +information on installing and running Conduit. + +We still plan on supporting the Indexer API alongside Conduit--that means that any changes made to the Postgresql plugin +will either be backwards compatible with the Indexer API, ando/or have corresponding fixes in Indexer. + +Here is our architecture diagram with Conduit as our data pipeline. +```mermaid +graph LR; + algod["Alogd"] + lb["Load Balancer"] + ro1["ReadOnly Indexer"] + ro2["ReadOnly Indexer"] + ro3["ReadOnly Indexer"] + psql["Postgresql"] + pe["postgresql Exporter"] + algodimp["algod Importer"] + + pe-->psql; + algod-->algodimp; + subgraph "Conduit Pipeline" + algodimp-->pe; + end + lb---ro1; + ro1---psql; + lb---ro2; + ro2---psql; + lb---ro3; + ro3---psql; + +``` + +With this architecture you're free to do things like use filter processors to limit the size of your database--though +doing this will affect how some Indexer APIs function. diff --git a/docs/tutorials/WritingBlocksToFile.md b/docs/tutorials/WritingBlocksToFile.md new file mode 100644 index 00000000..830ba90d --- /dev/null +++ b/docs/tutorials/WritingBlocksToFile.md @@ -0,0 +1,214 @@ +## Writing Blocks to Files Using Conduit + +This guide will take you step by step through a specific application of some +Conduit plugins. We will detail each of the steps necessary to solve our problem, and point out documentation and tools +useful for both building and debugging conduit pipelines. + +## Our Problem Statement + +For this example, our task is to ingest blocks from an Algorand network (we'll use Testnet for this), +and write blocks to files. + +Additionally, we don't care to see data about transactions which aren't sent to us, so we'll filter out all transactions +which are not sending either algos or some other asset to our account. + +## Getting Started + +First we need to make sure we have Conduit installed. Head over to [GettingStarted.md](./GettingStarted.md) +in order to get more details on how to install Conduit. We'll just build it from source: +```bash +git clone https://github.com/algorand/indexer.git $HOME/indexer +cd indexer +make conduit +alias conduit=$HOME/indexer/cmd/conduit/conduit +``` + +Now that we have Conduit installed we can take a look at the options for supported plugins with +``` +conduit list +``` +The current list ends up being +``` +importers: + algod - Importer for fetching blocks from an algod REST API. + file_reader - Importer for fetching blocks from files in a directory created by the 'file_writer' plugin. + +processors: + filter_processor - FilterProcessor Filter Processor + noop - noop processor + +exporters: + file_writer - Exporter for writing data to a file. + noop - noop exporter + postgresql - Exporter for writing data to a postgresql instance. +``` + +For our conduit pipeline we're going to use the `algod` importer, a `filter_processor`, and of course the +`file_writer` exporter. +To get more details about each of these individually, and the configuration variables required and available for them, +we can again use the list command. For example, +``` +conduit list exporters file_writer +``` +Returns the following: +``` +name: "file_writer" +config: + # BlocksDir is an optional path to a directory where block data will be + # stored. The directory is created if it doesn't exist. If not present the + # plugin data directory is used. + block-dir: "/path/to/block/files" + # FilenamePattern is the format used to write block files. It uses go + # string formatting and should accept one number for the round. + # If the file has a '.gz' extension, blocks will be gzipped. + # Default: "%[1]d_block.json" + filename-pattern: "%[1]d_block.json" + # DropCertificate is used to remove the vote certificate from the block data before writing files. + drop-certificate: true +``` + +## Setting Up Our Pipeline + +Let's start assembling a configuration file which describes our conduit pipeline. For that we'll run +``` +conduit init +``` +This will create a configuration directory if we don't provide one to it, and write a skeleton config file +there which we will use as the starting point for our pipeline. Here is the config file which the `init` subcommand has +written for us: +```yaml +# Generated conduit configuration file. +log-level: INFO +# When enabled prometheus metrics are available on '/metrics' +metrics: + mode: OFF + addr: ":9999" + prefix: "conduit" +# The importer is typically an algod archival instance. +importer: + name: algod + config: + netaddr: "your algod address here" + token: "your algod token here" +# One or more processors may be defined to manipulate what data +# reaches the exporter. +processors: +# An exporter is defined to do something with the data. +# Here the filewriter is defined which writes the raw block +# data to files. +exporter: + name: file_writer + config: + # optionally provide a different directory to store blocks. + #block-dir: "path where data should be written" +``` +## Setting up our Importer +We can see the specific set of plugins defined for our pipeline--an `algod` importer and `file_writer` exporter. +Now we will fill in the proper fields for these. I've got a local instance of algod running at `127.0.0.1:8080`, +with an API token of `e36c01fc77e490f23e61899c0c22c6390d0fff1443af2c95d056dc5ce4e61302`. If you need help setting up +algod, you can take a look at the [go-algorand docs](https://github.com/algorand/go-algorand#getting-started) or our +[developer portal](https://developer.algorand.org/). + +Here is the completed importer config: +```yaml +importer: + name: algod + config: + netaddr: "http://127.0.0.1:8080" + token: "e36c01fc77e490f23e61899c0c22c6390d0fff1443af2c95d056dc5ce4e61302" +``` + +## Setting up our Processor + +The processor section in our generated config is empty, so we'll need to fill in that section with the proper data +for the filter processor. We can paste in the output of our list command for that. +```bash +> conduit list processors filter_processor +name: filter_processor +config: + # Filters is a list of boolean expressions that can search the payset transactions. + # See README for more info. + filters: + - any: + - tag: txn.rcv + expression-type: exact + expression: "ADDRESS" +``` +The filter processor uses the tag of a property and allows us to specify an exact value to match or a regex. +For our use case we'll grab the address of a wallet I've created on testnet, `NVCAFYNKJL2NGAIZHWLIKI6HGMTLYXL7BXPBO7NXX4A7GMMWKNFKFKDKP4`. + +That should give us exactly what we want, a filter that only allows transaction through for which the receiver is my +account. However, there is a lot more you can do with the filter processor. To learn more about the possible uses, take +a look at the individual plugin documentation [here](../plugins/filter_processor.md). + +## Setting up our Exporter + +For the exporter the setup is simple. No configuration is necessary because it defaults to a directory inside the +conduit data directory. In this example I've chosen to override the default and set the directory output of my blocks +to a temporary directory, `block-dir: "/tmp/conduit-blocks/"`. + +## Running the pipeline +Now we should have a fully valid config, so let's try it out. Here's the full config I ended up with +(with comments removed) +```yaml +log-level: "INFO" +importer: + name: algod + config: + netaddr: "http://127.0.0.1:8080" + token: "e36c01fc77e490f23e61899c0c22c6390d0fff1443af2c95d056dc5ce4e61302" +processors: + name: filter_processor + config: + filters: + - any: + - tag: txn.rcv + expression-type: exact + expression: "NVCAFYNKJL2NGAIZHWLIKI6HGMTLYXL7BXPBO7NXX4A7GMMWKNFKFKDKP4" +exporter: + name: file_writer + config: + block-dir: "/tmp/conduit-blocks/" +``` + +There are two things to address before our example becomes useful. +1. We need to get a payment transaction to our account. + +For me, it's easiest to use the testnet dispenser, so I've done that. You can look at my transaction for yourself, +block #26141781 on testnet. +2. Skip rounds + +To avoid having to run algod all the way from genesis to the most recent round, you can use catchpoint catchup to +fast-forward to a more recent block. Similarly, we want to be able to run Conduit pipelines from whichever round is +most relevant and useful for us. +To run conduit from a round other than 0, use the `--next-round-override` or `-r` flag. + +Now let's run the command! +```bash +> conduit -d /tmp/conduit-tmp/ --next-round-override 26141781 +``` + +Once we've processed round 26141781, we should see our transaction show up! + +```bash +> cat /tmp/conduit-blocks/* | grep payset -A 14 + "payset": [ + { + "hgi": true, + "sig": "DI4oMkUT01LAs5XT55qcZ3VCY8Wn2WrAZpntzFu2bTz9xnzaObmp5TOTUF5/PVVFCn14hXKyF3/LTZTUJylaDw==", + "txn": { + "amt": 10000000, + "fee": 1000, + "fv": 26141780, + "lv": 26142780, + "rcv": "NVCAFYNKJL2NGAIZHWLIKI6HGMTLYXL7BXPBO7NXX4A7GMMWKNFKFKDKP4", + "snd": "GD64YIY3TWGDMCNPP553DZPPR6LDUSFQOIJVFDPPXWEG3FVOJCCDBBHU5A", + "type": "pay" + } + } + ] +``` + +There are many other existing plugins and use cases for Conduit! Take a look through the documentation and don't +hesitate to open an issue if you have a question. If you want to get a deep dive into the different types of filters +you can construct using the filter processor, take a look at our [filter guide](./FilterDeepDive.md).