From dce2b34c5ce1fab553312b32f19a3a31086d968b Mon Sep 17 00:00:00 2001 From: Will Winder Date: Mon, 18 Jul 2022 15:34:51 -0400 Subject: [PATCH 1/7] Documentation for data directory. --- README.md | 7 +++++-- docs/DataDirectory.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 2 deletions(-) create mode 100644 docs/DataDirectory.md diff --git a/README.md b/README.md index 824b8d8c1..96b750cc2 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ For a simple deployment the following configuration works well: * Network: Indexer, Algod and PostgreSQL should all be on the same network. * Indexer: 2 CPU and 8 GB of ram. * Database: When hosted on AWS a `db.r5.xlarge` instance works well. +* Storage: 20 GiB A database with replication can be used to scale read volume. Configure multiple Indexer daemons with a single writer and multiple readers. @@ -212,17 +213,19 @@ The command line arguments always take priority over the config file and environ The Indexer data directory is the location where the Indexer can store and/or load data needed for runtime operation and configuration. -**It is a required argument for Indexer daemon operation. Supply it to the Indexer via the `--data-dir` flag.** +**It is a required argument for Indexer daemon operation. Supply it to the Indexer via the `--data-dir`/`-i` flag.** **It is HIGHLY recommended placing the data directory in a separate, stateful directory for production usage of the Indexer.** +For more information on the data directory see [Indexer Data Directory](docs/DataDirectory.md). + ### Auto-Loading Configuration The Indexer will scan the data directory at startup and load certain configuration files if they are present. The files are as follows: - `indexer.yml` - Indexer Configuration File - `api_config.yml` - API Parameter Enable/Disable Configuration File -- + **NOTE:** It is not allowed to supply both the command line flag AND have an auto-loading configuration file in the data directory. Doing so will result in an error. To see an example of how to use the data directory to load a configuration file check out the [Disabling Parameters Guide](docs/DisablingParametersGuide.md). diff --git a/docs/DataDirectory.md b/docs/DataDirectory.md new file mode 100644 index 000000000..07ced4b4a --- /dev/null +++ b/docs/DataDirectory.md @@ -0,0 +1,36 @@ +# Indexer Data Directory + +The Indexer data directory is the location where the Indexer can store and/or load data needed for runtime operation and configuration. It is a required argument for Indexer daemon operation. Supply it to the Indexer via the `--data-dir` flag. + +# Storage Requirements + +As of mid-2022, approximately 15GiB for Mainnet. + +# Configuration Files + +The data directory is the first place to check for different configuration files, for example: +- `indexer.yml` - Indexer Configuration File +- `api_config.yml` - API Parameter Enable/Disable Configuration File + +# Account Cache + +Indexer writers keep an account cache in the data directory. This cache is used during block processing to compute things like the new account balances after processing transactions. Prior to this local cache, the database was queried on each round to fetch the initial account states. + +## Read-Only Mode + +The account cache is not required when in read-only mode. While the data directory is still required, it will only be used for configuration. + +# Initialization + +If a new data directory must be created, the following process should be used: +1. Review the Indexer log to find the most recent round that was processed. For example, `22212765` in the following line: + ``` + {"level":"info","msg":"round r=22212765 (49 txn) imported in 139.782694ms","time":"2022-07-18T19:23:13Z"} + ``` +2. Lookup the most recent catchpoint **without going over** from the list for your network: + [Mainnet]() + [Testnet]() + [Betanet]() +3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 6500000#1234567890ABCDEF01234567890ABCDEF0`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable. + +While Indexer starts, you can see progress information printed periodically in the log file. From e8cc035fe62340e20ddb239554dd6124e55d3a1e Mon Sep 17 00:00:00 2001 From: Will Winder Date: Tue, 19 Jul 2022 17:08:05 -0400 Subject: [PATCH 2/7] Update with feedback, include new links --- README.md | 1 + docs/DataDirectory.md | 27 ++++++++++++++++++++------- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 96b750cc2..dd09bd328 100644 --- a/README.md +++ b/README.md @@ -200,6 +200,7 @@ Settings can be provided from the command line, a configuration file, or an envi | max-applications-limit | | max-applications-limit | INDEXER_MAX_APPLICATIONS_LIMIT | | default-applications-limit | | default-applications-limit | INDEXER_DEFAULT_APPLICATIONS_LIMIT | | enable-all-parameters | | enable-all-parameters | INDEXER_ENABLE_ALL_PARAMETERS | +| catchpoint | | catchpoint | INDEXER_CATCHPOINT | ## Command line diff --git a/docs/DataDirectory.md b/docs/DataDirectory.md index 07ced4b4a..c747d506f 100644 --- a/docs/DataDirectory.md +++ b/docs/DataDirectory.md @@ -4,7 +4,7 @@ The Indexer data directory is the location where the Indexer can store and/or lo # Storage Requirements -As of mid-2022, approximately 15GiB for Mainnet. +As of mid-2022, approximately 20 GiB for Mainnet. # Configuration Files @@ -14,7 +14,16 @@ The data directory is the first place to check for different configuration files # Account Cache -Indexer writers keep an account cache in the data directory. This cache is used during block processing to compute things like the new account balances after processing transactions. Prior to this local cache, the database was queried on each round to fetch the initial account states. +Indexer writers maintain an account cache in the data directory. This cache is used during block processing to compute things like the new account balances after processing transactions. Prior to this local cache, the database was queried on each round to fetch the initial account states. + +The following files are created: +- ledger.block.sqlite +- ledger.block.sqlite-shm +- ledger.block.sqlite-wal +- ledger.tracker.sqlite +- ledger.tracker.sqlite-shm +- ledger.tracker.sqlite-wal + ## Read-Only Mode @@ -27,10 +36,14 @@ If a new data directory must be created, the following process should be used: ``` {"level":"info","msg":"round r=22212765 (49 txn) imported in 139.782694ms","time":"2022-07-18T19:23:13Z"} ``` -2. Lookup the most recent catchpoint **without going over** from the list for your network: - [Mainnet]() - [Testnet]() - [Betanet]() -3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 6500000#1234567890ABCDEF01234567890ABCDEF0`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable. +2. Lookup the most recent catchpoint **without going over** from the list for your network. For example, with `22212765` from step 1, on mainnet you would choose `22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`. A full list of available catchpoints for each network can be found at the following links: + [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt) + [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt) + [Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt) + [Alphanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/alphanet_catchpoints.txt) + [Devnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/devnet_catchpoints.txt) +3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable. While Indexer starts, you can see progress information printed periodically in the log file. + +Note: You are not required to unset the catchpoint label after initialization. During startup, if Indexer is ahead of the supplied catchpoint label, it is ignored. From 7afa3c76d7ebaf1c2371b26e2e767d668c0598c6 Mon Sep 17 00:00:00 2001 From: Will Winder Date: Thu, 21 Jul 2022 14:18:28 -0400 Subject: [PATCH 3/7] Update docs/DataDirectory.md Co-authored-by: algobarb <78746954+algobarb@users.noreply.github.com> --- docs/DataDirectory.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/DataDirectory.md b/docs/DataDirectory.md index c747d506f..9825cdcde 100644 --- a/docs/DataDirectory.md +++ b/docs/DataDirectory.md @@ -36,7 +36,9 @@ If a new data directory must be created, the following process should be used: ``` {"level":"info","msg":"round r=22212765 (49 txn) imported in 139.782694ms","time":"2022-07-18T19:23:13Z"} ``` -2. Lookup the most recent catchpoint **without going over** from the list for your network. For example, with `22212765` from step 1, on mainnet you would choose `22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`. A full list of available catchpoints for each network can be found at the following links: +2. Lookup the most recent catchpoint for your network **without going over the indexer's current round** from the following links. For example, with `22212765` from step 1, on mainnet you would choose `22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A` from the Mainnet consolidated catchpoint list. + +Full list of available catchpoints for each network can be found at the following links: [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt) [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt) [Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt) From 7fc3bd1e55b966d12d28953f7df28e5d68d0d950 Mon Sep 17 00:00:00 2001 From: Will Winder Date: Thu, 21 Jul 2022 14:19:28 -0400 Subject: [PATCH 4/7] Update docs/DataDirectory.md --- docs/DataDirectory.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/DataDirectory.md b/docs/DataDirectory.md index 9825cdcde..db836e9c5 100644 --- a/docs/DataDirectory.md +++ b/docs/DataDirectory.md @@ -37,7 +37,6 @@ If a new data directory must be created, the following process should be used: {"level":"info","msg":"round r=22212765 (49 txn) imported in 139.782694ms","time":"2022-07-18T19:23:13Z"} ``` 2. Lookup the most recent catchpoint for your network **without going over the indexer's current round** from the following links. For example, with `22212765` from step 1, on mainnet you would choose `22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A` from the Mainnet consolidated catchpoint list. - Full list of available catchpoints for each network can be found at the following links: [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt) [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt) From 32487d8856a0521e23020044c7d3155734d4fefa Mon Sep 17 00:00:00 2001 From: Will Winder Date: Thu, 21 Jul 2022 14:20:18 -0400 Subject: [PATCH 5/7] Update docs/DataDirectory.md --- docs/DataDirectory.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/DataDirectory.md b/docs/DataDirectory.md index db836e9c5..c6f05ca88 100644 --- a/docs/DataDirectory.md +++ b/docs/DataDirectory.md @@ -38,11 +38,11 @@ If a new data directory must be created, the following process should be used: ``` 2. Lookup the most recent catchpoint for your network **without going over the indexer's current round** from the following links. For example, with `22212765` from step 1, on mainnet you would choose `22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A` from the Mainnet consolidated catchpoint list. Full list of available catchpoints for each network can be found at the following links: - [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt) - [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt) - [Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt) - [Alphanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/alphanet_catchpoints.txt) - [Devnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/devnet_catchpoints.txt) + - [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt) + - [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt) + - [Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt) + - [Alphanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/alphanet_catchpoints.txt) + - [Devnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/devnet_catchpoints.txt) 3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable. While Indexer starts, you can see progress information printed periodically in the log file. From 631f8b2e1bd9eddcb7bef03c40f51d3b4e25e30d Mon Sep 17 00:00:00 2001 From: Will Winder Date: Fri, 22 Jul 2022 09:56:50 -0400 Subject: [PATCH 6/7] Update docs/DataDirectory.md --- docs/DataDirectory.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/DataDirectory.md b/docs/DataDirectory.md index c6f05ca88..daa6c7a00 100644 --- a/docs/DataDirectory.md +++ b/docs/DataDirectory.md @@ -41,7 +41,6 @@ Full list of available catchpoints for each network can be found at the followin - [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt) - [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt) - [Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt) - - [Alphanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/alphanet_catchpoints.txt) - [Devnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/devnet_catchpoints.txt) 3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable. From 2fc01498d06f77e4fe3f7746092429f7aced7566 Mon Sep 17 00:00:00 2001 From: Will Winder Date: Fri, 22 Jul 2022 10:26:27 -0400 Subject: [PATCH 7/7] Update docs/DataDirectory.md Co-authored-by: algobarb <78746954+algobarb@users.noreply.github.com> --- docs/DataDirectory.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/DataDirectory.md b/docs/DataDirectory.md index daa6c7a00..86802d91b 100644 --- a/docs/DataDirectory.md +++ b/docs/DataDirectory.md @@ -41,7 +41,6 @@ Full list of available catchpoints for each network can be found at the followin - [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt) - [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt) - [Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt) - - [Devnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/devnet_catchpoints.txt) 3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable. While Indexer starts, you can see progress information printed periodically in the log file.