Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for data directory. #1125

Merged
merged 7 commits into from
Jul 22, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ For a simple deployment the following configuration works well:
* Network: Indexer, Algod and PostgreSQL should all be on the same network.
* Indexer: 2 CPU and 8 GB of ram.
* Database: When hosted on AWS a `db.r5.xlarge` instance works well.
* Storage: 20 GiB

A database with replication can be used to scale read volume. Configure multiple Indexer daemons with a single writer and multiple readers.

Expand Down Expand Up @@ -199,6 +200,7 @@ Settings can be provided from the command line, a configuration file, or an envi
| max-applications-limit | | max-applications-limit | INDEXER_MAX_APPLICATIONS_LIMIT |
| default-applications-limit | | default-applications-limit | INDEXER_DEFAULT_APPLICATIONS_LIMIT |
| enable-all-parameters | | enable-all-parameters | INDEXER_ENABLE_ALL_PARAMETERS |
| catchpoint | | catchpoint | INDEXER_CATCHPOINT |

## Command line

Expand All @@ -212,17 +214,19 @@ The command line arguments always take priority over the config file and environ

The Indexer data directory is the location where the Indexer can store and/or load data needed for runtime operation and configuration.

**It is a required argument for Indexer daemon operation. Supply it to the Indexer via the `--data-dir` flag.**
**It is a required argument for Indexer daemon operation. Supply it to the Indexer via the `--data-dir`/`-i` flag.**

**It is HIGHLY recommended placing the data directory in a separate, stateful directory for production usage of the Indexer.**

For more information on the data directory see [Indexer Data Directory](docs/DataDirectory.md).

### Auto-Loading Configuration

The Indexer will scan the data directory at startup and load certain configuration files if they are present. The files are as follows:

- `indexer.yml` - Indexer Configuration File
- `api_config.yml` - API Parameter Enable/Disable Configuration File
-

**NOTE:** It is not allowed to supply both the command line flag AND have an auto-loading configuration file in the data directory. Doing so will result in an error.

To see an example of how to use the data directory to load a configuration file check out the [Disabling Parameters Guide](docs/DisablingParametersGuide.md).
Expand Down
51 changes: 51 additions & 0 deletions docs/DataDirectory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Indexer Data Directory

The Indexer data directory is the location where the Indexer can store and/or load data needed for runtime operation and configuration. It is a required argument for Indexer daemon operation. Supply it to the Indexer via the `--data-dir` flag.

# Storage Requirements

As of mid-2022, approximately 20 GiB for Mainnet.

# Configuration Files

The data directory is the first place to check for different configuration files, for example:
- `indexer.yml` - Indexer Configuration File
- `api_config.yml` - API Parameter Enable/Disable Configuration File

# Account Cache
AlgoStephenAkiki marked this conversation as resolved.
Show resolved Hide resolved

Indexer writers maintain an account cache in the data directory. This cache is used during block processing to compute things like the new account balances after processing transactions. Prior to this local cache, the database was queried on each round to fetch the initial account states.

The following files are created:
- ledger.block.sqlite
- ledger.block.sqlite-shm
- ledger.block.sqlite-wal
- ledger.tracker.sqlite
- ledger.tracker.sqlite-shm
- ledger.tracker.sqlite-wal


## Read-Only Mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an indexer is started in writer mode and then is re-started in reader mode, will these files be there or will they be deleted by the indexer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They wouldn't be deleted, but wouldn't be updated either.


The account cache is not required when in read-only mode. While the data directory is still required, it will only be used for configuration.

# Initialization

If a new data directory must be created, the following process should be used:
1. Review the Indexer log to find the most recent round that was processed. For example, `22212765` in the following line:
```
{"level":"info","msg":"round r=22212765 (49 txn) imported in 139.782694ms","time":"2022-07-18T19:23:13Z"}
```
2. Lookup the most recent catchpoint for your network **without going over the indexer's current round** from the following links. For example, with `22212765` from step 1, on mainnet you would choose `22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A` from the Mainnet consolidated catchpoint list.

Full list of available catchpoints for each network can be found at the following links:
winder marked this conversation as resolved.
Show resolved Hide resolved
[Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt)
[Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt)
[Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt)
[Alphanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/alphanet_catchpoints.txt)
[Devnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/devnet_catchpoints.txt)
winder marked this conversation as resolved.
Show resolved Hide resolved
3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable.

While Indexer starts, you can see progress information printed periodically in the log file.

Note: You are not required to unset the catchpoint label after initialization. During startup, if Indexer is ahead of the supplied catchpoint label, it is ignored.