Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conduit: Initial conduit pipeline tool. #1326

Merged
merged 68 commits into from
Nov 29, 2022
Merged

conduit: Initial conduit pipeline tool. #1326

merged 68 commits into from
Nov 29, 2022

Conversation

winder
Copy link
Contributor

@winder winder commented Nov 10, 2022

Summary

Add new conduit data pipeline command. The data processing component of Indexer has been removed and turned into a modular data pipeline. Data loading can be configured and extended by enabling different plugins.

Plugins are separated into three categories:

  • importer: data source for block information.
  • processor: manipulate, filter or enrich data.
  • exporter: do something with the data, like send it to a database.

Indexer has been modified to use a pre-configured conduit pipeline internally.

Some notable features that are available through conduit:

  • Transaction filtering - filter out transactions that you do not want.
  • Data pruning - automatically discard old transactions from the postgres database.

For more information, see the documentation.

Test Plan

New unit tests.
Production network catchup and data validation tests.

Eric-Warehime and others added 30 commits July 21, 2022 08:18
* Local Ledger (#1011)

* integrate block processor

* Local Ledger Deployment (#1013)

* add simple local ledger migration

* add deleted opts

* fast catchup (#1023)

* add fast catchup

* Localledger merge (#1036)

* return empty lists from fetchApplications and fetchAppLocalStates (#1010)

* Update model to converge with algod (#1005)

* New Feature: Adds Data Directory Support (#1012)

- Updates the go-algorand submodule hash to point to rel/beta
- Moves the cpu profiling file, pid file and indexer configuration file
  to be options of only the daemon sub-command
- Changes os.Exit() to be a panic with a special handler.  This is so
  that defer's are handled instead of being ignored.
- Detects auto-loading configuration files in the data directory and
  issues errors if equivalent command line arguments are supplied.
- Updates the README with instructions on how to use the auto-loading
  configuration files and the data directory.

* Update mockery version

Co-authored-by: erer1243 <[email protected]>
Co-authored-by: AlgoStephenAkiki <[email protected]>

* recovery scenario (#1024)

* handle ledger recovery scenario

* refactor create genesis block (#1026)

* refactor create genesis block

* Adds Local Ledger Readme (#1035)

* Adds Local Ledger Readme

Resolves #4109

Starts Readme docs

* Update docs/LocalLedger.md

Co-authored-by: Will Winder <[email protected]>

* Update docs/LocalLedger.md

Co-authored-by: Will Winder <[email protected]>

* Update docs/LocalLedger.md

Co-authored-by: Will Winder <[email protected]>

* Removed troubleshooting section

Co-authored-by: Will Winder <[email protected]>

* update ledger file path and migration (#1042)

* LocalLedger Refactoring + Catchpoint Service (#1049)

Part 1

    cleanup genesis file access.
    put node catchup into a function that can be swapped out with the catchup service.
    pass the indexer logger into the block processor.
    move open ledger into a util function, and move the initial state util function into a new ledger util file.
    add initial catchupservice implementation.
    move ledger init from daemon.go to constructor.
    Merge multiple read genesis functions.

Part 2

    Merge local_ledger migration package into blockprocessor.
    Rename Migration to Initialize
    Use logger in catchup service catchup

Part 3

    Update submodule and use NewWrappedLogger.
    Make util.CreateInitState private

* build: merge develop into localledger/integration (#1062)

* Ledger init status (#1058)

* Generate an error if the catchpoint is not valid for initialization. (#1075)

* Use main logger in handler and fetcher. (#1077)

* Switch from fullNode catchup to catchpoint catchup service. (#1076)

* Refactor daemon, add more tests (#1039)

Refactors daemon cmd into separate, testable pieces.

* Merge develop into localledger/integration (#1083)

* Misc Local Ledger cleanup (#1086)

* Update processor/blockprocessor/initialize.go

Co-authored-by: Zeph Grunschlag <[email protected]>

* commit

* fix function call args

* RFC-0001: Rfc 0001 impl (#1069)

Adds an Exporter interface and a noop exporter implementation with factory methods for construction

* Fix test errors

* Add/fix tests

* Add postgresql_exporter tests

* Update config loading

* Change BlockExportData to pointers

* Move and rename ExportData

* Add Empty func to BlockData

* Add comment

Co-authored-by: shiqizng <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: erer1243 <[email protected]>
Co-authored-by: AlgoStephenAkiki <[email protected]>
Co-authored-by: Will Winder <[email protected]>
Co-authored-by: Zeph Grunschlag <[email protected]>
* Updates for changes to our plugin design based on feedback in RFCs
* Merges in submodule update for test failure fixes
* Bump version to 2.13.0-rc1

* Bump version to 2.13.0

* Documentation for data directory. (#1125)

Co-authored-by: algobarb <[email protected]>

* Don't lookup big foreign assets. (#1141)

* Revert "Bump version to 2.13.0"

This reverts commit 0a8af61.

* Bump version to 2.13.0

* Fix import performance test runner. (#1133)

* Start on round 1 since round 0 is now computed from the genesis file.
* Wait for indexer processor to exit.
* Better logging for metric collection errors.
* Proper support for data directory.
* New test script for future release automation.

* Revert "Bump version to 2.13.0"

This reverts commit 7915890.

* Bump version to 2.13.0

* test fixes: Submodule updates (#1144)

* Update go-algorand submodule
* Fix test failure due to duplicate txns
* Add new ledger interface method

* Enhancement: remove import validator utility and obsolete ledger for evaluator (#1146)

removing a bunch of code and make the random test pass with the new ledger for evaluator

* Docs: Readme update (#1149)

* Update README header

* Testing: Use tempdir instead of /tmp for e2elive test (#1152)

* Format misc/*.py with `black` (#1153)

* apply black to e2elive.py as well (#1154)

* Enhancement: More information about S3 keys searched for and Dockerfile that uses submodule instead of channel (#1151)

* Eric's Dockerfile improvements
* Update misc/e2elive.py

Co-authored-by: DevOps Service <[email protected]>
Co-authored-by: Will Winder <[email protected]>
Co-authored-by: algobarb <[email protected]>
Co-authored-by: Barbara Poon <[email protected]>
Co-authored-by: Zeph Grunschlag <[email protected]>
* Update READEME

* Add initial Conduit.md

* Add example exporter, exporter README.md
* importer interface | algod importer implementation

* added new tests | modified importer interface

* exported mockAlgodServer from test

* updated algodImporter plugin

* Update importers/algod/algod_importer.go

Co-authored-by: Eric Warehime <[email protected]>

Co-authored-by: Eric Warehime <[email protected]>
* Added Processor

Added processor functionality

* pr updates

* Pr

* PR comments

* Reduce proto checks

* Linting

* Pr

* PR comments

* PR comments
* Use config to control noop exporter round start
* Initial files

* Initial Conduit Binary

Resolves #1165

Creates an initial framework for conduit binary production.  Introduces
a data directory flag as well as init and shutdown (basic)
functionality.

* linting

* PR comments

* init via importer

* Passing test

* e2e test passing

* Modify makefile so that algorand-indexer is built

* Pr changes

* Log level changes

* Modify bindflagset for conduit

* Makefile changes

* Makefile

* PR updates
* Add waits/retries to algod importer
* conduit pipeline run loop implementation
* Adds unit tests

Resolves #1192

Adds unit tests to the config directory

* Extra assert

* PR comments

* Wait for round increase

* Added tests

* context fix
* Adds cpu profile and pid flag

Resolves #1172

Adds the options for cpu profiling and pid flags as well as unit tests

* Update conduit/pipeline.go

Co-authored-by: shiqizng <[email protected]>

* linting

* Added pid file function

* rebase

Co-authored-by: shiqizng <[email protected]>
* Convert indexer daemon to conduit
* adding file exporter
* Bump version to 2.13.0-rc1

* Bump version to 2.13.0

* Documentation for data directory. (#1125)

Co-authored-by: algobarb <[email protected]>

* Don't lookup big foreign assets. (#1141)

* Revert "Bump version to 2.13.0"

This reverts commit 0a8af61.

* Bump version to 2.13.0

* Fix import performance test runner. (#1133)

* Start on round 1 since round 0 is now computed from the genesis file.
* Wait for indexer processor to exit.
* Better logging for metric collection errors.
* Proper support for data directory.
* New test script for future release automation.

* Revert "Bump version to 2.13.0"

This reverts commit 7915890.

* Bump version to 2.13.0

* test fixes: Submodule updates (#1144)

* Update go-algorand submodule
* Fix test failure due to duplicate txns
* Add new ledger interface method

* Enhancement: remove import validator utility and obsolete ledger for evaluator (#1146)

removing a bunch of code and make the random test pass with the new ledger for evaluator

* Docs: Readme update (#1149)

* Update README header

* Testing: Use tempdir instead of /tmp for e2elive test (#1152)

* Format misc/*.py with `black` (#1153)

* apply black to e2elive.py as well (#1154)

* Enhancement: More information about S3 keys searched for and Dockerfile that uses submodule instead of channel (#1151)

* Eric's Dockerfile improvements
* Update misc/e2elive.py

* Bug-Fix: Implement BlockHdrCached + miscellany (#1162)

* Enhancement: add max int64 checks (#1166)

* state proofs: Indexer Support for State Proofs (#1002)

Adds API support to the Indexer for State Proof Transactions and header fields.

Co-authored-by: Will Winder <[email protected]>

Co-authored-by: Will Winder <[email protected]>

* Bump version to 2.14.0-rc1

* Stop Panics if no config is supplied (#1180)

Give a default config if not supplied to stop panics.

* Fix spec name collisions. (#1182)

* Update go-algorand submodule to v3.9.1-beta (#1185)

* Bump version to 2.14.0-rc2

* Disable deadlock detection (#1186)

* Add support for new block header: TxnRoot SHA256 (#989)

* Accept yaml and yml configuration files. (#1181)

* Fix bug in reveals lookup (#1198)

* Fix bug in reveals lookup (#1198)

* Bump version to 2.14.0-rc3

* add state proof example with high reveal index - from betanet (#1199)

* Devops: Bump go-algorand submodule to v3.9.2-beta (#1203)

* Bump version to 2.14.0-rc4

* enhancement: Clarify REST query parameters for accounts search (#1201)

* update description for /v2/accounts

* cicd: add darwin arm64 support to release script (#1169)

* Bump version to 2.14.0

* Downgrade mockery to prevent incorrect deprecation warning. (#1211)

* Enhancement: update e2e test policy (#1197)

*update e2e test policy

* Fix release 2.14.0 (#1214)

* Accept yaml and yml configuration files. (#1181)

* Fix bug in reveals lookup (#1198)

* add state proof example with high reveal index - from betanet (#1199)

* enhancement: Clarify REST query parameters for accounts search (#1201)

* update description for /v2/accounts

* cicd: add darwin arm64 support to release script (#1169)

* Downgrade mockery to prevent incorrect deprecation warning. (#1211)

* Enhancement: update e2e test policy (#1197)

*update e2e test policy

* Update test expected value: transaction root sha256

Co-authored-by: AlgoStephenAkiki <[email protected]>
Co-authored-by: Michael Diamant <[email protected]>
Co-authored-by: algoidan <[email protected]>
Co-authored-by: shiqizng <[email protected]>
Co-authored-by: algolucky <[email protected]>
Co-authored-by: Will Winder <[email protected]>

Co-authored-by: DevOps Service <[email protected]>
Co-authored-by: Will Winder <[email protected]>
Co-authored-by: algobarb <[email protected]>
Co-authored-by: Barbara Poon <[email protected]>
Co-authored-by: Zeph Grunschlag <[email protected]>
Co-authored-by: shiqizng <[email protected]>
Co-authored-by: AlgoStephenAkiki <[email protected]>
Co-authored-by: John Lee <[email protected]>
Co-authored-by: Or Aharonee <[email protected]>
Co-authored-by: Michael Diamant <[email protected]>
Co-authored-by: algoidan <[email protected]>
Co-authored-by: algolucky <[email protected]>
* Initial Filter Processor

Resolves #1100

Adds dynamic filter processor and initial tests

* Nits

* Update processors/filterprocessor/filter_processor.go

Co-authored-by: Eric Warehime <[email protected]>

* Update processors/filterprocessor/filter_processor_test.go

Co-authored-by: Eric Warehime <[email protected]>

* Update processors/filterprocessor/filter_processor_test.go

Co-authored-by: Eric Warehime <[email protected]>

* PR comments

* Pr comments

* PR updates

* PR updates

* Pr updates

* PR updates

* PR updates

* PR updates

* Changed some to any and const to exact

* Test failure fix

* Added more unit tests

* Config comments

Co-authored-by: Eric Warehime <[email protected]>
* Add metrics reporting to conduit pipeline
* Changed tag format

* Updated generation code

* Added filtering of numerical values

* PR comments

* imports
* Add panic recovers

Resolves #263

Adds recovery functions in main loop of conduit as well as in a
go-routine in the start function of the pipeline

* helper function
Resolves #1245

Explicitly spell out empty data directory as well as condensed error
messages when running conduit
@winder winder added the Enhancement New feature or request label Nov 14, 2022
@winder winder changed the title Conduit conduit: Initial conduit pipeline command. Nov 14, 2022
@codecov
Copy link

codecov bot commented Nov 14, 2022

Codecov Report

Merging #1326 (9f32987) into develop (39159bd) will decrease coverage by 0.40%.
The diff coverage is 56.04%.

@@             Coverage Diff             @@
##           develop    #1326      +/-   ##
===========================================
- Coverage    61.11%   60.71%   -0.41%     
===========================================
  Files           52       76      +24     
  Lines         8512    10909    +2397     
===========================================
+ Hits          5202     6623    +1421     
- Misses        2848     3699     +851     
- Partials       462      587     +125     
Impacted Files Coverage Δ
api/error_messages.go 100.00% <ø> (ø)
api/generated/common/routes.go 46.51% <ø> (ø)
conduit/errors.go 0.00% <0.00%> (ø)
conduit/gen/lookup.go 0.00% <0.00%> (ø)
...ocessors/blockprocessor/internal/catchupservice.go 69.66% <ø> (ø)
...plugins/processors/filterprocessor/gen/generate.go 0.00% <0.00%> (ø)
idb/idb.go 69.35% <ø> (ø)
idb/postgres/internal/encoding/types.go 66.66% <ø> (ø)
idb/postgres/internal/writer/util.go 100.00% <ø> (ø)
idb/sig_type.go 64.00% <0.00%> (ø)
... and 52 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@winder winder marked this pull request as ready for review November 29, 2022 18:46
@winder winder changed the title conduit: Initial conduit pipeline command. conduit: Initial conduit pipeline tool. Nov 29, 2022
@winder winder merged commit d672efe into develop Nov 29, 2022
@winder winder deleted the conduit branch November 29, 2022 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Team Lamprey
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants