conduit: Convert daemon to conduit pipeline #1208

Eric-Warehime · 2022-08-29T21:59:46Z

Summary

Converts algorand-indexer's data pipeline to a conduit pipeline.

Test Plan

Indexer e2e tests.

codecov · 2022-08-31T23:36:55Z

Codecov Report

❗ No coverage uploaded for pull request base (conduit@beaf4c6). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             conduit    #1208   +/-   ##
==========================================
  Coverage           ?   61.58%           
==========================================
  Files              ?       64           
  Lines              ?     9126           
  Branches           ?        0           
==========================================
  Hits               ?     5620           
  Misses             ?     3021           
  Partials           ?      485

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

shiqizng · 2022-09-01T14:26:05Z

api/handlers.go

-	if si.fetcher != nil && si.fetcher.Error() != "" {
-		errors = append(errors, fmt.Sprintf("fetcher error: %s", si.fetcher.Error()))
+	if si.dataError != nil {
+		if err := si.dataError(); err != nil {


Suggested change

if err := si.dataError(); err != nil {

if si.dataError != nil && si.dataError() != nil {

}

The reason I've separated it out is that we need to add err to the set of known errors. We could just call si.dataError() again inside the block to reference the error, but because the pipeline will keep retrying it's possible that the first call would show an error but the second call might return nil.

This way we have a single call which either returns an error that is added to the health check or it's nil.

shiqizng · 2022-09-01T14:52:31Z

conduit/pipeline.go

-	logger *log.Logger
+	ctx     context.Context
+	cf      context.CancelFunc
+	running sync.WaitGroup


nit: consider naming this wg so it's clear that it's a go waitgroup.

I've copied the name from https://github.com/algorand/go-algorand/blob/master/catchup/catchpointService.go#L75 since I liked it.

To me, running is a waitgroup counting the running goroutine(1) makes sense. Open to changing it if we want to stick with wg for waitgroup vars though.

I agree with Shiqi here. Though I've never seen it written down, calling this wg seems to be a go idiom similar to ctx and cf above.

shiqizng · 2022-09-01T14:54:45Z

conduit/pipeline.go

-					return err
+					p.logger.Errorf("%v", err)
+					p.err = err
+					goto pipelineRun


this will retry forever until err==nil?

I've tried to mimic the behaviour that we have in fetcher. https://github.com/algorand/indexer/blob/develop/fetcher/fetcher.go#L219

This will run until the context is expires, which is probably whenever the cancel func is called. Errors can be raised by individual plugins, at which point we will set p.err so that it can be referenced by the p.Error() function, and the pipeline will start running again from the Importer on the same round.

I guess this was done in order to match the current Indexer behavior?

Does Indexer also retry on processor and writer errors?

Thinking out loud here, do you think that the error handling policy should be configurable per plugin?

I guess this was done in order to match the current Indexer behavior?

Yes, I've tried to match the behaviour of the current Indexer here.

Does Indexer also retry on processor and writer errors?

The Indexer's data pipeline is controlled by fetcher which runs two things concurrently:

Fetching and enqueuing blocks

Running the block handler on the queued blocks

The fetcher will stop running when:

The cancel function is called or the context is Done

It receives block bytes from algod which it cannot decode into a Block object

The handler returns an error

However, the handler we're using will just continue retrying when it errors. The effect is that fetcher only exits on malformed block data or context canceling.

Thinking out loud here, do you think that the error handling policy should be configurable per plugin?

I think it's a good idea to allow plugins to either return an error that can be retried or one that causes the pipeline to terminate. At the moment though, if we want to be backwards compatible w/ Indexer, all errors in the existing plugins would be non-fatal (except for block decoding).

Great, thanks for the clarifications. This approach sounds good to me.

shiqizng · 2022-09-01T14:56:18Z

conduit/pipeline.go

 					}
 				}
 				// Increment Round
+				p.err = nil


why does this need to be set to nil?

If an error has occurred in one of the plugins, p.err will be set to that. The pipeline will continue trying to process this round, and if it is successful we set the error to nil and move onto processing the next round. It's similar to https://github.com/algorand/indexer/blob/develop/fetcher/fetcher.go#L182 for reference.

winder

Looks really good. The only thing that really should change is how the blocking is done in runConduitCmdWithConfig.

winder · 2022-09-02T01:32:01Z

cmd/algorand-indexer/daemon.go

@@ -4,12 +4,12 @@ import (
 	"context"
 	"errors"
 	"fmt"
+	"github.com/algorand/indexer/conduit"


nit: import grouping

winder · 2022-09-02T01:38:25Z

cmd/algorand-indexer/daemon.go

 	return err
 }

-func runBlockImporter(ctx context.Context, cfg *daemonConfig, wg *sync.WaitGroup, db idb.IndexerDb, dbAvailable chan struct{}, bot fetcher.Fetcher, opts idb.IndexerDbOptions) {
+func makeConduitConfig(dCfg *daemonConfig) conduit.PipelineConfig {


winder · 2022-09-02T01:44:57Z

cmd/conduit/main.go

+	// Make sure to call this so we can shutdown if there is an error
+	defer pipeline.Stop()
+
+	for {


This is a busy loop. A good way to handle this is to have Start() block. Start it in a go routine and use a sync.WaitGroup. Something like:

var wg sync.WaitGroup wg.Add(1) go func() { defer wg.Done() pipeline.Start() } () wg.Wait()

winder · 2022-09-02T01:57:45Z

conduit/pipeline.go

+func (p *pipelineImpl) Error() error {
+	return p.err
+}


🤡 : This pattern is a bit of a hack, I wasn't happy when adding it to the daemon. I don't have a good suggestion for fixing it.

winder · 2022-09-02T01:59:32Z

conduit/pipeline_test.go

-	// Give some time for the goroutine to start
-	time.Sleep(1 * time.Second)
+	err := pImpl.Start()
+	assert.Nil(t, err)


nit: assert.NoError(t, err)

winder · 2022-09-02T02:00:58Z

conduit/pipeline_test.go

-	}()
-
-	// Give some time for the goroutine to start
-	time.Sleep(1 * time.Second)


Why isn't the delay needed anymore?

+1...just curious

Pipeline.Start synchronously creates the PID file and CPU Profile if necessary, launches the goroutine which runs block imports, and then returns. So since we're not creating these files in a goroutine anymore they will always be created by the time we've returned.

AlgoStephenAkiki · 2022-09-02T12:42:43Z

conduit/pipeline.go

 }

-func (p *pipelineImpl) Stop() error {
+func (p *pipelineImpl) Stop() {
+	p.cf()


How is this cancel function set?

In MakePipeline which takes a context, we create a cancel context and assign it to the pipelineImpl.

The one thing which I haven't included is an easy way to reset that context. So if you call Start and then Stop things wouldn't really work if you called Start again.

I think we just say that once you call stop you are closing conduit.

winder · 2022-09-02T19:05:50Z

conduit/pipeline.go

-	return p.RunPipeline()
+	go p.runPipeline()
+	return err


I missed this in my first pass. I think it's better to let the caller decide when to start a go routine, that way it's more natural to control a graceful shutdown with sync.WaitGroup.

It is nice that the pidFile/profFile tests are deterministic this way.

Maybe it would be the best of both solutions to split Init/Start by renaming:

Start -> Initialize

runPipeline -> Start

Moved all of the goroutine specific code into Start and added a synchronous Init function. Also added a blocking Wait function to the pipeline so that we don't have an empty loop places where we have no other blocking operations to wait on.

Slightly modified the error variable used in the handlers, but like you've said I don't think there's an idiomatic way to achieve that other than just protecting reads/writes with a mutex.

Let me know what you think...I tried your suggestion above of moving wait group management into the conduit daemon, but it ended up splitting up the wait group and context handling/blocking in ways that I didn't like.

Eric-Warehime added Not-Yet-Enabled Feature is not yet enabled at this time Skip-Release-Notes Reserved for PRs which do not need to be included in Release Notes labels Aug 29, 2022

Eric-Warehime requested review from AlgoStephenAkiki, winder and shiqizng August 31, 2022 23:35

Eric-Warehime marked this pull request as ready for review August 31, 2022 23:36

shiqizng reviewed Sep 1, 2022

View reviewed changes

Eric-Warehime added 2 commits September 1, 2022 13:12

Convert indexer daemon to conduit

ca40873

Refactor conduit goroutine usage

1e26d3b

Eric-Warehime force-pushed the 1194-daemon-conduit branch from adf9386 to 1e26d3b Compare September 1, 2022 20:17

Eric-Warehime added 2 commits September 1, 2022 13:37

Fix rebase errors

108b4c0

Refactor pidfile/cpuprof for new concurrency model

329cc04

winder suggested changes Sep 2, 2022

View reviewed changes

AlgoStephenAkiki reviewed Sep 2, 2022

View reviewed changes

winder reviewed Sep 2, 2022

View reviewed changes

Refactor for PR feedback

0091381

Eric-Warehime requested review from winder, AlgoStephenAkiki and shiqizng September 6, 2022 21:21

AlgoStephenAkiki approved these changes Sep 9, 2022

View reviewed changes

winder approved these changes Sep 9, 2022

View reviewed changes

shiqizng approved these changes Sep 9, 2022

View reviewed changes

Eric-Warehime merged commit 88fca5c into algorand:conduit Sep 9, 2022

Eric-Warehime deleted the 1194-daemon-conduit branch September 9, 2022 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conduit: Convert daemon to conduit pipeline #1208

conduit: Convert daemon to conduit pipeline #1208

Eric-Warehime commented Aug 29, 2022

codecov bot commented Aug 31, 2022 •

edited

Loading

shiqizng Sep 1, 2022

Eric-Warehime Sep 1, 2022

shiqizng Sep 1, 2022

Eric-Warehime Sep 1, 2022

winder Sep 2, 2022

shiqizng Sep 1, 2022 •

edited

Loading

Eric-Warehime Sep 1, 2022

winder Sep 2, 2022

Eric-Warehime Sep 2, 2022

winder Sep 2, 2022

shiqizng Sep 1, 2022

Eric-Warehime Sep 1, 2022

winder left a comment

winder Sep 2, 2022

winder Sep 2, 2022

winder Sep 2, 2022

winder Sep 2, 2022

winder Sep 2, 2022

winder Sep 2, 2022

AlgoStephenAkiki Sep 2, 2022

Eric-Warehime Sep 2, 2022

AlgoStephenAkiki Sep 2, 2022

Eric-Warehime Sep 2, 2022

AlgoStephenAkiki Sep 9, 2022

winder Sep 2, 2022

Eric-Warehime Sep 6, 2022

	if err := si.dataError(); err != nil {
	if si.dataError != nil && si.dataError() != nil {
	}

conduit: Convert daemon to conduit pipeline #1208

conduit: Convert daemon to conduit pipeline #1208

Conversation

Eric-Warehime commented Aug 29, 2022

Summary

Test Plan

codecov bot commented Aug 31, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shiqizng Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

winder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 31, 2022 •

edited

Loading

shiqizng Sep 1, 2022 •

edited

Loading