Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add exporter interface RFC #1061

Merged
merged 19 commits into from
Jul 8, 2022
Merged

Conversation

Eric-Warehime
Copy link
Contributor

Summary

Add an RFC for the initial exporter interface definition. Please add general comments to the thread, and specific comments inline.

@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #1061 (f657a2a) into develop (8501907) will increase coverage by 0.17%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #1061      +/-   ##
===========================================
+ Coverage    59.54%   59.72%   +0.17%     
===========================================
  Files           45       48       +3     
  Lines         8353     8390      +37     
===========================================
+ Hits          4974     5011      +37     
  Misses        2920     2920              
  Partials       459      459              
Impacted Files Coverage Δ
config/config.go 0.00% <ø> (ø)
exporters/exporter.go 100.00% <100.00%> (ø)
exporters/exporter_factory.go 100.00% <100.00%> (ø)
exporters/noop/noop_exporter.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8501907...f657a2a. Read the comment docs.

Comment on lines 75 to 76
// Round returns the next round to be processed. Atomically updated when Recv successfully completes.
Round() uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worded a little strongly. It suggests that the exporter could choose to re-process old rounds. That may be a feature we want to support eventually, but I'm not sure it's part of v1.

Suggested change
// Round returns the next round to be processed. Atomically updated when Recv successfully completes.
Round() uint64
// Round returns the next round that this exporter has not yet successfully processed.
Round() uint64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point. We don't necessarily want to commit to adding re-processing of rounds, but I do think it's important to ensure that this is atomically updated... i.e. Round will be guaranteed to stay in sync w/ the exported data.

Comment on lines 81 to 85
* A config file that defines all parameters that can be supplied to the plugin, and which provides the default values
that will be used for each parameter. A toml file stored inside the Indexer data directory stores all of the config data
for a given plugin. The Indexer will use the type specified in the Indexer config to look for a plugin config which
satisfies that class, and then load that as the selected plugin. Supplying multiple plugin configs for the selected type
of exporter will result in undefined behavior--a random config will be chosen.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • TOML: I don't have a strong opinion about these formats, but we already use YML, so I'd opt to stick with it.
  • Undefined behavior: let's generate an error, or support multiple instances of the same plugin.

Comment on lines +99 to +104
The Indexer's config will need to be changed to incorporate plugins. Future RFCs can decide how to string multiple
pipelines and plugins together in interesting ways. For now, we will have a single pipeline per indexer--i.e. running the
indexer will only run a single data pipeline and therefore have a single exporter. In that way the initial changes will
be easy to make--one new field in the config which provides an internal plugin name. Config files for the selected
plugin will then be resolved and parsed at runtime. The default configuration for Indexer will select the Postgresql
plugin in order to maintain backwards compatibility with the existing Indexer pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea, nice way to phase in plugins. Maybe we can call this setting "exporter override"?

Comment on lines +119 to +126
Though not being defined here, we can imagine that intermediate plugins in our execution framework might perform common
data operations such as aggregation, filtering, annotation, etc. The result of some of these operations will be a subset
of full block data, while others may result in block data which does not conform to the block specification at all.
In order to accommodate customization of the data, we will provide multiple data formats--the standard Block interface
used today, as well as more generic forms that plugins can deserialize based on their needs. A given plugin will need to
specify the data format it expects for both input data and output if it provides any (exporters obviously don't have
output data in this sense). The system will ensure that the constructed pipeline has compatible data formats during
initialization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to define an initial format, and extend it in the future.

Today, we have "Block" and "StateDelta". We're missing "Certificate", but we could add that in as well. I think they would all be nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I will add an example here.

- Telegraf uses stdin/stdout to communicate w/ processes. Kafka is mostly built around APIs, similarly things like
docker/linkerd/containerd use HTTP endpoints to standardize communication (though they mostly use this for config data
instead of streaming data).
- I support using sockets--very standard and well known programming interface,and has useful libraries built
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sockets are a good cross-platform way of communication. You'll need some way of serializing even if it something simple like a type byte, 4 length bytes and the payload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left this in for now. I think the the socket libraries are going to be easy to integrate into a cross-process communication framework, but obviously that needs to be given more detail and thought, and it will evolve a bit as we develop.

We can always add another RFC for this and keep this one for historical purposes, or decide that we want to update old RFCs to reflect the actual state of things in the future.

@Eric-Warehime Eric-Warehime requested a review from winder July 5, 2022 16:58
Comment on lines 60 to 61
// Name is a UID for each Exporter
Name() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some mixed tabs and spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be replaced by a Metadata object containing name, description, deprecation status as we discussed.

// Connect will be called during initialization, before block data starts going through the pipeline.
// Typically used for things like initializating network connections.
// The ExporterConfig passed to Connect will contain the Unmarhsalled config file specific to this plugin.
// Should return an error if it fails--this will result in the Indexer process terminating.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"return an error" is self evident, I'm not sure you need the last line on some of these. What happens when an error is received is a property of the framework so I think we'll have more thoughts on these comments later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. In most cases I've been attempting to be as verbose as possible just for the purposes of discussion/expectations.

@Eric-Warehime Eric-Warehime added the documentation Improvements or additions to documentation label Jul 7, 2022
@Eric-Warehime Eric-Warehime requested a review from winder July 7, 2022 16:37
@Eric-Warehime Eric-Warehime added the Not-Yet-Enabled Feature is not yet enabled at this time label Jul 7, 2022
@Eric-Warehime Eric-Warehime merged commit fddfde6 into algorand:develop Jul 8, 2022
@Eric-Warehime Eric-Warehime deleted the rfc-0001 branch July 8, 2022 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Not-Yet-Enabled Feature is not yet enabled at this time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants