Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/runtime: Support hot-loading of runtime bundles #5961

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

peternose
Copy link
Contributor

Issue: #5755

Example status output:

"bundles": [
  {
    "name": "test-runtime",
    "id": "8000000000000000000000000000000000000000000000000000000000000000",
    "components": [
      {
        "kind": "ronl",
        "version": {}
      },
      {
        "kind": "rofl",
        "version": {}
      }
    ]
  }
],

Copy link

netlify bot commented Dec 6, 2024

Deploy Preview for oasisprotocol-oasis-core canceled.

Name Link
🔨 Latest commit 3716fee
🔍 Latest deploy log https://app.netlify.com/sites/oasisprotocol-oasis-core/deploys/675fa3c67968340008f1e07e

@peternose peternose force-pushed the peternose/feature/hot-loading branch 3 times, most recently from 9e3cb33 to 5da378e Compare December 7, 2024 02:50
Copy link

codecov bot commented Dec 7, 2024

Codecov Report

Attention: Patch coverage is 72.08589% with 273 lines in your changes missing coverage. Please review.

Project coverage is 65.33%. Comparing base (ba6735a) to head (35d2d45).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
go/runtime/bundle/discovery.go 61.45% 75 Missing and 26 partials ⚠️
go/runtime/registry/registry.go 76.30% 24 Missing and 17 partials ⚠️
go/runtime/registry/config.go 70.33% 24 Missing and 11 partials ⚠️
go/runtime/bundle/registry.go 90.06% 9 Missing and 6 partials ⚠️
go/worker/common/committee/node.go 54.16% 9 Missing and 2 partials ⚠️
go/worker/keymanager/worker.go 65.51% 9 Missing and 1 partial ⚠️
go/runtime/host/sandbox/sandbox.go 61.90% 6 Missing and 2 partials ⚠️
go/runtime/host/tdx/qemu.go 0.00% 8 Missing ⚠️
go/runtime/registry/host.go 75.75% 4 Missing and 4 partials ⚠️
go/runtime/bundle/bundle.go 28.57% 4 Missing and 1 partial ⚠️
... and 13 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5961      +/-   ##
==========================================
+ Coverage   64.69%   65.33%   +0.63%     
==========================================
  Files         629      631       +2     
  Lines       64297    64752     +455     
==========================================
+ Hits        41599    42304     +705     
+ Misses      17775    17485     -290     
- Partials     4923     4963      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@peternose peternose marked this pull request as ready for review December 7, 2024 03:18
Copy link
Member

@ptrus ptrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-note): only reviewed the discovery part now

go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
Comment on lines 74 to 99
// Init sets up bundle discovery using node configuration and adds configured
// and cached bundles to the registry.
func (d *Discovery) Init() error {
// Consolidate all bundles in one place, which could be useful
// if we implement P2P sharing in the future.
if err := d.copyBundles(); err != nil {
return err
}

// Add copied and cached bundles to the registry.
if err := d.Discover(); err != nil {
return err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init only adds configured bundles to the registry. In addition, some of them may be already previously exploded (cached)? If this is the case somewhere nested WriteExploded will only verify...

However, it does not guarantee to add all the "cached" bundles, as looking at Discover method you only add the *.orc bundles to the registry (i.e. the ones that were configured). This is desired as some of the exploded bundles may be stale, i.e. no longer part of the config. (Automatic removal will be handled as part of #5737).

Do I understand your logic correctly? If so I would suggest removing cached bundles reference from the Init toplevel docs and before d.Discover() to avoid confusion?

Thanks for your reply! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All bundles are copied to the bundle folder.:

  • Bundles from the config are copied in Init on line 79.
  • Bundles fetched from the web are first verified and then copied to the bundle folder.

So the Discovery loads all bundles in the config, all bundles that used to be in the config, and all bundles that were downloaded. The idea is that you always have bundle and the extracted folder in the bundle directory. This way, it will be easy to implement p2p2 sharing as one can just check which orc files are in the bundle directory.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see and thanks.

So what I suggest to change (super minor but to avoid confusion):

// Init sets up bundle discovery using node configuration and adds configured
// and cached bundles to the registry.

and

// Add copied and cached bundles to the registry.

to

// Init sets up bundle discovery using node configuration and adds configured
// bundles (that are guaranteed to be exploded) to the registry.

and

// Add copied bundles (that are guaranteed to be exploded) to the registry.

or smtg along the lines, just to emphasize that we are not also moving stale cached bundles (see issue #5737)

Hope it makes sense(?), and feel free to reword as you see fit...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all bundles that used to be in the config

Correct since at init time the .orc file (if configured) was moved to /data/runtimes/bundles and exploded.
Note, the nodes that were running previous software (prior to this PR/current version), may have some bundles exploded, later removed them from config (and thus also bundle), so only the exploded part will stay. I guess with #5737 the idea was to remove those stale "exploded" bundles + the ones whose version is now outdated.

Now if we push for the p2p sharing, I guess we go in the other direction, i.e. we further increase the storage load on the nodes as they store essentially all the historical bundles and their exploded paths (technically we could be removing those once outdated and only store bundles). On the positive side we get a more resilient data availability for our software components.

Thinking aloud, if we want to have some kind of p2p sharing we may want only archive nodes responsible for that? For the normal nodes I don't think we want to store outdated bundles and exploded components. In any case we may want to adapt #5737 cleanup requirements?

Happy to tackle this myself in the follow-up once we align on the strategy.

cc @kostko

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no sense in storing or sharing historic bundles, so any pruning should get rid of them IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this make sense. RONL + system components in one bundle, and ROFL components in detached bundles.

Also note that ROFL components can only be fetched from detached bundles, not from bundles that contain RONLs.

Our code doesn't distinguish between ROFL and system components. But it would be nice to have a check that ROFLs are always in a detached bundle, and system components in the runtime bundle.

Copy link
Contributor Author

@peternose peternose Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add ROFLSystemKind Kind= "rofl-system" or SystemKind Kind = "system" and verify this in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, maybe we don't need this and we just mark the component as system ROFL once is read from an undetached bundle.

Copy link

@martintomazic martintomazic Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought "system components" can only be ROFL component with current design? (update expect for one and only one RONL)

But it would be nice to have a check that ROFLs are always in a detached bundle, and system components in the runtime bundle.

If not is this really "always" then, if system component is not rofl and may be used by different bundles, then you also want to package it separately, i.e. detached?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "system ROFL component" = non-detached ROFL. There is already the detached flag for this, but maybe it is not obvious? Or there could be an additional component attribute (e.g. system), which would only be allowed to be set if the bundle is non-detached?

Copy link

@martintomazic martintomazic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked new discovery and registry part. Overall looks super solid. Will try to review the whole PR/integration tmr :)

go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Show resolved Hide resolved
go/runtime/bundle/discovery.go Show resolved Hide resolved
go/oasis-node/cmd/node/node.go Show resolved Hide resolved
go/runtime/config/config.go Show resolved Hide resolved
go/runtime/bundle/discovery.go Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
Comment on lines 74 to 99
// Init sets up bundle discovery using node configuration and adds configured
// and cached bundles to the registry.
func (d *Discovery) Init() error {
// Consolidate all bundles in one place, which could be useful
// if we implement P2P sharing in the future.
if err := d.copyBundles(); err != nil {
return err
}

// Add copied and cached bundles to the registry.
if err := d.Discover(); err != nil {
return err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see and thanks.

So what I suggest to change (super minor but to avoid confusion):

// Init sets up bundle discovery using node configuration and adds configured
// and cached bundles to the registry.

and

// Add copied and cached bundles to the registry.

to

// Init sets up bundle discovery using node configuration and adds configured
// bundles (that are guaranteed to be exploded) to the registry.

and

// Add copied bundles (that are guaranteed to be exploded) to the registry.

or smtg along the lines, just to emphasize that we are not also moving stale cached bundles (see issue #5737)

Hope it makes sense(?), and feel free to reword as you see fit...

go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Show resolved Hide resolved
go/control/api/api.go Outdated Show resolved Hide resolved
go/control/api/api.go Show resolved Hide resolved
go/runtime/host/sandbox/sandbox.go Show resolved Hide resolved
go/runtime/registry/config.go Outdated Show resolved Hide resolved
go/runtime/bundle/manifest.go Outdated Show resolved Hide resolved
go/runtime/bundle/manifest.go Outdated Show resolved Hide resolved
Comment on lines 74 to 99
// Init sets up bundle discovery using node configuration and adds configured
// and cached bundles to the registry.
func (d *Discovery) Init() error {
// Consolidate all bundles in one place, which could be useful
// if we implement P2P sharing in the future.
if err := d.copyBundles(); err != nil {
return err
}

// Add copied and cached bundles to the registry.
if err := d.Discover(); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no sense in storing or sharing historic bundles, so any pruning should get rid of them IMO.

go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
@peternose peternose force-pushed the peternose/feature/hot-loading branch from 49e86ad to 45b2903 Compare December 12, 2024 11:11
.changelog/5962.cfg.md Outdated Show resolved Hide resolved
@peternose peternose force-pushed the peternose/feature/hot-loading branch from 45b2903 to 979c1c2 Compare December 12, 2024 11:38
Since runtime versions can be configured dynamically, and none of them may
be active at a given moment, this function has no meaningful purpose
in its current form and how it is currently used.
Extracted the creation of the provisioner, host info and caching
quote service into a dedicated helper function to improve code
readability.
Fixes a bug that occurs when attempting to abort the runtime
immediately after it starts.
The node can now fetch and verify runtime bundles from remote repositories
and automatically update to new versions.
@peternose peternose force-pushed the peternose/feature/hot-loading branch from 979c1c2 to 3716fee Compare December 16, 2024 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants