-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aptos-workspace-server] indexer support and graceful shutdown #15183
Conversation
⏱️ 6m total CI duration on this PR
|
.context("failed to start node api")?; | ||
res_indexer_grpc | ||
.map_err(anyhow::Error::msg) | ||
.context("failed to start node api")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message indicates "failed to start node api"
but this line is checking the indexer_grpc
result. The message should be updated to "failed to start indexer grpc"
to accurately reflect which service failed to start.
Spotted by Graphite Reviewer
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we show an error when trying to spin up the server but docker is not available? same as what we do for the current localnet
- With the existing CLI, we use a JSON format with Result/Error. This helps downstream processes (i.e sdk) to handle errors and logs as expected (for reference) - could we make sure we use the same format in the
aptos-workspace-server
as well?
{
" Error":"some error"
}
{
"Result":"success"
}
fa41505
to
b00b7cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docs are beautiful, super easy to follow :')
tokio::pin!(fut_faucet_finish); | ||
// Phase 2: Wait for all services to be up. | ||
let all_services_up = async move { | ||
tokio::try_join!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose by design you can't configure as much as you can with localnet, e.g. using a host postgres / hasura, choosing to not run the faucet / particular processors, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan is to make these configurable at a later time. Design-wise it will require a bit more chore (storing things in Option<impl Future>
or some other collections) but otherwise perfectly doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a host postgres / hasura, choosing to not run the faucet / particular processors, etc?
These are crucial for the localnet unification
|
||
let (options, config) = | ||
create_container_options_and_config(instance_id, docker_network_name); | ||
let (fut_container, fut_container_cleanup) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but I notice you don't try to clear out any leftover containers before creating the new one, e.g. if the clean up failed or the user siginted it or whatever. Should you be doing so here? Or I suppose it's not necessary because you'll use different names and networks each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's not necessary because you'll use different names and networks each time?
Correct, every aptos-workspace-server
instance creates its own network, volumes and containers, so there aren't really leftovers (from self) to clean up.
However I do think we should add a separate pass to clean up leftovers globally, in case some previous runs aborted abnormally. Still need to figure out a few more details:
- When do we run it? During localnet start up or as a global pass when you hit the
workspace test
command? - When do we consider a docker resource removable? More than 2 hours since it was created?
- Do we need some sort of global lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say when it's removable safely yeah, maybe just a workspace prune
command might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc, we create a docker resource for each test suite, and after the test finishes we technically dont need it anymore. So we can remove the resource in the "after" hook in workspace (that runs after each test suite is finish)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can remove the resource in the "after" hook in workspace (that runs after each test suite is finish)
@0xmaayan Yeah on the TS-side we need to make sure SIGNTERM gets sent to aptos-workspace-server
binary once it's no longer in use, so it can clean up the resources. I'll double check this part later today.
} | ||
} | ||
|
||
/// Returns the URLfor connecting to the indexer grpc service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
/// Returns the URLfor connecting to the indexer grpc service. | |
/// Returns the URL for connecting to the indexer grpc service. |
|
||
let (options, config) = | ||
create_container_options_and_config(instance_id, docker_network_name); | ||
let (fut_container, fut_container_cleanup) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say when it's removable safely yeah, maybe just a workspace prune
command might be better.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably separate the cleanup from the normal path, but it's just a nit. something like this
async move {
all_services_up.await?;
select! {
// wait for finish
}
Ok(())
}
let (run, cleanup) = run_service();
select! {
_ = shutdown => (),
result = run => if let Err(_) = result {}
}
cleanup.await;
This is major enhancement to
aptos-workspace-server
, implementing the following features