-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean sync Goerli broken(?) post merge on 1.10.21 and 1.10.23 #25693
Comments
Can you provide more information that leads you to this conclusion? It can take a while to start sometimes, just leave it running for a bit. |
We have a couple of PRs for fixing/improving snap sync on master recently. Maybe you can try to use master once we merge this PR #25694 |
Are 20 hours enough? ;) These are the top non unique log messages by count:
I also re-tried this locally before submitting the issue (the above setup is running on a droplet).
|
Update: It's now been almost 40h and still no sync. Going to stop it now and try with latest master. |
I'm wondering if your node actually manages to find any peers -- is your firewall sufficiently open, so bidirectional communication can occur over the relevant ports ? |
Next update: @holiman As I wrote in the initial report 1.10.21 was able to sync (but then got stuck in the heal stage). Also as I wrote yesterday, I can replicate the non-syncing behaviour locally by just running |
@ulope that log say
Goerli is post-merge, it needs the beacon client to tell it what the head is, and then it will sync to that. |
@holiman Unless I'm very much mistaken (and the release notes are wrong) 1.10.21 is just as aware of the goerli merge as later versions. However it does start snap syncing immediately as mentioned. Also with both versions Prysm seems to wait for geth to be synced to some degree because it continually logs:
(I also tried other beacon clients before, but didn't record their output unfortunately. Can check again if that's helpful.) So either I'm missing something else or this looks like a hen/egg problem. |
They are both aware, but not quite "as awaare" :) The more recent version will spit out something like
The difference being that this flag is set for goerli:
See #24538 for more info:
|
We are closely following this issue as we are also receiving the same message “Beacon client online, but never received consensus updates. Please ensure your beacon client is operational to follow the chain!”. Our current setup is GETH 1.10.23 and prysm alpine image. We are connecting to network goerli prater. |
Maybe something is broken in Prysm <-> Geth in TTD-passed mode? AFAIK geth needs a signal from the CL to start syncing. Could be that this signal doesn't come? |
What should happen is: geth should print something like
If it doesn't print that, but a CL is attached, it means the CL is not sending FcU requests, probably because it is waiting for geth to sync. |
Can confirm that geth from master branch (at commit d30e39b) did start syncing after a couple seconds. I ran geth like this:
And lighthouse like this:
|
@Snehapati11 It seems that Prysm doesn't even start syncing the beacon chain in our setup. Is that the case for you too? I've tried now with both lighthouse and nimbus. They both at least start syncing the beacon chain (very very slowly though, current ETA 6d+) @fjl Hm I assume that start syncing signal will only come once the beacon chain is synced. So this might after all be not a geth problem. I'll investigate further. |
@fjl Was your lighthouse node already synced? |
I used checkpoint sync and it's a bit faster. See this guide for more info: https://lighthouse-book.sigmaprime.io/checkpoint-sync.html#use-infura-as-a-remote-beacon-node-provider |
@fjl But at the point where you started both clients the beacon chain wasn’t finished syncing yet? I’ll try replicating that tomorrow. |
I start both clients in quick succession, and they both go into sync kind of quickly. This is with a completely blank DB. Pretty sure this is an issue with prysm. Maybe it doesn't enable optimistic sync by default? |
I confirm that Geth fresh sync is broken for Goerli on Geth We run the same setup as for Ropsten and Mainnet and they are fine! This issue is repeatable for different nodes! Here is Geth logs: Here is Prysm logs: Geth sync status: We also know that sync status is broken for the latest Geth, because a bunch of our monitoring tools have issues |
I've switched different version of prysm and it didn't help. geth version 1.10.21 has started syncing immediately |
@ulope I just want to say that infinite "State heal in progress" may be due to hardware issue. If you run on cloud, try to spin up a new machine. If you go with bare metal, you need probably better hardware. This is unrelated to the broken sync issue, but it's quite common. I am seeing it in 10-20% launches on low spec machines. |
@ulope Beacon node is still is in progress .Are you facing any issues while geth goerli syncing process? It shows beacon client is online but not passing the consensus update. |
@ulope We have just successfully tested Goerli version 1.10.23 with prysm version 3.0.0 and we are no longer seeing the issue that " Beacon client online, but never received consensus updates. Please ensure your beacon client is operational to follow the chain!". It would appear prysm version 3.1.0 has an issue as you suggested. |
Erigon + Prysm v3.1.0 successfully synced from scratch. |
the checkpoint sync was key for me to get lighthouse to poke geth and get it to start syncing https://notes.ethereum.org/@launchpad/checkpoint-sync#EF-DevOps-Endpoints |
Looks like this issue is resolved, will close. Feel free to open a new issue if geth sync is broken for you |
Why you close a critical issue without fix? It should be either fixed or official announced that old fresh synch method is deprecated. |
We've successfully synced multiple nodes on goerli and never ran into this issue. |
I've repeated this issue today for Goerli and Ropsten as well. The probability is not 100%, for Ropsten it was in 2 times from 4 tries, and for Goerli it was 3 times from 4 tries. This issue will appear probably on Mainnet after The Merge, because sync condition is changed for all Post-Merge network. |
The problem here is with the CL clients (Prysm, Lighthouse, etc.). The CL client needs to start syncing the beacon chain optimistically and start delivering |
I have brought this up in chat with CL devs, let's see how they respond. |
Replacing geth with version v1.10.21 always solve the problem. It would be better to open a new issue with more relevant details |
geth v1.10.21 'works' because it always starts the legacy non-PoS sync. It's not a good fix long term. |
Sorry for the late reply (with the merge looming time is a bit scarce). So with a checkpoint synced lighthouse and geth 1.10.22+ I was able to successfully sync. So I'd say at least for me this was definitely (in part) user error. However, having said that I do find that this is a very drastic change in behaviour esp. for a patch release. Syncing has always started on its own in the 7+ years history. IMO this should have been geth 2.0. |
After the merge, Geth requires input from the consensus layer to find the correct chain. There is no way for it to know the sync target without the CL. This is a protocol limitation, and it's why we changed it in the release after the merge on Goerli. We are working on alternatives to the engine API connection, so Geth may potentially be able to sync on its own again in the future. |
I want to run EL without CL! So, I can confirm |
geth v1.10.25 and still seems similar issue, the beacon node takes ages to sync and dont show correct time estimations, And this from geth: Is every new sync even with a light node is by default syncing from start? I read that the fix seems to be the checkpoint, As. someone who was. able to spin a node quickly before, This seems a downgrade. of usability from past geth version, |
So for Prysm is there really sync checkpoint other then local or testnet nodes or dowloading a file? https://notes.ethereum.org/@launchpad/checkpoint-sync |
this checkpoint is life changer: --checkpoint-sync-url=https://beaconstate.ethstaker.cc --genesis-beacon-api-url=https://beaconstate.ethstaker.cc |
Geth/v1.10.26 + Prysm 3.1.2 still same problem on goerli-prater from scratch. 2 days still no sycing. But with --checkpoint-sync-url=https://goerli.checkpoint-sync.ethpandaops.io it started syncing immediately. Thanks @thomaseth2 for https://notes.ethereum.org/@launchpad/checkpoint-sync. |
Any progress? My execution can connect to another execution client but no sync. The beacon is complaining |
System information
Geth version:
v1.10.21
/v1.10.23
OS & Version: Linux
Expected behaviour
A fresh sync from scratch of Goerli to work
Actual behaviour
Does not work.
1.10.21
:INFO [09-05|14:37:35.138] State heal in progress [email protected] [email protected] [email protected] nodes=16,319,[email protected] pending=4369
Unexpected trienode heal packet
messages (~70% of all log lines)WARN [09-02|16:42:58.589] Pivot seemingly stale, moving old=7,516,913 new=7,516,977
INFO [09-05|14:37:34.872] Imported new block headers count=0 elapsed=15.422ms number=7,382,818 hash=aa32c4..48c7cc age=3w4d12h ignored=178
Local chain is post-merge, waiting for beacon client sync switch-over...
level=error msg="Unable to process past deposit contract logs, perhaps your execution client is not fully synced" error="no contract code at given address" prefix=powchain
1.10.23
:No sync progress is reported, apparently every peer is dropped because of:
WARN [09-06|09:25:04.945] Snapshot extension registration failed peer=6de3885d err="peer connected on snap without compatible eth support"
Steps to reproduce the behaviour
Compose file used:
The text was updated successfully, but these errors were encountered: