Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network: Update libp2p to 0.54.1 #5996

Closed
nazar-pc opened this issue Oct 9, 2024 · 10 comments · Fixed by #6248
Closed

network: Update libp2p to 0.54.1 #5996

nazar-pc opened this issue Oct 9, 2024 · 10 comments · Fixed by #6248
Assignees

Comments

@nazar-pc
Copy link
Contributor

nazar-pc commented Oct 9, 2024

Substrate currently uses libp2p 0.52.4 with lots of outdated dependencies, libp2p 0.53.0 was released 11 months ago and 0.54.0 was released 2 months ago in time for September's release cut.

This is way too slow, please consider upgrading libp2p to not lag behind for a year or more, this is bad for downstream users due to lots of duplicated and outdated dependencies, can also be problematic from security standpoint.

@bkchr
Copy link
Member

bkchr commented Oct 9, 2024

CC @paritytech/networking

@lexnv lexnv changed the title Update libp2p to 0.54.1 network: Update libp2p to 0.54.1 Oct 10, 2024
@lexnv
Copy link
Contributor

lexnv commented Oct 10, 2024

It's definitely something we want to do!

I would wait a bit to update libp2p to the latest version:

  • the amount of breaking changes from version to version is consuming a lot of resources (I gave up after half a day of hacking at the new API to get it just to compile).
  • we need a better process for catching regressions in networking backend updates. Last time we updated, we discovered a few weeks later a CPU regression of around 20-30%. Here is a good document written by @AndreiEres https://hackmd.io/@AndreiEres/networking-bench (and Benchmark network stack CPU usage  #5220) with a path forward.
  • Ideally this could happen after Litep2p stabilizes. We'd need a safe fallback for unexpected issues, and the version prior to 0.52.4 started to manifest panics for nodes running for 5-6 days 🙈

After we update libp2p to latest version, we could start looking at performance issues: #5221

@lexnv lexnv added this to Networking Oct 10, 2024
@nazar-pc
Copy link
Contributor Author

nazar-pc commented Oct 10, 2024

We jump onto latest version of libp2p at https://github.com/autonomys/subspace fairly quickly and while I agree it is annoying to do all the changes needed to migrate to new version, it typically takes me a couple of hours to do. And I am yet to see show-stopping issues after upgrade itself (though we do not have as robust testing and benchmarking capabilities yet).

I can probably do this upgrade for you (in two steps: first to 0.53.x and then to 0.54.x to make review easier), but it takes you guys a really long time to review with regular merge conflicts in between (I have 4 PRs that I opened that are waiting for review for over a week each, some are waiting for way longer than a week), so while I understand you probably have good reasons for this, it is a big investment.

Let me know if you're willing to commit to timely review and I'll take a look at doing this in the near future myself, it isn't a big deal.

Substrate is a large source of outdated dependencies that triggered RUSTSEC alerts in the past already as well as causing other issues, like pulling old version of libp2p-quic (seems impossible to disable right now) that pulls old version of ring that fails to compile on some platforms, all for dependency we don't even use in the runtime.

@lexnv
Copy link
Contributor

lexnv commented Oct 21, 2024

@nazar-pc We would love to get some help on this if you have some extra bandwidth! 🙏

I'll have a pass soon for the remaining networking PRs, thanks for contributing again!

@nazar-pc
Copy link
Contributor Author

Noted, I'll update once I get to it

@nazar-pc
Copy link
Contributor Author

nazar-pc commented Oct 27, 2024

0.53.2 is done here: #6248

Please assign this issue to me, I'll take care of the rest too.

UPD: 0.54.1 is also there, it was too easy to do to delay it further.

@nazar-pc
Copy link
Contributor Author

Now https://rustsec.org/advisories/RUSTSEC-2024-0421.html affects us due to old libp2p version as well

@lexnv
Copy link
Contributor

lexnv commented Dec 10, 2024

@nazar-pc what is the impact of https://rustsec.org/advisories/RUSTSEC-2024-0421.html issue for your team?

IIUC, https://rustsec.org/advisories/RUSTSEC-2024-0421.html can lead to privilege escalation if and only if host domain name check is part of a privilege check / authorization check.

From the substrate perspective, we use crypto/noise on top of every connection, we verify and validate the remote is in possession of the private keys derived from the PeerID.

In other words, even if the DNS resolver decides that example.org and xn--example-.org are equal, we do rely on the signatures from crypto/noise to establish the encrypted connection with the remote address.

Let me know if my understanding is correct 🙏

@nazar-pc
Copy link
Contributor Author

Probably not very large, but I really don't like to suppress advisories and I had to do that more than once due to old libp2p and litep2p versions in Substrate already.

@lexnv
Copy link
Contributor

lexnv commented Dec 10, 2024

Yep totally understandable, thanks for the swift reply 🙏

We'll test until the EoW the libp2p update which should enable us to easily upgrade to the latest unaffected version 👍

github-merge-queue bot pushed a commit that referenced this issue Dec 16, 2024
# Description

Fixes #5996

https://github.com/libp2p/rust-libp2p/releases/tag/libp2p-v0.53.0
https://github.com/libp2p/rust-libp2p/blob/master/CHANGELOG.md

## Integration

Nothing special is needed, just note that `yamux_window_size` is no
longer applicable to libp2p (litep2p seems to still have it though).

## Review Notes

There are a few simplifications and improvements done in libp2p 0.53
regarding swarm interface, I'll list a few key/applicable here.

libp2p/rust-libp2p#4788 removed
`write_length_prefixed` function, so I inlined its code instead.

libp2p/rust-libp2p#4120 introduced new
`libp2p::SwarmBuilder` instead of now deprecated
`libp2p::swarm::SwarmBuilder`, the transition is straightforward and
quite ergonomic (can be seen in tests).

libp2p/rust-libp2p#4581 is the most annoying
change I have seen that basically makes many enums `#[non_exhaustive]`.
I mapped some, but those that couldn't be mapped I dealt with by
printing log messages once they are hit (the best solution I could come
up with, at least with stable Rust).

libp2p/rust-libp2p#4306 makes connection close
as soon as there are no handler using it, so I had to replace
`KeepAlive::Until` with an explicit future that flips internal boolean
after timeout, achieving the old behavior, though it should ideally be
removed completely at some point.

`yamux_window_size` is no longer used by libp2p thanks to
libp2p/rust-libp2p#4970 and generally Yamux
should have a higher performance now.

I have resolved and cleaned up all deprecations related to libp2p except
`BandwidthSinks`. Libp2p deprecated it (though it is still present in
0.54.1, which is why I didn't handle it just yet). Ideally Substrate
would finally [switch to the official Prometheus
client](paritytech/substrate#12699), in which
case we'd get metrics for free. Otherwise a bit of code will need to be
copy-pasted to maintain current behavior with `BandwidthSinks` gone,
which I left a TODO about.

The biggest change in 0.54.0 is
libp2p/rust-libp2p#4568 that changed transport
APIs and enabled unconditional potential port reuse, which can lead to
very confusing errors if running two Substrate nodes on the same machine
without changing listening port explicitly.

Overall nothing scary here, but testing is always appreciated.

# Checklist

* [x] My PR includes a detailed description as outlined in the
"Description" and its two subsections above.
* [x] My PR follows the [labeling requirements](

https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process
) of this project (at minimum one label for `T` required)
* External contributors: ask maintainers to put the right label on your
PR.

---

Polkadot Address: 1vSxzbyz2cJREAuVWjhXUT1ds8vBzoxn2w4asNpusQKwjJd

---------

Co-authored-by: Dmitry Markin <[email protected]>
@github-project-automation github-project-automation bot moved this to Blocked ⛔️ in Networking Dec 16, 2024
nazar-pc added a commit to autonomys/polkadot-sdk that referenced this issue Dec 20, 2024
# Description

Fixes paritytech#5996

https://github.com/libp2p/rust-libp2p/releases/tag/libp2p-v0.53.0
https://github.com/libp2p/rust-libp2p/blob/master/CHANGELOG.md

## Integration

Nothing special is needed, just note that `yamux_window_size` is no
longer applicable to libp2p (litep2p seems to still have it though).

## Review Notes

There are a few simplifications and improvements done in libp2p 0.53
regarding swarm interface, I'll list a few key/applicable here.

libp2p/rust-libp2p#4788 removed
`write_length_prefixed` function, so I inlined its code instead.

libp2p/rust-libp2p#4120 introduced new
`libp2p::SwarmBuilder` instead of now deprecated
`libp2p::swarm::SwarmBuilder`, the transition is straightforward and
quite ergonomic (can be seen in tests).

libp2p/rust-libp2p#4581 is the most annoying
change I have seen that basically makes many enums `#[non_exhaustive]`.
I mapped some, but those that couldn't be mapped I dealt with by
printing log messages once they are hit (the best solution I could come
up with, at least with stable Rust).

libp2p/rust-libp2p#4306 makes connection close
as soon as there are no handler using it, so I had to replace
`KeepAlive::Until` with an explicit future that flips internal boolean
after timeout, achieving the old behavior, though it should ideally be
removed completely at some point.

`yamux_window_size` is no longer used by libp2p thanks to
libp2p/rust-libp2p#4970 and generally Yamux
should have a higher performance now.

I have resolved and cleaned up all deprecations related to libp2p except
`BandwidthSinks`. Libp2p deprecated it (though it is still present in
0.54.1, which is why I didn't handle it just yet). Ideally Substrate
would finally [switch to the official Prometheus
client](paritytech/substrate#12699), in which
case we'd get metrics for free. Otherwise a bit of code will need to be
copy-pasted to maintain current behavior with `BandwidthSinks` gone,
which I left a TODO about.

The biggest change in 0.54.0 is
libp2p/rust-libp2p#4568 that changed transport
APIs and enabled unconditional potential port reuse, which can lead to
very confusing errors if running two Substrate nodes on the same machine
without changing listening port explicitly.

Overall nothing scary here, but testing is always appreciated.

# Checklist

* [x] My PR includes a detailed description as outlined in the
"Description" and its two subsections above.
* [x] My PR follows the [labeling requirements](

https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process
) of this project (at minimum one label for `T` required)
* External contributors: ask maintainers to put the right label on your
PR.

---

Polkadot Address: 1vSxzbyz2cJREAuVWjhXUT1ds8vBzoxn2w4asNpusQKwjJd

---------

Co-authored-by: Dmitry Markin <[email protected]>

(cherry picked from commit c881288)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Blocked ⛔️
Development

Successfully merging a pull request may close this issue.

3 participants