Revert log level changes #3913

bkchr · 2024-03-30T20:20:30Z

Closes: #3906

alexggh · 2024-04-01T06:20:01Z

Did this cause any problem @bkchr?

If not why wouldn't we want node operator understanding if their validators are properly connected or not

I guess we kind of disagree on this and this #3677 the fact that this information is not relevant for nodes.

Debug output is not to be printed using info logs. If people come to you complaining, you should either tell them which cli flags to change or come up for example with a dedicated RPC endpoint that generates debug information.

Let me try to see if I can change your mind :), at least on my perspective part of operating a system with minimal downtime is the ability to understand what happened in production after it happened, metrics are a part of the story for that, but we also need the default log level to give us enough information to be able to pin-point problems or at least know for sure certain things are going correctly.

A lot of our bug reports include just the Import #<block_number> log, it is really hard to figure it out just from that what happened and for a certain categories of bugs you can't go and tell them to enable DEBUG level because they are not reproducing systematically or we don't know how to reproduce it.

Hence, I has thinking we should actually, go the opposite direction and re-think, a bit the information our INFO level prints, so that we make it easier to understand what is going on.

I'm aware that too much logging might cause other problems, but I don't think that was the issue with this line, because it is outputed every 10 minutes.

bkchr · 2024-04-01T07:03:53Z

I get what you want to achieve. However, printing all authorities that you are not connected are debug information. If you go ahead and just print the connectivity to all authorities in percentage every 10 minutes it is fine.

Having a bad connectivity is almost never a result of what you are doing locally. If not having these issues because you regenerated your node key every restart and then authority discovery not telling the others your new key. However, this is a bug and nothing the operator can change. They could also not change anything for these missing authorities as they are not required to open slots in their firewall for each outgoing connection. Yes they could have configured their server in this way, but then they already failed there.

All in all, printing all this information versus just printing the connectivity every 10 minutes as percentage doesn't really makes any difference.

alexggh · 2024-04-01T07:20:21Z

All in all, printing all this information versus just printing the connectivity every 10 minutes as percentage doesn't really makes any difference.

It doesn't make a difference for that particular node, but I does it for the whole network and anyone trying to find out which nodes are the ones not properly connected since they would show repeatedly in the logs for all the others.

Anyways, this doesn't matter much because we figure it out the issue, the next one will probably be in other parts of the system, that's why I think that maybe we need to re-think our rules with what we output as INFO, currently there isn't much information in there or maybe recommend people running with DEBUG, but that is probably dangerous and costly because it would fill their disk logs.

sandreim · 2024-04-01T07:31:15Z

@bkchr @alexggh you both raise valid points, but I think this is what we actually need: #816

alexggh · 2024-04-01T07:43:03Z

... you both raise valid points, but I think this is what we actually need: #816

Yes, that would be golden :D.

bkchr · 2024-04-01T07:52:37Z

It doesn't make a difference for that particular node, but I does it for the whole network and anyone trying to find out which nodes are the ones not properly connected since they would show repeatedly in the logs for all the others.

And then? You see some ids and don't know who they are. Even if you resolve them to some validator name, it still doesn't solve any problem. You could just write a dedicated tool to query the DHT and then trying to connect to all validators.

Closes: #3906

Closes: paritytech#3906

Revert log level changes

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode

Loading
Loading status checks…

e2e54aa

Closes: #3906

bkchr added the R0-silent label Mar 30, 2024

bkchr requested a review from alexggh March 30, 2024 20:20

bkchr mentioned this pull request Mar 30, 2024

Always print connectivity report #3677

Merged

ggwpez approved these changes Mar 30, 2024

View reviewed changes

liamaharon approved these changes Mar 31, 2024

View reviewed changes

sandreim approved these changes Mar 31, 2024

View reviewed changes

bkchr added this pull request to the merge queue Mar 31, 2024

Merged via the queue into master with commit 256d5ae Mar 31, 2024
137 of 139 checks passed

bkchr deleted the bkchr-revert-logging branch March 31, 2024 21:08

Ank4n pushed a commit that referenced this pull request Apr 9, 2024

Revert log level changes (#3913)

Verified

This commit was signed with the committer’s verified signature.

Ank4n Ankan

GPG key ID: 7FB3A30ED151DC51

Learn about vigilant mode

8df0f0f

Closes: #3906

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert log level changes #3913

Revert log level changes #3913

bkchr commented Mar 30, 2024

alexggh commented Apr 1, 2024

bkchr commented Apr 1, 2024

alexggh commented Apr 1, 2024

sandreim commented Apr 1, 2024

alexggh commented Apr 1, 2024

bkchr commented Apr 1, 2024

Revert log level changes #3913

Revert log level changes #3913

Conversation

bkchr commented Mar 30, 2024

alexggh commented Apr 1, 2024

bkchr commented Apr 1, 2024

alexggh commented Apr 1, 2024

sandreim commented Apr 1, 2024

alexggh commented Apr 1, 2024

bkchr commented Apr 1, 2024