-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation after upgrading from 8.6.1 to 8.16.1 #118623
Comments
I don't find any details on how nodes are performing worse, but rather higher CPU usage, which is per se not necessarily a problem. Could you expand on the performance you are observing? Which API, what changes are you observing etc. |
@javanna sorry - my bad! After the upgrade node's search latency got worse and never settled back into its original value, while CPU was much higher: Here's how flush rate changed for that node: But even if all the other metrics stayed the same I would consider CPU degradation a problem as nodes doing same amount of work but consuming significantly more CPU effectively means performance is degraded and I would have to scale up the cluster after the upgrade. |
I also added ES and JVM config/settings to the issue description – none of those changed between the upgrades. |
I wonder if this is related to the improvements introduced here #94607 |
Another observation which may or may not be relevant is the change in query cache behavior. Looks like the query cache became more effective judging by hit rate and size, but at the same time rate of evictions increased reducing overall memory footprint. Increased evictions where mentioned here too – and getting the query cache memory overcounting issue fixed was our original intent behind the upgrade. |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
What do your queries look like? Did you identify specific ones that are slowing down more than others? Could you run them with profile enabled so that we look where time is spent? |
I wasn't able to pin-point a specific query shape. I guess I would need to complete the migration and get rid of the nodes running the old version. And then compare performance of the queries on old version vs. new one. |
Upon comparing hot threads API output for 8.6.1 and 8.16.1 nodes I noticed that this stack shows a lot in 8.16.1 and is missing from 8.6.1:
Pointing towards LZ4.decompress being slower in Lucene 9.12.0 due to
UpdateI restarted one 8.16.1 node after disabling Lucene memory segments with the setting above and its CPU went down! I repeated experimented by restarting two 8.16.1 nodes: one w/o any settings changes (control) and another one again with memory segments disabled. Again CPU on the node with setting disabled went down significanly: |
Elasticsearch Version
8.16.1
Installed Plugins
No response
Java Version
bundled
OS Version
6.1.112-124.190.amzn2023.aarch64
Problem Description
After starting an upgrade from 8.6.1 to 8.16.1 we noticed that data nodes running new version are performing worse.
Their CPU was significantly higher than on the data nodes with the old version:
We also noticed that new version nodes have much higher flush rate:
It doesn't matter whether an old node is upgraded to 8.16.1 or a brand new node spun up with 8.16.1 - they all show same symptoms of higher CPU and much higher flush rate degrading performance of the cluster as a whole.
We added more data nodes with the new 8.16.1 version and now have a cluster with 35 "old" nodes and 36 "new" nodes.
New nodes are consuming twice more CPU and have much higher flush rate:
Another notable difference is how new nodes have higher cache size:
The reason why we were upgrading to 8.16.1 is to mitigate this issue
We are running Elasticsearch on EC2 with ephemeral instance store on c7gd.16xlarge. Elasticsearch version aside no changes were made to the cluster infrastructure, JVM/Elasticsearch options, etc.
Steps to Reproduce
Upgrade a 8.6.1 data node to 8.16.1
Logs (if relevant)
No response
None of the below has changed between the upgrade
JVM options
Elasticsearch static node config
Elasticsearch cluster settings
The text was updated successfully, but these errors were encountered: