Slow queries in GraphQL after upgrading to v3.4.1 #11291

glesys-andreas · 2022-12-22T10:53:28Z

NetBox version

v3.4.1

Python version

3.9

Steps to Reproduce

import/create >= 800 vlans
[{"vid": "1", "name": "vlan_1", "status": "active"} ..... {"vid": "800", "name": "vlan_800", "status": "active"}]

run query:

query vlan {
	vlan_list {
		prefixes {
			prefix
			description
			tenant {
				name
			}
			vlan {
				group {
					name
				}
				vid
			}
		}
	}
}

Expected Behavior

Response in less than 10 seconds.

Observed Behavior

In version 3.3 this query took less than 10s, after upgrading to 3.4 this takes between 30-40 seconds.
This is reproduced at the demo site: https://demo.netbox.dev/ipam/vlans/

The text was updated successfully, but these errors were encountered:

arthanson · 2022-12-28T16:30:30Z

This is more of a feature request than a bug, The issue is we aren't doing any prefetch_related or select_related on the GraphQL queries. I replicated the above and 2046 queries are done. I'm not certain why it got worse from 3.3 to 3.4 but the root of the issue is the DB calls.

The bigger issue with GraphQL is we don't know what will be included in the query, so you can either optimize for most common queries or dynamically optimize queries. There is a package https://github.com/tfoxy/graphene-django-optimizer to dynamically optimize queries, unfortunately it doesn't work with the latest Graphene.

emil-nasso · 2022-12-31T10:26:16Z

I think this is a standard problem in graphql APIs that is usually solved using batching/data loaders. Basically, it's an n+1 problem/lazy loading problem. There is one query to the database to fetch all prefixes and then when it is time to load the VLAN for a prefix, there is no batching implemented, so it performs lazy-loading/one query to the database per prefix.

The solution is to batch it. When trying to load VLANs, instead of directly loading them, their keys are "cached" and finally, a list of all prefixes that we need to load VLANs for is sent to the batch method and we can load them all at once, using one query to the database.

I have implemented this in several graphql APIs but I'm not "fluent in python or graphene". :)

It seems like graphene has a solution for it though: https://docs.graphene-python.org/en/latest/execution/dataloader

It seems like it could possibly be a solution for this issue to use that.

neuro42 · 2023-01-23T22:15:08Z

After upgrading to 3.4.3 from 3.3.6, we've also noticed our graphql query jump from ~1s to ~7s. Looking at the CPU usage with py-spy, it doesn't seem like the DB queries are being slow, but the django bits around them that are:

2.74s   _fetch_all (django/db/models/query.py:1867)
1.36s   _deepcopy_dict (copy.py:230)
1.38s   deepcopy (copy.py:146)
2.33s   filter_queryset (django_filters/filterset.py:213)
3.25s   __iter__ (django/db/models/query.py)

I haven't had a chance yet to rerun the query against 3.3.6 to see what it looked like there.

v0tti · 2023-02-16T16:17:33Z

This issue is causing some trouble for us because most of our queries are taking so long we run into timeouts.

Example query we are using:

{
      device_list(has_primary_ip: true, status__n: "planned") {
        comments
        custom_fields
        device_role {
          name
        }
        id
        name
        platform {
          slug
        }
        primary_ip4 {
          address
          dns_name
        }
        primary_ip6 {
          address
          dns_name
        }
        site {
          slug
        }
        status
        interfaces {
          bridge {
            name
          }
          ip_addresses {
            address
          }
          lag {
            name
          }
          name
          mac_address
          mgmt_only
          parent {
            name
          }
          tagged_vlans {
            vid
          }
          type
          untagged_vlan {
            vid
          }
        }
      }
}

v0tti · 2023-02-21T14:29:04Z

I have identified commit 99cf1b1 as responsible for the massive slowdown of the GraphQL queries.
For our worst query it slows down the execution time from about 20 seconds to over 2 minutes.

v0tti · 2023-02-22T12:17:24Z

My fork reverts the commit on top of the current NetBox release. GraphQL is now as fast as prior to v3.4. As to be expected, filtering does not work on all levels, but we do not use that at the moment.

florianschendel · 2023-02-28T16:08:21Z

Hi guys,

I have a similar issue. Since the upgrade to 3.4.x the graphql query ends with a timeout

{
  prefix_list {
    description
    id
    last_updated
    site {
      name
    }
    vrf {
      name
    }
    tenant {
      name
    }
    role {
      name
    }
    tags {
      name
    }
    custom_fields
    prefix
    vlan {
      vid
      id
      name
      description
      status
      last_updated
    }
  }
}

Less complex queries still work without any issues.

Best regards
Florian

arthanson · 2023-02-28T16:33:47Z

Am going to take another look at this later this week, see if there are some tweaks that can be made. Thanks for the sample queries, will test with those.

florianschendel · 2023-02-28T16:36:19Z

Am going to take another look at this later this week, see if there are some tweaks that can be made. Thanks for the sample queries, will test with those.

thanks a lot!

florianschendel · 2023-02-28T17:09:32Z

I split the one script into three scripts as workaround

I created one for the region "emea", "apac" and "americas". We created super networks/summary networks with the VLAN ID 4094.

Actually we don´t need it in our graphql report, we need only VLAN/Prefixes that are really configured on our switches, routers ....

Maybe its also a workaround for some user people.

{
  prefix_list(vlan_vid__lt: 4094, region: "emea") {
    description
    id
    last_updated
    site {
      name
    }
    vrf {
      name
    }
    tenant {
      name
    }
    role {
      name
    }
    tags {
      name
    }
    custom_fields
    prefix
    vlan {
      vid
      id
      name
      description
      status
      last_updated
    }
  }
}

arthanson · 2023-03-09T17:39:24Z

Traced down the issue: https://github.com/netbox-community/netbox/blob/develop/netbox/netbox/filtersets.py#L249 This code is causing custom-fields to be queried even if they aren't in GraphQL request. You don't see this in REST API as all fields are returned anyways and there are prefetch_related calls in-place, but in GraphQL only a specific subset of fields are returned and there is currently no query optimization. This basically causes the query count to double from previous versions as filters are applied correctly now.

I made a bunch of VLANs with the query above, here are the query counts:
old code (before filtering at all levels): 985 queries
current code (with filtering at all levels): 1852 queries
with query optimizer: 84 queries

This will need to be fixed separately from a query optimizer, either the code referenced above needs to be changed so the custom fields aren't added to the query (preferred), or the query optimizer would need to detect the filter is a subclass of NetBoxModelFilterSet and auto add custom field prefetches.

v0tti · 2023-03-10T15:08:02Z

@arthanson Thanks for your work. I just tested our use case with your branch and it is (at least) up to speed with previous versions.

arthanson · 2023-03-10T16:37:48Z

Added issue #11949 to track the custom-fields being auto added to queries.

jeremystretch · 2023-03-23T12:52:00Z

PR #11943 has been merged into feature and will be included in the upcoming v3.5 release.

glesys-andreas added the type: bug A confirmed report of unexpected behavior in the application label Dec 22, 2022

arthanson self-assigned this Dec 27, 2022

arthanson assigned arthanson and unassigned arthanson Jan 3, 2023

arthanson added this to the v3.5 milestone Jan 6, 2023

jeremystretch removed this from the v3.5 milestone Feb 16, 2023

arthanson mentioned this issue Mar 9, 2023

11291 optimize GraphQL queries #11943

Merged

arthanson mentioned this issue Mar 10, 2023

NetBoxModelFilterSet causes GraphQL queries on custom-fields even if not referenced #11949

Open

jeremystretch added this to the v3.5 milestone Mar 22, 2023

jeremystretch added status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application and removed type: bug A confirmed report of unexpected behavior in the application labels Mar 22, 2023

jeremystretch closed this as completed Mar 23, 2023

jeremystretch mentioned this issue Apr 27, 2023

Release v3.5.0 #12371

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow queries in GraphQL after upgrading to v3.4.1 #11291

Slow queries in GraphQL after upgrading to v3.4.1 #11291

glesys-andreas commented Dec 22, 2022

arthanson commented Dec 28, 2022

emil-nasso commented Dec 31, 2022 •

edited

Loading

neuro42 commented Jan 23, 2023 •

edited

Loading

v0tti commented Feb 16, 2023

v0tti commented Feb 21, 2023

v0tti commented Feb 22, 2023

florianschendel commented Feb 28, 2023

arthanson commented Feb 28, 2023

florianschendel commented Feb 28, 2023

florianschendel commented Feb 28, 2023

arthanson commented Mar 9, 2023

v0tti commented Mar 10, 2023

arthanson commented Mar 10, 2023

jeremystretch commented Mar 23, 2023

Slow queries in GraphQL after upgrading to v3.4.1 #11291

Slow queries in GraphQL after upgrading to v3.4.1 #11291

Comments

glesys-andreas commented Dec 22, 2022

NetBox version

Python version

Steps to Reproduce

Expected Behavior

Observed Behavior

arthanson commented Dec 28, 2022

emil-nasso commented Dec 31, 2022 • edited Loading

neuro42 commented Jan 23, 2023 • edited Loading

v0tti commented Feb 16, 2023

v0tti commented Feb 21, 2023

v0tti commented Feb 22, 2023

florianschendel commented Feb 28, 2023

arthanson commented Feb 28, 2023

florianschendel commented Feb 28, 2023

florianschendel commented Feb 28, 2023

arthanson commented Mar 9, 2023

v0tti commented Mar 10, 2023

arthanson commented Mar 10, 2023

jeremystretch commented Mar 23, 2023

emil-nasso commented Dec 31, 2022 •

edited

Loading

neuro42 commented Jan 23, 2023 •

edited

Loading