-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tag subscription counts wrong on stats page #7908
Comments
Why 219 subscribers? |
Hi @dongskyler! Sorry, i think we must've given you a mistake in our suggested code change! Thanks for looking into this! We SUPER appreciate your help, and please don't feel this was your fault in any way! 😄 Hmm. I wonder -- it's a pretty odd bug. The change was to add the
Is it possible we're joining incorrectly? What if we joined against something which has only 219 records... or something... ? |
So, it's not that. Could it be that... the returned records are not of the same type? So the logic in this block is not working as we had expected? plots2/app/controllers/stats_controller.rb Lines 2 to 9 in 0e58e62
Let me test what |
Hmm. So the filtering works well, but, why is the logic not working to actually show counts? Let me try going through the logic step by step in the rails console. |
OK, so what I think is happening is that the relation is defined here: plots2/app/models/tag_selection.rb Line 4 in 0e58e62
But, both tables have a |
Aha! I found that it's possible to use TagSelection.where(following: true).joins("LEFT JOIN community_tags ON community_tags.tid = tag_selections.tid").joins("JOIN term_data ON term_data.tid = tag_selections.tid").group("term_data.name").count Replaces the whole iterative loop, and is probably MUCH more efficient! No iterating over every tag... woohoo! I'm gonna replace this whole segment. |
Done in #7930! Let's see how that does. |
Merged it! Let's test at https://stable.publiclab.org/stats/subscriptions as soon as it's done building. |
OK, it kind of worked. It's no longer showing 219s... but the new query may suffer from other issues. Re-opening! |
I think i shouldn't have done |
That's right, i think both should be tags2 = TagSelection.where(following: true).joins("INNER JOIN community_tags ON community_tags.tid = tag_selections.tid").joins("INNER JOIN term_data ON term_data.tid = tag_selections.tid").group("term_data.name").count['water-quality']
=> 34087 plots2/app/controllers/stats_controller.rb Lines 3 to 7 in c0de1f0
Here is the query being generated: SELECT `tag_selections`.* FROM `tag_selections` INNER JOIN term_data ON term_data.tid = tag_selections.tid INNER JOIN community_tags ON community_tags.tid = tag_selections.tid WHERE `tag_selections`.`tid` = 3838 AND `tag_selections`.`following` = TRUE LIMIT 11 |
Ah, ok - so it's cross-maching and just returning a lot of duplicates. I can add a |
OK! So this works and returns proper counts: @tags = TagSelection.where(following: true)
.joins("LEFT JOIN community_tags ON community_tags.tid = tag_selections.tid")
.joins(:tag)
.group("term_data.name")
.count However, it does not filter out spam. 😱 |
We could TRY something like this: TagSelection.where(following: true)
.joins("LEFT JOIN community_tags ON community_tags.tid = tag_selections.tid")
.joins(:tag)
.joins("INNER JOIN node ON node.nid = community_tags.nid")
.where("node.status = 1")
.group("term_data.name")
.count However, it's a pretty slow query: maybe 3 seconds? We might be able to select fewer things to speed it up. But I dunno. We could cache it every 24 hours? |
Caching would look like: Rails.cache.fetch("stats/subscriptions/query", expires_in: 24.hours) do
TagSelection.where(following: true)
.joins("LEFT JOIN community_tags ON community_tags.tid = tag_selections.tid")
.joins(:tag)
.joins("INNER JOIN node ON node.nid = community_tags.nid")
.where("node.status = 1")
.group("term_data.name")
.count
end |
That sounds like a rather inefficient query. How big is the MySQL database? Do you have the database schema somewhere? Otherwise, my input might be limited. |
I think adding server-side caching is a great idea. But it's also a good idea to improve the efficiency of SQL queries |
Thank you very much for your kind words. I'm just trying to contribute here. |
Sure, some docs here:
I opened a PR with the cached version. We have about 5000 |
So, the PR passed, and I'm tempted to merge this for now, but if you LOVE optimizing SQL, we can leave this issue open and see if there's another way to do it more efficiently! I'm not such a Rails SQL wizard, so I guess it doesn't seem like a very friendly place to start contributing 😅 but for sure some folks are great at databases so don't let me scare you off of it if you are! If you're looking for something else to get into, there are a bunch of other issues I might suggest! |
I'm not sure why the caching caused trouble. But we could cache in the view template instead, which is easy too: https://guides.rubyonrails.org/caching_with_rails.html We would just wrap these lines:
|
Just checking in that this came up in our weekly staff call today. We're getting ready to onboard a new staff member and this view will be very relevant to them. Bumping this up in priority! Important counts we're looking at right now:
|
Do we have breakout issues for
or
and
or are these included in this issue? |
#7941 resolved the ordering issue. But i think we have the remaining issues, looking at https://stable.publiclab.org/stats/subscriptions:
We'll spend some time digging in on this today! |
I think it's possible we need to create better test data for this so we can work the problem locally. |
I believe I've fixed the caching syntax and pushed a new PR. Let's fingers 🤞 |
Merged, just need to confirm on https://stable.publiclab.org/stats/subscriptions now! |
OK, so, it's an improvement, but the counts are still inflated. For example, although https://stable.publiclab.org/tag/balloon-mapping is indeed a top tag, it has 80 followers, not 56700. I believe the number shown is not being grouped properly, and is repeating counts due to improper table joins and groupings. Next step is to debug on the console on the live server. |
OK, plan is to:
|
OK so I have another attempt, manually tested on the live database, and I think this is correct: plots2/app/controllers/stats_controller.rb Lines 3 to 9 in 9b469ee
Note "DISTINCT" - it returns 109 for And it mostly works! The stats look much better: https://stable.publiclab.org/stats/subscriptions But, Aha, i wonder... we aren't filtering out banned people. Let's look: Lines 182 to 186 in 9b469ee
|
OK, added that too and re-added cache code, because that slowed it down a lot: plots2/app/controllers/stats_controller.rb Lines 3 to 13 in 8b0b3d5
|
I think we're done here! Fully working on stable! Will publish soon. |
This page is looking great! https://publiclab.org/stats/subscriptions |
Strangely this has re-broken, so re-opening: 8b0b3d5 was the last fix, a day before we closed this and had confirmed it on stable. It's no longer working on stable either: https://stable.publiclab.org/stats/subscriptions |
On stable: counts = TagSelection.select("DISTINCT tag_selections.tid, tag_selections.user_id").where(following: true).joins("INNER JOIN community_tags ON community_tags.tid = tag_selections.tid").joins("INNER JOIN term_data ON term_data.tid = community_tags.tid").group("term_data.name").joins("INNER JOIN rusers ON rusers.id = tag_selections.user_id").where("rusers.status = 1 OR rusers.status = 4").count["balloon-mapping"]
=> 88 which seems correct...????? vs. |
Just came across this myself today on https://publiclab.org/stats/subscriptions. Compare the difference between what's listed at that URL vs on the Tag's own page: |
I recall someone suggesting we need much better test data to reproduce this and I wanted to suggest we develop some code snippets that can be used to generate a ton of users, tags, and subscriptions, to try to simulate the failure. |
Part of #9827 |
We just changed the query for this page and it looks like something went wrong, and perhaps the grouping is incorrect? There are so many tags with 219 followers and almost none w 1 follower.
https://publiclab.org/stats/subscriptions
Let's check the last change to that file and figure out what went wrong.
The text was updated successfully, but these errors were encountered: