As a gitcoin admin, I'd like the mailchimp syncing to be refactored, so the load on the database is lowered and the site runs faster #4784

danlipert · 2019-07-12T11:53:10Z

User Story

As a gitcoin admin, I'd like the mailchimp syncing to be refactored, so the load on the database is lowered and the site runs faster

Why Is this Needed

Summary:While the sync_mail management command runs, it slows down the site significantly.

Description

Type: Fug

Current Behavior

The mailchimp lists sync bi-directionally every 2 hours

Expected Behavior

The mailchimp lists should sync at a lower rate, and the sync from mailchimp to gitcoin should be done less often since it puts a heavy load on the database.

Definition of Done

The current sync_mail cronjob and management command is split out into two different jobs, with the sync from mailchimp to gitcoin running only once a day, while the sync from gitcoin to mailchimp should run twice a day.

Data Requirements

While the mailchimp to gitcoin sync is running, the database often runs up to 0.25 seconds slower per request, resulting in a major performance hit.

Additional Information

Reported by @owocki and discovered by profiling the running jobs on the cronbox

The text was updated successfully, but these errors were encountered:

owocki · 2019-07-12T14:02:14Z

specifically i think that the pull_to_db() section is slow. it'll do

20k db reads (for profile)
5k dbreads for matches
20k for the mailchimp list
1.5k for tips
5k for bounties
100 for whitepaper access requests

PLUS all of the above x2 for all the calls to get_or_save_email_subscriber()

owocki · 2019-07-12T14:03:46Z

two recommendations for how to make this job more efficient @danlipert : (these are both low lifts)

consolidate the DB reads down to one per object type. a la instead of

    from dashboard.models import Subscription
    for sub in Subscription.objects.all():
        email = sub.email
        process_email(email, 'dashboard_subscription')

do a

    from dashboard.models import Subscription
    for email in Subscription.objects.all().values_list('email', flat=True):
        process_email(email, 'dashboard_subscription')

limit the reads in pull_to_db() to only objects created in the last n hours, where n = the number of hours since the job last ran

danlipert · 2019-07-12T14:10:01Z

Those look good - I think #1 will give a small improvement just grabbing one column instead of the whole row. #2 is good in general for sure - I think it'd also be good to have finer grained scheduling on these by splitting out the different steps as described in the issue. I wonder how we can avoid having a "magic number" for the n hours in both python code and in the crontab config - having the magic number led to problems in the mailchimp list getting desynced previously (was 2 hours in the python script but 6 hours in the crontab)

owocki · 2019-07-12T14:14:57Z

whats a "magic number"?

danlipert · 2019-07-12T14:56:55Z

@owocki https://en.wikipedia.org/wiki/Magic_number_(programming) - in this case there was a value of 2 hours that needed to be set in two different places for the sync to work properly, instead of using a environment variable or something similar that each file would import from.

owocki · 2019-07-15T16:59:13Z

PR is in at #4798

danlipert added 2 labels Jul 12, 2019

danlipert mentioned this issue Jul 12, 2019

sync mail twice a day #4785

Merged

owocki mentioned this issue Jul 15, 2019

As a gitcoin admin, I'd like the mailchimp syncing to be refactored, so the load on the database is lowered and the site runs faster #4798

Merged

thelostone-mc closed this as completed Jun 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As a gitcoin admin, I'd like the mailchimp syncing to be refactored, so the load on the database is lowered and the site runs faster #4784

As a gitcoin admin, I'd like the mailchimp syncing to be refactored, so the load on the database is lowered and the site runs faster #4784

danlipert commented Jul 12, 2019

owocki commented Jul 12, 2019

owocki commented Jul 12, 2019 •

edited

Loading

danlipert commented Jul 12, 2019

owocki commented Jul 12, 2019

danlipert commented Jul 12, 2019

owocki commented Jul 15, 2019

As a gitcoin admin, I'd like the mailchimp syncing to be refactored, so the load on the database is lowered and the site runs faster #4784

As a gitcoin admin, I'd like the mailchimp syncing to be refactored, so the load on the database is lowered and the site runs faster #4784

Comments

danlipert commented Jul 12, 2019

User Story

Why Is this Needed

Description

Current Behavior

Expected Behavior

Definition of Done

Data Requirements

Additional Information

owocki commented Jul 12, 2019

owocki commented Jul 12, 2019 • edited Loading

danlipert commented Jul 12, 2019

owocki commented Jul 12, 2019

danlipert commented Jul 12, 2019

owocki commented Jul 15, 2019

owocki commented Jul 12, 2019 •

edited

Loading