Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement --recent flag on import_votes_cast command #818

Open
symroe opened this issue May 27, 2021 · 8 comments
Open

Implement --recent flag on import_votes_cast command #818

symroe opened this issue May 27, 2021 · 8 comments
Assignees

Comments

@symroe
Copy link
Member

symroe commented May 27, 2021

At the moment import_votes_cast takes --since with a date. This doesn't work well, as the response since a given date gets longer over time, when the data hasn't changed.

This ends up taking a long time to import data we have already.

Rather, we should implement a --recent flag that imports and results added or updated in YNR since the last result entry we have in WCIVF.

This should copy the way the person import's --recent flag works.

It might also require more filters in YNR - see DemocracyClub/yournextrepresentative#1078

@michaeljcollinsuk michaeljcollinsuk self-assigned this Jun 14, 2021
@michaeljcollinsuk
Copy link
Contributor

michaeljcollinsuk commented Jun 14, 2021

@symroe was going to start writing up that issue as discussed on stand up but then saw this - so it is not the import_people command that needs work, more all the others? So this, import_ballots and import_parties

As I see it the changes required are the following:

  • First update the relevant API endpoints in YNR to allow filtering by timestamp
  • Then update WCIVF models to store timestamps
  • Update the WCIVF management commands and import helpers to allow --recent flag option
  • Finally when all is tested and working, update who_deploy crontab files

@symroe
Copy link
Member Author

symroe commented Jun 14, 2021

Yep, that sounds like a reasonable set of actions!

I think before all of that we need to audit the YNR models to see if they are storing the timestamp. e.g, Ballot isn't at the moment.

@michaeljcollinsuk
Copy link
Contributor

Good point. This has also made me think a bit more about how the updated_at timestamp actually works in practice - for example, if a candidacy is added to a ballot, would that update the ballot timestamp? I'm not sure it well, but we probably need it to for the ballot importer to work effectively. So I'll look in to this along with auditing YNR models as the first step.

@michaeljcollinsuk
Copy link
Contributor

@symroe something else I have noticed, some models such as Membership, Person and Post (and others in popolo/models.py file) use a Timestampable model whereas others use the TimeStampedModel from django-extensions. I'm guessing the former is a hangover from the when you were using popolo? So I wonder if its worth migrating over so that everything uses TimeStampableModel to be consistent? Or even just rename the fieldnames on Timestampable so that they match (created and modified)

@symroe
Copy link
Member Author

symroe commented Jun 16, 2021

Good catch!

I'm not sure I mind what one of the two models we keep, but I agree that we should move to only using one!

If it's helpful to think about:

  1. I think django-extensions is more useful that just that model, and we generally have it installed on all our Django projects.
  2. The DIY version actually requires django-model-utils, so it's still a 3rd party package.
  3. If we keep the DIY version, it's the sort of thing we might want to add to our own django utils package, meaning we might end up just recreating django-extensions?

That's not really an argument either way I guess, but it might help when thinking about it!

@michaeljcollinsuk
Copy link
Contributor

I think it's worth changing to use the TimeStampable model from django-extensions - as that is also already being used in WCIVF on some models too.

Another thing that's come up when trying to add it to a new class with existing objects, is that the created field needs a default to use when you run migrate. So I think there are two options here:

  1. Don't worry about the value, and just set it to timezone.now. This will mean that we have ballots with a created date of after their election date, but I'm not sure if we will practically ever use the created date? I'd have thought the election date is what we care about for general queries, and for this work it will be the modified value we use.
  2. I fake it to something a little more realistic, by writing a data migration to add a value such as the election date the object is related to?

What do you think?

@symroe
Copy link
Member Author

symroe commented Jun 16, 2021

I think you're right that we're not going to care about created very much, but at the same time it might be confusing in future to have a load of identical values at a time when they clearly weren't made. Grabbing the date from the ballot ID and removing a couple of months might be a better way of doing it?

Of course the modified date will be whenever the migration was run, meaning that our first --recent import will include everything. I think that's ok as a one off!

@michaeljcollinsuk
Copy link
Contributor

michaeljcollinsuk commented Sep 13, 2021

@symroe Had a thought about the issue of keeping the modified value updated on a parent model when the child relation gets updated (e.g. Membership gets updated on a Person).

An alternative is to handle it through our filtering - so that when an API request is made to filter on last_updated we use a filter method that looks at the timestamps for all the related models modified value. This would be similar to what currently happens on the PersonViewSet.

That would then save us from the issue of multiple calls to save that might hinder performance elsewhere in the site? But it would tie the logic specifically to filtering through the API, so couldn't then be used elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants