Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store multiple crawls in a single database #105

Merged
merged 7 commits into from
Sep 13, 2024
Merged

Conversation

chosak
Copy link
Member

@chosak chosak commented Sep 12, 2024

This PR significantly alters the way this package uses a database. Instead of storing individual website crawls into separate SQLite database files, all crawls are now stored into the same Django database. This database can be configured to use any database backend supported by Django. Database tables are now managed by Django migrations, and a new Crawl model keeps track of the status of past crawls, including whether they succeeded or failed.

This PR also adds Python tests for 100% of testable Python code (excluding only the plugin to the wpull crawler, which is difficult to test without running a real crawl). This package has been migrated to pytest and pytest-cov for simpler testing and coverage checks. Moving forward, PR checks will fail if Python coverage drops below 100%.

(As a TODO, a future PR will need to add a management command to clean up old crawls, to ensure the database doesn't continue to grow indefinitely).

@chosak chosak requested a review from willbarton September 12, 2024 19:29
@chosak
Copy link
Member Author

chosak commented Sep 12, 2024

@willbarton in 5bf3d93 I added a workaround to handle missing SVG icons in the Python tests, when we haven't run the frontend build. Like cf.gov, the frontend build copies the CFPB Design System SVG icons from node_modules to where Django can see and load them inline during template rendering. We don't want to have to run the frontend install and build steps in order to successfully run the Python tests.

To get around this, I've added a simple template loader that ignores missing SVGs. Cf.gov has a bunch of custom code that handles this a different way (by inserting a placeholder SVG) which I'd prefer not to copy here. Thanks to @anselmbradford for counsel on different options.

Currently this project only has a single settings file that doesn't disambiguate between testing or dev or production. As part of future work I'll split that out so that this new loader isn't running in production, but this seems a reasonable path forward for now.

We don't want to have to run the frontend build to run Python tests.
@chosak chosak force-pushed the feature/multi-crawl branch from 844f245 to 5bf3d93 Compare September 12, 2024 21:14
@chosak chosak merged commit 54d1451 into main Sep 13, 2024
3 checks passed
@chosak chosak deleted the feature/multi-crawl branch September 13, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant