Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve content-fetch #4221

Merged
merged 10 commits into from
Jul 25, 2024
Merged

improve content-fetch #4221

merged 10 commits into from
Jul 25, 2024

Conversation

sywhb
Copy link
Contributor

@sywhb sywhb commented Jul 24, 2024

Some improvement on content-fetch:

  1. remove scrapingbee
  2. fix the sql to count saved items in database within a minute using created_at instead of saved_at as saved_at can be manually input
  3. add an hour rate limiter to the API router which only allows 600 authenticated requests and 150 non-authenticated requests per hour on top of the minute rate limiter
  4. hardcode some domains to block while fetching content and blocks a domain for a day if it was failed to fetch for 10 times in a day

@sywhb sywhb requested review from satindar and jacksonh as code owners July 24, 2024 04:17
Copy link

vercel bot commented Jul 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
omnivore-demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 25, 2024 5:13am
omnivore-prod ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 25, 2024 5:13am

Copy link

Squawk Report

✅ 0 violations across 1 file(s)


packages/db/migrations/0184.do.add_user_id_created_at_index_to_library_item.sql

-- Type: DO
-- Name: add_user_id_created_at_index_to_library_item
-- Description: Add an index of columns user_id and created_at to the library_item table for counting query

CREATE INDEX CONCURRENTLY IF NOT EXISTS library_item_user_id_created_at_idx ON omnivore.library_item (user_id, created_at);

✅ Rule Violations (0)

No violations found.


📚 More info on rules

⚡️ Powered by Squawk (1.1.2), a linter for PostgreSQL, focused on migrations

@sywhb
Copy link
Contributor Author

sywhb commented Jul 25, 2024

Squawk Report

✅ 0 violations across 1 file(s)

packages/db/migrations/0184.do.add_user_id_created_at_index_to_library_item.sql

-- Type: DO
-- Name: add_user_id_created_at_index_to_library_item
-- Description: Add an index of columns user_id and created_at to the library_item table for counting query

CREATE INDEX CONCURRENTLY IF NOT EXISTS library_item_user_id_created_at_idx ON omnivore.library_item (user_id, created_at);

✅ Rule Violations (0)

No violations found.

📚 More info on rules

⚡️ Powered by Squawk (1.1.2), a linter for PostgreSQL, focused on migrations

@jacksonh I created an index on columns user_id and created_at on table library_item for the counting query

@@ -25,7 +25,7 @@ export const libraryItemRepository = appDataSource
.andWhere('md5(original_url) = md5(:url)', { url })

if (forUpdate) {
qb.setLock('pessimistic_write')
qb.setLock('pessimistic_read')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced pessimistic_write with pessimistic_read lock strategy so it will prevent other transactions from modifying the data while it is being read but allows other transactions to read the data concurrently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jacksonh jacksonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a pretty big performance improvement too i think as the COUNT calls were adding up.

@jacksonh
Copy link
Contributor

LGTM

@sywhb sywhb merged commit bdae03e into main Jul 25, 2024
6 checks passed
@sywhb sywhb deleted the fix/content-fetch-error branch July 25, 2024 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants