-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching #13
Comments
Also it would be great if the embedding vectors of slides could be cached. In this way, we only run the (costly) embedding once. |
do you have any other ideas on that? |
This message was generated by git-bob (version: 0.12.0, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully. It sounds like we have two main caching ideas to explore: local PDF storage and caching embedding vectors. For local PDF caching, we could save selected PDFs to a dedicated cache directory. When PDFs are needed, we'd check this directory first before downloading new data. For caching embedding vectors, we could implement a mechanism to store these in memory or on disk after computation. This way, we can reuse them without recomputation. We might use a dictionary with unique identifiers (like slide IDs) as keys and the embedding vectors as values. Here's a rough proposal for embedding vector caching: embedding_cache = {}
def get_embedding(slide_id, compute_embedding):
if slide_id in embedding_cache:
return embedding_cache[slide_id]
else:
embedding = compute_embedding(slide_id)
embedding_cache[slide_id] = embedding
return embedding This function takes a |
Brainstorm about Caching:
The text was updated successfully, but these errors were encountered: