Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching #13

Closed
lea-33 opened this issue Nov 28, 2024 · 3 comments
Closed

Caching #13

lea-33 opened this issue Nov 28, 2024 · 3 comments
Labels
help wanted Extra attention is needed

Comments

@lea-33
Copy link
Collaborator

lea-33 commented Nov 28, 2024

Brainstorm about Caching:

  • saving pdfs local and only retrieving images from those I have selected (e.g. via the sliceplot viewer)
@lea-33 lea-33 added the help wanted Extra attention is needed label Nov 28, 2024
@haesleinhuepf
Copy link
Member

Also it would be great if the embedding vectors of slides could be cached. In this way, we only run the (costly) embedding once.

@lea-33
Copy link
Collaborator Author

lea-33 commented Dec 4, 2024

do you have any other ideas on that?
git-bob comment

Copy link

github-actions bot commented Dec 4, 2024

This message was generated by git-bob (version: 0.12.0, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

It sounds like we have two main caching ideas to explore: local PDF storage and caching embedding vectors. For local PDF caching, we could save selected PDFs to a dedicated cache directory. When PDFs are needed, we'd check this directory first before downloading new data.

For caching embedding vectors, we could implement a mechanism to store these in memory or on disk after computation. This way, we can reuse them without recomputation. We might use a dictionary with unique identifiers (like slide IDs) as keys and the embedding vectors as values.

Here's a rough proposal for embedding vector caching:

embedding_cache = {}

def get_embedding(slide_id, compute_embedding):
    if slide_id in embedding_cache:
        return embedding_cache[slide_id]
    else:
        embedding = compute_embedding(slide_id)
        embedding_cache[slide_id] = embedding
        return embedding

This function takes a slide_id and a compute_embedding function as parameters, checking if the embedding is cached before computing and storing it.

@haesleinhuepf haesleinhuepf mentioned this issue Dec 4, 2024
@lea-33 lea-33 closed this as completed Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants