Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #31

Merged
merged 1 commit into from
Dec 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
## Slide Insight

This repository contains a collection of notebooks for gaining insights into presentation slides through multimodal AI models. The goal is to [compare different models](Test_Models.ipynb) and how they perform on summarizing the content of presentation slides. This is not implemented through text-to-text models but rather through image-to-text (multimodal) models. Another goal is to establish a workflow that is capable of [grouping Slides](Text_Embedding.ipynb) from multiple presentations together based on their representation as a word embedding. This helps to gather all available information concerning one specific topic.
This repository contains a collection of notebooks for gaining insights into presentation slides through multimodal AI models. Some goals are:

To access the models, a [free Service from Github](https://github.com/marketplace/models) is used.
- [Comparison of different models](Test_Models.ipynb) and their performance on summarizing the content of presentation slides. This is not implemented through text-to-text models but rather through image-to-text (multimodal) models. As a first test pdf I3D:bio's Training Material ['WhatIsOMERO.pdf'](https://doi.org/10.5281/zenodo.8323588) (Schmidt, C., Bortolomeazzi, M. et al., 2023) is used.
- Establishing a workflow that is capable of [grouping Slides](Text_Embedding.ipynb) from multiple presentations together based on their representation as a word embedding. This helps to gather all available information concerning one specific topic from different presentations. Testing Data is again I3D:bio's Training Material and also Presentation Slides from the [Bio-image Data Science Lectures](https://zenodo.org/records/12623730) from Robert Haase (licensed under CC-BY 4.0).
- Improve our understanding about [how different types of embeddings represent the same content](Compare_Embeddings.ipynb). For this task, some presentation slides are adapted to see whether text, visual or mixed-modal embeddings perform comparably well in representating a slides features, when the slide is changed in a specific manner. For this, slides are adapted from the Bio-image Data Science Lectures.

To access the AI models used in this repository, this [free Service from Github](https://github.com/marketplace/models) is used.

***Be aware that there are certain [rate limits](https://docs.github.com/en/github-models/prototyping-with-ai-models#rate-limits) for each model!***

As a first test pdf I3D:bio's Training Material ['WhatIsOMERO.pdf'](https://doi.org/10.5281/zenodo.8323588) (Schmidt, C., Bortolomeazzi, M. et al., 2023) is used.

- ### First:

### Before getting started:
Make sure to generate a developer key / personal access token on Github and set it as an environment variable. You can generate the token via the [Github website](github.com) under user settings and afterwards set it like this for your current session:


Expand All @@ -21,6 +25,3 @@ Make sure to generate a developer key / personal access token on Github and set
##### Windows command prompt:
```set GITHUB_TOKEN= your-github-token-goes-here```


- ### Second:
Set up the Model. How this is done is shown in the [first Notebook](Test_Models.ipynb)
Loading