Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to use local embeddings model #1320

Merged
merged 15 commits into from
Apr 15, 2023
Merged

Add ability to use local embeddings model #1320

merged 15 commits into from
Apr 15, 2023

Conversation

Tymec
Copy link
Contributor

@Tymec Tymec commented Apr 14, 2023

Background

This change adds the ability to use a local embeddings model (sBERT) using the SentenceTransformers library. This change gives the users an alternative to OpenAI's ada embeder which costs money.

Changes

  • Renamed the get_ada_embedding function to get_embedding
  • get_embedding function now uses either ada or sBERT embeder based on memory_embeder config option
  • New config option called memory_embeder that defaults to ada, which can also be configured by setting the MEMORY_EMBEDER environment variable.
  • Added sentence_transformers as a requirement

Documentation

  • In-code comments that explain the logic and functionality of the code.

Test Plan

  • Created a unit test module to test if everything works as intended.

PR Quality Checklist

  • My pull request is atomic and focuses on a single change.
  • I have thoroughly tested my changes with multiple different prompts.
  • I have considered potential risks and mitigations for my changes.
  • I have documented my changes clearly and comprehensively.
  • I have not snuck in any "extra" small tweaks changes

@FlyWieEinAirplane
Copy link

Sounds useful.
Looking at the code, it would be easy to add a third option but it seems to be a bit more complicated to add a third embeddings dimension option. Maybe this could be improved a bit.

.env.template Outdated Show resolved Hide resolved
tests/embeder_test.py Outdated Show resolved Hide resolved
@Tymec
Copy link
Contributor Author

Tymec commented Apr 14, 2023

Thanks for the feedback. I agree that it's not a good approach and only did so as a temporary solution which I overlooked.

To improve it, I created a variable that holds a dictionary with the embeder name as the key and the dimension as the value. This variable is dynamically assigned to the dimension based on the embeder used at runtime. Then, this variable is imported and used in memory backends.

Tymec added 2 commits April 14, 2023 17:58
Changed all occurrences of "embeder" to "embedder".
- Fixed "embeder" typo to "embedder"
- Added newline at the end of test unit
nponeccop
nponeccop previously approved these changes Apr 14, 2023
Copy link
Contributor

@nponeccop nponeccop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a crazy EOF error now too. I'm approving.

@Tymec
Copy link
Contributor Author

Tymec commented Apr 14, 2023

Just noticed and patched some issues with the test unit, as well as resolved issues with the new EMBED_DIM variable in memory.base module.

@nponeccop nponeccop mentioned this pull request Apr 14, 2023
1 task
@nponeccop
Copy link
Contributor

@Tymec There are conflicts now

@nponeccop
Copy link
Contributor

@Tymec There are conflicts now

nponeccop
nponeccop previously approved these changes Apr 15, 2023
richbeales
richbeales previously approved these changes Apr 15, 2023
@richbeales
Copy link
Contributor

Run flake8 autogpt/ tests/ --select E303,W293,W291,W292,E305,E231,E302
autogpt/config/config.py:85:1: W293 blank line contains whitespace
autogpt/memory/base.py:46:1: W293 blank line contains whitespace
autogpt/memory/base.py:48:1: W293 blank line contains whitespace

@Tymec Tymec dismissed stale reviews from richbeales and nponeccop via e0af761 April 15, 2023 15:27
@richbeales richbeales merged commit 17cdeee into Significant-Gravitas:master Apr 15, 2023
@Tymec Tymec mentioned this pull request Apr 19, 2023
5 tasks
sindlinger pushed a commit to Orgsindlinger/Auto-GPT-WebUI that referenced this pull request Sep 25, 2024
Add ability to use local embeddings model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants