GitHub - VectorDBCloud/Open-Source-Embedding-Cookbook

This repository contains a collection of Python scripts demonstrating how to use open-source embeddings with various vector databases. These cookbooks provide practical examples for data ingestion and similarity search using popular vector databases.

Vector databases are specialized database systems designed to store and query high-dimensional vectors efficiently. They are crucial for machine learning applications, particularly in natural language processing and computer vision.

About Vector Database Cloud

Vector Database Cloud is a platform that provides one-click deployment of popular vector databases including Qdrant, Milvus, ChromaDB, and Pgvector on cloud. Our platform ensures a secure API, a comprehensive customer dashboard, efficient vector search, and real-time monitoring.

Introduction

Vector Database Cloud is designed to seamlessly integrate with your existing data workflows. Whether you're working with structured data, unstructured data, or high-dimensional vectors, you can leverage popular ETL (Extract, Transform, Load) tools to streamline the process of moving data into and out of Vector Database Cloud.

Supported Vector Databases

Prerequisites

Python 3.7+
Access to Vector Database Cloud (VectorDBCloud) with API URL and API key for each database

Installation

Clone this repository:

git clone https://github.com/VectorDBCloud/Open-Source-Embedding-Cookbook.git
cd Open-Source-Embedding-Cookbook

Install the required dependencies:
```
pip install -r requirements.txt
```

Dependencies

The requirements.txt file includes the following main dependencies:

sentence-transformers
psycopg2-binary
pymilvus
chromadb
qdrant-client

Usage

Each cookbook is a standalone Python script demonstrating how to:

Connect to the respective vector database
Use open-source embeddings (Sentence Transformers with 'all-MiniLM-L6-v2' model)
Insert sample data with embeddings
Perform similarity searches

Before running any script, set the appropriate environment variables:

export VECTORDBCLOUD_<DATABASE>_API_URL="https://your-vector-db-cloud-url.com"
export VECTORDBCLOUD_<DATABASE>_API_KEY="your-api-key"

Replace <DATABASE> with the specific database name (e.g., PGVECTOR, MILVUS, CHROMADB, QDRANT).

To run a cookbook:

python <cookbook_name>.py

For example:

python pgvector_cookbook.py

Cookbooks

pgvector_cookbook.py: Demonstrates usage with pgvector
milvus_cookbook.py: Demonstrates usage with Milvus
chromadb_cookbook.py: Demonstrates usage with ChromaDB
qdrant_cookbook.py: Demonstrates usage with Qdrant

Each cookbook includes examples of:

Connecting to the database
Creating a collection/table
Inserting sample data with embeddings
Performing a similarity search

Customization

To adapt these scripts for your own use case:

Replace the sample data with your own dataset.
Adjust the embedding model if needed (currently using 'all-MiniLM-L6-v2').
Modify the schema or collection structure to fit your data requirements.
Customize the similarity search query and parameters as per your needs.

Best Practices

When working with vector databases and embeddings, consider the following best practices:

Choose the right embedding model: Select an embedding model that's appropriate for your data type and use case.
Normalize your vectors: Ensure your vectors are normalized to unit length for consistent similarity calculations.
Use appropriate index types: Choose the right index type for your specific use case to optimize search performance.
Batch operations: When inserting or querying large amounts of data, use batch operations to improve efficiency.
Monitor performance: Regularly monitor and optimize your database performance, especially as your data grows.
Keep your embeddings up to date: Retrain or update your embeddings periodically to reflect changes in your data or improvements in embedding models.
Implement error handling: Robust error handling can help prevent data loss and improve the reliability of your applications.
Secure your API keys: Always keep your Vector Database Cloud API keys secure and never expose them in client-side code.

Related Resources

Contributing

We welcome contributions to improve and expand our Open-Source Embedding Cookbook! Here's how you can contribute:

Fork the repository: Create your own fork of the code.
Create a new branch: Make your changes in a new git branch.
Make your changes: Enhance existing cookbooks or add new ones.
Follow the style guidelines: Ensure your code follows our coding standards.
Write clear commit messages: Your commit messages should clearly describe the changes you've made.
Submit a pull request: Open a new pull request with your changes.
Respond to feedback: Be open to feedback and make necessary adjustments to your pull request.

For more detailed information on contributing, please refer to our Contribution Guidelines.

We also encourage you to:

Report bugs and issues through our Issue Tracker.
Suggest new features or improvements.
Help improve documentation.
Share your experiences and use cases with the community.

Remember, all contributors are expected to adhere to our Code of Conduct. We appreciate your efforts to make this project better for everyone!

Troubleshooting

If you encounter issues:

Ensure all environment variables are correctly set.
Check your internet connection for API access.
Verify that you have the correct permissions for the Vector Database Cloud services.
Make sure all dependencies are correctly installed.

For specific error messages, please refer to the documentation of the respective vector database or create an issue in this repository.

License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially

Under the following terms:

Attribution — You must give appropriate credit to Vector Database Cloud, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests Vector Database Cloud endorses you or your use.

Additionally, we require that any use of this guide includes visible attribution to Vector Database Cloud. This attribution should be in the form of "Open Source Embedding curated by Vector Database Cloud" or "Based on Vector Database Cloud Open Source Embedding", along with a link to https://vectordbcloud.com, in any public-facing applications, documentation, or redistributions of this guide.

No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

For the full license text, visit: https://creativecommons.org/licenses/by/4.0/legalcode

Disclaimer

The information and resources provided in this community repository are for general informational purposes only. While we strive to keep the information up-to-date and correct, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the information, products, services, or related graphics contained in this repository for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

Vector Database Cloud configurations may vary, and it's essential to consult the official documentation before implementing any solutions or suggestions found in this community repository. Always follow best practices for security and performance when working with databases and cloud services.

The content in this repository may change without notice. Users are responsible for ensuring they are using the most current version of any information or code provided.

This disclaimer applies to Vector Database Cloud, its contributors, and any third parties involved in creating, producing, or delivering the content in this repository.

The use of any information or code in this repository may carry inherent risks, including but not limited to data loss, system failures, or security vulnerabilities. Users should thoroughly test and validate any implementations in a safe environment before deploying to production systems.

For complex implementations or critical systems, we strongly recommend seeking advice from qualified professionals or consulting services.

By using this repository, you acknowledge and agree to this disclaimer. If you do not agree with any part of this disclaimer, please do not use the information or resources provided in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

About Vector Database Cloud

Introduction

Supported Vector Databases

Prerequisites

Installation

Dependencies

Usage

Cookbooks

Customization

Best Practices

Related Resources

Contributing

Troubleshooting

License

Disclaimer

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
chromadb_cookbook.py		chromadb_cookbook.py
milvus_cookbook.py		milvus_cookbook.py
pgvector_cookbook.py		pgvector_cookbook.py
qdrant_cookbook.py		qdrant_cookbook.py
requirements.txt		requirements.txt

VectorDBCloud/Open-Source-Embedding-Cookbook

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

About Vector Database Cloud

Introduction

Supported Vector Databases

Prerequisites

Installation

Dependencies

Usage

Cookbooks

Customization

Best Practices

Related Resources

Contributing

Troubleshooting

License

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages