Firestore Vector Store Plugin
Firestore Vector Store
Section titled “Firestore Vector Store”The Firestore plugin provides retriever implementations that use Google Cloud Firestore as a vector store.
Installation
Section titled “Installation”pip3 install genkit-plugin-firebase
Prerequisites
Section titled “Prerequisites”- A Firebase project with Cloud Firestore enabled.
- The
genkit
package installed. gcloud
CLI for managing credentials and Firestore indexes.
Configuration
Section titled “Configuration”To use this plugin, specify it when you initialize Genkit:
from genkit.ai import Genkitfrom genkit.plugins.firebase.firestore import FirestoreVectorStorefrom genkit.plugins.google_genai import VertexAI # Assuming VertexAI provides the embedderfrom google.cloud import firestore
# Ensure you have authenticated with gcloud and set the projectfirestore_client = firestore.Client()
ai = Genkit( plugins=[ VertexAI(), # Ensure the embedder's plugin is loaded FirestoreVectorStore( name='my_firestore_retriever', collection='my_collection', # Replace with your collection name vector_field='embedding', content_field='text', embedder='vertexai/text-embedding-004', # Example embedder firestore_client=firestore_client, ), ] # Define a default model if needed # model='vertexai/gemini-1.5-flash',)
Configuration Options
Section titled “Configuration Options”- name (str): A unique name for this retriever instance.
- collection (str): The name of the Firestore collection to query.
- vector_field (str): The name of the field in the Firestore documents that contains the vector embedding.
- content_field (str): The name of the field in the Firestore documents that contains the text content.
- embedder (str): The name of the embedding model to use. Must match a configured embedder in your Genkit project.
- firestore_client: A
google.cloud.firestore.Client
object that will be used for all queries to the vectorstore.
-
Create a Firestore Client:
from google.cloud import firestore# Ensure you have authenticated with gcloud and set the projectfirestore_client = firestore.Client() -
Define a Firestore Retriever:
from genkit.ai import Genkitfrom genkit.plugins.firebase.firestore import FirestoreVectorStorefrom genkit.plugins.google_genai import VertexAI # Assuming VertexAI provides the embedderfrom google.cloud import firestore# Assuming firestore_client is already created# firestore_client = firestore.Client()ai = Genkit(plugins=[VertexAI(), # Ensure the embedder's plugin is loadedFirestoreVectorStore(name='my_firestore_retriever',collection='my_collection', # Replace with your collection namevector_field='embedding',content_field='text',embedder='vertexai/text-embedding-004', # Example embedderfirestore_client=firestore_client,),]# Define a default model if needed# model='vertexai/gemini-1.5-flash',) -
Retrieve Documents:
from genkit.ai import Document # Import Document# Assuming 'ai' is configured as aboveasync def retrieve_documents():# Note: ai.retrieve expects a Document object for the queryquery_doc = Document.from_text("What are the main topics?")return await ai.retrieve(query=query_doc,retriever='my_firestore_retriever', # Matches the 'name' in FirestoreVectorStore config)# Example of calling the async function# import asyncio# retrieved_docs = asyncio.run(retrieve_documents())# print(retrieved_docs)
Populating the Index
Section titled “Populating the Index”Before you can retrieve documents, you need to populate your Firestore collection with data and their corresponding vector embeddings. Here’s how you can do it:
-
Prepare your Data: Organize your data into documents. Each document should have at least two fields: a
text
field containing the content you want to retrieve, and anembedding
field that holds the vector embedding of the content. You can add any other metadata as well. -
Generate Embeddings: Use the same embedding model configured in your
FirestoreVectorStore
to generate vector embeddings for your text content. Theai.embed()
method can be used. -
Upload Documents to Firestore: Use the Firestore client to upload the documents with their embeddings to the specified collection.
Here’s an example of how to index data:
from genkit.ai import Document, Genkit # Import Genkit and Documentfrom genkit.types import TextPartfrom google.cloud import firestore # Import firestore
# Assuming 'ai' is configured with VertexAI and FirestoreVectorStore plugins# Assuming 'firestore_client' is an initialized firestore.Client() instance
async def index_documents(documents: list[str], collection_name: str): """Indexes the documents in Firestore.""" genkit_documents = [Document(content=[TextPart(text=doc)]) for doc in documents] # Ensure the embedder name matches the one configured in Genkit embed_response = await ai.embed(embedder='vertexai/text-embedding-004', content=genkit_documents) # Use 'content' parameter embeddings = [emb.embedding for emb in embed_response.embeddings]
for i, document_text in enumerate(documents): doc_id = f'doc-{i + 1}' embedding = embeddings[i]
doc_ref = firestore_client.collection(collection_name).document(doc_id) result = doc_ref.set({ 'text': document_text, 'embedding': embedding, # Ensure this field name matches 'vector_field' in config 'metadata': f'metadata for doc {i + 1}', }) print(f"Indexed document {doc_id}") # Optional: print progress
# Example Usage# documents = [# "This is document one.",# "This is document two.",# "This is document three.",# ]# import asyncio# asyncio.run(index_documents(documents, 'my_collection')) # Replace 'my_collection' with your actual collection name
Creating a Firestore Index
Section titled “Creating a Firestore Index”To enable vector similarity search you will need to configure the index in your Firestore database. Use the following command:
gcloud firestore indexes composite create \ --project=<YOUR_FIREBASE_PROJECT_ID> \ --collection-group=<YOUR_COLLECTION_NAME> \ --query-scope=COLLECTION \ --field-config=vector-config='{"dimension":<YOUR_DIMENSION_COUNT>,"flat": {}}',field-path=<YOUR_VECTOR_FIELD>
- Replace
<YOUR_FIREBASE_PROJECT_ID>
with the ID of your Firebase project. - Replace
<YOUR_COLLECTION_NAME>
with the name of your Firestore collection (e.g.,my_collection
). - Replace
<YOUR_DIMENSION_COUNT>
with the correct dimension for your embedding model. Common values are:768
fortext-embedding-004
(Vertex AI)
- Replace
<YOUR_VECTOR_FIELD>
with the name of the field containing vector embeddings (e.g.,embedding
).