Mixedbread Cloud: A Unified API for RAG Pipelines

Explore this unified API for file uploading, document parsing, embedding models, vector store, and a retrieval pipeline.

By Abid Ali Awan, KDnuggets Assistant Editor on June 4, 2025 in Language Models

Mixedbread Cloud: A Unified API for RAG Pipelines

Image by Editor (Kanwal Mehreen) | Canva

During a talk with some machine learning engineers, I asked why we need to combine LangChain with multiple APIs and services to set up a retrieval augmented generation (RAG) pipeline. Why can't we have one API that handles everything — like document loading, parsing, embedding, reranking models, and vector storage — all in one place?

It turns out, there is a solution called Mixedbread. This platform is fast, user-friendly, and provides tools for building and serving retrieval pipelines. In this tutorial, we will explore Mixedbread Cloud and learn how to build a fully functional RAG pipeline using Mixedbread’s API and OpenAI’s latest model.

Introducing Mixedbread Cloud

The Mixedbread cloud is all in one solution for building a proper AI application with advanced text understanding capabilities. Designed to simplify the development process, it provides a comprehensive suite of tools to handle everything from document management to intelligent search and retrieval.

Mixedbread cloud provides:

Document Uploading: Upload any type of documents using the user-friendly interface or API
Document Processing: Extract structured information from various document formats, transforming unstructured data into text
Vector Stores: Store and retrieve embeddings with searchable collections of files
Text Embeddings: Convert text into high-quality vector representations that capture semantic meaning
Reranking: Enhance search quality by reordering results based on their relevance to the original query

Building the RAG Application with Mixedbread and OpenAI

In this project, we will learn how to build a RAG application using Mixedbread and the OpenAI API. This step-by-step guide will walk you through setting up the environment, uploading documents, creating a vector store, monitoring file processing, and building a fully functional RAG pipeline.

1. Setting Up

Visit the Mixedbread website and create an account. Once signed up, generate your API key. Similarly, ensure you have an OpenAI API key ready.
Then, save your API keys as environment variables for secure access in your code.
Ensure you have the necessary Python libraries installed:

pip install mixedbread openai

Initialize the the mixed bread client and open ai client using the API keys. Also, set the pat or the PDF folder, name the vector store, and sett the LLM name.

import os
import time
from mixedbread import Mixedbread
from openai import OpenAI

# --- Configuration ---
# 1. Get your Mixedbread API Key
mxbai_api_key = os.getenv("MXBAI_API_KEY")

# 2. Get your OpenAI API Key
openai_api_key = os.getenv("OPENAI_API_KEY")

# 3. Define the path to the FOLDER containing your PDF files
pdf_folder_path = "/work/docs"

# 4. Vector Store Configuration
vector_store_name = "Abid Articles"

# 5. OpenAI Model Configuration
openai_model = "gpt-4.1-nano-2025-04-14"

# --- Initialize Clients ---
mxbai = Mixedbread(api_key=mxbai_api_key)
openai_client = OpenAI(api_key=openai_api_key)

2. Uploading the files

We will locate all the PDF files in the specified folder and then upload them to the Mixedbread cloud using the API.

import glob

pdf_files_to_upload = glob.glob(os.path.join(pdf_folder_path, "*.pdf")) # Find all .pdf files

print(f"Found {len(pdf_files_to_upload)} PDF files to upload:")
for pdf_path in pdf_files_to_upload:
    print(f"  - {os.path.basename(pdf_path)}")

uploaded_file_ids = []
print("\nUploading files...")
for pdf_path in pdf_files_to_upload:
    filename = os.path.basename(pdf_path)
    print(f"  Uploading {filename}...")
    with open(pdf_path, "rb") as f:
        upload_response = mxbai.files.create(file=f)
        file_id = upload_response.id
        uploaded_file_ids.append(file_id)
        print(f"    -> Uploaded successfully. File ID: {file_id}")

print(f"\nSuccessfully uploaded {len(uploaded_file_ids)} files.")

All four PDF files have been successfully uploaded.

Found 4 PDF files to upload:
  - Building Agentic Application using Streamlit and Langchain.pdf
  - Deploying DeepSeek Janus Pro locally.pdf
  - Fine-Tuning GPT-4o.pdf
  - How to Reach $500k on Upwork.pdf

Uploading files...
  Uploading Building Agentic Application using Streamlit and Langchain.pdf...
    -> Uploaded successfully. File ID: 8a538aa9-3bde-4498-90db-dbfcf22b29e9
  Uploading Deploying DeepSeek Janus Pro locally.pdf...
    -> Uploaded successfully. File ID: 52c7dfed-1f9d-492c-9cf8-039cc64834fe
  Uploading Fine-Tuning GPT-4o.pdf...
    -> Uploaded successfully. File ID: 3eaa584f-918d-4671-9b9c-6c91d5ca0595
  Uploading How to Reach $500k on Upwork.pdf...
    -> Uploaded successfully. File ID: 0e47ba93-550a-4d4b-9da1-6880a748402b

Successfully uploaded 4 files.

You can go to your Mixedbread dashboard and click on the “Files” tab to see all the uploaded files.

3. Creating and Populating the Vector Store

We will now create the vector store and add the uploaded files by providing the list of the uploaded file IDs.

vector_store_response = mxbai.vector_stores.create(
    name=vector_store_name,
    file_ids=uploaded_file_ids # Add all uploaded file IDs during creation
)
vector_store_id = vector_store_response.id

4. Monitor File Processing Status

The Mixedbread vector store will convert each page of the files into embeddings and then save them to the vector store. This means you can perform similarity searches for images or text within the PDFs.

We have written custom code to monitor the file processing status.

print("\nMonitoring file processing status (this may take some time)...")
all_files_processed = False
max_wait_time = 600 # Maximum seconds to wait (10 minutes, adjust as needed)
check_interval = 20 # Seconds between checks
start_time = time.time()
final_statuses = {}

while not all_files_processed and (time.time() - start_time) < max_wait_time:
    all_files_processed = True # Assume true for this check cycle
    current_statuses = {}
    files_in_progress = 0
    files_completed = 0
    files_failed = 0
    files_pending = 0
    files_other = 0

    for file_id in uploaded_file_ids:
       
        status_response = mxbai.vector_stores.files.retrieve(
            vector_store_id=vector_store_id,
            file_id=file_id
        )
        current_status = status_response.status
        final_statuses[file_id] = current_status # Store the latest status

        if current_status == "completed":
            files_completed += 1
        elif current_status in ["failed", "cancelled", "error"]:
            files_failed += 1
        elif current_status == "in_progress":
            files_in_progress += 1
            all_files_processed = False # At least one file is still processing
        elif current_status == "pending":
             files_pending += 1
             all_files_processed = False # At least one file hasn't started
        else:
            files_other += 1
            all_files_processed = False # Unknown status, assume not done

    print(f"  Status Check (Elapsed: {int(time.time() - start_time)}s): "
          f"Completed: {files_completed}, Failed: {files_failed}, "
          f"In Progress: {files_in_progress}, Pending: {files_pending}, Other: {files_other} "
          f"/ Total: {len(uploaded_file_ids)}")

    if not all_files_processed:
        time.sleep(check_interval)

# --- Check Final Processing Outcome ---
completed_count = sum(1 for status in final_statuses.values() if status == 'completed')
failed_count = sum(1 for status in final_statuses.values() if status in ['failed', 'cancelled', 'error'])

print("\n--- Processing Summary ---")
print(f"Total files processed: {len(final_statuses)}")
print(f"Successfully completed: {completed_count}")
print(f"Failed or Cancelled: {failed_count}")
for file_id, status in final_statuses.items():
    if status != 'completed':
        print(f"  - File ID {file_id}: {status}")

if completed_count == 0:
     print("\nNo files completed processing successfully. Exiting RAG pipeline.")
     exit()
elif failed_count > 0:
     print("\nWarning: Some files failed processing. RAG will proceed using only the successfully processed files.")
elif not all_files_processed:
     print(f"\nWarning: File processing did not complete for all files within the maximum wait time ({max_wait_time}s). RAG will proceed using only the successfully processed files.")

It took almost 42 seconds for it to process over 100 pages.

Monitoring file processing status (this may take some time)...
  Status Check (Elapsed: 0s): Completed: 0, Failed: 0, In Progress: 4, Pending: 0, Other: 0 / Total: 4
  Status Check (Elapsed: 21s): Completed: 0, Failed: 0, In Progress: 4, Pending: 0, Other: 0 / Total: 4
  Status Check (Elapsed: 42s): Completed: 4, Failed: 0, In Progress: 0, Pending: 0, Other: 0 / Total: 4

--- Processing Summary ---
Total files processed: 4
Successfully completed: 4
Failed or Cancelled: 0

When you click on the "Vector Store" tab on the Mixedbread dashboard, you will see that the vector store has been successfully created and it has 4 files stored.

5. Building RAG Pipeline

A RAG pipeline consists of three main components: retrieval, augmentation, and generation. Below is a step-by-step explanation of how these components work together to create a robust question-answering system.

The first step in the RAG pipeline is retrieval, where the system searches for relevant information based on the user's query. This is achieved by querying a vector store to find the most similar results.

user_query = "How to Deploy Deepseek Janus Pro?"

retrieved_context = ""

search_results = mxbai.vector_stores.search(
    vector_store_ids=[vector_store_id], # Search within our newly created store
    query=user_query,
    top_k=10 # Retrieve top 10 relevant chunks across all documents
)

if search_results.data:
    # Combine the content of the chunks into a single context string
    context_parts = []
    for i, chunk in enumerate(search_results.data):
        context_parts.append(f"Chunk {i+1} from '{chunk.filename}' (Score: {chunk.score:.4f}):\n{chunk.content}\n---")
    retrieved_context = "\n".join(context_parts)
else:
    retrieved_context = "No context was retrieved."

The next step is augmentation, where the retrieved context is combined with the user's query to create a custom prompt. This prompt includes system instructions, the user's question, and the retrieved context.

prompt_template = f"""
You are an assistant answering questions based *only* on the provided context from multiple documents.
Do not use any prior knowledge. If the context does not contain the answer to the question, state that clearly.

Context from the documents:
---
{retrieved_context}
---

Question: {user_query}

Answer:
"""

The final step is generation, where the combined prompt is sent to a language model (OpenAI's GPT-4.1-nano) to generate the answer. This model is chosen for its cost-effectiveness and speed.

response = openai_client.chat.completions.create(
    model=openai_model,
    messages=[
        {"role": "user", "content": prompt_template}
    ],
    temperature=0.2,
    max_tokens=500
)

final_answer = response.choices[0].message.content.strip()

print(final_answer)

The RAG pipeline produces highly accurate and contextually relevant answers.

To deploy DeepSeek Janus Pro locally, follow these steps:

1. Install Docker Desktop from https://www.docker.com/ and set it up with default settings. On Windows, ensure WSL is installed if prompted.

2. Clone the Janus repository by running:
   ```
   git clone https://github.com/kingabzpro/Janus.git
   ```
3. Navigate into the cloned directory:
   ```
   cd Janus
   ```
4. Build the Docker image using the provided Dockerfile:
   ```
   docker build -t janus .
   ```
5. Run the Docker container with the following command, which sets up port forwarding, GPU access, and persistent storage:
   ```
   docker run -it --rm -p 7860:7860 --gpus all --name janus_pro -e TRANSFORMERS_CACHE=/root/.cache/huggingface -v huggingface:/root/.cache/huggingface janus:latest
   ```
6. Wait for the container to download the model and start the Gradio application. Once running, access the app at http://localhost:7860/.

7. The application has two sections: one for image understanding and one for image generation, allowing you to upload images, ask for descriptions or poems, and generate images based on prompts.

This process enables you to deploy DeepSeek Janus Pro locally on your machine.

Conclusion

Building a RAG application using Mixedbread was a straightforward and efficient process. The Mixedbread team highly recommend using their dashboard for tasks such as uploading documents, parsing data, building vector stores, and performing similarity searches through an intuitive user interface. This approach makes it easier for professionals from various fields to create their own text-understanding applications without requiring extensive technical expertise.

In this tutorial, we learned how Mixedbread's unified API simplifies the process of building a RAG pipeline. The implementation requires only a few steps and delivers fast, accurate results. Unlike traditional methods that scrape text from documents, Mixedbread converts entire pages into embeddings, enabling more efficient and precise retrieval of relevant information. This page-level embedding approach ensures that the results are contextually rich and highly relevant.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in technology management and a bachelor's degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.