📄 PDF Chatbot

Our PDF Chatbot tool allows you to transform any PDF document into an interactive chatbot that can answer questions based on the document’s content. This tool uses advanced RAG (Retrieval-Augmented Generation) technology to provide accurate, context-aware responses.

Try PDF Chatbot Now

Start creating interactive chatbots from your PDFs

How It Works

The PDF Chatbot processes documents through several sophisticated steps:

  1. Document Processing: Upload your PDF, which is then extracted and chunked into manageable text segments
  2. Vector Embedding: Text chunks are converted into vector representations using Gemini’s models/text-embedding-004
  3. Efficient Indexing: Vectors are indexed using FAISS for lightning-fast similarity search
  4. Intelligent Querying: Questions are answered using Gemini 2.0 Flash LLM based on the most relevant document chunks

The system maintains chat history for contextual understanding and supports dynamic index updates.

API Endpoints

1. PDF to Chunk

This endpoint processes a PDF from a remote URL, extracts and cleans its text, and prepares it for indexing.

// Request
POST /pdf_to_chunk
{
  "user_id": "user123",
  "file_id": "file1",
  "file_url": "https://bucket.s3.amazonaws.com/file1.pdf"
}

// Response
{
  "total_pages": 8,
  "total_chars": 14500,
  "total_chunks": 5,
  "chunks": [
    {
      "chunk_text": "Extracted and cleaned text content...",
      "chunk_chars": 1024,
      "chunk_page": 2,
      "source": "https://bucket.s3.amazonaws.com/file1.pdf",
      "links": ["https://example.com", "https://another-link.com"],
      "unique_id": 9876543210123456
    },
    // Additional chunks...
  ],
  "faiss_index_path": "path/to/faiss/index"
}

2. Chunk to Index

This endpoint stores pre-processed text chunks into a user-specific vector index.

// Request
POST /chunk_to_index
{
  "user_id": "user123",
  "chunk_data": [
    {
      "chunk_text": "example text...",
      "unique_id": 9876543210123456,
      "chunk_chars": 1000,
      "chunk_page": 2,
      "source": "https://example.com",
      "links": []
    }
  ]
}

// Response
{
  "message": "FAISS index created successfully",
  "index_path": "path/to/index"
}

3. Query

This endpoint allows users to ask questions about uploaded documents.

// Request
POST /query
{
  "user_id": "user123",
  "question": "What are the main topics in the document?",
  "chat_history": [
    ["What is the document about?", "It discusses climate change policies."],
    ["What is the impact of climate change?", "It leads to rising temperatures, extreme weather, and sea level rise."]
  ]
}

// Response
{
  "answer": "The document discusses the effects of CO2 emissions and global warming...",
  "chunk_ids": [9876543210123456, 1576543210153056]
}

4. Delete Vectors

This endpoint removes specific vector embeddings from a user’s index.

// Request
POST /delete_vectors
{
  "user_id": "user123",
  "chunk_ids": [9876543210123456, 1576543210153056]
}

// Response
{
  "message": "Vectors deleted successfully",
  "deleted_count": 2
}

Use Cases

  • Knowledge Base: Create interactive FAQ systems from technical documentation
  • Educational Content: Transform textbooks or course materials into interactive learning assistants
  • Legal Documents: Make complex legal documents more accessible through natural language queries
  • Research Papers: Quickly extract insights from academic papers through conversation

This tool is completely free to use! Start building your PDF chatbot today to make your documents more accessible and interactive.