# Knowledge
URL: /docs/framework/knowledge
Source: /app/src/content/docs/framework/knowledge.mdx

import { DocImage } from '@/components/DocImage';

<DocImage title="Knowledge" />

**RAG integration with document processing and vector search**

import { Step, Steps } from 'fumadocs-ui/components/steps';

## Overview

The Knowledge system provides retrieval-augmented generation (RAG) capabilities, allowing agents to access and utilize external documents in their responses. It automatically processes documents, creates vector embeddings, and enables semantic search for relevant information. Agents with knowledge can provide more accurate, contextual responses based on your documents.

## Enabling Knowledge

Enable knowledge for an agent by setting the `knowledge` option to `true`:

```typescript
import { Agent } from '@astreus-ai/astreus';

const agent = await Agent.create({
  name: 'KnowledgeAgent',
  model: 'gpt-4o',
  knowledge: true,  // Enable knowledge base access (default: false)
  embeddingModel: 'text-embedding-3-small' // Optional: specify embedding model
});
```

## Adding Documents

<Steps>
  <Step>
    ### Add Text Content

    Add content directly as a string:

    ```typescript
    await agent.addKnowledge(
      'Your important content here',
      'Document Title',
      { category: 'documentation' }
    );
    ```
  </Step>

  <Step>
    ### Add from File

    Add content from supported file types:

    ```typescript
    // Add PDF file
    await agent.addKnowledgeFromFile(
      '/path/to/document.pdf',
      { source: 'manual', version: '1.0' }
    );

    // Add text file
    await agent.addKnowledgeFromFile('/path/to/notes.txt');
    ```
  </Step>

  <Step>
    ### Add from Directory

    Process all supported files in a directory:

    ```typescript
    await agent.addKnowledgeFromDirectory(
      '/path/to/docs',
      { project: 'documentation' }
    );
    ```
  </Step>
</Steps>

## Supported File Types

* **Text files**: `.txt`, `.md`, `.json`
* **PDF files**: `.pdf` (with text extraction)

## How It Works

The knowledge system follows a sophisticated processing pipeline:

<Steps>
  <Step>
    ### Document Processing

    Documents are stored and indexed in the knowledge database with metadata.
  </Step>

  <Step>
    ### Text Chunking

    Content is split into chunks (1000 characters with 200 character overlap) for optimal retrieval.

    The overlap ensures context continuity:
    $\text{overlap ratio} = \frac{200}{1000} = 0.2 = 20\%$

    This prevents important information from being split across chunk boundaries.
  </Step>

  <Step>
    ### Vector Embeddings

    Each chunk is converted to vector embeddings using OpenAI or Ollama embedding models.

    Common embedding dimensions:

    * `text-embedding-3-small`: 1536 dimensions
    * `text-embedding-3-large`: 3072 dimensions
    * `text-embedding-ada-002`: 1536 dimensions

    The Euclidean distance between vectors can also be used:
    $d(\vec{p}, \vec{q}) = \sqrt{\sum_{i=1}^{n}(p_i - q_i)^2}$
  </Step>

  <Step>
    ### Semantic Search

    When agents receive queries, relevant chunks are retrieved using cosine similarity search.

    The similarity between query and document vectors is calculated using:

    $\text{cosine similarity} = \cos(\theta) = \frac{\vec{A} \cdot \vec{B}}{||\vec{A}|| \cdot ||\vec{B}||} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \cdot \sqrt{\sum_{i=1}^{n} B_i^2}}$

    Where:

    * $\vec{A}$ is the query embedding vector
    * $\vec{B}$ is the document chunk embedding vector
    * Higher values (closer to 1) indicate greater similarity
  </Step>

  <Step>
    ### Context Integration

    Retrieved information is automatically added to the agent's context for enhanced responses.
  </Step>
</Steps>

## Example Usage

Here's a complete example of using knowledge with an agent:

```typescript
import { Agent } from '@astreus-ai/astreus';

// Create agent with knowledge enabled
const agent = await Agent.create({
  name: 'DocumentAssistant',
  model: 'gpt-4o',
  knowledge: true,
  embeddingModel: 'text-embedding-3-small', // Optional: specify embedding model
  systemPrompt: 'You are a helpful assistant with access to company documentation.'
});

// Add documentation
await agent.addKnowledgeFromFile('./company-handbook.pdf', {
  type: 'handbook',
  department: 'hr'
});

await agent.addKnowledge(`
Our API uses REST principles with JSON responses.
Authentication is done via Bearer tokens.
Rate limiting is 1000 requests per hour.
`, 'API Documentation', {
  type: 'api-docs',
  version: '2.0'
});

// Query with automatic knowledge retrieval
const response = await agent.ask('What is our API rate limit?');
console.log(response);
// The agent will automatically search the knowledge base and include relevant context

// Manual knowledge search
const results = await agent.searchKnowledge('API authentication', 5, 0.7);
results.forEach(result => {
  console.log(`Similarity: ${result.similarity}`);
  console.log(`Content: ${result.content}`);
});
```

## Managing Knowledge

### Available Methods

```typescript
// List all documents with metadata
const documents = await agent.getKnowledgeDocuments();
// Returns: Array<{ id: number; title: string; created_at: string }>

// Delete specific document by ID
const success = await agent.deleteKnowledgeDocument(documentId);
// Returns: boolean indicating success

// Delete specific chunk by ID
const success = await agent.deleteKnowledgeChunk(chunkId);
// Returns: boolean indicating success

// Clear all knowledge for this agent
await agent.clearKnowledge();
// Returns: void

// Search with custom parameters
const results = await agent.searchKnowledge(
  'search query',
  10,    // limit: max results (default: 5)
  0.8    // threshold: similarity threshold (0-1, default: 0.7)
);
// Returns: Array<{ content: string; metadata: MetadataObject; similarity: number }>

// Get relevant context for a query
const context = await agent.getKnowledgeContext(
  'query text',
  5      // limit: max chunks to include (default: 5)
);
// Returns: string with concatenated relevant content

// Expand context around a specific chunk
const expandedChunks = await agent.expandKnowledgeContext(
  documentId,   // Document ID
  chunkIndex,   // Chunk index within document
  2,            // expandBefore: chunks to include before (default: 1)
  2             // expandAfter: chunks to include after (default: 1)
);
// Returns: Array<string> with expanded chunk content
```

## Configuration

### Environment Variables

```bash
# Database (required)
KNOWLEDGE_DB_URL=postgresql://user:password@host:port/database

# API key for embeddings (uses same provider as agent's model)
OPENAI_API_KEY=your_openai_key
```

### Embedding Model Configuration

Specify the embedding model directly in the agent configuration:

```typescript
const agent = await Agent.create({
  name: 'KnowledgeAgent',
  model: 'gpt-4o',
  embeddingModel: 'text-embedding-3-small',  // Specify embedding model here
  knowledge: true
});
```