Media Analysis
Astreus includes powerful AI-powered media analysis capabilities with enhanced provider compatibility that allow your agents to understand and analyze various types of media files including images, documents, and other file formats. The system automatically detects provider capabilities and provides seamless fallbacks for optimal compatibility.
Overview
Media analysis in Astreus provides:
- 🖼️ Image Analysis: Analyze screenshots, diagrams, charts, and photos
- 📄 Document Analysis: Extract information from PDFs, Word documents, and text files
- 🎨 Visual Understanding: Understand UI elements, designs, and visual layouts
- 📊 Chart & Graph Analysis: Interpret data visualizations and trends
- 🔍 Context-Aware Analysis: Provide additional context for more accurate results
- 🔄 Provider Compatibility: Automatic detection of vision capabilities across different LLM providers
- ⚡ Fallback Support: Seamless handling when specific analysis methods aren't available
Image Analysis
Analyze images with custom prompts and varying levels of detail:
// Basic image analysis
const imageAnalysis = await agent.analyzeImage({
imagePath: './screenshot.png',
prompt: 'What UI elements are visible in this screenshot?',
detail: 'high' // 'low', 'high', or 'auto'
});
console.log('Image analysis:', imageAnalysis);
// Chart analysis
const chartAnalysis = await agent.analyzeImage({
imagePath: './sales-chart.png',
prompt: 'What trends are shown in this sales chart?',
detail: 'high'
});
// Design analysis
const designAnalysis = await agent.analyzeImage({
imagePath: './mockup.png',
prompt: 'Analyze this UI design and suggest improvements',
detail: 'auto'
});
Image Detail Levels
low
: Faster analysis, lower cost, good for simple imageshigh
: More detailed analysis, higher cost, better for complex imagesauto
: Automatically chooses the best detail level based on image complexity
Document Analysis
Extract information and insights from various document types:
// PDF analysis
const pdfAnalysis = await agent.analyzeDocument({
filePath: './contract.pdf',
prompt: 'Extract key terms, conditions, and important dates from this contract'
});
// Word document analysis
const docAnalysis = await agent.analyzeDocument({
filePath: './report.docx',
prompt: 'Summarize the main findings and recommendations'
});
// Text file analysis
const textAnalysis = await agent.analyzeDocument({
filePath: './code.js',
prompt: 'Review this code and identify potential issues or improvements'
});
General Media Analysis
Use the general media analysis function for various file types:
// PowerPoint presentation analysis
const presentationAnalysis = await agent.analyzeMedia({
filePath: './presentation.pptx',
analysisType: 'detailed',
prompt: 'Summarize the main points of this presentation'
});
// Excel spreadsheet analysis
const spreadsheetAnalysis = await agent.analyzeMedia({
filePath: './data.xlsx',
analysisType: 'summary',
prompt: 'Analyze the data trends and provide insights'
});
Analysis Types
summary
: Provides a high-level overviewdetailed
: Thorough analysis with specific detailsextraction
: Extracts specific information or data pointscomparison
: Compares elements or identifies differences
Context-Aware Analysis
Provide additional context to improve analysis accuracy:
// Analyze with business context
const contextualAnalysis = await agent.analyzeWithContext({
filePath: './quarterly-report.pdf',
prompt: 'What are the key financial metrics and their implications?',
context: 'This is a Q1 2024 financial report for a SaaS company',
detail: 'high'
});
// Analyze with historical context
const historicalAnalysis = await agent.analyzeWithContext({
filePath: './performance-chart.png',
prompt: 'How does current performance compare to expectations?',
context: 'Company expected 25% growth this quarter, previous quarter was 15%',
detail: 'auto'
});
Batch Analysis
Analyze multiple files in a single operation:
// Analyze multiple images
const batchImages = [
'./image1.png',
'./image2.jpg',
'./image3.png'
];
const batchAnalysis = await Promise.all(
batchImages.map(imagePath =>
agent.analyzeImage({
imagePath,
prompt: 'Describe what you see in this image',
detail: 'auto'
})
)
);
console.log('Batch analysis results:', batchAnalysis);
Integration with Chat
Media analysis results can be seamlessly integrated into chat conversations:
// Analyze image and discuss in chat
const analysis = await agent.analyzeImage({
imagePath: './dashboard.png',
prompt: 'What metrics are shown in this dashboard?'
});
// Continue conversation with analysis results
const response = await agent.chat({
message: `Based on the dashboard analysis: ${analysis}, what actions should we take?`,
sessionId: 'dashboard-review',
metadata: { analysisType: 'dashboard', source: 'dashboard.png' }
});
Error Handling
Implement proper error handling for media analysis with improved provider compatibility:
try {
const analysis = await agent.analyzeImage({
imagePath: './image.png',
prompt: 'Analyze this image',
detail: 'high'
});
console.log('Analysis successful:', analysis);
} catch (error) {
if (error.message.includes('file not found')) {
console.error('Image file not found');
} else if (error.message.includes('unsupported format')) {
console.error('Unsupported image format');
} else if (error.message.includes('file too large')) {
console.error('Image file too large');
} else if (error.message.includes('Media analysis not supported')) {
console.error('Current provider does not support media analysis. Please use a vision-capable model like GPT-4 Vision, Claude-3, or configure a provider with analyzeMedia support.');
} else {
console.error('Analysis failed:', error);
}
}
Provider Compatibility
The media analysis system automatically detects and adapts to different provider capabilities:
// The system checks for:
// 1. Dedicated analyzeMedia method
// 2. Vision capability through complete method
// 3. Model name patterns (vision, 4o, claude-3)
// Supported vision-capable models:
// - GPT-4 Vision (gpt-4-vision-preview, gpt-4o)
// - Claude-3 (claude-3-sonnet, claude-3-opus, claude-3-haiku)
// - Custom providers with analyzeMedia method
// Automatic fallback ensures seamless operation across providers
const analysis = await agent.analyzeImage({
imagePath: './diagram.png',
prompt: 'Explain this technical diagram'
// System automatically chooses best available method
});
Supported File Types
Images
- PNG: High quality, lossless compression
- JPEG/JPG: Compressed images, good for photos
- GIF: Animated images (analyzed as static)
- BMP: Bitmap images
- WEBP: Modern web format
Documents
- PDF: Portable document format
- DOC/DOCX: Microsoft Word documents
- TXT: Plain text files
- RTF: Rich text format
- HTML: Web pages and structured documents
Presentations & Spreadsheets
- PPTX: PowerPoint presentations
- XLSX: Excel spreadsheets
- CSV: Comma-separated values
- JSON: Structured data format
Best Practices
1. Use Appropriate Detail Levels
// ✅ Good: Use high detail for complex images
await agent.analyzeImage({
imagePath: './complex-diagram.png',
prompt: 'Analyze this technical diagram',
detail: 'high'
});
// ✅ Good: Use auto for general analysis
await agent.analyzeImage({
imagePath: './simple-logo.png',
prompt: 'What is this logo?',
detail: 'auto'
});
2. Provide Clear, Specific Prompts
// ✅ Good: Specific prompt
await agent.analyzeImage({
imagePath: './chart.png',
prompt: 'What is the sales trend from January to March 2024?'
});
// ❌ Poor: Vague prompt
await agent.analyzeImage({
imagePath: './chart.png',
prompt: 'What do you see?'
});
3. Use Context for Better Results
// ✅ Good: Provide context
await agent.analyzeWithContext({
filePath: './report.pdf',
prompt: 'What are the key risks mentioned?',
context: 'This is a financial risk assessment for a tech startup'
});
4. Handle Large Files Appropriately
// ✅ Good: Check file size before analysis
const fs = require('fs');
const stats = fs.statSync('./large-image.png');
if (stats.size > 10 * 1024 * 1024) { // 10MB
console.log('File too large, consider resizing');
} else {
const analysis = await agent.analyzeImage({
imagePath: './large-image.png',
prompt: 'Analyze this image',
detail: 'auto'
});
}
Performance Considerations
- File Size: Larger files take longer to analyze and cost more
- Detail Level: Higher detail levels provide better results but increase processing time
- Batch Processing: Use Promise.all for concurrent analysis of multiple files
- Caching: Consider caching analysis results for frequently accessed files
Next Steps
- Learn about Intent Recognition for smart tool selection
- Explore Agents for comprehensive agent capabilities
- Check out Plugins for extending media analysis functionality
- See RAG for document-based knowledge retrieval
How is this guide?