π
Document Intelligence and Search
Data extraction from documents and intelligent search services
β±οΈ Estimated reading time: 15 minutes
Azure AI Document Intelligence
Azure AI Document Intelligence (formerly Form Recognizer) extracts structured data from documents using AI.
Pre-trained Models:
- Invoices: Extracts vendors, amounts, dates, items
- Receipts: Purchase information, totals, payment methods
- ID Cards: Personal data from IDs, passports, licenses
- Tax Forms: Tax return information
- Custom Documents: Trained with specific documents
Capabilities:
- Structured Extraction: Converts documents to JSON with named fields
- Advanced OCR: Reads printed and handwritten text with high accuracy
- Layout Analysis: Understands tables, forms, and complex structures
- Confidence and Validation: Provides confidence scores per field
Usage Process:
1. Select Model: Choose pre-trained model or train custom
2. Send Document: Upload document image or PDF
3. Process: AI analyzes and extracts information
4. Receive Results: Get structured data in JSON format
Pre-trained Models:
- Invoices: Extracts vendors, amounts, dates, items
- Receipts: Purchase information, totals, payment methods
- ID Cards: Personal data from IDs, passports, licenses
- Tax Forms: Tax return information
- Custom Documents: Trained with specific documents
Capabilities:
- Structured Extraction: Converts documents to JSON with named fields
- Advanced OCR: Reads printed and handwritten text with high accuracy
- Layout Analysis: Understands tables, forms, and complex structures
- Confidence and Validation: Provides confidence scores per field
Usage Process:
1. Select Model: Choose pre-trained model or train custom
2. Send Document: Upload document image or PDF
3. Process: AI analyzes and extracts information
4. Receive Results: Get structured data in JSON format
π― Key Points
- β Extracts structured data from unstructured documents
- β Pre-trained models for invoices, receipts, and IDs
- β Supports training of custom models
- β Converts documents to structured JSON format
- β High accuracy with OCR and layout analysis
- β Provides confidence scores
Azure AI Search (Knowledge Mining)
Azure AI Search is an intelligent search service that combines AI with traditional search capabilities.
Knowledge Mining Process:
1. Ingestion: Import data from multiple sources (Azure Storage, SQL DB, Cosmos DB)
2. AI Enrichment: Apply AI skills (OCR, text analysis, vision)
3. Indexing: Create optimized search indexes
4. Intelligent Search: Enable natural queries and advanced filtering
Enrichment Skills:
- Text Extraction: OCR and document processing
- Text Analysis: Language detection, entities, sentiment
- Computer Vision: Image and video analysis
- Custom Search: Domain-specific models
Advanced Features:
- Faceted Search: Filters by categories and ranges
- Semantic Search: Context and intent understanding
- Vector Search: Similarity search using embeddings
- Scoring and Ranking: Custom relevance algorithms
Knowledge Mining Process:
1. Ingestion: Import data from multiple sources (Azure Storage, SQL DB, Cosmos DB)
2. AI Enrichment: Apply AI skills (OCR, text analysis, vision)
3. Indexing: Create optimized search indexes
4. Intelligent Search: Enable natural queries and advanced filtering
Enrichment Skills:
- Text Extraction: OCR and document processing
- Text Analysis: Language detection, entities, sentiment
- Computer Vision: Image and video analysis
- Custom Search: Domain-specific models
Advanced Features:
- Faceted Search: Filters by categories and ranges
- Semantic Search: Context and intent understanding
- Vector Search: Similarity search using embeddings
- Scoring and Ranking: Custom relevance algorithms
π― Key Points
- β Combines traditional search with AI
- β Enriches data with AI skills
- β Creates intelligent indexes for search
- β Supports natural and semantic queries
- β Vector search by similarity
- β Scalable and integrable with other applications
Knowledge Store
Knowledge Store is a feature of Azure AI Search that allows saving enriched information (extracted by AI) independently of the search index.
What is it for?
Normally, enriched data only lives inside the search engine. Knowledge Store allows 'projecting' or exporting this data to Azure Storage (Tables or Blobs) for other uses.
Key Exam Use Cases:
- Analytics and Reporting: Connecting Power BI directly to enriched data to visualize trends.
- Data Science: Using clean, structured data to train new Machine Learning models.
- Audit: Saving a permanent copy of extracted information.
Types of Projections:
- Table Projections: Saves structured data for relational analysis.
- Object Projections: Saves complex JSON structures.
- File Projections: Saves extracted or generated images.
What is it for?
Normally, enriched data only lives inside the search engine. Knowledge Store allows 'projecting' or exporting this data to Azure Storage (Tables or Blobs) for other uses.
Key Exam Use Cases:
- Analytics and Reporting: Connecting Power BI directly to enriched data to visualize trends.
- Data Science: Using clean, structured data to train new Machine Learning models.
- Audit: Saving a permanent copy of extracted information.
Types of Projections:
- Table Projections: Saves structured data for relational analysis.
- Object Projections: Saves complex JSON structures.
- File Projections: Saves extracted or generated images.
π― Key Points
- β Saves enriched data outside the search index
- β Allows using AI data in Power BI or Machine Learning
- β Uses Azure Storage (Tables and Blobs)
- β Key concept: Projections
- β Essential for reusing extracted knowledge