👁️

Computer Vision

AWS services for image and video analysis

⏱️ Estimated reading time: 18 minutes

Amazon Rekognition

Deep learning-based image and video analysis service that requires no ML experience.

Key Features

Object and Scene Detection

- Identifies thousands of objects (vehicles, pets, furniture)
- Recognizes activities (running, dancing)
- Detects scenes (beach, city, sunset)

Facial Recognition

- Face detection: Location and attributes
- Face comparison: Similarity between two faces
- Face search: Search for people in collections
- Facial analysis: Estimated age, gender, emotions

Celebrity Recognition

- Identifies celebrities in images and videos
- Provides additional information

Content Moderation

- Detects inappropriate or offensive content
- Classifies by categories (violence, drugs, etc.)
- Adjustable confidence levels

Text Detection (OCR)

- Extracts text from images
- Supports printed and handwritten text
- Multiple languages

PPE Detection

- Personal Protective Equipment
- Helmets, gloves, masks
- Useful for safety compliance

Use Cases

- Identity verification
- Security and surveillance
- Social media content moderation
- Media analysis
- Access control
- Visual search

Key Points

✓ Rekognition provides production-ready features without custom models
✓ Validate accuracy across demographic groups to avoid facial recognition bias
✓ Optimize images (resolution, format) to improve OCR and detection
✓ Consider video processing vs image cost tradeoffs
✓ Combine services (Rekognition + Textract) for heterogeneous pipelines

Amazon Textract

Service that automatically extracts text, handwriting, and data from scanned documents.

Capabilities

Text Extraction

- Advanced OCR
- Printed and handwritten text
- Preserves document layout

Forms Analysis

- Identifies key-value pairs
- Extracts data from form fields
- Checkboxes

Tables Analysis

- Extracts tabular data
- Maintains row and column structure
- Processes complex tables

Identity Document Analysis

- Passports
- Driver's licenses
- Extracts structured information

Signature Detection

- Identifies signatures in documents
- Useful for validation

Queries

- Searches for specific information in documents
- Natural language questions
- Example: 'What is the total amount?'

Advantages over Traditional OCR

- Understands document structure
- No manual configuration required
- Handles multiple formats
- High accuracy

Use Cases

- Invoice processing
- Mortgage loan automation
- Medical records extraction
- Legal document digitization
- Insurance claims processing
- Customer onboarding (KYC)

Key Points

✓ Textract is ideal for extracting document structure; always validate complex tables and forms
✓ Preprocessing images (deskew, contrast enhancement) improves OCR
✓ Test with real documents and variants (languages, formats)
✓ Use Queries to extract specific fields at scale
✓ Manage PII and apply redaction when necessary

Other Vision Services

Amazon Lookout for Vision

Purpose: Detect defects in manufactured products

Features:
- Automated visual inspection
- Identifies anomalies and defects
- Training with few images (30+)
- Integration with production lines

Use cases:
- Manufacturing quality control
- Damaged product detection
- Assembly verification

Amazon Monitron

Purpose: Predictive monitoring of industrial equipment

Features:
- End-to-end predictive maintenance system
- Sensors for vibration and temperature
- Detects abnormal behavior in machinery
- Automatic ML without development needed

AWS Panorama

Purpose: Computer vision analysis at the edge

Features:
- Physical device (Panorama Appliance)
- Processes video from IP cameras locally
- Low latency
- Privacy (data doesn't leave the site)

Use cases:
- Real-time PPE verification
- People counting
- Manufacturing quality control
- Retail traffic analysis

Amazon Rekognition Video

Video Analysis:
- Activity detection
- Person tracking
- Real-time facial recognition
- Inappropriate content detection
- Sports event analysis

Key Points

✓ Lookout for Vision is useful for industrial inspection with few training images
✓ Monitron and Panorama cover edge and predictive maintenance scenarios
✓ Assess use-case sensitivity and latency to choose edge vs cloud
✓ Integrate inference pipelines with alerts and action systems
✓ Measure false positives/negatives in production and tune thresholds

← Back to AWS-AIF-C01