β¨
Generative AI
Concepts and applications of Generative Artificial Intelligence
β±οΈ Estimated reading time: 25 minutes
Introduction to Generative AI
Generative AI creates new content (text, images, audio, code) based on patterns learned from training data.
Pre-trained models with large amounts of data that can be adapted to multiple tasks:
- LLMs: Large Language Models
- Diffusion Models: For image generation
- Multimodal: Process multiple data types
Foundation Models
Pre-trained models with large amounts of data that can be adapted to multiple tasks:
- LLMs: Large Language Models
- Diffusion Models: For image generation
- Multimodal: Process multiple data types
π― Key Points
- β Foundation models enable fast adaptation (fine-tuning / prompt engineering)
- β Understanding limitations (hallucinations, bias) is critical for production use
- β Choose model by modality (text, image, multimodal) and latency requirements
- β RAG helps reduce hallucinations when up-to-date information is required
- β Assess inference and embedding storage costs for scaled solutions
Amazon Bedrock
Fully managed service providing access to AI foundation models through a unified API.
- Anthropic Claude: Conversation and reasoning
- Meta Llama: Open-source model
- Amazon Titan: AWS proprietary models
- Stability AI: Image generation
- Cohere: NLP and embeddings
- Customization via fine-tuning
- RAG (Retrieval Augmented Generation)
- AI Agents
- Model evaluation
Available Models
- Anthropic Claude: Conversation and reasoning
- Meta Llama: Open-source model
- Amazon Titan: AWS proprietary models
- Stability AI: Image generation
- Cohere: NLP and embeddings
Features
- Customization via fine-tuning
- RAG (Retrieval Augmented Generation)
- AI Agents
- Model evaluation
π― Key Points
- β Bedrock provides managed access to models without managing infrastructure
- β Compare models by cost, customization and privacy requirements
- β Use RAG or fine-tuning depending on context needs and data control
- β Test models with production-like data before deployment
- β Consider latency and SLAs when selecting provider/model
RAG (Retrieval Augmented Generation)
Architecture that improves LLM responses by retrieving relevant information from external sources.
1. Vectorization: Convert documents to embeddings
2. Vector database: Store embeddings (e.g., OpenSearch, Pinecone)
3. Retrieval: Search for relevant information
4. Generation: LLM generates response with retrieved context
- Reduces hallucinations
- Up-to-date information
- More accurate responses
- No fine-tuning required
Components
1. Vectorization: Convert documents to embeddings
2. Vector database: Store embeddings (e.g., OpenSearch, Pinecone)
3. Retrieval: Search for relevant information
4. Generation: LLM generates response with retrieved context
Benefits
- Reduces hallucinations
- Up-to-date information
- More accurate responses
- No fine-tuning required
π― Key Points
- β Vectorization and vector DBs are at the core of RAG
- β Choose vector DB by latency, cost and scalability
- β Curate knowledge sources and ensure document quality
- β Combine retrieval and generation to balance accuracy and fluency
- β Measure hallucination reduction and contextual accuracy after RAG deployment