TOC
——Shared inspiration
The only way to do great work, is to do what you love.
- 8 week course, starting July 28th, 2025
- Week 1 - Extended (Take your time with practical exercise. Use the extra guides in Week 1 for a firmer footing and seek help via the platform, email, or LinkedIn (details in the GitHub repo).)
- Evaluate leading 10 frontier and open source models
- Implement open source projects based on what I am learning
- Maintain this blog, as a learning resource
- Add key skills to my resume
- Build real world, LLM powered applications
- Project - Run Ollama locally - Saad's notes
- Project #1 - CareerGPT-Backend (Enterprise grade platform, evolving into E-Commerce app would not be an architectural stretch)
- Service to service authentication via JWT (per service keys, DB backed)
- Trust boundary, all microservices are behind API Gateway, using identity-backend as the sole issuer
- Scope, no user level roles yet; services authenticate each other for internal calls
- Project #2 - A logging microservice (log-service) separate from careergpt-backend.
- Exposes a write-only API for client services to send structured logs.
- Can later include:
- Query API (for admin dashboards).
- Filters (by app, user, level).
- Stores logs in SQLite initially (upgradable to Postgres).
- Project #3 - Identity Service
- Identity-Backend is Trust Anchor
- Service-to-Service Auth Model
- Identity-Backend - Why this isn't secure
- RBAC
- FSM, guest, email collected, verified, subscribed
- States trigger role changes, policy updates, for example premium api for paid subscribers
- FSM stored in DB, user states, state transitions tables, etc
- External SSO and Federation
- OpenID Connect, SAML 2.0 allowing user to authenticate using external IdP
- Identity-backend exchanges external token -> issues internal JWT with proper claims
- Zero trust alignment
- Only trust identity-backend; validates every request
- Enterprise hardening and zero trust roadmap (see ChatGPT)
- …
- Project #4 - Seperate data science lab, using jupyter from careergpt-backend
- Project #5 - Integrate Identity, and Logging services with CareerGPT-Backend
- Project #Web Scraper Agent
- Project #1 - AI powered brochure generator, scrapes and navigates assets, such as company website
- Project #2 - Support agent for an airline with UI and remote procedure calls
- Project #3 - Create meeting minutes from audio using both open and closed source models
- Project #4 - AI that converts Python code to optimized C++, boost performance by 60,000X
- Project #5 - Build AI knowledge-worker using RAG to become an expert on all company-related matters
- Project #6 - Predict product prices from short descriptions using Frontier models
- Project #7 - Execute Fine-tuned open-source model to compete with Frontier in price prediction
- Project #8 - Build Autonomous multi agent system collaborating with models to spot deals and notify you of special bargains.
- Inside the LLM - practical, hands-on take on transformers theory
- RAG
- LoRA
- AI Agents
- AI in Production - deploy LLMs and Agents for scale, resiliency, security
Monday - July 28th, 2025
- Environment setup completed.
- Process: OLlama - run LLM locally
- Experience calling the OpenAI API for a frontier model.
- High-level understanding of system vs. user prompts.
- Build your data science lab environment
- End of day notes: Daily - Saad's notes
Tuesday - July 29th, 2025
- How much python do you need to know? Intermediate, for example can you answer what is wrong with:
- yield from set(book.get("author") for book in books if book.get("author"))
- If you understand this, you are well-prepared. If you know why it might not be optimal, you are advanced.
- If terms like yield, set, or .get are unfamiliar, a dedicated Jupyter notebook (a special Jupyter lab) in Week 1 provides a Python guide at the required level, taking you through stepping stones until such a line makes sense.
- Three Pillars of LLM Engineering Covered:
- Models: Understanding the spectrum of LLMs (open-source, closed-source, multimodal, capable of generating images or audio), their architectures, and selecting the right one for a task.
- Tools: Practical use of Hugging Face, LangChain (as glue code), Gradio, Weights & Biases, and Modal for deployment.
- Techniques: Applying APIs, Retrieval-Augmented Generation (RAG), fine-tuning, and building full agentic AI solutions.
- The course covers both closed-source (frontier) and open-source models. "Frontier models" refer to the LLMs pioneering what is possible today, the largest possible models. This term often refers to closed-source, paid models, but sometimes also to the biggest, strongest open-source models, depending on the context.
- Closed-Source Frontier Models (Superscalers) - These are the largest, highest-scale, paid models from major AI companies.
- GPT (OpenAI): ChatGPT, released in late 2022, caught everyone off guard with its power and brought generative AI to mainstream attention.
- Claude (Anthropic): A primary competitor to OpenAI, often considered neck and neck with GPT in leaderboards (with Claude currently having a slight edge) and usually favored by data scientists.
- Gemini (Google): Google's entrant in the frontier model space; Google also offers Gemma, an open-source variant.
- Command R (Cohere): A model from Cohere, a Canadian AI company.
- Perplexity: A search engine that can use other models but also has its own model.
- Open-Source Models - These are freely available models for use and modification.
- Llama (Meta): The most famous open-source model, as Meta paved the way by open-sourcing the original Llama 1.
- Mixtral (Mistral AI): A "mixture of experts" model from the French company Mistral AI, containing multiple smaller models.
- Qwen (Alibaba Cloud): A powerhouse model, super impressive and very powerful for its size, which will be used from time to time in the course.
- Gemma (Google): Google's smaller, open-source model.
- Phi (Microsoft): Microsoft's smaller, open-source model.
- Three Core Methods for Using LLMs
- Chat Interfaces: for example, chatgpt
- Cloud APIs: Abstract away the complexity of operationalizing an LLM, allowing you to focus on on unified API
- Local Execution: Using tools like Ollama, which optimize and compile the model's code into high-performance C++ code (via Llama CPP) for efficient local execution. This allows running models in inference and execution mode on your box but offers less control over internal operations due to the fully compiled code.
- Rationale for Using Ollama
- Advantages:
- Cost-Free: No API charges; it's open-source, free, and runs locally on your machine.
- Data Privacy: All data stays on your local machine, which is critical for confidential data that absolutely must not go to the cloud or leave the internet
- Disadvantage
- Performance: Local open-source models are generally smaller and less powerful than paid frontier models (which are many times larger and more powerful), so the expected result quality may be lower.
Day 3 - Wednesday - July 30th, 2025
Overview of Frontier LLM Models
• The session explored frontier large language models (LLMs), examining their strengths, weaknesses, and potential business applications, with the goal of providing a true intuition for their differences and commercial applicability.
• Model selection was framed as a strategic decision, considering technical capabilities, contextual nuances, pricing, and domain-specific expertise, with cost and API rate limits increasingly becoming primary differentiators as model performance converges.
Key Models and Their Characteristics
OpenAI Models
GPT (including GPT-4.0)
Known for generating structured, well-researched, and nuanced responses, often with introductions and summaries.
Excels in creative tasks such as transforming bullet points into emails, slides, or blog posts, and is remarkably good at writing and debugging complex code, often acting as a "co-pilot."
GPT-4o, specifically, is multimodal (Omni), capable of generating images and handling "trickery" questions with witty and fun responses. It also features Canvas for collaborative code iteration and enhancement.
Sometimes struggles with simple, precise tasks like counting letter occurrences due to underlying tokenization strategies.
O1 Preview (formerly Strawberry)
This model, originally codenamed Strawberry, is the strongest of OpenAI's models, currently available to pro-subscribers but slated for wider release.
Uses a chain-of-reasoning approach, delivering more precise and accurate results in tasks like counting letters or solving analogies.
Shows improved reasoning and accuracy compared to standard GPT outputs, successfully solving problems that GPT-4o might miss.
Anthropic’s Claude Family
Includes Claude Haiku, Sonnet, and Opus, with Claude 3.5 Sonnet identified as the strongest due to recent versions surpassing older, more expensive models.
Effective for coding assistance, producing code in "artifacts" for easy sharing and version tracking. It offers clear explanations and is characterized by a charismatic, succinct, and pithy style.
Claude 3.5 Sonnet, released in October, has recently surpassed PhD-level proficiency in math, physics, and chemistry, leading in most benchmarks.
Strong on socio-ethical considerations and safety alignment, often pushing back on comparative claims against other AI models due to its ethical framework. However, it may err on tasks needing precise numeric or token-based counts, similar to other LLMs.
Google Gemini
The next-generation model, evolved from Bard, is now widely integrated into Google search responses.
Provides thorough, sometimes overly literal answers, often lacking the ability to understand subtle humor or nuance in queries like the "Hawaii to 17" example.
While maintaining factual accuracy, it is less creative or playful in humorous or abstract queries and also struggled with the letter counting task.
Cohere (Command R Plus)
A Canadian AI company, Cohere's model is well-known for its focus on technical rigor and domain expertise, leveraging specialized knowledge bases (RAG).
Produces well-structured and detailed answers, as seen in its response to philosophical questions, which was less expressive than Claude's but had more structure and substance. However, like many other models, it struggles with trivial counting tasks, reflecting limitations related to its training and inference mode.
Meta’s LAMA Model
An open-source model, accessible via meta.ai, which supports interactive image generation (e.g., creating images for abstract concepts) and text responses.
Performs adequately on direct questions, but its overall power is noted as less than some other frontier models. It is also limited in nuanced tasks like accurate letter counting.
Perplexity
Functions primarily as an AI-powered search engine rather than a traditional LLM, capable of using other models but also having its own.
Excels at summarizing current events and demonstrates strong research capabilities by providing nuanced, well-crafted responses to real-time information, even when other models have knowledge cutoffs.
Notably, it successfully handles some precise counting tasks, placing it alongside O1 Preview in accuracy for such challenges. It pushes back on self-comparison questions, stating it lacks such capabilities.
Common Strengths and Weaknesses
Strengths
All models are astonishingly powerful, generating structured, well-reasoned responses to detailed technical and creative queries.
Effectively transform brief prompts or bullet points into comprehensive outputs like emails, code, slides, blog posts, and explanations.
Claude 3.5 Sonnet has demonstrated near “PhD-level” proficiency in math, physics, and chemistry, with other models expected to follow.
Weaknesses
Consistent challenges with precise token counting (e.g., counting the letter “A”) due to tokenization and input processing strategies.
Varying ability to understand humor, indirect references, and subtle context, often providing overly literal interpretations, highlighting the need for human oversight in critical uses.
Tend to not be as strong with highly specialized subject matter, particularly in specific business domains, lacking the expert knowledge of a human.
Some models, especially with knowledge cutoffs (e.g., GPT's October last year cutoff), may struggle with recent events or emerging topics, requiring supplemental tools like Perplexity for real-time insights.
Can exhibit "strange blind spots," getting questions wrong but confidently stating incorrect answers, a phenomenon known as hallucination.
Business and Strategic Insights for Senior Management
As LLM performance converges across models, cost and API rate limits are becoming primary differentiators, making efficient pricing and resource management key strategic factors.
Demonstrations of iterative code enhancements (e.g., GPT-4.0 with Canvas) show significant efficiency gains in problem-solving, debugging, and rapid prototyping, acting as a powerful "co-pilot."
Comparative experiments (e.g., the leadership challenge among models) reveal that while all are powerful, differences in personality, safety, and alignment can significantly impact their suitability for sensitive or client-facing projects.
Selecting the LLM that best fits specific business needs (considering factors like scalability, handling unstructured data, contextual understanding, and domain expertise) remains crucial for strategic implementation.
Insights on Future Directions
Upcoming sessions will cover foundational concepts such as transformer architectures, token mechanisms, context window management, and API cost considerations—all key for technological oversight and strategic planning.
Bridging technical details with commercial deployment strategies will support data-driven decisions in non-traditional industries.
Comments
Post a Comment