Enterprise hardening and zero trust roadmap (see ChatGPT)
…
Project #4 - Seperate data science lab, using jupyter from careergpt-backend
Project #5 - Integrate Identity, and Logging services with CareerGPT-Backend
Project #Web Scraper Agent
Project #1 - AI powered brochure generator, scrapes and navigates assets, such as company website
Project #2 - Support agent for an airline with UI and remote procedure calls
Project #3 - Create meeting minutes from audio using both open and closed source models
Project #4 - AI that converts Python code to optimized C++, boost performance by 60,000X
Project #5 - Build AI knowledge-worker using RAG to become an expert on all company-related matters
Project #6 - Predict product prices from short descriptions using Frontier models
Project #7 - Execute Fine-tuned open-source model to compete with Frontier in price prediction
Project #8 - Build Autonomous multi agent system collaborating with models to spot deals and notify you of special bargains.

Milestones

Inside the LLM - practical, hands-on take on transformers theory
RAG
LoRA
AI Agents
AI in Production - deploy LLMs and Agents for scale, resiliency, security

Week 1: Foundations and First Projects (Course onboarding - Saad's notes)

• Dive into the fundamentals of Transformers.

• Experiment with six leading Frontier Models.

• Build your first business Gen AI product that scrapes the web, makes decisions, and creates formatted sales brochures.

Monday - July 28th, 2025

Environment setup completed.
Process: OLlama - run LLM locally
Experience calling the OpenAI API for a frontier model.
High-level understanding of system vs. user prompts.
Build your data science lab environment
End of day notes: Daily - Saad's notes

Tuesday - July 29th, 2025

How much python do you need to know? Intermediate, for example can you answer what is wrong with:

yield from set(book.get("author") for book in books if book.get("author"))

If you understand this, you are well-prepared. If you know why it might not be optimal, you are advanced.
If terms like yield, set, or .get are unfamiliar, a dedicated Jupyter notebook (a special Jupyter lab) in Week 1 provides a Python guide at the required level, taking you through stepping stones until such a line makes sense.

Three Pillars of LLM Engineering Covered:

Models: Understanding the spectrum of LLMs (open-source, closed-source, multimodal, capable of generating images or audio), their architectures, and selecting the right one for a task.
Tools: Practical use of Hugging Face, LangChain (as glue code), Gradio, Weights & Biases, and Modal for deployment.
Techniques: Applying APIs, Retrieval-Augmented Generation (RAG), fine-tuning, and building full agentic AI solutions.

The course covers both closed-source (frontier) and open-source models. "Frontier models" refer to the LLMs pioneering what is possible today, the largest possible models. This term often refers to closed-source, paid models, but sometimes also to the biggest, strongest open-source models, depending on the context.
Closed-Source Frontier Models (Superscalers) - These are the largest, highest-scale, paid models from major AI companies.

GPT (OpenAI): ChatGPT, released in late 2022, caught everyone off guard with its power and brought generative AI to mainstream attention.
Claude (Anthropic): A primary competitor to OpenAI, often considered neck and neck with GPT in leaderboards (with Claude currently having a slight edge) and usually favored by data scientists.
Gemini (Google): Google's entrant in the frontier model space; Google also offers Gemma, an open-source variant.
Command R (Cohere): A model from Cohere, a Canadian AI company.
Perplexity: A search engine that can use other models but also has its own model.

Open-Source Models - These are freely available models for use and modification.

Llama (Meta): The most famous open-source model, as Meta paved the way by open-sourcing the original Llama 1.
Mixtral (Mistral AI): A "mixture of experts" model from the French company Mistral AI, containing multiple smaller models.
Qwen (Alibaba Cloud): A powerhouse model, super impressive and very powerful for its size, which will be used from time to time in the course.
Gemma (Google): Google's smaller, open-source model.
Phi (Microsoft): Microsoft's smaller, open-source model.

Three Core Methods for Using LLMs

Chat Interfaces: for example, chatgpt
Cloud APIs: Abstract away the complexity of operationalizing an LLM, allowing you to focus on on unified API
Local Execution: Using tools like Ollama, which optimize and compile the model's code into high-performance C++ code (via Llama CPP) for efficient local execution. This allows running models in inference and execution mode on your box but offers less control over internal operations due to the fully compiled code.

Rationale for Using Ollama

Advantages:

Cost-Free: No API charges; it's open-source, free, and runs locally on your machine.
Data Privacy: All data stays on your local machine, which is critical for confidential data that absolutely must not go to the cloud or leave the internet

Disadvantage

Performance: Local open-source models are generally smaller and less powerful than paid frontier models (which are many times larger and more powerful), so the expected result quality may be lower.

Day 3 - Wednesday - July 30th, 2025

Overview of Frontier LLM Models

• The session explored frontier large language models (LLMs), examining their strengths, weaknesses, and potential business applications, with the goal of providing a true intuition for their differences and commercial applicability.

• Model selection was framed as a strategic decision, considering technical capabilities, contextual nuances, pricing, and domain-specific expertise, with cost and API rate limits increasingly becoming primary differentiators as model performance converges.

Key Models and Their Characteristics

OpenAI Models

GPT (including GPT-4.0)
- Known for generating structured, well-researched, and nuanced responses, often with introductions and summaries.
- Excels in creative tasks such as transforming bullet points into emails, slides, or blog posts, and is remarkably good at writing and debugging complex code, often acting as a "co-pilot."
- GPT-4o, specifically, is multimodal (Omni), capable of generating images and handling "trickery" questions with witty and fun responses. It also features Canvas for collaborative code iteration and enhancement.
- Sometimes struggles with simple, precise tasks like counting letter occurrences due to underlying tokenization strategies.
O1 Preview (formerly Strawberry)
- This model, originally codenamed Strawberry, is the strongest of OpenAI's models, currently available to pro-subscribers but slated for wider release.
- Uses a chain-of-reasoning approach, delivering more precise and accurate results in tasks like counting letters or solving analogies.
- Shows improved reasoning and accuracy compared to standard GPT outputs, successfully solving problems that GPT-4o might miss.

Anthropic’s Claude Family

Includes Claude Haiku, Sonnet, and Opus, with Claude 3.5 Sonnet identified as the strongest due to recent versions surpassing older, more expensive models.
Effective for coding assistance, producing code in "artifacts" for easy sharing and version tracking. It offers clear explanations and is characterized by a charismatic, succinct, and pithy style.
Claude 3.5 Sonnet, released in October, has recently surpassed PhD-level proficiency in math, physics, and chemistry, leading in most benchmarks.
Strong on socio-ethical considerations and safety alignment, often pushing back on comparative claims against other AI models due to its ethical framework. However, it may err on tasks needing precise numeric or token-based counts, similar to other LLMs.

Google Gemini

The next-generation model, evolved from Bard, is now widely integrated into Google search responses.
Provides thorough, sometimes overly literal answers, often lacking the ability to understand subtle humor or nuance in queries like the "Hawaii to 17" example.
While maintaining factual accuracy, it is less creative or playful in humorous or abstract queries and also struggled with the letter counting task.

Cohere (Command R Plus)

A Canadian AI company, Cohere's model is well-known for its focus on technical rigor and domain expertise, leveraging specialized knowledge bases (RAG).
Produces well-structured and detailed answers, as seen in its response to philosophical questions, which was less expressive than Claude's but had more structure and substance. However, like many other models, it struggles with trivial counting tasks, reflecting limitations related to its training and inference mode.

Meta’s LAMA Model

An open-source model, accessible via meta.ai, which supports interactive image generation (e.g., creating images for abstract concepts) and text responses.
Performs adequately on direct questions, but its overall power is noted as less than some other frontier models. It is also limited in nuanced tasks like accurate letter counting.

Perplexity

Functions primarily as an AI-powered search engine rather than a traditional LLM, capable of using other models but also having its own.
Excels at summarizing current events and demonstrates strong research capabilities by providing nuanced, well-crafted responses to real-time information, even when other models have knowledge cutoffs.
Notably, it successfully handles some precise counting tasks, placing it alongside O1 Preview in accuracy for such challenges. It pushes back on self-comparison questions, stating it lacks such capabilities.

Common Strengths and Weaknesses

Strengths

All models are astonishingly powerful, generating structured, well-reasoned responses to detailed technical and creative queries.
Effectively transform brief prompts or bullet points into comprehensive outputs like emails, code, slides, blog posts, and explanations.
Claude 3.5 Sonnet has demonstrated near “PhD-level” proficiency in math, physics, and chemistry, with other models expected to follow.

Weaknesses

Consistent challenges with precise token counting (e.g., counting the letter “A”) due to tokenization and input processing strategies.
Varying ability to understand humor, indirect references, and subtle context, often providing overly literal interpretations, highlighting the need for human oversight in critical uses.
Tend to not be as strong with highly specialized subject matter, particularly in specific business domains, lacking the expert knowledge of a human.
Some models, especially with knowledge cutoffs (e.g., GPT's October last year cutoff), may struggle with recent events or emerging topics, requiring supplemental tools like Perplexity for real-time insights.
Can exhibit "strange blind spots," getting questions wrong but confidently stating incorrect answers, a phenomenon known as hallucination.

Business and Strategic Insights for Senior Management

As LLM performance converges across models, cost and API rate limits are becoming primary differentiators, making efficient pricing and resource management key strategic factors.
Demonstrations of iterative code enhancements (e.g., GPT-4.0 with Canvas) show significant efficiency gains in problem-solving, debugging, and rapid prototyping, acting as a powerful "co-pilot."
Comparative experiments (e.g., the leadership challenge among models) reveal that while all are powerful, differences in personality, safety, and alignment can significantly impact their suitability for sensitive or client-facing projects.
Selecting the LLM that best fits specific business needs (considering factors like scalability, handling unstructured data, contextual understanding, and domain expertise) remains crucial for strategic implementation.

Insights on Future Directions

Upcoming sessions will cover foundational concepts such as transformer architectures, token mechanisms, context window management, and API cost considerations—all key for technological oversight and strategic planning.
Bridging technical details with commercial deployment strategies will support data-driven decisions in non-traditional industries.

Search This Blog

Saad Aziz AI

TOC

Monday - July 28th, 2025

Tuesday - July 29th, 2025

Day 3 - Wednesday - July 30th, 2025

Overview of Frontier LLM Models

Key Models and Their Characteristics

OpenAI Models

Anthropic’s Claude Family

Google Gemini

Cohere (Command R Plus)

Meta’s LAMA Model

Perplexity

Common Strengths and Weaknesses

Strengths

Weaknesses

Business and Strategic Insights for Senior Management

Insights on Future Directions

Day 4 - July 4th, 2025

Comments

Post a Comment

Popular posts from this blog

Feature: Audit log for one login, and identity service

Getting started - Build your data science lab environment

QA - Run #1 - Results