QA - Run #1 - Results

# QA Report for Tag QA_July2025_A

- **Tag/Commit:** QA_July2025_A (`f32abc1`)
- **Date:** 2025-08-01
- **QA Outcomes:** All tests passed except for file upload (see below)...
- **Environment:** Windows 10, Python 3.12, requirements.txt as of tag
- **Notes:** See log for detailed results.
## Issue Found
- **Step:** 
- **Bug:** 
- **Log excerpt:**
- **Screenshot:** 

For this QA run, we begin with pristine state by:

  1. Stop all services
  2. Delete logs.db, and jobs.db
  3. Delete all files in doc_store folder
  4. Start all services
  5. Test, you should see no data on index page, and query should result in no data found error
Next, go ahead and upload new1.txt, and after that new2.txt

Issue the query: "can you summarize all of the todo items you can find, seperating each one on a single line, with due date first, then title, and description with any details you can augment"

Expected Answer:
- August 1, 2025: Respond to Amira and discuss weekend plans - August 3, 2025: Go to Somesh class and finish taking notes for the week

## Issue Found
2025-08-03 | Issue-Critical: One login page is down
- **Step:** After change .env, and cPanel secret values to non-defaults, and introducing code that attempts to safegaurd by terminating app -> hosed app, and One Login no longer worked
- **Bug:** 
- **Log excerpt:**
- **Screenshot:** 
Solution:
Using code base labels:
identity-backend
- QA_July2025_A_Checkpoint1 - read more about Git labels
echo-private
- QA_July2025_A
- QA_July2025_A_Checkpoint1












1. Issue with toggle OpenAI API/Locally hosted Ollama:

## Issue Found - **Step:** Upload > Query with model=Ollama - **Bug:** Ollama misses action item “Amira” - **Log excerpt:** (paste relevant logs) - **Screenshot:** (if available)


Details: Link

2. Issue with running worker.py and jobs table missing

- **Step:** Upload > Query with model=Ollama - **Bug:** Ollama misses action item “Amira” - **Log excerpt:** (paste relevant logs) - **Screenshot:** (if available)
https://saadazizai.blogspot.com/2025/08/issue-running-python-workerpy-from.html
It's brittle this way, should have better solution, tbd
## Bug Fix: Unable to start service after data purge
- **Action:** No fix.
- **Outcome:** Use workaround, start services in order: 
 -- logging_service.py
 -- parser_service.py
 -- api_gateway.py
 -- worker.py 
## Cpanel
A full multi-microservice app like Echo (with different ports for each service) will not run natively on most shared cPanel setups. | Read more
https://saadazizai.blogspot.com/2025/08/cpanel-architectural-limitations-single.html

3. Code issues: Log consistency, and verbosity

## Issue Found
- **Step:** 
- **Bug:** 
- **Log excerpt:**
- **Screenshot:** 
1) first, add changes to query endpoint so that it propogates our changes to route to backend properly (similar to query-ui)
2) second, let's abstract away redudant code, for example query, and query-ui. Also potentially related, why repeat code here, and why order of operations necessary or worker script crashes with job sql error
3) last, let's make sure everything is logged internally and using our global service!!! consistency, dry, etc

4. Prompt issues - openAI wants specificity (todo: date, day, time)

## Issue Found
- **Step:** 
- **Bug:** 
- **Log excerpt:**
- **Query:** list each todo item across all documents, make sure to include date, day, time details
- **Screenshot:** 

Notice, the query here becomes: "list each todo item across all documents"

Read more: Link



5. Prompt Issues - Open ai is unable to remove todo items who due dates have passed

6. August 3rd, 12:51 AM (Sunday) - End of day notes

identity-backend - core service

high-level review of your identity-backend MVP security as implemented, what’s fine for MVP, and what’s risky or needs improvement for production.

We’ll focus on:

  • OAuth2/OIDC flow basics

  • Token security

  • Credentials and secrets management

  • Surface area for leaks/bypass

  • Logging & auditability

  • Common attack surface

1. What you’ve implemented — High Level (MVP):

A. OAuth2 Authorization Code Flow (for browser-based login)

  • /authorize — accepts client_id, redirect_uri, etc. and renders login page

  • /login — verifies user (static username/password), issues one-time code, redirects with ?code=...

  • /token — exchanges code for JWT id_token (acting as access_token), with single-use code and client credentials check

B. JWT Service-to-Service & User Token

  • Tokens are short-lived (exp), signed, and audience/issuer-checked

  • Client secrets are enforced (basic security for /token endpoint)

  • JWT secret/issuer are loaded from config/env

C. Database

  • Auth codes stored in SQLite (authcodes.db), single-use, deleted after redemption

D. Logging

  • Custom unified logging — both to stderr (cPanel) and external service via JWT-secured POST

E. Config & Secrets

  • Uses .env and config.py for secrets and environment variables (with dev fallbacks)

F. CORS, XSS, CSRF

  • No explicit handling shown (common for MVP, but more later...)


2. MVP Security: What’s Acceptable

For an MVP, you meet many basic best practices:

  • One-time codes for auth (prevents replay attacks).

  • JWTs are short-lived, signed, with standard claims (iss, aud, exp, iat, etc.).

  • Secrets configurable by environment.

  • Logging includes warnings and info for all critical events (auth attempt, code issued, login fail, etc.).

  • Hardcoded credentials (for quick testing only!).

  • Session/cookies only used to store state between authorize/login (not used for access control in backend).


3. What’s NOT Secure for Production (Needs Attention Soon)

A. Hardcoded credentials

  • USERNAME = "username"; PASSWORD = "password"
    Why: Obvious, but even for MVP, this is hackable in seconds.
    How to Fix: At minimum, move to .env, or better, load from hashed DB table.
    Must fix for any external demo!


B. Secrets in Code

  • app.secret_key = "dev-secret"

  • JWT_SECRET_KEY = os.getenv("JWT_SECRET_KEY", "dev-secret")
    Why: "dev-secret" in code means anyone with access can forge tokens.
    How to Fix: Set all secrets only via environment variables.\


C. No HTTPS Enforcement (in code)

  • No @app.before_request to enforce HTTPS.
    Why: Without HTTPS, tokens and auth codes leak in transit (easy MitM).
    Fix: Always run behind HTTPS proxy.
    MVP OK if running only on localhost/behind trusted firewall.


D. Weak Redirect URI Check

  • Only checks startswith(); can be tricked with URLs like http://localhost:5000.evil.com/callback
    How to Fix: Parse and strictly match hostname/scheme.
    MVP OK if you fully control the client and are not exposing to third parties.


E. No CSRF/XSRF Protection

  • /login form posts with no CSRF token.
    Risk: Session fixation, CSRF in browser flows.
    Fix: Flask-WTF CSRF or similar.


F. Session Storage

  • Using Flask session for client_id, etc., with a dev secret (not ideal, but not a major risk for MVP, as it’s not auth session).


G. Audit Logging

  • Logging is great, but watch for accidental logging of secrets, passwords, or tokens.


H. Missing: Rate limiting, lockout on login failure, brute-force protection

  • No account lockout or rate limiting on /login.
    Risk: Password brute-forcing.
    Fix: Flask-Limiter or similar, especially for public deployments.


I. JWT Key Rotation

  • No support for rotating JWT signing keys or for exposing public keys for verification.

  • MVP: Fine if all services are under your control.


J. Error Handling/Info Leaks

  • Some errors could reveal sensitive info (e.g., “Invalid client_secret for X”).

  • Avoid echoing which fields were wrong in production.


4. What Looks Professional / You Won’t Look Like an Idiot

This MVP is better than most “first time” attempts:

  • You’re actually following the real OAuth2 flow.

  • Codes are single-use and stored in a DB.

  • JWT tokens have proper claims, exp, etc.

  • Logging is robust.

  • Configs are centralized.

You’ll look pro IF you:

  • Immediately fix hardcoded credentials.

  • Never use "dev-secret" in prod.

  • Make clear in your docs (README is decent!) what’s intentionally simplified for MVP and what’s a must-fix before launch.

  • Bonus: Add a comment/TODO in code at each “dangerous” place. (e.g., # TODO: Move credentials to secure DB before prod)


5. Summary Table

Area MVP Grade Production Grade Must-Fix for External Demo? Note
Hardcoded creds OK FAIL Yes Move to .env or DB now
JWT secret in code OK FAIL Yes Use env var, never code
HTTPS only OK FAIL Yes Use behind proxy, never expose HTTP
Code single-use PRO OK No Use short TTL in DB (OK for now)
Redirect check MVP OK/FAIL Only if 3rd party clients Use strict parse for prod
CSRF on forms OK FAIL Only if browser UI exposed Flask-WTF for prod
Logging PRO PRO No Good central logging
Rate limit/lockout OK FAIL Only if open on web Add Flask-Limiter

6. Bottom Line for MVP Demo

  • DO NOT share the public URL with anyone outside your team unless you’ve set real secrets and hidden the creds.

  • List “known shortcuts for MVP” in your docs to show you’re aware (this always looks good to reviewers and engineers).


Client Application review

Great, let's review your client app’s security posture and architecture.
You’ve built a Flask-based API Gateway/UI (plus Worker, Parser, Logging microservices), now enforcing login using your identity-backend. I’ll focus on:

  • Where you nailed the MVP (good practices, clear wins)

  • What is a liability, or will make you look unprepared to a reviewer

  • The flow from login to API use—holes, mitigations, and “easy wins”

  • Actionable advice (“MVP fine” vs “fix ASAP”)


1. High-Level Review: OIDC/OAuth2 Flow in Client

How it works:

  • /login → redirects user to identity-backend /authorize

  • /callback → gets an auth code from OIDC, exchanges for id_token via /token at the identity service (with client_secret and client_id)

  • Stores id_token (JWT) in session

  • All protected routes call require_login() which checks/validates token before rendering anything private (e.g., /, /upload, /query-ui)

  • Logout clears session


2. What’s Strong / Looks Professional (MVP)

  • Uses a real OIDC-like flow—code exchange, client_id, client_secret, redirect_uri all present

  • State parameter is generated and validated to prevent CSRF on OIDC flow

  • Audience, issuer, expiry checks on JWT (even if signature verify is skipped for now)

  • Logs every step—including token hashes (not raw tokens!), source IP, and UA—showing traceability and audit effort

  • Session secret is configurable and not hardcoded (FLASK_SECRET_KEY)

  • Environment-based config for secrets, API keys, etc.

  • Centralized logging utility (nice touch)

  • Logs endpoint is clearly marked as public and temporary, with a warning in code


3. What’s MVP Only / Needs Attention Before Launch

A. Token Signature Verification

  • Current: Skips signature verification (options={"verify_signature": False} in jwt.decode)

    • Why risky: Anyone can forge a token with correct claims

    • How to fix: Always verify signature with the real secret (available in all your services)

    • MVP ok for dev/test, but will look “rookie” if left before review

B. Session Security

  • Flask session cookie is signed, but not set to Secure or HttpOnly by default.

    • Set SESSION_COOKIE_SECURE = True and SESSION_COOKIE_HTTPONLY = True if running behind HTTPS.

C. Client Secret Handling

  • Secret is in .env (good for MVP), but do not check this into git

  • No brute-force or rate-limiting on code/token exchange; risk is lower in MVP, but consider Flask-Limiter in prod

D. JWT in Session Only

  • This means APIs you expose to browser clients are as secure as your session—fine, but if you add APIs intended for pure API access, you’ll want to accept JWT in the Authorization header and verify it.

E. Logs Endpoint

  • /logs and /logs.json are public and marked as such—do not forget to restrict this in prod.

    • Add a check like if not is_logged_in(): return redirect(...)

    • Leaving logs public is a classic "rookie mistake" in prod.

F. CSRF

  • State param is handled on login/callback (great!)

  • No CSRF tokens for /upload, /query-ui forms—fix if allowing non-API client use

G. Token Scope and Claims

  • No fine-grained authorization (just authentication)

  • No RBAC, but that’s fine for MVP

H. OIDC Redirect URI Security

  • Only one allowed redirect URI (http://localhost:5000/callback), and is checked, but just as string.

    • If user-supplied, parse with urllib and match hostname/protocol, not just startswith

I. Error Handling/Info Leaks

  • Error pages can leak too much info in debug—wrap in custom error handler for prod

J. OpenAI Key Management

  • Reads from env, but never expose this to browser clients (yours is safe as backend only)


4. “Showstopper” Issues to Fix Before Going Public

  • Token signature must be verified in verify_id_token() (high severity if not fixed)

  • Remove/secure the logs endpoint

  • Set session cookie secure/httponly

  • Remove any hardcoded/test secrets before git push/public deploy


5. Nice-to-Have Improvements (for Reviewer/Professional Polish)

  • Add SESSION_COOKIE_SAMESITE = 'Lax' or 'Strict' to prevent cross-site session attacks

  • Use .env for all secrets, never fallback to “dev-” defaults in prod

  • Add rate-limiting on login/token/callback endpoints

  • Implement proper error page templates for 400/401/500s

  • Add X-Content-Type-Options: nosniff, X-Frame-Options: DENY headers for bonus security


6. Your Actual Flow: Any Holes?

  • All protected endpoints go through require_login()good

  • Session-stored JWT means browser users can't access APIs directly (by design)—safe for web UIs

  • If running only on localhost or dev, current setup is fine.

  • If deploying externally, you must enforce HTTPS and verify JWT signatures everywhere


7. Checklist Table

Area MVP Grade Prod Grade Must-Fix for Review/Demo Notes
OIDC code flow PRO PRO No Well-implemented, clear state mgmt
JWT signature verify OK FAIL Yes Remove verify_signature: False
Logs endpoint public OK FAIL Yes Restrict before prod
Session cookie security OK OK Yes Add Secure/HttpOnly/SameSite
.env secret mgmt PRO PRO No Good, just don’t check into git
CSRF tokens on POST OK FAIL If browser upload used Use Flask-WTF if necessary
Rate limiting OK FAIL No Flask-Limiter for brute force in prod


8. Example: How to Fix JWT Signature Verification

Change this:

claims = jwt.decode(token, options={"verify_signature": False})

To:

claims = jwt.decode(token, os.getenv("JWT_SECRET_KEY"), algorithms=["HS256"])

  • Make sure the secret is set in your env.


9. Final Verdict (for MVP and Professional Impression)

  • This is a solid MVP—you’ve gone much further than most (state, audience, issuer, expiry checks, logging, state validation).

  • You WON’T look like an idiot as long as you:

    • Fix signature verification

    • Secure your logs

    • Tighten up session cookie security

    • Remove “dev” secrets



Comments

Popular posts from this blog

Feature: Audit log for one login, and identity service

Getting started - Build your data science lab environment