VibeCodeDoktor Report

Why This Matters

AI assistants pack everything into one big function. Works until you change something — then everything breaks. If a function has >50 lines or >3 nesting levels, ask AI to split it.

What's Happening

Five nested try/except blocks make the control flow nearly impossible to follow.

What Happens if You Don't Fix This

Errors get caught at the wrong level, leading to silent failures and hard-to-find bugs.

How to Fix

Extract each try/except block into its own function
Use specific exceptions instead of generic ones
Use early-return pattern for error paths

AI Fix Prompt

"In server/transcribe.py starting at line 128, flatten the 5 nested try/except blocks by extracting each into a separate function that raises specific exceptions."

Code

try:
    try:
        try:
            result = whisper.transcribe(chunk)
        except APIError:
            try:
                result = whisper.transcribe(chunk, model="base")
            except:
                ...

What to Remember

Flatten nested error handling by extracting functions. Each function handles one concern and its specific errors.

What's Happening

app.py imports models.py, and models.py imports app.py for the db instance. This is worked around with delayed imports, making the code fragile.

What Happens if You Don't Fix This

Any restructuring can lead to ImportError. The code is hard to test because import order is critical.

How to Fix

Move database instance to its own module (db.py)
Both modules import from db.py instead of each other
Introduce Flask Application Factory pattern

AI Fix Prompt

"Create server/db.py exporting the SQLAlchemy db instance. Update server/app.py and server/models.py to import from server/db.py instead of each other."

Code

# server/app.py
from server.models import User, Transcription

# server/models.py
from server.app import db  # circular!

What to Remember

Circular imports indicate poor module boundaries. Extract shared dependencies into a separate module.

What's Happening

Configuration values are spread across config.py, app.py, models.py, transcribe.py, whisper_api.py, and setup.py with sometimes conflicting defaults.

What Happens if You Don't Fix This

Inconsistent configuration leads to hard-to-reproduce bugs, especially between development and production.

How to Fix

Centralize all configuration in config.py
Use environment-dependent config classes (Development, Production, Testing)
Other modules import from config.py

AI Fix Prompt

"Consolidate all configuration into server/config.py with Development/Production/Testing classes. Update all other files to import from config."

Code

# config.py: WHISPER_MODEL = "medium"
# transcribe.py: MODEL = os.getenv("MODEL", "small")  # conflicts!
# whisper_api.py: DEFAULT_MODEL = "base"  # another conflict!

What to Remember

Configuration should live in one place. A single source of truth eliminates configuration drift.

What's Happening

The chunking algorithm implements silence detection, overlap handling, and dynamic chunk sizing in a 120-line function with 8 local variables.

What Happens if You Don't Fix This

Hard to debug when audio quality varies. Tiny changes can dramatically worsen transcription results.

How to Fix

Use pydub or librosa for silence detection
Break algorithm into clearly named sub-steps
Use configurable thresholds instead of magic numbers

AI Fix Prompt

"Refactor chunk_audio() in server/audio.py to use pydub.silence.split_on_silence() instead of the custom implementation. Extract constants to config."

Code

def chunk_audio(audio_data, sr=16000):
    threshold = 0.015  # magic number
    min_silence = int(sr * 0.3)  # magic number
    # ... 120 lines of sliding window logic

What to Remember

Prefer well-tested libraries (pydub, librosa) over custom implementations for common audio processing tasks.

What's Happening

client/ui.py contains both GUI rendering and business logic (file selection, API calls, error handling) in a 450-line file.

What Happens if You Don't Fix This

Changes to layout can break business logic and vice versa. Unit testing is practically impossible.

How to Fix

Extract API calls to separate client/api.py module
Move business logic to client/controller.py
Use UI module only for rendering and event binding

AI Fix Prompt

"Split client/ui.py into three modules: client/ui.py (rendering only), client/api.py (API calls), client/controller.py (business logic coordination)."

Code

class MainWindow:
    def on_record_click(self):
        # 80 lines mixing UI updates, API calls, and error handling
        self.status_label.config(text="Recording...")
        audio = self.recorder.record()
        response = requests.post(API_URL + "/upload", ...)

What to Remember

Separate presentation from logic. MVC or similar patterns make code testable and maintainable.

What's Happening

A single function handles audio decoding, chunking, Whisper API calls, post-processing, punctuation, and database writes — all in 380 lines.

What Happens if You Don't Fix This

Virtually untestable, extremely error-prone to modify. Any bug fix can have unintended side effects.

How to Fix

Split into dedicated steps: decode_audio, chunk_audio, call_whisper, postprocess, save_result
Make each step independently testable
Implement a pipeline pattern (step by step)

AI Fix Prompt

"Refactor transcribe_and_process() in server/transcribe.py into 5 smaller functions: decode_audio(), chunk_audio(), call_whisper(), postprocess_text(), save_transcription(). Wire them together in a pipeline function."

Code

def transcribe_and_process(audio_file, user_id, language="de"):
    # ... 380 lines of nested logic
    # audio decoding, chunking, API calls, text cleanup, DB writes

What to Remember

Functions over 50 lines are a smell. Over 100 is dangerous. Over 300 is a maintenance nightmare. Split along responsibility boundaries.

What's Happening

The /transcribe endpoint has 28 different decision paths through nested conditions and error handling.

What Happens if You Don't Fix This

Each change theoretically requires testing 28 paths. The increased regression risk slows down development.

How to Fix

Split handler into middleware chain
Extract validation, processing, and response into separate functions
Use guard clauses instead of deep nesting

AI Fix Prompt

"Refactor the /transcribe endpoint in server/app.py to use guard clauses for early returns and extract validation, processing, and response formatting into separate functions."

Code

@app.route("/transcribe", methods=["POST"])
def transcribe():
    if request.content_type:
        if "multipart" in request.content_type:
            if "audio" in request.files:
                # ... 15 more levels of nesting

What to Remember

Aim for cyclomatic complexity under 10. Use guard clauses (early returns) to flatten nested conditions.

Why This Matters

In vibe coding, testing is the only guarantee. Every new prompt can break old code. Golden rule: write a test proving current state works BEFORE asking AI to change anything.

What's Happening

Not a single API endpoint is tested with HTTP requests. Neither upload, transcription, nor admin routes have integration tests.

What Happens if You Don't Fix This

Routing errors, wrong HTTP status codes, and serialization issues are only discovered in production.

How to Fix

Use Flask test client (app.test_client())
Test happy path + error cases for each endpoint
Validate request/response format and status codes

AI Fix Prompt

"Create tests/test_api.py using Flask test client. Test all routes: GET /health, POST /upload, POST /transcribe, GET /transcriptions, /admin/* with both valid and invalid inputs."

Code

# No integration tests exist. Example of what should be:
# def test_upload_invalid_file(client):
#     response = client.post("/upload", data={"audio": (BytesIO(b"not audio"), "test.exe")})
#     assert response.status_code == 400

What to Remember

Every API endpoint needs at least a happy-path and an error-case integration test using the framework test client.

What's Happening

Test data is created inline without reusable fixtures. Every new test must write its own setup code.

What Happens if You Don't Fix This

Duplicated setup code leads to inconsistent test data and makes tests hard to maintain.

How to Fix

Create pytest fixtures in conftest.py
Factory functions for commonly needed test objects
Provide fixtures for app instance, DB session, test client

AI Fix Prompt

"Create tests/conftest.py with fixtures: app (Flask test app), client (test client), db_session (test database), sample_audio (test audio file). Create tests/factories.py for User and Transcription factories."

Code

# Current: no fixtures, each test duplicates setup
def test_something():
    app = create_app()  # duplicated
    db.create_all()  # duplicated
    user = User(email="test@test.com")  # duplicated

What to Remember

Good test infrastructure (fixtures, factories, helpers) pays for itself within weeks by making tests easy to write and maintain.

What's Happening

The only mock for the Whisper API returns a simplified object that doesn't match the actual API response format.

What Happens if You Don't Fix This

Tests pass even though the code would fail with the real API. False confidence.

How to Fix

Align mock responses with real API responses
Use fixture with recorded real API response
Write contract test against API documentation

AI Fix Prompt

"In tests/test_transcribe.py, update the Whisper API mock to return the actual response format including segments, language, and duration fields."

Code

# Mock returns simplified format:
mock_whisper.return_value = {"text": "hello world"}
# Real API returns: {"text": "...", "segments": [...], "language": "de", "duration": 12.5}

What to Remember

Mocks must match the real interface. Record real responses and use them as fixtures to prevent drift.

What's Happening

The entire project has only 2 tests in a single test file. Core functionality like upload, transcription, and authentication is untested.

What Happens if You Don't Fix This

Any change can silently break existing functionality. Refactoring becomes a gamble.

How to Fix

Set up test framework (pytest, pytest-flask)
Test at least every API endpoint
Cover critical business logic with unit tests
Set up CI pipeline with test execution

AI Fix Prompt

"Set up pytest with pytest-flask. Create test files: tests/test_api.py (endpoint tests), tests/test_transcribe.py (transcription logic), tests/test_models.py (database operations). Target 60% coverage minimum."

Code

# tests/test_transcribe.py — ENTIRE test suite:
def test_whisper_returns_text():
    assert transcribe("hello.wav") != ""

def test_empty_audio():
    assert transcribe("empty.wav") == ""

What to Remember

Test coverage below 40% means you are flying blind. Prioritize testing critical paths: auth, payment, data persistence.

What's Happening

Existing tests connect to the same database as production because no test configuration exists.

What Happens if You Don't Fix This

Tests can modify or delete production data. An accidental test run can destroy real user data.

How to Fix

Configure separate test database connection in conftest.py
Use SQLite in-memory for fast unit tests
Create fixtures for test data setup and teardown

AI Fix Prompt

"Create tests/conftest.py with a test database fixture using SQLite in-memory. Update tests to use the fixture instead of importing from server.config directly."

Code

# tests/test_transcribe.py
from server.config import DATABASE_URL  # same as production!
from server.models import db

def test_save_transcription():
    db.session.add(...)  # writes to production DB!

What to Remember

Tests must never touch production databases. Use separate test databases, fixtures, and cleanup.

What's Happening

There is no GitHub Actions, GitLab CI, or any other CI/CD configuration. Tests must be run manually.

What Happens if You Don't Fix This

Without automated test execution, tests are forgotten and code quality erodes silently.

How to Fix

Create GitHub Actions workflow for tests
Run lint check (flake8/ruff) and tests on every push
Enable branch protection rules

AI Fix Prompt

"Create .github/workflows/test.yml that runs pytest on every push and PR. Include ruff linting. Add a badge to README.md."

Code

# No CI/CD configuration files found:
# No .github/workflows/*.yml
# No .gitlab-ci.yml
# No Jenkinsfile

What to Remember

CI/CD is non-negotiable for any team project. Automate linting and tests from day one.

Why This Matters

Dead code accumulates from AI sessions — old approaches left behind. It confuses both you and future AI prompts. Keep code clean: what's not needed gets deleted.

What's Happening

server/whisper_api.py is not imported by any other module. The Whisper integration is done directly in transcribe.py.

What Happens if You Don't Fix This

Dead module confuses new developers and gets accidentally modified during refactoring.

How to Fix

Delete the module since it is unused
If desired: refactor transcribe.py to actually use whisper_api.py

AI Fix Prompt

"Delete server/whisper_api.py — it is not imported anywhere. Run grep -r "whisper_api" to confirm no references exist."

Code

# server/whisper_api.py — 180 lines, imported by nobody
class WhisperClient:
    def __init__(self, api_key, model="medium"):
        ...
    def transcribe(self, audio_path, language="de"):
        ...

What to Remember

Dead code is not free. It costs attention, creates confusion, and can introduce bugs when accidentally modified.

What's Happening

14 imported modules or functions are never used: json, sys, re in app.py, hashlib and hmac in auth.py, among others.

What Happens if You Don't Fix This

Unused imports slow down startup, increase memory footprint, and obscure real dependencies.

How to Fix

Use ruff or autoflake to automatically remove unused imports
Set up isort for consistent import sorting
Set up pre-commit hook for import checks

AI Fix Prompt

"Run ruff check --select F401 --fix server/ to auto-remove all unused imports. Then run ruff check --select I --fix server/ to sort remaining imports."

Code

import json  # unused
import sys  # unused
import re  # unused
from flask import Flask, request, jsonify, redirect  # redirect unused

What to Remember

Use an auto-formatter (ruff, autoflake) to catch unused imports. Configure as a pre-commit hook to prevent accumulation.

What's Happening

65 lines of commented-out WebSocket code for real-time streaming block readability without benefit.

What Happens if You Don't Fix This

Commented-out code is never re-enabled but confuses developers and complicates code reviews.

How to Fix

Delete the commented-out code
If the feature is planned: document as issue/ticket
Git history preserves the code anyway

AI Fix Prompt

"Delete the commented-out WebSocket streaming code at server/app.py lines 245-310. Create a GitHub issue "Implement real-time WebSocket streaming" if the feature is still planned."

Code

# TODO: WebSocket streaming (v2)
# @socketio.on("audio_stream")
# def handle_stream(data):
#     chunk = data["chunk"]
#     # ... 60 more commented lines

What to Remember

Delete commented-out code. Git preserves history. Dead comments are noise, not documentation.

What's Happening

After a return statement at line 288, there are 12 lines of code that can never execute.

What Happens if You Don't Fix This

Developers might assume this code executes and introduce bugs based on false assumptions.

How to Fix

Delete the unreachable code
Check whether the early return is correct or whether the code should be reachable

AI Fix Prompt

"Delete the unreachable code at server/transcribe.py lines 290-302 (after the return at line 288). Verify the return at 288 is intentional."

Code

    return result  # line 288

    # Dead code below — never executes:
    logger.info("Post-processing complete")
    stats.record_transcription(len(result))
    cache.set(cache_key, result)

What to Remember

Unreachable code after return statements is a common mistake. Use a linter rule to catch it automatically.

What's Happening

A function migrate_v1_to_v2() in utils.py was written for a one-time data migration and is never called again.

What Happens if You Don't Fix This

Minimal risk, but increases cognitive load when reading utils.py.

How to Fix

Delete the function or move to a separate migrations directory

AI Fix Prompt

"Delete the migrate_v1_to_v2() function at server/utils.py lines 78-120. It was a one-time migration and is no longer called."

Code

def migrate_v1_to_v2():
    """One-time migration from v1 schema to v2. Run once, then delete."""
    # ... 40 lines of migration logic
    # Last run: 2024-08-15

What to Remember

One-time scripts should live in a separate directory (e.g., migrations/) or be deleted after execution.

Why This Matters

AI repeats itself between prompts. Inconsistent naming, duplicate functions. Each issue is small but together code becomes unmaintainable. Check regularly.

What's Happening

except: without a specific exception catches everything including SystemExit, KeyboardInterrupt, and MemoryError.

What Happens if You Don't Fix This

Process cannot be cleanly terminated. Severe system errors are swallowed and go unnoticed.

How to Fix

Replace except: with except Exception:
Even better: catch specific exceptions (requests.Timeout, openai.APIError)
Add logging for caught exceptions

AI Fix Prompt

"In server/transcribe.py, replace all bare except: clauses with except Exception as e: and add logging.exception("...") calls."

Code

try:
    result = process_audio(chunk)
except:  # catches EVERYTHING
    result = ""  # silently returns empty string

What to Remember

Never use bare except:. Always catch specific exceptions or at minimum except Exception.

What's Happening

Multiple mutable global variables (active_jobs, cache_dict, stats_counter) are shared between requests without synchronization.

What Happens if You Don't Fix This

Race conditions under load can lead to data loss, corrupt counters, or cache inconsistencies.

How to Fix

Replace global variables with thread-safe alternatives (threading.Lock)
Use Redis for shared state
Use Flask context or dependency injection

AI Fix Prompt

"Replace global mutable state in server/transcribe.py with thread-safe alternatives: use threading.Lock for active_jobs, use Flask-Caching for cache_dict, use Redis or atomic operations for stats_counter."

Code

active_jobs = {}  # mutable global, shared across threads
cache_dict = {}  # mutable global, no TTL, no size limit
stats_counter = {"total": 0, "errors": 0}  # race condition!

What to Remember

Global mutable state is the enemy of concurrent code. Use thread-safe structures or external state stores.

What's Happening

Numeric values like 16000, 0.015, 0.3, 512 are used directly in code without explanatory constants.

What Happens if You Don't Fix This

Hard to understand what the numbers mean. Changes require find-and-replace across multiple files.

How to Fix

Define named constants (SAMPLE_RATE, SILENCE_THRESHOLD, MIN_SILENCE_DURATION)
Centralize constants in config.py

AI Fix Prompt

"Extract magic numbers in server/audio.py to named constants: SAMPLE_RATE = 16000, SILENCE_THRESHOLD = 0.015, MIN_SILENCE_SECONDS = 0.3, FFT_SIZE = 512."

Code

audio = audio.set_frame_rate(16000)  # what is 16000?
if amplitude < 0.015:  # what threshold?
    if silence_frames > int(16000 * 0.3):  # ??

What to Remember

Name every magic number. SAMPLE_RATE = 16000 is self-documenting; 16000 alone is not.

What's Happening

Different endpoints return errors in different formats: sometimes {"error": "..."}, sometimes {"message": "..."}, sometimes plain text.

What Happens if You Don't Fix This

Clients must handle different error formats, leading to unreliable error display.

How to Fix

Create unified error handler with Flask errorhandler()
Define consistent format: {"error": {"code": "...", "message": "..."}}
Update all endpoints to consistent format

AI Fix Prompt

"Create an error handler in server/app.py using @app.errorhandler that returns {"error": {"code": status_code, "message": str(error)}} for all error responses."

Code

# Endpoint A:
return jsonify({"error": "File too large"}), 400
# Endpoint B:
return jsonify({"message": "Invalid format"}), 422
# Endpoint C:
return "Server error", 500

What to Remember

Standardize error responses across your API. Clients should parse errors the same way everywhere.

What's Happening

Text fields like title and notes accept arbitrarily long strings without length restriction at the application level.

What Happens if You Don't Fix This

Extremely long inputs can stress the database and break UI elements.

How to Fix

Define maximum length at model level with String(255)
Validate before saving
Set maxlength attribute on client side

AI Fix Prompt

"Add length constraints to server/models.py: title = Column(String(200), nullable=False), notes = Column(String(5000)). Add validation in the route handlers."

Code

class Transcription(db.Model):
    title = db.Column(db.String)  # no length limit
    notes = db.Column(db.Text)  # no length limit

What to Remember

Always define maximum lengths for user-provided text fields. Unbounded input is a security and stability risk.

What's Happening

API calls to the Whisper service have no timeout, no retry, and no specific error handling. A 500 or timeout crashes the entire request handler.

What Happens if You Don't Fix This

Transient API errors cause complete transcription failure. Users lose their recording without an error message.

How to Fix

Set timeout for API calls (e.g., 30 seconds)
Implement retry logic with exponential backoff
Catch specific errors and return user-friendly messages

AI Fix Prompt

"Wrap Whisper API calls in server/transcribe.py with tenacity retry decorator: @retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10)). Add timeout=30 to requests."

Code

response = openai.audio.transcriptions.create(
    model=WHISPER_MODEL,
    file=audio_chunk,
    language=language
)  # no timeout, no retry, no error handling

What to Remember

All external API calls need timeout, retry, and error handling. Assume the network will fail.

What's Happening

The entire project uses print() for output instead of the Python logging module. 47 print() calls spread across 6 files.

What Happens if You Don't Fix This

No log levels, no structured output, no ability to filter logs in production or send to log aggregators.

How to Fix

Configure Python logging module
Replace print() with logger.info/warning/error
Set up structured logging (JSON) for production

AI Fix Prompt

"Replace all print() calls with proper logging: import logging; logger = logging.getLogger(__name__); replace print("Error:...") with logger.error("..."), print("Processing...") with logger.info("...")."

Code

print(f"Processing file: {filename}")  # should be logger.info
print(f"ERROR: {str(e)}")  # should be logger.error
print(f"Transcription complete in {elapsed}s")  # should be logger.info

What to Remember

Use the logging module from day one. Structured logs with levels are essential for production debugging.

Why This Matters

Documentation bridges you and your future self (and the AI). Missing docs = most common reason AI projects get abandoned. AI can write them — just ask.

What's Happening

The README.md contains only "# codedictate" and a one-line description. Installation, usage, API documentation, and architecture are missing.

What Happens if You Don't Fix This

New developers cannot set up or understand the project without reading the code.

How to Fix

Add sections: Installation, Quickstart, API Documentation, Architecture
Document system requirements (Python version, FFmpeg, etc.)
Provide example commands for setup and start

AI Fix Prompt

"Expand README.md with sections: ## Features, ## Requirements (Python 3.10+, FFmpeg, OpenAI API key), ## Installation, ## Quick Start, ## API Endpoints, ## Configuration, ## Testing."

Code

# codedictate
Voice-to-text dictation tool.

What to Remember

A good README is the most cost-effective documentation. It saves hours of onboarding time per developer.

What's Happening

None of the 8 API endpoints are documented. Neither OpenAPI/Swagger nor inline docstrings describe request/response formats.

What Happens if You Don't Fix This

Frontend developers must read the backend code to understand the API. Integration becomes guesswork.

How to Fix

Install flask-openapi3 or flasgger for automatic API docs
At minimum add docstrings with input/output per route
Maintain OpenAPI spec as YAML

AI Fix Prompt

"Add docstrings to all route handlers in server/app.py with format: Args (JSON body fields), Returns (response format), Raises (error cases). Consider adding flask-openapi3."

Code

@app.route("/transcribe", methods=["POST"])
def transcribe():
    # No docstring, no type hints, no documentation
    file = request.files["audio"]
    ...

What to Remember

Document your API at least with docstrings. For public APIs, use OpenAPI/Swagger for interactive documentation.

What's Happening

82% of functions in the project have no docstring. Only 5 of 28 functions describe what they do, what they expect, and what they return.

What Happens if You Don't Fix This

Developers must read the implementation to understand the interface. Significantly increases onboarding time.

How to Fix

Add docstrings to all public functions
Format: short description, Args, Returns, Raises
Most important: public API functions and complex logic

AI Fix Prompt

"Add Google-style docstrings to all public functions in server/transcribe.py, server/audio.py, and server/models.py. Include Args, Returns, and Raises sections."

Code

def chunk_audio(audio_data, sr=16000):
    # No docstring — what does it return? What format is audio_data?
    ...

def postprocess(text, language):
    # No docstring — what postprocessing? What languages supported?
    ...

What to Remember

Docstrings are the contract between functions. They should answer: what does it do, what does it need, what does it return.

What's Happening

There is neither a .env.example nor documentation of which environment variables are required.

What Happens if You Don't Fix This

New developers or deployments fail because required variables are not set.

How to Fix

Create .env.example with all required variables
Comments with descriptions and example values
Reference .env.example in README

AI Fix Prompt

"Create .env.example with all environment variables used in server/config.py: OPENAI_API_KEY, DATABASE_URL, FLASK_SECRET_KEY, FLASK_DEBUG, WHISPER_MODEL. Add descriptions as comments."

Code

# config.py uses these but nobody documents them:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")  # required but undocumented
DATABASE_URL = os.getenv("DATABASE_URL", "sqlite:///local.db")
SECRET_KEY = os.getenv("FLASK_SECRET_KEY", "dev-key-change-me")

What to Remember

A .env.example file is cheap insurance. It documents requirements and prevents deployment failures.

What's Happening

No documentation about why certain technology decisions were made (Whisper model choice, SQLite vs. PostgreSQL, client architecture).

What Happens if You Don't Fix This

Future developers repeat evaluations or change decisions without knowing the context.

How to Fix

Create docs/adr/ directory
Write ADR for each significant decision (lightweight format)
Document context, decision, consequences

AI Fix Prompt

"Create docs/adr/001-whisper-model-selection.md and docs/adr/002-sqlite-for-development.md using the lightweight ADR template (Context, Decision, Consequences)."

Code

# No architecture documentation found.
# Questions that remain unanswered:
# - Why Whisper medium model instead of large?
# - Why SQLite instead of PostgreSQL?
# - Why desktop client instead of web-only?

What to Remember

Architecture Decision Records (ADRs) prevent repeating past discussions. Even one-paragraph ADRs save future time.

Why This Matters

AI knows patterns but doesn't always follow them. It mixes async/await with .then(), ignores conventions. Tell the AI which patterns your project uses — it will follow them.

What's Happening

The .gitignore file has no entries for .env, *.pem, *.key, or upload directories. Sensitive files could be accidentally committed.

What Happens if You Don't Fix This

An accidental git add . commits API keys, certificates, and user data to the repository.

How to Fix

Add .env, *.pem, *.key, uploads/, *.sqlite to .gitignore
Use gitignore.io template for Python/Flask
Install git-secrets as pre-commit hook

AI Fix Prompt

"Update .gitignore to include: .env, .env.*, *.pem, *.key, uploads/, *.sqlite, *.db, __pycache__/, .pytest_cache/, htmlcov/."

Code

# .gitignore (incomplete):
__pycache__/
*.pyc
# Missing: .env, *.pem, *.key, uploads/, *.sqlite

What to Remember

Start every project with a comprehensive .gitignore. Use gitignore.io or GitHub templates as a baseline.

What's Happening

No documentation or configuration for virtual environments. Developers might install dependencies globally.

What Happens if You Don't Fix This

Global installations lead to version conflicts between projects and make builds non-reproducible.

How to Fix

Provide Makefile or setup.sh with venv creation
Document in README: python -m venv .venv && source .venv/bin/activate
Add .venv/ to .gitignore

AI Fix Prompt

"Create a Makefile with: setup target (creates venv, installs deps), run target (starts server), test target (runs tests). Add .venv/ to .gitignore."

Code

# No Makefile, no setup script, no venv documentation
# Developers are expected to... guess?
# pip install -r requirements.txt  # global? venv? who knows!

What to Remember

Always use virtual environments. Document the setup process. Make it one command.

What's Happening

No /health or /readiness endpoint for monitoring, load balancers, or container orchestration.

What Happens if You Don't Fix This

Load balancers and monitoring tools cannot check the application state.

How to Fix

Add GET /health endpoint that checks DB connection and API reachability
HTTP 200 when healthy, 503 when not
Use Docker HEALTHCHECK directive

AI Fix Prompt

"Add a /health endpoint to server/app.py that checks: database connectivity (SELECT 1), returns {"status": "healthy", "db": "ok"} or 503 with details."

Code

# No health check endpoint exists.
# Docker and monitoring have no way to check if the app is running correctly.

What to Remember

Health check endpoints are essential for production. They enable automated recovery and monitoring.

What's Happening

The Flask SECRET_KEY has a hardcoded fallback "dev-key-change-me" that gets used in production when the environment variable is not set.

What Happens if You Don't Fix This

With a known secret key, attackers can sign session cookies and gain admin access.

How to Fix

Remove fallback — application should not start without SECRET_KEY
Use secrets.token_hex(32) for production
Add startup check that aborts on missing key

AI Fix Prompt

"In server/config.py, change SECRET_KEY to raise an error if not set: SECRET_KEY = os.environ["FLASK_SECRET_KEY"] # no fallback, must be set."

Code

SECRET_KEY = os.getenv("FLASK_SECRET_KEY", "dev-key-change-me")  # predictable!

What to Remember

Never provide default values for security-critical configuration. Fail loudly instead of running insecurely.

What's Happening

No code formatter (Black, Ruff) configured. The code has inconsistent indentation, line lengths, and string quoting.

What Happens if You Don't Fix This

Inconsistent style creates unnecessary diff noise in code reviews and slows down reading.

How to Fix

Configure ruff format (pyproject.toml)
Set up pre-commit hook for automatic formatting
Run once on entire project

AI Fix Prompt

"Add ruff configuration to pyproject.toml: [tool.ruff] line-length = 100. Create .pre-commit-config.yaml with ruff check and ruff format hooks. Run ruff format . on the entire project."

Code

# Inconsistent style throughout:
some_var="no spaces"  # line 12
other_var = "with spaces"  # line 13
very_long_function_call(argument1, argument2, argument3, argument4, argument5)  # 120+ chars

What to Remember

Pick a formatter (ruff, black), configure it once, never argue about style again.

Why This Matters

AI loves adding packages — sometimes unnecessarily. Every dependency is a risk. Always ask if a native solution exists. Fewer deps = fewer attack vectors.

What's Happening

Only 8 direct dependencies are pinned but their transitive dependencies are not. pip install can install different versions on different machines.

What Happens if You Don't Fix This

Builds are not reproducible. "Works on my machine" becomes the standard problem.

How to Fix

Use pip-compile (pip-tools) to generate a complete requirements.txt
Or: use Poetry/PDM with lock file
Commit requirements.txt and lock file to Git

AI Fix Prompt

"Install pip-tools (pip install pip-tools). Create requirements.in with direct deps. Run pip-compile requirements.in to generate a fully pinned requirements.txt with hashes."

Code

# requirements.txt — only direct deps pinned:
Flask==2.2.3
openai==1.6.1
SQLAlchemy==2.0.25
# But what version of Jinja2? Markupsafe? Click? Unknown!

What to Remember

Always pin all dependencies including transitive ones. Use pip-compile, Poetry, or PDM for reproducible builds.

What's Happening

No check whether the licenses of the 8 dependencies are compatible with the planned licensing model.

What Happens if You Don't Fix This

If the project is distributed commercially, GPL-licensed dependencies could cause legal issues.

How to Fix

Install and run pip-licenses
Check compatibility with planned licensing model
Document results in LICENSES.md

AI Fix Prompt

"Run pip-licenses --format=table to audit all dependency licenses. Create LICENSES.md documenting the findings."

Code

# Unknown licenses in dependency tree:
# Flask — BSD-3-Clause (OK)
# openai — Apache-2.0 (OK)
# But what about transitive deps?

What to Remember

Audit dependency licenses early, especially if you plan to distribute commercially.

What's Happening

Flask 2.2.3 is vulnerable to session cookie manipulation (CVE-2023-30861). The current version is 3.1.x.

What Happens if You Don't Fix This

Attackers can manipulate session cookies and impersonate other users.

How to Fix

Update Flask to >= 3.0.0
Review breaking changes in Flask 3.0 migration guide
Run all tests after update

AI Fix Prompt

"In requirements.txt, update Flask from 2.2.3 to 3.1.0. Review the Flask 3.0 migration guide for breaking changes. Run tests after update."

Code

# requirements.txt
Flask==2.2.3  # CVE-2023-30861: session cookie vulnerability
Werkzeug==2.2.3  # also outdated, update together

What to Remember

Run pip-audit or safety check regularly. Pinned versions require active maintenance to stay secure.

What's Happening

The project uses setup.py instead of pyproject.toml. setup.py is being superseded by pyproject.toml per PEP 517/518.

What Happens if You Don't Fix This

Minimal risk short-term, but future tooling will assume pyproject.toml.

How to Fix

Migrate setup.py to pyproject.toml
Switch build system to setuptools with pyproject.toml

AI Fix Prompt

"Convert setup.py to pyproject.toml following PEP 621. Use [build-system] requires = ["setuptools>=68.0"]. Move all metadata to [project] table."

Code

# setup.py (deprecated pattern):
from setuptools import setup
setup(
    name="codedictate",
    version="0.1.0",
    install_requires=[...]
)

What to Remember

Use pyproject.toml for new Python projects. It is the modern standard per PEP 517/518/621.

Why This Matters

AI-generated code often contains security holes — hardcoded keys, missing validation, SQL concatenation. These are invisible: the code "works" but is like an unlocked front door. For every AI-generated code, check input validation and whether secrets ended up in the source.

What's Happening

The audio upload endpoint accepts any file without checking file type, size, or content.

What Happens if You Don't Fix This

Attackers can upload executable code or overwhelm the server with oversized files.

How to Fix

Validate allowed file types (WAV, MP3, FLAC, OGG)
Enforce maximum file size (e.g., 50 MB)
Check MIME type and magic bytes

AI Fix Prompt

"Add file validation to the /upload endpoint in server/app.py: check file extension against ALLOWED_EXTENSIONS, enforce MAX_CONTENT_LENGTH, and verify magic bytes."

Code

@app.route("/upload", methods=["POST"])
def upload_audio():
    file = request.files["audio"]
    file.save(os.path.join(UPLOAD_DIR, file.filename))

What to Remember

Always validate file uploads: check type, size, and content. Never trust the client-provided filename.

What's Happening

The filename is taken directly from the user without using secure_filename() or similar sanitization.

What Happens if You Don't Fix This

Attackers can write files to arbitrary directories (e.g., ../../etc/crontab).

How to Fix

Use werkzeug.utils.secure_filename()
Generate your own UUIDs as filenames
Secure the upload directory with permissions

AI Fix Prompt

"In server/app.py line 71, replace file.filename with secure_filename(file.filename) from werkzeug.utils, or better yet, generate a UUID filename."

Code

file.save(os.path.join(UPLOAD_DIR, file.filename))

What to Remember

Never use user-supplied filenames directly. Use secure_filename() or generate unique names server-side.

What's Happening

CORS is configured with origins="*", allowing requests from any domain.

What Happens if You Don't Fix This

Third-party websites can make API requests on behalf of authenticated users.

How to Fix

Restrict allowed origins to your own domain
Allow localhost in development, only your domain in production

AI Fix Prompt

"In server/app.py line 18, replace CORS(app, origins="*") with CORS(app, origins=os.environ.get("ALLOWED_ORIGINS", "http://localhost:3000").split(","))."

Code

CORS(app, origins="*")

What to Remember

Restrict CORS to the minimum necessary origins. Wildcard origins bypass the same-origin policy entirely.

What's Happening

The OpenAI API key is hardcoded directly in the source code and gets committed to the repository with every push.

What Happens if You Don't Fix This

Attackers can extract the API key from git history and make API calls at your expense.

How to Fix

Move API key to environment variables
Create .env file and add to .gitignore
Rotate the existing key immediately

AI Fix Prompt

"Replace the hardcoded OPENAI_API_KEY in server/config.py with os.environ.get("OPENAI_API_KEY") and add a .env.example file."

Code

OPENAI_API_KEY = "sk-proj-abc123def456ghi789"

What to Remember

Never commit API keys or secrets to version control. Use environment variables or a secrets manager.

What's Happening

User input is directly interpolated into a SQL query without parameterization or escaping.

What Happens if You Don't Fix This

Attackers can execute arbitrary SQL commands, steal data, or drop the entire database.

How to Fix

Replace string interpolation with parameterized queries
Use SQLAlchemy ORM methods instead of raw SQL
Add input validation as an additional layer of defense

AI Fix Prompt

"In server/models.py line 87, replace the f-string SQL query with a parameterized SQLAlchemy query using bindparams or ORM methods."

Code

db.execute(f"SELECT * FROM transcriptions WHERE user_id = '{user_id}' AND title LIKE '%{search}%'")

What to Remember

Always use parameterized queries. Never interpolate user input into SQL strings.

What's Happening

Admin routes (/admin/users, /admin/stats) are accessible without any authentication whatsoever.

What Happens if You Don't Fix This

Anyone can view user data, delete accounts, and modify system settings.

How to Fix

Implement authentication middleware
Add JWT or session-based auth for admin routes
Introduce role-based access control (RBAC)

AI Fix Prompt

"Add a @require_admin decorator to all /admin/* routes in server/app.py. Implement JWT-based authentication in server/auth.py."

Code

@app.route("/admin/users")
def admin_users():
    users = User.query.all()
    return jsonify([u.to_dict() for u in users])

What to Remember

Every admin endpoint must require authentication and authorization. Defense in depth means checking at every layer.

What's Happening

The application starts with debug=True, enabling the interactive debugger and code reload in production.

What Happens if You Don't Fix This

The Werkzeug debugger allows remote code execution. Attackers can run arbitrary Python code on your server.

How to Fix

Control debug=True via environment variable
Set FLASK_DEBUG=0 in production
Use Gunicorn or uWSGI as production WSGI server

AI Fix Prompt

"In server/app.py line 312, replace app.run(debug=True) with app.run(debug=os.environ.get("FLASK_DEBUG", "0") == "1")."

Code

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000, debug=True)

What to Remember

Never enable debug mode in production. It exposes an interactive debugger that allows arbitrary code execution.

What's Happening

None of the API endpoints have rate limiting implemented, neither for authentication nor for resource-intensive operations.

What Happens if You Don't Fix This

Attackers can perform brute-force attacks or overwhelm the server with mass requests.

How to Fix

Install and configure Flask-Limiter
Set strict limits for auth endpoints (e.g., 5/minute)
Set moderate limits for upload endpoints (e.g., 10/hour)

AI Fix Prompt

"Add Flask-Limiter to server/app.py. Apply @limiter.limit("5/minute") to auth endpoints and @limiter.limit("10/hour") to /upload."

Code

# No rate limiting configuration found anywhere in the project

What to Remember

Rate limiting is essential for all public-facing APIs, especially auth and file upload endpoints.

Your Personal Code Guide for codedictate

Your Roadmap

Like what you see?

I analyzed these areas:

Overview

Complexity

Deeply Nested Error Handling (5 Levels)

Circular Import Between app.py and models.py

Configuration Scattered Across 6 Files

Overly Complex Audio Chunking Algorithm

UI Module Has Mixed Responsibilities

God Function: transcribe_and_process (380 lines)

Cyclomatic Complexity of 28 in Request Handler

Tests

No Integration Tests for API Endpoints

No Test Fixtures or Factories

Mocked Tests Don't Match Real API

Only 8% Test Coverage (2 Tests for 4260 LOC)

Tests Use Production Database

No CI/CD Pipeline Configuration

Dead Code

Entire Module server/whisper_api.py is Unused

14 Unused Imports Across 5 Files

Commented-Out Feature: WebSocket Streaming

Unreachable Code After Early Return

Legacy Database Migration Script Still Present

Code Quality

Bare except Catches SystemExit and KeyboardInterrupt

Global Mutable State in Module Scope

Magic Numbers in Audio Processing

Inconsistent Error Response Format

No Input Length Validation on Text Fields

No Error Handling on Whisper API Calls

Logging Uses print() Instead of logging Module

Documentation

README Has Only Project Title

No API Documentation or Schema

Missing Docstrings on 23 of 28 Functions

No Environment Variable Documentation

No Architecture Decision Records

Best Practices

No .gitignore Entries for Sensitive Files

No Virtual Environment Configuration

No Health Check Endpoint

Secret Key Uses Predictable Default

No Automated Code Formatting

Dependencies

Unpinned Transitive Dependencies

No Dependency License Audit

Flask 2.2.3 Has Known Security Vulnerability (CVE-2023-30861)

setup.py Uses Deprecated Practices

Security

Unrestricted File Upload Without Validation

Path Traversal via User-Controlled Filename

CORS Allows All Origins

Hardcoded API Key in Source Code

SQL Injection via Raw Query

Missing Authentication on Admin Endpoints

Flask Debug Mode Enabled in Production Config

No Rate Limiting on API Endpoints

Ready for your own report?