How to Build Software That Actually Works: Fixing AI Chaos with Spec-Kit & TDD | BitAI

Build software that actually works by separating intent from implementation using a Spec-Kit + Claude Code workflow.
Stop relying on prompts: Use a living document (the spec) to anchor your AI agent before you write a single line of code.
TDD is mandatory: Implementing Test-Driven Development (TDD) into your specs prevents the "hallucination" of architecture decisions.
System Design first: Always annotate your architectural decisions in your specs to ensure API formats and data models remain consistent.

🎯 Introduction

I used to start coding features the moment they popped into my head. No planning. No documentation. Just pure vibes. The result? Half-finished features, forgotten API formats, and endless debugging. If you are struggling to build software that actually works in 2024, you are likely making one fatal flaw: you are starting with code, not intent.

In my experience, most developers treat AI tools like a magic autocomplete, which actually accelerates technical debt. The real solution isn't a better model; it's a better workflow. By combining Claude Code with Spec-Kit, and integrating Kent Beck’s TDD methodology, we can build software systems that are maintainable, scalable, and technically sound. This guide walks through the architecture of a workflow that turns chaotic outputs into automated reliability.

🧠 Core Explanation

When you ask an AI to "build a user authentication system," it generates code. But that code is often missing the "system design" context. It might call a POST endpoint /login but forget to document the exact body schema required.

How to build software that actually works isn't about writing better prompts. It’s about implementing a strict Workflow Engineering process.

The Flow: Specification → Technical Decision → AI Coding → Automated Verification (TDD).
The Problem: Context drift. The AI forgets constraints it read 15 minutes ago.
The Fix: Make the constraints executable artifacts (Tests).

We aren't just asking the AI to write code; we are constraining it. We use Spec-Kit to create a "living contract." The code is just the response to that contract, and if the contract demands a specific behavior, the code must obey.

🔥 Contrarian Insight

Most engineering blogs will tell you that "AIassisted development" equals "faster shipping." That is dangerous advice.

The truth is, AI-assisted development creates toxic short-term efficiency. If you can generate a feature in 5 minutes that breaks your architecture next week, you have actually slowed your cycle time. To build software that actually works, you must slow down the coding phase and speed up the verification phase.

Do not ship code. Ship tests that describe the behavior you want.

The code is just the bridge.

🔍 Deep Dive: The Spec-Kit + Claude Code Pipeline

Here is the breakdown of the architecture that makes this work in production.

1. The Living Specification (Spec-Kit)

Don't write a README. Write a machine-readable spec. Spec-Kit allows you to define the API contract and the expected behaviors.

2. The Architectural Anchor

This is the step many miss. Before writing the spec, you must define the "Why."

Bad: "Create a login route."
Good: "Create a login route returning JWTs with a 2-hour expiry to support active session monitoring." (Includes the API format and the business decision).

3. The AI Execution (Claude Code)

Claude (or any LLM) takes the spec and generates the implementation.

4. The Integration (TDD via PR #1172)

Correction from the author: I initially found that Claude forgets decisions (e.g., changing an API schema). I fixed this by integrating TDD.

Feature: feat: Integrate Kent Beck TDD methodology into spec-kit

In the Spec, we add the requirement: "This module must implement TDD. Only code that passes strict unit tests linked in the spec may be merged."

This enforces discipline. The AI cannot hallucinate code; it must generate the tests first, then the implementation.

🏗️ System Design / Workflow Architecture

How does this scale from a single feature to a whole platform?

Definition Layer: The spec-kit file contains the domain models (e.g., User interface, AuthStrategy enum).
State Layer: Hard-coded decisions are stored in the spec, preventing "Drift."
Execution Layer: Claude/Agent reads the spec context.
Verification Layer: CI/CD pipeline runs based on the "TDD Constraint."

Cache Strategy: Since the spec is the single source of truth, the "Cache" is simply the file read bandwidth. No need for fragmented brainpower; all context is linearized in the spec file.

API Structure: The workflow ensures that your API endpoints are only created when the specs are finalized.

🧑‍💻 Practical Value: Implementation Steps

You don't need to rewrite your whole repo overnight. Here is the workflow shift:

Step 1: Document the Decision, Not the Code

Don't jump into src/controllers.js. Open your Spec-Kit file. Write the requirements, the constraints, and the TDD rules.

# Feature: Email Verification Service
## Decisions
- Use AES-256 for token encryption.
- Tokens expire in 24 hours.
## TDD Rules
- Unit tests must hit the mock endpoint before production execution.

Step 2: Convince the AI

Use Claude Code. It excels at understanding multi-file contexts.

Command: "Read the Spec-Kit for the Email Service. Generate the TDD unit tests and the implementation based strictly on the constraints in the TDD Rules section."

Step 3: The Approval Gate

Review the Tests first. If the tests cover the architecture decisions (like encryption and expiry), only then allow the code to merge.

The Correction (PR #1172)

If you try this and find the AI ignoring your structure (a common hallucination), you need the logic enforced by the repo itself. I submitted a PR that enforces this: PR #1172. This change means the workflow literally cannot proceed without TDD logic being present.

⚔️ Comparison: "Speed" vs "Quality"

Feature	Traditional "Fast" Development	Spec-Kit + Claude Workflow
Start Point	Idea -> Code	Idea -> Spec -> Test
Architecture	Drifts as you go	Anchored in spec
Maintenance	High (Technical Debt)	Low (Test-Driven)
Onboarding	Slow (No docs)	Fast (Spec is the doc)
Final Output	Working prototype	Production-ready System

⚡ Key Takeaways

Context is finite: AI tools have a context limit. Reduce the cognitive load by moving context into a Spec file.
TDD prevents hallucination: Making tests executable requirements forces the AI to respect your architecture.
Hard-wire discipline: Use tools like Spec-Kit and automated PR checks to force the AI into doing the boring work.
Intent before Implementation: Never write code without first defining what it achieves.

🔗 Related Topics

🔮 Future Scope

As models get better, they will handle more of the implementation. The bottleneck will shift to Specification. We will likely see tools that generate the Spec-Kit Markdown files semi-automatically, creating a feedback loop where the System Design dictates the tools, and the tools refine the System Design.

❓ FAQ

Q: Does integrating TDD with AI slow me down? A: In the short term? Yes. In the long term? No. It prevents the "Zombie Code" phase where features seem to work but break as soon as requirements change. The friction is your friend.

Q: Can I use Spec-Kit without Claude Code? A: Yes. The spec acts as a living internal document for your human engineers. However, using Claude Code to implement the spec gives you the best return on investment.

Q: Is the PR #1172 change mandatory? A: Not mandatory for small scripts, but essential for system design. If you are building user-facing features, enabling the TDD logic in Spec-Kit is the only way to prevent the AI from forgetting your API formats later.

Q: What happens if the AI fails the TDD tests? A: The PR logic would (or should) prevent the commit. In a human setup, the tests fail, and you must explain to the AI (or fix the test) why the code didn't match the spec.

Q: Is this just for Node.js? A: No. The workflow logic is language-agnostic. You can write specs for Python microservices, Go routers, or Rust command-line tools. The "Language" of the Spec is always Markdown.

🎯 Conclusion

If you are tired of build[ing] software that actually works only to see it rot due to lack of documentation and drifting architecture, you need to change the inputs, not just the hope. By moving from "Prompting" to "Specifying," you transform coding from a magical spark into a systematic engineering process. Start with the spec, enforce TDD, and let the AI handle the syntax.

Want to see the implementation in action? Check out the PR that fixed Spec-Kit’s memory loss problem: PR #1172. Hit the "Review" button if you agree that TDD is the only way to keep AI honest.