Gabriel Caiana

What I have learned from building a product with Spec-Driven Development


Table of contents
  1. The problem that no one talks about
  2. What is Spec-Driven Development
  3. Why the “What doesn’t go in” section matters so much
  4. How this changes working with AI
  5. The complete flow
  6. What still doesn’t work well
  7. What I’ve learned so far

A few weeks ago I started building a project from scratch. Microservices, AI pipeline, authentication, asynchronous messaging — the kind of projects where architectural decisions accumulate quickly and context is lost even faster.

From the beginning I decided to use AI as a development partner. And the first thing I learned is that AI, without structure, accelerates in the wrong direction as efficiently as it accelerates in the right direction.

The problem that no one talks about

When you ask an LLM to implement an endpoint, it implements it. Often well. The problem appears when the model doesn’t know what you decided last week.

You don’t know that that field needs to be optional because the consumer will fetch the data via HTTP. It doesn’t know that you chose Fixed Window in rate limiting because Sliding Window would be unnecessary complexity for the MVP. It doesn’t know that you deliberately don’t want any ORM — neither Prisma nor TypeORM.

Without this explicit information, AI fills in the holes with generic patterns. And generic standards in a project with specific decisions generate technical debt disguised as productivity.

Concrete things that happen without structure:

  • Endpoint implemented with Prisma in a project that uses pure pg
  • Consumer SQS written from scratch when there is a shared SqsConsumer in the project
  • SNS error silenced when contract said SNS is primary operation and must rethrow exception

All of this happens when the context is not encoded anywhere that the AI ​​can read before writing code.

What is Spec-Driven Development

The idea is simple: before any code, there is an approved spec.

The spec is a Markdown file with a fixed structure:

  • Context — why does this feature exist, what problem does it solve
  • Scope — what goes in and, just as importantly, what explicitly doesn’t go in
  • API Agreement — request, response, expected errors with real examples
  • Schema and queries — the SQL queries that will be used
  • Published events — if the feature publishes messages to other services, with the typed payload
  • Acceptance criteria — binary checklist of what “ready” means
  • Implementation notes — technical decisions, discarded alternatives, pitfalls

This document goes through three states: draft → review → approved. Only after approved does implementation begin.

Why the “What doesn’t go in” section matters so much

This is the part I most underestimated at first.

In a real project, scope creep doesn’t happen because someone decided to do more. It happens because no one defined the limits precisely enough. When you write “What doesn’t fit” in the spec, you are making an explicit decision — not leaving it open for whoever implements it to resolve it on the spot.

In one of the project specs, the negative scope section included:

Out of scope:
- IP-based rate limiting (API Gateway / WAF responsibility)
- Sliding Window (Fixed Window is enough for the MVP)
- Persisting counters in the database (Redis is the fast path; database is audit-only)

This is not documentation of the obvious. It’s the elimination of three decisions that someone — human or AI — could make differently without this limit.

This practice connects directly to what I discuss in Building a Robust Architecture: architectural decisions need to be explicit and traceable — not implicit in the code.

How this changes working with AI

When AI has access to an accurate contract, behavior changes.

An example from the project: the worker that extracts data from resumes using a language model (in this case, via Amazon Bedrock). In addition to the processing flow, the spec defined the model output validation scheme:

const ResumeExtractionSchema = z.object({
  headline: z.string().nullable().optional(),
  seniority_level: z.enum(['junior', 'mid', 'senior', 'staff', 'principal']).nullable().optional(),
  years_experience: z.number().nullable().optional(),
  skills: z.array(ResumeSkillSchema).default([]),
  experiences: z.array(ResumeExperienceSchema).default([]),
});

AI did not invent this scheme. Implemented this. The acceptance criteria “Bedrock returns valid JSON → Zod validates → job marked as completed” is binary: either it passes, or it doesn’t.

The spec also defined throttling behavior — what to do when the model API returns an error due to overload:

If ThrottlingException occurs: increase the queue message visibility timeout
and rethrow the error (do not delete the message)

Without this line, the likely implementation would be to treat ThrottlingException like any other error — deleting the message from the queue and losing processing. With the line in the spec, this specific behavior is in the contract.

The complete flow

In practice, it works like this:

  1. Issue created in task manager with sequential number and a high-level description
  2. Spec written following the template, linked to the issue
  3. Review of the spec before the code exists — architectural questions are resolved here
  4. Spec approved — status approved in document
  5. Implementation — AI reads the project spec and context and implements within those limits
  6. Commit linked to issue — full traceability: spec → code → PR → issue

Step 3 is the most important. The spec review is where the tough decisions happen. The code is a consequence.

When the spec of a job analysis endpoint defined that profileId in the event payload should be optional, with the note:

The AI Worker will be responsible for fetching the profile internally via HTTP —
the correct pattern for the pipeline.

This decision was made during the review, not during implementation. The code that came later did not need to deliberate on this.

What still doesn’t work well

The spec resolves alignment, not code quality. A precise contract can produce mediocre code if the implementation is not reviewed.

Writing good specs has a real cost. Fulfilling the acceptance criteria accurately — making them binary, without ambiguity — requires thinking about the feature before touching it. That is exactly the value. But it’s work.

Specs become outdated. When the implementation reveals something that the spec did not predict, the decision made in the code needs to be fed back into the document. This doesn’t happen automatically.

What I’ve learned so far

Spec-Driven Development with AI is not about using AI to write specs. It’s about making sure the AI ​​— and anyone on the team — has the right contract before writing code.

The speed of AI is real. The risk too: without structure, it produces correct code for the wrong problem, or for a version of the problem that the team has already discarded. Spec is the mechanism that aligns speed with intent.

The flow is not new. It is the basic discipline of software engineering applied to a context where the developer writes at the speed of a language model. The tools have changed. The need to think before building, no.