Stage 5 · Endurecido · Tier 2: Staging Etapa 5 · Endurecido · Tier 2: Staging

The Surface Resists. The Core Stays Tough. La Superficie Resiste. El Núcleo Se Mantiene Tenaz.

Tier 2 is where the spec governs deployment, not just code. Every pre-production build is verified against NFRs, smoke tests, and the full test pyramid. The harness doesn't just check what the AI generated — it checks that what it generated can survive production load. El Tier 2 es donde el spec gobierna el despliegue, no solo el código. Cada build pre-producción se verifica contra NFRs, smoke tests y la pirámide de pruebas completa. El harness no solo verifica lo que generó el AI — verifica que lo que generó puede sobrevivir la carga de producción.

Tier 1 gave you a working system verified against behavioral specs. Tier 2 hardens it for production load: the spec grows to cover non-functional requirements, real services run locally, and the harness gains load, stress, security, concurrency, and latency tests. Then GS writes the CI/CD pipeline that gates every one of them. El Tier 1 te dio un sistema funcionando, verificado contra specs de comportamiento. El Tier 2 lo endurece para la carga de producción: el spec crece para cubrir requerimientos no funcionales, los servicios reales corren localmente, y el harness gana pruebas de carga, stress, seguridad, concurrencia y latencia. Después GS escribe el pipeline de CI/CD que pone un gate a cada una de ellas.

The T2 Execution Cycle El Ciclo de Ejecución T2

What forces verification — the Tier 2 analog of the green test Qué fuerza la verificación — el análogo en Tier 2 del test verde

At Tier 1, the forcing function is the green test plus the pre-commit hook: nothing commits that breaks the spec. Tier 2 needs its own forcing function — or the AI will write a CI/CD file, deploy, and call it done, without ever proving the use cases actually run. The forcing function is the CD gate. En el Tier 1, la función forzante es el test verde más el hook de pre-commit: nada se commitea si rompe el spec. El Tier 2 necesita su propia función forzante — o el AI escribirá un archivo de CI/CD, desplegará y lo dará por hecho, sin jamás probar que los casos de uso de verdad corren. La función forzante es el gate de CD.

Build passes Tier 1 — unit + integration green. El build pasa el Tier 1 — unit + integración en verde.

Deploy to local/staging against real services (Docker — see Step 02). Despliega a local/staging contra servicios reales (Docker — ver Paso 02).

Use-case walk — the AI drives the running app through every UC/F-NNN via the Playwright MCP: real browser, real network, real database. Paseo de casos de uso — el AI conduce la app en ejecución a través de cada UC/F-NNN vía el MCP de Playwright: navegador real, red real, base de datos real.

QA verification — at each postcondition, an assertion + a screenshot + a vision check (multimodal AI-as-QA) against the expected state. Verificación de QA — en cada postcondición, una aserción + un screenshot + un chequeo de visión (AI-como-QA multimodal) contra el estado esperado.

NFR gates — load, latency, and security thresholds (Step 04). Gates de NFR — umbrales de carga, latencia y seguridad (Paso 04).

Gate decision — every use case passes AND every NFR gate is green → promote. Any failure → roll back, do not promote. Decisión del gate — cada caso de uso pasa Y cada gate de NFR está en verde → promueve. Cualquier fallo → rollback, no promuevas.

Record — results logged; each failure becomes a spec or remediation item. Registra — resultados logueados; cada fallo se convierte en un ítem de spec o de remediación.

What forces it: the pipeline invokes the use-case-walk skill and blocks promotion on its exit code. The AI cannot deploy-and-forget — promotion is gated on a green walk. This is the Tier 2 analog of Tier 1's green test. Qué lo fuerza: el pipeline invoca el skill del paseo de casos de uso y bloquea la promoción según su exit code. El AI no puede desplegar-y-olvidar — la promoción está gateada por un paseo en verde. Este es el análogo en Tier 2 del test verde del Tier 1.

What the gate proves: Executable + Verifiable Qué prueba el gate: Ejecutable + Verificable

Tier 1 proves the code is correct in isolation — its tests pass. But "passes its tests" is not "works in production." The Harden gate forces a demonstration of two of the seven properties, at production fidelity: El Tier 1 prueba que el código es correcto en aislamiento — sus tests pasan. Pero "pasa sus tests" no es "funciona en producción". El gate de Harden fuerza una demostración de dos de las siete propiedades, a fidelidad de producción:

Executable — the spec's use cases actually run end-to-end against real services. The generative execution (the qa-walk driving the deployed app through every use case) is the proof: not "the tests pass" but "the system does the thing."Ejecutable — los casos de uso del spec realmente corren end-to-end contra servicios reales. La ejecución generativa (el qa-walk manejando la app desplegada por cada caso de uso) es la prueba: no "los tests pasan" sino "el sistema hace la cosa".
Verifiable — every NFR threshold and every use-case postcondition is checked against the spec by the harness, not a human. The NFR battery and the verification-loop coherence checks are the proof.Verificable — cada umbral NFR y cada postcondición se chequea contra el spec por el harness, no por un humano. La batería NFR y los chequeos de coherencia del loop son la prueba.

NFR battery + generative execution = Executable and Verifiable, demonstrated. The gate requires both green to promote. You cannot promote an artifact that hasn't shown it both runs and is verified against the spec under production conditions. Batería NFR + ejecución generativa = Ejecutable y Verificable, demostradas. El gate exige ambas en verde para promover. No podés promover un artefacto que no mostró que corre y que está verificado contra el spec bajo condiciones de producción.

Declared at Mold, built here. The derivation premise still holds: the gate is promised in the specification from day one — the NFR thresholds and the rule "every use case must pass the walk" live in SPEC.md and the harness policy the moment you mold. But the gate's implementation — the actual Hurl files, the Playwright walk, the wiring to real endpoints — can only be generated once the code it verifies exists. So you declare the gate at Mold and derive its body at Harden, the same way a feature is declared in the spec and its test is generated when the feature is built. Promised once; made executable as the system does. Declarado en Mold, construido acá. La premisa de derivación sigue valiendo: el gate se promete en la especificación desde el día uno — los umbrales NFR y la regla "todo caso de uso debe pasar el paseo" viven en SPEC.md y en la política del harness desde que moldeás. Pero la implementación del gate — los archivos Hurl, el paseo de Playwright, el cableado a endpoints reales — solo se puede generar una vez que existe el código que verifica. Así que declarás el gate en Mold y derivás su cuerpo en Harden, igual que una feature se declara en el spec y su test se genera cuando se construye. Prometido una vez; vuelto ejecutable a medida que el sistema lo hace.

How it's enforced in practice. With the PragmaWorks toolchain, this gate is the gs-verify-deploy skill, governed by a ForgeCraft Tier 2 policy. Two forcing levels: a post-deploy hook auto-fires the skill on any deploy-shaped action (git push, vercel, railway up…) so the AI cannot quietly skip it; and in CI the skill exits nonzero on any failure, with the pipeline's promotion step gated on that exit code. Pure-GS variant: a post-deploy hook running the walk by hand. Either way — the gate is mechanical, not a reminder. Cómo se hace cumplir en la práctica. Con el toolchain de PragmaWorks, este gate es el skill gs-verify-deploy, gobernado por una política Tier 2 de ForgeCraft. Dos niveles de forzado: un hook post-deploy auto-dispara el skill ante cualquier acción con forma de deploy (git push, vercel, railway up…) para que el AI no pueda saltearlo en silencio; y en CI el skill sale con código ≠0 ante cualquier fallo, con el paso de promoción del pipeline gateado por ese exit code. Variante Pure-GS: un hook post-deploy que corre el paseo a mano. De cualquier forma — el gate es mecánico, no un recordatorio.

The Verification Loop — Multimodal AI-as-QA El Loop de Verificación — IA como QA Multimodal

A use case is verified by triangulating evidence, not by one tool Un caso de uso se verifica triangulando evidencia, no con una sola herramienta

No single tool proves a use case works. The AI verifies it the way a careful human QA would — by gathering evidence from every layer the use case touches and checking that the layers agree. The canonical loop, for a typical create/update use case: Ninguna herramienta sola prueba que un caso de uso funciona. El AI lo verifica como lo haría un QA humano cuidadoso — juntando evidencia de cada capa que el caso de uso toca y chequeando que las capas coincidan. El loop canónico, para un caso de uso típico de creación/actualización:

Read prior state — SQL (or a DB MCP): capture the database state before the action.Leer el estado previo — SQL (o un MCP de DB): captura el estado de la base de datos antes de la acción.
Drive the behavior — Playwright MCP: simulate the user performing the use case in the real UI.Conducir el comportamiento — Playwright MCP: simula al usuario ejecutando el caso de uso en la UI real.
Inspect the business layer — read the logs/traces: confirm the business layer received and processed the right command and values.Inspeccionar la capa de negocio — leer los logs/trazas: confirma que la capa de negocio recibió y procesó el comando y los valores correctos.
Read the new state — SQL again: capture the delta. Did exactly the right rows change, and nothing else?Leer el estado nuevo — SQL de nuevo: captura el delta. ¿Cambiaron exactamente las filas correctas, y nada más?
Confirm the return — UI refresh (Playwright screenshot + vision): the returned information shows on screen as the postcondition specifies.Confirmar el retorno — refresco de UI (screenshot de Playwright + visión): la información devuelta aparece en pantalla como especifica la postcondición.
Check coherence — the UI, the logs, and the DB must tell the same story. If the screen says "saved" but the DB delta is empty, or the logs show a value the UI never displayed, you have a bug — even if each isolated check passes. Coherence across layers is what separates real verification from test theater.Chequear coherencia — la UI, los logs y la DB deben contar la misma historia. Si la pantalla dice "guardado" pero el delta de la DB está vacío, o los logs muestran un valor que la UI nunca mostró, tenés un bug — aunque cada chequeo aislado pase. La coherencia entre capas es lo que separa la verificación real del teatro de tests.

Evidence layers and the tool for each: Capas de evidencia y la herramienta para cada una:

Evidence layerCapa de evidencia	ToolHerramienta	Question it answersPregunta que responde
State (before / after / delta)Estado (antes / después / delta)	SQL · DB MCP	Did the right data change, and only that?¿Cambió el dato correcto, y solo ese?
User behavior + visualComportamiento de usuario + visual	Playwright MCP + visionvisión	Does the flow work and look right to a user?¿El flujo funciona y se ve bien para un usuario?
API / contractAPI / contrato	Hurl	Does the boundary honor its contract?¿El boundary honra su contrato?
Business internalsInternos de negocio	log / trace readinglectura de logs / trazas	Did the business layer do the right thing inside?¿La capa de negocio hizo lo correcto adentro?
Statistical / balanceEstadística / balance	parameterized simulationsimulación parametrizada	Over many runs, are outcomes and balance correct?¿En muchas corridas, los resultados y el balance son correctos?

The same loop, swapped per platform: El mismo loop, intercambiado por plataforma:

Headless — the same loop without the UI steps: Hurl drives the boundary, SQL + logs verify. For API services and batch jobs.Headless — el mismo loop sin los pasos de UI: Hurl conduce el boundary, SQL + logs verifican. Para servicios API y batch jobs.
Mobile — the same loop, with a mobile-automation MCP replacing Playwright for the behavior and visual step.Móvil — el mismo loop, con un MCP de automatización móvil reemplazando a Playwright en el paso de comportamiento y visual.
Simulation / optimization / game balance — the behavior step becomes a parameterized run (e.g. a Monte Carlo over a blackjack advisor: 100k hands, check the recommended action's expected value). Verification is statistical — distributions, balance, convergence — not a single postcondition.Simulación / optimización / balance de juego — el paso de comportamiento se vuelve una corrida parametrizada (ej. un Monte Carlo sobre un asesor de blackjack: 100k manos, chequear el valor esperado de la acción recomendada). La verificación es estadística — distribuciones, balance, convergencia — no una sola postcondición.
ETL / data pipeline — trace a datum through the pipeline: state at the source, evidence at each transform stage, aggregated state at the sink. Verify lineage and coherence — the datum arrived transformed and aggregated correctly, with no loss and no silent corruption between stages. The "behavior" is the pipeline run; the evidence layers are the stage outputs.ETL / pipeline de datos — trazá un dato por el pipeline: estado en el origen, evidencia en cada etapa de transformación, estado agregado en el sink. Verificá lineage y coherencia — el dato llegó transformado y agregado correctamente, sin pérdida ni corrupción silenciosa entre etapas. El "comportamiento" es la corrida del pipeline; las capas de evidencia son las salidas de cada etapa.
Event-driven / async — message published → consumer received → side effect happened. Verify across the broker, the consumer logs, and the resulting state; account for eventual consistency, idempotency, and retries.Event-driven / async — mensaje publicado → consumidor recibió → efecto secundario ocurrió. Verificá a través del broker, los logs del consumidor y el estado resultante; tené en cuenta consistencia eventual, idempotencia y reintentos.

The catalog is open. CRUD, headless, mobile, simulation, ETL, event-driven — these are not separate mechanisms. They are one principle (gather evidence per layer, check coherence) instantiated for the shape of the use case. When a new shape appears, you don't need a new method — you map its evidence layers to tools and check they cohere. El catálogo es abierto. CRUD, headless, móvil, simulación, ETL, event-driven — no son mecanismos separados. Son un principio (juntar evidencia por capa, chequear coherencia) instanciado para la forma del caso de uso. Cuando aparece una forma nueva, no necesitás un método nuevo — mapeás sus capas de evidencia a herramientas y verificás que coincidan.

The principle: pick the evidence layers the use case actually touches, drive each with the right tool, and verify they cohere. GS is tool-agnostic — Playwright, Hurl, SQL, log readers, and simulation harnesses are the common defaults, swapped per platform. The qa-walk skill runs this loop; the CD gate makes it mandatory. El principio: elegí las capas de evidencia que el caso de uso realmente toca, conducí cada una con la herramienta correcta, y verificá que coincidan. GS es agnóstico de herramientas — Playwright, Hurl, SQL, lectores de logs y harnesses de simulación son los defaults comunes, intercambiados por plataforma. El skill qa-walk corre este loop; el gate de CD lo hace obligatorio.

Step 01 — Expand the Spec for Tier 2 Paso 01 — Expande el Spec para Tier 2

The NFR section grows teeth La sección de NFR muestra los dientes

Your SPEC.md already has a non-functional requirements section. Now it gets specific and measurable — every threshold becomes a number a test can check. Tu SPEC.md ya tiene una sección de requerimientos no funcionales. Ahora se vuelve específica y medible — cada umbral se convierte en un número que un test puede verificar.

Paste this — expand NFRsPega esto — expandir NFRsRead docs/spec/SPEC.md. Expand the Non-Functional Requirements section so every requirement is a measurable threshold a test can verify. Cover all five classes: PERFORMANCE / LATENCY - P-NNN: [endpoint] responds within [p50 / p95 / p99 ms] at [N] concurrent users LOAD - L-NNN: system sustains [N] requests/second for [duration] with error rate < [X]% STRESS - ST-NNN: identify the breaking point — at what concurrency does the system degrade, and does it fail gracefully (no data loss, clean 503s)? SECURITY - S-NNN: authentication required on all routes except [list]; input validated at the trust boundary; secrets only in env; [authz rules per role] CONCURRENCY - C-NNN: [shared resource] is safe under concurrent access — no race conditions, correct locking/transactions, idempotent where required For each, write the threshold AND the failure behavior (what the system must do when the limit is hit). Update SPEC.md and commit: docs(spec): expand NFRs for Tier 2.Lee docs/spec/SPEC.md. Expande la sección de Requerimientos No Funcionales para que cada requerimiento sea un umbral medible que un test pueda verificar. Cubre las cinco clases: PERFORMANCE / LATENCIA - P-NNN: [endpoint] responde dentro de [p50 / p95 / p99 ms] con [N] usuarios concurrentes CARGA - L-NNN: el sistema sostiene [N] requests/segundo durante [duración] con tasa de error < [X]% STRESS - ST-NNN: identifica el punto de quiebre — ¿a qué concurrencia se degrada el sistema, y falla graciosamente (sin pérdida de datos, 503 limpios)? SEGURIDAD - S-NNN: autenticación requerida en todas las rutas excepto [lista]; input validado en el límite de confianza; secretos solo en env; [reglas de authz por rol] CONCURRENCIA - C-NNN: [recurso compartido] es seguro bajo acceso concurrente — sin race conditions, locking/transacciones correctas, idempotente donde se requiera Para cada uno, escribe el umbral Y el comportamiento ante el fallo (qué debe hacer el sistema cuando se alcanza el límite). Actualiza SPEC.md y commitea: docs(spec): expand NFRs for Tier 2.

Step 02 — Local Services with Docker / Rancher Paso 02 — Servicios Locales con Docker / Rancher

Real infrastructure, not mocks Infraestructura real, no mocks

The staging harness must run against real services — a real database, cache, and queue — not mocks. Mocks pass when production fails. Stand the dependencies up locally with containers so every test runs against the real thing. El harness de staging debe correr contra servicios reales — una base de datos, caché y cola reales — no mocks. Los mocks pasan cuando producción falla. Levanta las dependencias localmente con contenedores para que cada test corra contra lo real.

Paste this — local servicesPega esto — servicios localesRead docs/spec/SPEC.md Tech Stack. Generate a docker-compose.yml (or Rancher / Podman config if I specify) that stands up every external dependency this system needs locally: - Database [from spec] with a seeded test schema - Cache / queue / object store if the spec uses them - Any third-party service that has a containerized emulator Include a Makefile or npm script: "services:up", "services:down", "services:reset". The harness will run against these. No mocks for integration, E2E, load, or concurrency tests. Commit: chore(infra): local service stack for the staging harness.Lee el Stack Tecnológico de docs/spec/SPEC.md. Genera un docker-compose.yml (o config de Rancher / Podman si lo especifico) que levante localmente cada dependencia externa que este sistema necesita: - Base de datos [del spec] con un esquema de test seedeado - Caché / cola / object store si el spec los usa - Cualquier servicio de terceros que tenga un emulador en contenedor Incluye un Makefile o npm script: "services:up", "services:down", "services:reset". El harness correrá contra estos. Sin mocks para tests de integración, E2E, carga o concurrencia. Commitea: chore(infra): local service stack for the staging harness.

Step 03 — MCP Tooling for Tier 2 Paso 03 — Herramientas MCP para Tier 2

Give the AI the instruments to drive the harness Dale al AI los instrumentos para conducir el harness

Model Context Protocol servers let the AI drive real tools directly — a browser, a load generator, the database. At Tier 2 these turn the AI from a code generator into a system operator that can run the full battery and read the results. Los servidores Model Context Protocol permiten que el AI conduzca herramientas reales directamente — un navegador, un generador de carga, la base de datos. En el Tier 2 estos convierten al AI de un generador de código en un operador de sistema que puede correr la batería completa y leer los resultados.

ToolHerramienta	UseUso
Playwright MCP	Drive the running app in a real browser for E2E and visual verification.Conduce la app en un navegador real para E2E y verificación visual.
Load-testing MCP (k6 / Artillery)	Generate and run load and stress scenarios from the spec NFRs; read the latency histogram back.Genera y corre escenarios de carga y stress desde los NFR del spec; lee el histograma de latencia.
Database / SQL MCP	Inspect state before and after a test, verify no concurrency corruption.Inspecciona el estado antes y después de un test, verifica que no haya corrupción por concurrencia.
HTTP / API MCP	Hit endpoints directly for contract and security tests, independent of the UI.Golpea endpoints directamente para tests de contrato y seguridad, independiente de la UI.

Wire these in CONSTITUTION.md so every session knows which tools are available and when to use them. The AI should reach for the load MCP when it sees an L-NNN requirement, the same way it reaches for the test runner on an F-NNN. Conéctalos en CONSTITUTION.md para que cada sesión sepa qué herramientas están disponibles y cuándo usarlas. El AI debería recurrir al MCP de carga cuando ve un requerimiento L-NNN, igual que recurre al test runner ante un F-NNN.

Step 04 — The Test Battery Paso 04 — La Batería de Pruebas

Load · stress · security · concurrency · latency Carga · stress · seguridad · concurrencia · latencia

One prompt per class, each derived from the spec NFRs. The AI writes the test, runs it against the local service stack, and reports against the threshold. Un prompt por clase, cada uno derivado de los NFR del spec. El AI escribe el test, lo corre contra el stack de servicios locales, y reporta contra el umbral.

Paste this — generate the Tier 2 test batteryPega esto — generar la batería de pruebas Tier 2Read docs/spec/SPEC.md NFR section and CONSTITUTION.md. Using the local service stack and the available MCP tools, generate and run the Tier 2 test battery. One suite per class, each tied to its NFR id: 1. LOAD (L-NNN): sustained throughput test. Assert error rate and latency hold for the full duration. 2. STRESS (ST-NNN): ramp concurrency until the breaking point. Assert graceful degradation — no data loss, clean error codes. 3. SECURITY (S-NNN): unauthenticated access rejected, input fuzzing at the trust boundary, no secrets in responses or logs, authz enforced per role. 4. CONCURRENCY (C-NNN): hammer the shared resource with parallel writers. Assert no race condition, correct final state, idempotency where required. 5. LATENCY (P-NNN): measure p50/p95/p99 under the specified concurrent load. Assert each percentile is within threshold. Place suites in tests/tier2/. Report a table: NFR id | threshold | measured | pass/fail. For any failure, do not patch silently — show me the gap.Lee la sección de NFR de docs/spec/SPEC.md y CONSTITUTION.md. Usando el stack de servicios locales y las herramientas MCP disponibles, genera y corre la batería de pruebas Tier 2. Una suite por clase, cada una atada a su id de NFR: 1. CARGA (L-NNN): test de throughput sostenido. Asegura que la tasa de error y la latencia se mantienen durante toda la duración. 2. STRESS (ST-NNN): rampea la concurrencia hasta el punto de quiebre. Asegura degradación graciosa — sin pérdida de datos, códigos de error limpios. 3. SEGURIDAD (S-NNN): acceso no autenticado rechazado, fuzzing de input en el límite de confianza, sin secretos en respuestas o logs, authz aplicado por rol. 4. CONCURRENCIA (C-NNN): martilla el recurso compartido con escritores en paralelo. Asegura que no haya race condition, estado final correcto, idempotencia donde se requiera. 5. LATENCIA (P-NNN): mide p50/p95/p99 bajo la carga concurrente especificada. Asegura que cada percentil está dentro del umbral. Coloca las suites en tests/tier2/. Reporta una tabla: id NFR | umbral | medido | pasa/falla. Ante cualquier fallo, no parchees en silencio — muéstrame la brecha.

Step 04b — The Use-Case Walk (MCP QA Simulation) Paso 04b — El Paseo de Casos de Uso (Simulación de QA con MCP)

A skill that simulates human QA over staging Un skill que simula QA humano sobre staging

Codify the use-case walk as a project skill so it runs identically every time and the pipeline can invoke it. This is the human-QA simulation: the AI becomes the tester, driving the real app and judging the result against the spec — assertion plus vision. Codifica el paseo de casos de uso como un skill del proyecto para que corra idéntico cada vez y el pipeline pueda invocarlo. Esta es la simulación de QA humano: el AI se vuelve el tester, conduce la app real y juzga el resultado contra el spec — aserción más visión.

Paste this — generate the qa-walk skillPega esto — generar el skill qa-walkGenerate a project skill that simulates human QA over staging using MCP tools. It must be invokable by the CD pipeline and exit nonzero on any failure. Create .claude/skills/qa-walk/ (or the slash-command equivalent for this assistant) that, when run: 1. Reads docs/spec/SPEC.md and lists every use case (UC-NNN) and functional feature (F-NNN) in scope. 2. Ensures the app is running against the local/staging service stack (docker compose up if needed). 3. For each use case, using the Playwright MCP server: - Drive the full flow in a real browser: actor → precondition → steps → postcondition. - At the postcondition, capture a screenshot. - Verify two ways: (a) DOM/assertion check, (b) vision check — compare the screenshot to the use case's expected visual state with a vision model. - Run the error-path for each acceptance criterion that can fail. 4. Aggregate into a table: UC/F id | assertion | vision | error-path | pass/fail. 5. Write reports/qa-walk-[date].md and EXIT NONZERO if any use case fails. Then wire it: the CD pipeline must call this skill after deploy-to-staging and BLOCK promotion on its exit code. Nothing promotes without a green walk.Genera un skill de proyecto que simule QA humano sobre staging usando herramientas MCP. Debe poder ser invocado por el pipeline de CD y salir con código distinto de cero ante cualquier fallo. Crea .claude/skills/qa-walk/ (o el equivalente de slash-command para este asistente) que, al correr: 1. Lea docs/spec/SPEC.md y liste cada caso de uso (UC-NNN) y feature funcional (F-NNN) en alcance. 2. Asegure que la app esté corriendo contra el stack de servicios local/staging (docker compose up si hace falta). 3. Para cada caso de uso, usando el servidor MCP de Playwright: - Conduzca el flujo completo en un navegador real: actor → precondición → pasos → postcondición. - En la postcondición, capture un screenshot. - Verifique de dos formas: (a) chequeo de DOM/aserción, (b) chequeo de visión — compara el screenshot con el estado visual esperado del caso de uso usando un modelo de visión. - Corra el path de error de cada criterio de aceptación que pueda fallar. 4. Agregue en una tabla: id UC/F | aserción | visión | path de error | pasa/falla. 5. Escriba reports/qa-walk-[fecha].md y SALGA CON CÓDIGO DISTINTO DE CERO si algún caso de uso falla. Luego conéctalo: el pipeline de CD debe llamar a este skill después de desplegar-a-staging y BLOQUEAR la promoción según su exit code. Nada se promueve sin un paseo en verde.

This is where skills + MCP replace the manual QA pass. The skill makes the walk deterministic; the MCP tools give the AI hands; the pipeline makes it mandatory. Aquí es donde los skills + MCP reemplazan la pasada manual de QA. El skill hace el paseo determinista; las herramientas MCP le dan manos al AI; el pipeline lo vuelve obligatorio.

Step 05 — GS Writes the CI/CD Pipeline Paso 05 — GS Escribe el Pipeline CI/CD

The pipeline is a derived artifact of the spec El pipeline es un artefacto derivado del spec

The deployment pipeline is not a hand-written script — it derives from SPEC.md. Every NFR gate, deploy trigger, smoke test, and rollback rule is named in the spec first, then the AI generates the pipeline that enforces them. El pipeline de despliegue no es un script escrito a mano — deriva de SPEC.md. Cada gate de NFR, trigger de despliegue, smoke test y regla de rollback se nombra primero en el spec, luego el AI genera el pipeline que los aplica.

Paste this — generate the CD pipelinePega esto — generar el pipeline de CDRead docs/spec/SPEC.md (Tech Stack, NFRs, Quality Gates) and CONSTITUTION.md. Generate the full CI/CD pipeline (.github/workflows/deploy.yml or the platform in the spec) that: 1. Runs the full test suite: unit, integration, E2E, plus the Tier 2 battery (load, stress, security, concurrency, latency) against the local/staging service stack. 2. Makes every NFR a gate — if L-001 or P-001 fails, the build fails. Not a warning. 3. Deploys to the staging environment named in the spec only after all gates pass. 4. Runs smoke tests post-deploy against every F-NNN feature. 5. Rolls back automatically if any smoke test fails. 6. Never deploys to production at this tier — staging only. Name each pipeline step after the SPEC.md artifact it verifies. Commit: ci(tier2): spec-derived staging pipeline with NFR gates.Lee docs/spec/SPEC.md (Stack Tecnológico, NFRs, Quality Gates) y CONSTITUTION.md. Genera el pipeline de CI/CD completo (.github/workflows/deploy.yml o la plataforma del spec) que: 1. Corra la suite de tests completa: unit, integration, E2E, más la batería Tier 2 (carga, stress, seguridad, concurrencia, latencia) contra el stack de servicios local/staging. 2. Haga de cada NFR un gate — si L-001 o P-001 falla, el build falla. No una advertencia. 3. Despliegue al ambiente de staging nombrado en el spec solo después de que todos los gates pasen. 4. Corra smoke tests post-despliegue contra cada feature F-NNN. 5. Haga rollback automáticamente si algún smoke test falla. 6. Nunca despliegue a producción en este tier — solo staging. Nombra cada step del pipeline con el artefacto de SPEC.md que verifica. Commitea: ci(tier2): spec-derived staging pipeline with NFR gates.

The pipeline's promotion gate calls the qa-walk skill from the previous step — promotion is blocked unless the use-case walk exits green. El gate de promoción del pipeline llama al skill qa-walk del paso anterior — la promoción se bloquea a menos que el paseo de casos de uso salga en verde.

Step 06 — NFR Gates in Practice Paso 06 — Gates de NFR en la Práctica

Each NFR becomes a test that fails if unmet Cada NFR se vuelve un test que falla si no se cumple

Every Non-Functional Requirement from SPEC.md becomes a gate in the staging pipeline. Not a guideline — a gate. If P-001 says the endpoint responds in under 200ms at 100 concurrent users, the pipeline fails if it doesn't. Spec-first means NFRs are defined before deployment, not measured after. Cada Requerimiento No Funcional de SPEC.md se convierte en un gate en el pipeline de staging. No una guía — un gate. Si P-001 dice que el endpoint responde en menos de 200ms con 100 usuarios concurrentes, el pipeline falla si no lo hace. Spec-first significa que los NFRs se definen antes del despliegue, no se miden después.

Paste this — NFR gate verificationPega esto — verificación de gates NFRRead SPEC.md section Non-Functional Requirements. For each NFR: 1. Write a staging test that verifies it at the boundary condition (exactly the specified threshold). 2. Name the test: nfr-[id]-[short-description].test.ts (e.g., nfr-p001-response-latency.test.ts) 3. The test must FAIL if the NFR is not met — not warn, not log. 4. Add each test to the CI workflow under a 'nfr' job that runs on staging only. After writing all NFR tests, print a coverage table: | NFR ID | Threshold | Test file | Status |Lee la sección Requerimientos No Funcionales de SPEC.md. Para cada NFR: 1. Escribe un test de staging que lo verifique en la condición límite (exactamente el umbral especificado). 2. Nombra el test: nfr-[id]-[descripción-corta].test.ts (ej., nfr-p001-latencia-respuesta.test.ts) 3. El test debe FALLAR si el NFR no se cumple — no advertir, no loguear. 4. Agrega cada test al workflow de CI bajo un job 'nfr' que solo corre en staging. Al terminar los tests NFR, imprime una tabla de cobertura: | ID NFR | Umbral | Archivo de test | Estado |

Step 07 — Test Pyramid Completion Paso 07 — Completar la Pirámide

E2E tests that drive the running application Tests E2E que conducen la aplicación en ejecución

By the end of Tier 1, you have unit and integration tests from the harness. Tier 2 completes the pyramid: E2E tests that drive the running application through every use case in SPEC.md. Not happy-path only — error paths, boundary conditions, concurrent access patterns. Al final del Tier 1, tienes tests unitarios y de integración del harness. El Tier 2 completa la pirámide: tests E2E que conducen la aplicación en ejecución a través de cada caso de uso en SPEC.md. No solo happy-path — paths de error, condiciones límite, patrones de acceso concurrente.

Paste this — E2E test completionPega esto — completar tests E2ERead SPEC.md Functional Features. For each F-NNN feature: 1. Write an E2E test that drives the full user flow (actor → precondition → steps → postcondition). 2. Write an error-path test for each acceptance criterion that can fail. 3. Use Playwright (or the project's E2E framework) — browser-driven, real network, real database. 4. Each test file: e2e/[feature-id]-[feature-name].spec.ts 5. Add a screenshot assertion at the postcondition step. After all E2E tests exist, run the full suite and report: - Unit / Integration / E2E counts - Coverage percentage - Any acceptance criterion with no corresponding test (flag as gap)Lee Funcionalidades de SPEC.md. Para cada feature F-NNN: 1. Escribe un test E2E que conduzca el flujo completo del usuario (actor → precondición → pasos → postcondición). 2. Escribe un test de path de error para cada criterio de aceptación que pueda fallar. 3. Usa Playwright (o el framework E2E del proyecto) — browser-driven, red real, base de datos real. 4. Cada archivo de test: e2e/[id-feature]-[nombre-feature].spec.ts 5. Agrega una aserción de screenshot en el paso de postcondición. Cuando existan todos los tests E2E, corre la suite completa y reporta: - Conteos Unit / Integration / E2E - Porcentaje de cobertura - Cualquier criterio de aceptación sin test correspondiente (marcar como gap)

Large Codebases: The Strangler Fig Codebases Grandes: La Higuera Estranguladora

For large brownfield systems, don't apply Tier 2 to the whole codebase at once. Identify the seam — the boundary where new GS-disciplined code will grow. Apply NFR gates and CD pipeline to the new module only. Route a percentage of traffic to it. The old code degrades gracefully as the new module grows. This is the Strangler Fig pattern applied to GS adoption: the new system strangles the old one without a big-bang rewrite. Para sistemas brownfield grandes, no apliques Tier 2 a toda la codebase de una vez. Identifica la costura — el límite donde crecerá el nuevo código bajo disciplina GS. Aplica gates NFR y pipeline de CD solo al nuevo módulo. Ruteá un porcentaje del tráfico hacia él. El código viejo degrada graciosamente mientras crece el nuevo módulo. Este es el patrón Strangler Fig aplicado a la adopción de GS: el nuevo sistema estrangula al viejo sin una reescritura big-bang.

Don't Over-Harness No Sobre-Harness

The harness is Bounded too El harness también es Acotado

The harness is subject to the same Bounded property it enforces. Too many hooks, an overgrown CONSTITUTION.md, or too many CNT (context navigation tree) levels make the AI spend its context budget reading the harness instead of doing the work — re-creating the exact context degradation GS exists to prevent. The cure becomes the disease. Keep the harness minimal-sufficient: every hook, rule, and CNT level must earn its place by preventing a real, observed failure. El harness está sujeto a la misma propiedad Acotada que hace cumplir. Demasiados hooks, un CONSTITUTION.md sobrecargado, o demasiados niveles de CNT (árbol de navegación de contexto) hacen que el AI gaste su presupuesto de contexto leyendo el harness en vez de trabajar — recreando la misma degradación de contexto que GS existe para prevenir. La cura se vuelve la enfermedad. Mantené el harness mínimo-suficiente: cada hook, regla y nivel de CNT debe ganarse su lugar previniendo una falla real y observada.

How many CNT levels? The math, made simple ¿Cuántos niveles de CNT? La matemática, simple

A CNT level is a map the AI reads to find things. A few well-organized maps are fast. Too many tiny maps and the AI burns its budget reading maps instead of working. Three rules keep it healthy: Un nivel de CNT es un mapa que el AI lee para encontrar cosas. Unos pocos mapas bien organizados son rápidos. Demasiados mapas chiquitos y el AI quema su presupuesto leyendo mapas en vez de trabajar. Tres reglas lo mantienen sano:

The CNT health rulesLas reglas de salud del CNTRULE 1 — Branching: each map holds 3 to 10 entries. Fewer than 3 → the level isn't earning its place. Flatten it. More than ~10 → split it into sub-maps. RULE 2 — Depth ≈ log10(file count). Each level multiplies capacity ~10×. ~30 files → 2 levels ~300 files → 3 levels ~3,000 files → 4 levels RULE 3 — Co-change coherence: files that change together live in the same map. If two files almost always change together but sit in different branches, the partition is wrong. Fix the partition before adding depth. Healthy CNT = depth near log10(files), 3–10 per node, partitioned along what actually changes together.REGLA 1 — Ramificación: cada mapa tiene de 3 a 10 entradas. Menos de 3 → el nivel no se gana su lugar. Aplanalo. Más de ~10 → dividilo en sub-mapas. REGLA 2 — Profundidad ≈ log10(cantidad de archivos). Cada nivel multiplica la capacidad ~10×. ~30 archivos → 2 niveles ~300 archivos → 3 niveles ~3.000 archivos → 4 niveles REGLA 3 — Coherencia de co-cambio: los archivos que cambian juntos viven en el mismo mapa. Si dos archivos casi siempre cambian juntos pero están en ramas distintas, la partición está mal. Arreglá la partición antes de agregar profundidad. CNT sano = profundidad cerca de log10(archivos), 3–10 por nodo, particionado según lo que de verdad cambia junto.

The third rule is the one that matters most — and it is computable. Files that change together should be navigated together. Your git history already knows which those are. La tercera regla es la que más importa — y es calculable. Los archivos que cambian juntos deberían navegarse juntos. Tu historial de git ya sabe cuáles son.

Paste this — CNT health checkPega esto — chequeo de salud del CNTRead CONSTITUTION.md and the current CNT structure (the document and code navigation maps). Then read the last 200 commits. 1. For each CNT node, count its entries. - Flag any node with fewer than 3 entries (candidate to flatten). - Flag any node with more than 10 entries (candidate to split). 2. Count the project's files. Target depth = ceil(log10(file count)). Report whether the current tree is deeper or shallower than target. 3. Build the co-change matrix from the 200 commits. List any file pairs that change together in more than half their commits but live in different CNT branches — these are mis-partitioned. 4. Report only (change nothing): - Nodes to flatten / nodes to split - Too deep or too shallow vs. target - Mis-partitioned files (co-change across branches) - The minimal CNT that keeps co-changing files together at 3–10 per node.Lee CONSTITUTION.md y la estructura actual del CNT (los mapas de navegación de documentos y de código). Luego lee los últimos 200 commits. 1. Para cada nodo del CNT, contá sus entradas. - Marcá los nodos con menos de 3 entradas (candidatos a aplanar). - Marcá los nodos con más de 10 entradas (candidatos a dividir). 2. Contá los archivos del proyecto. Profundidad objetivo = ceil(log10(cantidad de archivos)). Reportá si el árbol actual es más profundo o más plano que el objetivo. 3. Construí la matriz de co-cambio desde los 200 commits. Listá los pares de archivos que cambian juntos en más de la mitad de sus commits pero viven en ramas distintas del CNT — están mal particionados. 4. Reportá solamente (no cambies nada): - Nodos a aplanar / nodos a dividir - Demasiado profundo o demasiado plano vs. el objetivo - Archivos mal particionados (co-cambio entre ramas) - El CNT mínimo que mantiene juntos los archivos que cambian juntos, a 3–10 por nodo.

Run this whenever the AI starts feeling slower or vaguer. It usually means the harness has outgrown its job — the navigation cost now exceeds the navigation benefit. A CNT that follows what changes together stays cheap to read and precise to navigate. Corré esto cuando el AI empiece a sentirse más lento o más vago. Suele significar que el harness creció más allá de su función — el costo de navegación ahora supera el beneficio. Un CNT que sigue lo que cambia junto se mantiene barato de leer y preciso de navegar.

← Temper: Structural Disciplines Anneal: Brownfield Audit →