commit · quire · 0c68bdc0

Document per-run CI container architecture

Pivot the planned CI execution model from "run-fn returns a (container
{...}) spec" to "per-run container, sh tunnels via docker exec." The
language-level reason for using Fennel — dynamic orchestration, branching
on real command output — only pays off when sh is the chokepoint inside a
running container, not when the run-fn emits a static spec.

Adds the design doc capturing the decision, updates CI.md and CI-FENNEL.md
to remove the (container ...) primitive and introduce (ci.image ...) at
the pipeline level.

Assisted-by: Claude Opus 4.7 (1M context) via Claude Code

change zyyoxuuwltwrmtllmwtrwxkuywwnuswq

commit 0c68bdc0598e0e49af07c8cd24b932ed160b7232

author Alpha Chen <alpha@kejadlen.dev>

date 1mo ago

parent svmprrqm

diff --git a/docs/CI-FENNEL.md b/docs/CI-FENNEL.md
index 5ea0036..fab64e8 100644
--- a/docs/CI-FENNEL.md
+++ b/docs/CI-FENNEL.md
@@ -17,6 +17,16 @@ The mental model: **jobs are functions from inputs to outputs; sources are reser
 
 This is closer to Concourse's resources-and-jobs model than to GitHub Actions' triggers-and-jobs model. More elegant, less familiar. Worth being deliberate about.
 
+## Pipeline-level container image
+
+```
+(ci.image "rust:1.76")
+```
+
+Top-level form, called once before any `(ci.job ...)`. Declares the image used to start the run's container; every `(sh ...)` call from every job in the run is `docker exec`'d into this container. Pipelines that need heterogeneous images per job will get a per-job override later — for now, one image per pipeline keeps the model simple.
+
+A pipeline that registers a job but never declares an image errors at validation, not at runtime. Calling `ci.image` more than once errors with the same shape as other duplicate-registration errors.
+
 ## The `job` primitive
 
 ```
@@ -29,9 +39,9 @@ Three positional arguments:
 * **`inputs`** — list of names. Each is a job id or a source ref. Must be non-empty. v1: strings/keywords only (see "Future: input args" below).
 * **`run`** — function from inputs to outputs (a table) or `nil` (skipped).
 
-That's the entire surface. Image selection, conditional firing, output extraction — all done inside `run` using runtime primitives. Fennel-as-code means there's no need for config-language conveniences when a function will do.
+That's the entire surface. Conditional firing, output extraction, follow-up commands — all done inside `run` using runtime primitives. Image lives at the pipeline level (see above), not on individual jobs. Fennel-as-code means there's no need for config-language conveniences when a function will do.
 
-If a fourth concept ever genuinely needs to be expressible at the job level (per-job timeout, retry policy, secret scoping), that's the moment to introduce a map-form variant — `(job id {:inputs ... :run ... :timeout ...})`. Migration would be mechanical. Until then, the positional form is shorter and reads better for the actual surface.
+If a fourth concept ever genuinely needs to be expressible at the job level (per-job image override, timeout, retry policy, secret scoping), that's the moment to introduce a map-form variant — `(job id {:inputs ... :run ... :image ...})`. Migration would be mechanical. Until then, the positional form is shorter and reads better for the actual surface.
 
 ## Inputs
 
@@ -130,51 +140,50 @@ A bad `ci.fnl` push gets a CI run that fails immediately with the parse error, s
 
 ## `run` — the only primitive
 
-`run` is a host-side Fennel function (the container can't run Fennel) called when the job is about to execute. It receives the runtime handle and returns either:
+`run` is a host-side Fennel function called when the job is about to execute. It receives the runtime handle and returns either:
 
 * **A table** — the job's outputs. Available to dependent jobs through `(jobs <this-job>)`.
 * **`nil`** — the job is skipped. Dependents see `(jobs <this-job>)` return `nil`.
 
 That's the whole contract. No sugar layer, no introspection, no defaulting. The runner records what was returned.
 
-Inside `run`, the function uses **runtime primitives** bound on the handle. The most important is `(container {...})`, which runs a container and returns a result table:
+Inside `run`, the function uses **runtime primitives** bound on the handle. The most important is `(sh cmd opts?)`, which `docker exec`'s a command into the run's container and returns a result table:
 
 ```
 (job :test [:quire/push]
-  (fn [{: container : jobs}]
+  (fn [{: sh : jobs}]
     (let [push (jobs :quire/push)]
-      (container {:image "rust:1.75"
-                  :cmd (.. "git checkout " push.sha " && cargo test")}))))
+      (sh ["git" "checkout" push.sha])
+      (sh "cargo test"))))
 ```
 
-`(container ...)` returns `{:exit :stdout :stderr :duration}`. That's what `run` returns. The runner records it as the outputs.
-
-For more complex jobs, the function does its own orchestration: multiple containers, host-side work between them, computed outputs derived from intermediate results:
+`(sh ...)` returns `{:exit :stdout :stderr :cmd}`. The run-fn can branch on that — checking exit, parsing stdout, deciding whether to issue follow-up commands. That dynamism is the whole reason ci.fnl is Fennel and not YAML:
 
 ```
 (job :test-and-package [:quire/push]
-  (fn [{: container : jobs}]
-    (let [push (jobs :quire/push)
-          test (container {:image "rust:1.75"
-                           :cmd ["git checkout" push.sha "&&" "cargo test"]})]
-      (when (= 0 test.exit)
-        (let [pkg (container {:image "alpine"
-                              :cmd "tar czf out.tar.gz target/release"})]
-          {:exit pkg.exit
-           :artifacts ["out.tar.gz"]
-           :test-stdout test.stdout})))))
+  (fn [{: sh : jobs}]
+    (let [push (jobs :quire/push)]
+      (sh ["git" "checkout" push.sha])
+      (let [test (sh "cargo test")]
+        (when (= 0 test.exit)
+          (let [pkg (sh "tar czf out.tar.gz target/release")]
+            {:exit pkg.exit
+             :artifacts ["out.tar.gz"]
+             :test-stdout test.stdout}))))))
 ```
 
 If the test fails, the outer `(when ...)` returns nil → job skipped. If it passes, the package step runs and the function returns a custom output table. One mechanism, scales from "run a command" to "orchestrate a multi-step pipeline."
 
+`sh` is the only host-effect primitive. There is no `(container ...)` form — the run's container is started by the runner before the run-fn is invoked, and every `sh` call tunnels into it via `docker exec`. Making `sh` the chokepoint is what lets the in-process VM sandbox (`io`/`os`/`debug` removed from the execute VM) actually mean something — the script can't quietly bypass logging or persistence by reaching for `os.execute`.
+
 ### Why `run` is "just a function"
 
 Earlier drafts of this design had three return shapes (string, list of strings, table) plus an `:outputs` field for declarative output extension plus a `:when` field for conditional firing plus an `:image` field for the default container image. All gone. They were paying for conveniences that aren't conveniences in a code-first config:
 
-* **String sugar.** `:run "cargo test"` saves about ten characters over `(fn [_] (container {:image "rust:1.75" :cmd "cargo test"}))`. Not worth a second mental model.
-* **`:outputs` declarative extension.** "Read coverage.json after the container exits" is a Fennel one-liner inside `run`: `(let [r (container {...})] {:exit r.exit :coverage (read-json "coverage.json")})`. Helpers compose to clean up repetition.
+* **String sugar.** `:run "cargo test"` saves about ten characters over `(fn [_] (sh "cargo test"))`. Not worth a second mental model.
+* **`:outputs` declarative extension.** "Read coverage.json after the command exits" is a Fennel one-liner inside `run`: `(let [r (sh "...")] {:exit r.exit :coverage (read-json "coverage.json")})`. Helpers compose to clean up repetition.
 * **`:when`.** Returning `nil` from `run` already means "skip." Filtering and work end up in the same expression, which makes the intent more visible, not less.
-* **`:image`.** Image lives on the `(container ...)` call where it's actually used. Lets a single job legitimately use multiple images.
+* **`:image`.** Image is declared once at the pipeline level via `(ci.image ...)`. Per-job override can be added as a map-form opts arg if a pipeline ever needs heterogeneity.
 
 The residual things that *aren't* "just functions" — the inputs list and the id — are the ones that genuinely need to be language-level. They define the graph and the identity. Everything else is user-space.
 
@@ -183,75 +192,67 @@ The residual things that *aren't* "just functions" — the inputs list and the i
 Bound on the runtime handle passed into each `run` function. Destructure what you need: `(fn [{: sh : secret : jobs}] ...)`.
 
 * `(jobs name)` — return outputs for `name` (a transitive ancestor of the calling job, or a source ref). Errors if `name` is not in the calling job's transitive inputs.
-* `(container {opts})` — run a container, return `{:exit :stdout :stderr :duration}`. Opts: `:image`, `:cmd` (string or list), `:env`, `:cwd`, `:cache` (cache dir mount, defaults to job's image-keyed cache).
-* `(sh cmd)` — run a command on the host, no container. For cheap utility work. Returns the same shape as `container`.
+* `(sh cmd opts?)` — `docker exec` a command into the run's container, return `{:exit :stdout :stderr :cmd}`. `cmd` is either a string (run under `sh -c` inside the container) or a non-empty sequence of strings (argv, no shell). `opts` accepts `:env` (table of overrides) and `:cwd` (path inside `/work`).
 * `(secret name)` — resolve a named secret from the operator's config. Errors if the name isn't declared.
 * `(read-file path)`, `(read-json path)`, `(write-file path content)` — workspace I/O. Paths relative to the workspace.
 * `(log msg)` — append to the job's log file. Visible in the web UI.
 * `(env name)` — read an environment variable from the runner's environment.
 
-Each of these blocks the Fennel function until it returns. Multi-container parallelism inside one job is a v2 want; the v1 model is "the function runs sequentially, calling primitives that block."
+Each of these blocks the Fennel function until it returns. Multi-`sh`-call parallelism inside one job is a v2 want; the v1 model is "the function runs sequentially, calling primitives that block."
+
+`sh` is the only host-effect channel. There is no `(container ...)` primitive — the run's container is started by the runner before any run-fn executes (with the image declared via `(ci.image ...)` at the pipeline level), and every `sh` call execs into it via `docker exec`. Stdout and stderr stay separated (no TTY); ordering is approximate but each chunk has its own timestamp in the JSONL log.
 
-> **v0 status:** `sh`, `secret`, and `jobs` are bound today. `container`, `read-file`/`read-json`/`write-file`, `log`, and `env` are planned and tracked separately.
+> **v0 status:** `sh`, `secret`, and `jobs` are bound today. `sh` currently shells out on the host; the per-run container + `docker exec` tunneling is planned (see backlog `lpmoszxo`, `knmkqkvx`). `read-file`/`read-json`/`write-file`, `log`, and `env` are planned and tracked separately.
 
-Eval is unsandboxed by default (see CI.md). A `run` function that loops forever or allocates without bound will hang or OOM `quire serve`. The mitigation is the same as for any Fennel hang: write `ci.fnl` thoughtfully. The bwrap opt-in (also see CI.md) covers eval and primitive calls together when it lands.
+The execute VM is sandboxed (no `io`/`os`/`debug`), so `sh` is the documented chokepoint for any host effect — `os.execute` and `io.open` are not available alternates. See CI.md for the full sandbox shape and the bwrap opt-in for the untrusted-code threat model.
 
 ## A worked example
 
 ```
-;; Helper: a parameterized test job
-(fn rust-test [version]
-  (job (.. "test-" version) [:quire/push]
-    (fn [{: container : jobs}]
-      (let [push (jobs :quire/push)]
-        (when (= "main" push.branch)
-          (container {:image (.. "rust:" version)
-                      :cmd [(.. "git checkout " push.sha)
-                            "cargo test --all-features"]}))))))
-
-;; Matrix testing on every push to main
-(each [_ v (ipairs [:1.75 :1.76 :stable])]
-  (rust-test v))
-
-;; Build only if all tests passed
-(job :build [:test-1.75 :test-1.76 :test-stable :quire/push]
-  (fn [{: container : jobs}]
+(local ci (require :quire.ci))
+
+(ci.image "rust:1.76")  ; one image for the whole pipeline
+
+;; Test on every push to main
+(ci.job :test [:quire/push]
+  (fn [{: sh : jobs}]
+    (let [push (jobs :quire/push)]
+      (when (= "main" push.branch)
+        (sh ["git" "checkout" push.sha])
+        (sh "cargo test --all-features")))))
+
+;; Build only if test passed
+(ci.job :build [:test :quire/push]
+  (fn [{: sh : jobs}]
     (let [push (jobs :quire/push)
-          t-175 (jobs :test-1.75)
-          t-176 (jobs :test-1.76)
-          t-stb (jobs :test-stable)]
-      (when (and t-175 t-176 t-stb
-                 (= 0 t-175.exit)
-                 (= 0 t-176.exit)
-                 (= 0 t-stb.exit))
-        (let [r (container {:image "rust:1.75"
-                            :cmd [(.. "git checkout " push.sha)
-                                  "cargo build --release"]})]
+          test (jobs :test)]
+      (when (and test (= 0 test.exit))
+        (sh ["git" "checkout" push.sha])
+        (let [r (sh "cargo build --release")]
           {:exit r.exit
            :artifacts ["target/release/quire"]})))))
 
 ;; Deploy on push to main only
-(job :deploy [:build]
-  (fn [{: container : jobs}]
+(ci.job :deploy [:build]
+  (fn [{: sh : jobs}]
     (when (jobs :build)
-      (container {:image "alpine"
-                  :cmd "scp target/release/quire host:/usr/local/bin/"}))))
+      (sh "scp target/release/quire host:/usr/local/bin/"))))
 
 ;; Tagged release: publish to a registry
-(job :publish [:quire/push]
-  (fn [{: container : jobs}]
+(ci.job :publish [:quire/push]
+  (fn [{: sh : jobs}]
     (let [push (jobs :quire/push)]
       (when (and push.tag (string.match push.tag "^v"))
-        (container {:image "rust:1.75"
-                    :cmd [(.. "git checkout " push.tag)
-                          "cargo publish"]})))))
+        (sh ["git" "checkout" push.tag])
+        (sh "cargo publish")))))
 ```
 
 What this expresses:
 
-* Every push fires a run. Test jobs check `push.branch` and return nil for non-main pushes; build/deploy chain skips with them (their inputs are nil, their `(when ...)` checks see nil).
+* Every push fires a run. The test job checks `push.branch` and returns nil for non-main pushes; the build/deploy chain skips with it (their inputs are nil, their `(when ...)` checks see nil).
 * Tagged pushes additionally fire `:publish`, which has its own predicate.
-* The "all tests passed" check in `:build` is now visible in code rather than implicit. More verbose than a `:when` field, but the verbosity is honest about what's happening — and a helper (`(all-passed test-1.75 test-1.76 test-stable)`) would clean it up if the pattern repeats.
+* The "test passed" check in `:build` is visible in code rather than implicit. More verbose than a `:when` field, but the verbosity is honest about what's happening.
+* All jobs run inside the same per-run container started from `rust:1.76`. `cargo`, `git`, and `scp` are expected to be present in the image (or installed by an earlier `sh` in the run); pipelines that need different toolchains today should pick an image that has all of them, or wait for per-job image override.
 
 ## Evaluation timing
 
@@ -268,9 +269,9 @@ The three-context model means **`ci.fnl` is re-evaluated more than you might exp
 ## Open questions
 
 * **Source events with no matching jobs.** If `ci.fnl` has no jobs whose transitive inputs include `:quire/push`, do pushes still create empty runs? Probably no — skip silently. But worth being explicit.
-* **What's the exact set of runtime primitives?** `container`, `sh`, `read-file` are obvious. Less obvious: do we expose `tcp-connect`, `http-get`? They'd enable real "jobs as observers" patterns, but they're a long road into "Fennel is a real programming environment." Probably no, defer.
-* **Artifacts as inputs.** Job B with `[:build]` as inputs — does B's workspace start with build's artifacts already in place? Probably yes; otherwise the `:artifacts` output is data-only and you can't use them in subsequent containers. Implementation: artifacts unpacked into B's workspace before B's container starts.
-* **Image pre-pull discoverability.** Without a top-level `:image` field, the runner can't statically know what images a job uses — it has to actually run the function (or analyze it, which is fragile). Probably acceptable for v1: pull-on-demand from `(container ...)` calls works fine, just with a one-time latency per new image. A `quire ci pull <image>` command lets users warm explicitly.
+* **What's the exact set of runtime primitives?** `sh`, `read-file` are obvious. Less obvious: do we expose `tcp-connect`, `http-get`? They'd enable real "jobs as observers" patterns, but they're a long road into "Fennel is a real programming environment." Probably no, defer.
+* **Artifacts as inputs.** Job B with `[:build]` as inputs — does B's workspace start with build's artifacts already in place? Under per-run container, `/work` is shared across jobs already; artifacts written by job A are visible to job B by default. The open question is whether *outputs* declared from a job carry artifact paths the runner should pin for retention beyond the run.
+* **Image pre-pull.** With a single pipeline-level `(ci.image ...)` declaration, the runner knows the image up front and can pull before starting the run container. Pull-on-demand at `docker run` time works too. A `quire ci pull <image>` command lets users warm explicitly if they want to avoid first-push latency.
 * **Error semantics inside `run`.** What if it throws? Job marked failed, exception text into the log. What if it returns a malformed value (not nil, not a table)? Mark failed, log a schema warning.
 * **Push payload size.** `:quire/push.files-changed` could be huge for a large merge. Do we cap it? Stream it differently? Defer to first time it bites.
 * **Composition across files.** A `quire/stdlib.fnl` of common helpers, or per-repo Fennel modules. Real want eventually; not v1.
@@ -286,11 +287,12 @@ The three-context model means **`ci.fnl` is re-evaluated more than you might exp
 * **Builtins live under `quire/`**; user job ids cannot contain `/`.
 * **For v1, the only source is `:quire/push`.** Cron, webhook, manual deferred.
 * **Filtering happens inside `run`** by returning `nil`. Every push starts a run; jobs that return nil from `run` are skipped.
-* **Runtime handle as the run-fn argument.** The function receives a single table `{: sh : secret : jobs : container ...}` and destructures the primitives it uses. Slash-containing source names are read via the `jobs` accessor — `(jobs :quire/push)` — never via dot access.
+* **Runtime handle as the run-fn argument.** The function receives a single table `{: sh : secret : jobs ...}` and destructures the primitives it uses. Slash-containing source names are read via the `jobs` accessor — `(jobs :quire/push)` — never via dot access.
 * **`(jobs name)` is the only accessor for upstream outputs**, covering both source refs and job outputs. Transitive ancestors are visible; non-ancestors and unknown names raise a Lua error.
 * **Dependency graph derived from the inputs list**, not declared separately. No `:needs`.
 * **Four structural validations**: acyclic (registration eval), non-empty inputs (registration eval), reachability from a source (registration eval), no `/` in user job ids (parse time). All fail-closed with named-target error messages.
 * **`run` is a function** `(fn [{: jobs ...}] ...)`. Returns a table (the outputs) or `nil` (skipped). No sugar.
-* **`(container {opts})` is the primary primitive** for running containers. Opts include `:image`, so a single job can use multiple images by making multiple container calls.
+* **`(sh cmd opts?)` is the only host-effect primitive.** `docker exec`s into the run's container; returns `{:exit :stdout :stderr :cmd}`. There is no `(container ...)` form. The execute VM is sandboxed (no `io`/`os`/`debug`) so `sh` is the documented chokepoint.
+* **`(ci.image <name>)` declares the image** at the pipeline level. One image per pipeline. Per-job override deferred until pipelines actually need heterogeneity; would arrive as a map-form `(ci.job ...)` opts arg.
 * **Three eval contexts** — registration, run start, per job — all in-process inside `quire serve`. Sandboxing model and threat model are described in CI.md.
 * **Source registration sourced from the default branch only** (relevant once registration becomes meaningful — for v1 it's a no-op since `:quire/push` needs no registration).
diff --git a/docs/CI.md b/docs/CI.md
index f83dac8..0a74a2d 100644
--- a/docs/CI.md
+++ b/docs/CI.md
@@ -4,23 +4,27 @@ How CI works in quire. Slots alongside PLAN.md; will likely fold in once the ope
 
 ## Shape
 
-The runner lives **in-process with `quire serve`**, as a long-lived tokio task in the same binary. It owns a queue of pending runs (in-memory, reconstructed from disk on startup), watches it for new entries, materializes a workspace per run, evaluates `.quire/ci.fnl`, and shells out to execute each job. Jobs are **ephemeral** — fresh sandbox per job, torn down on exit.
+The runner lives **in-process with `quire serve`**, as a long-lived tokio task in the same binary. It owns a queue of pending runs (in-memory, reconstructed from disk on startup), watches it for new entries, materializes a workspace per run, starts a per-run container with the pipeline's declared image, evaluates `.quire/ci.fnl` in the host process, and tunnels each `(sh ...)` call from each job into the run's container via `docker exec`. The container is **per-run** — one started at run start, torn down at run end.
 
-The runner itself is not a container. It's a tokio task. The thing the runner *spawns* is the sandbox.
+The runner itself is not a container. It's a tokio task. The thing the runner *spawns* is the run's sandbox container.
 
 ```
 quire (one process)
   ├── HTTP server (quire serve)
   ├── ci-runner task
-  │     ├── run #1: <sandbox> rust:1.75 ...    (ephemeral)
-  │     ├── run #2: <sandbox> python:3.12 ...  (ephemeral)
+  │     ├── run #1: <sandbox> rust:1.75 ...    (per-run)
+  │     ├── run #2: <sandbox> python:3.12 ...  (per-run)
   │     └── ...
   └── (shared state: run queue, log broadcasts)
 ```
 
-Not the long-lived-per-image runner pool that GitHub Actions and GitLab use. That model amortizes startup at the cost of hermeticity — job N+1 inherits whatever job N left behind in the filesystem, which becomes a permanent class of "fails after the previous job" debugging. The speedup mostly comes from cache reuse, which is achievable with bind-mounted cache directories without taking on the statefulness debt. Personal forge doing dozens of runs/week, not thousands/day; container-per-job is strictly better here.
+Not the long-lived-per-image runner pool that GitHub Actions and GitLab use. That model amortizes startup at the cost of hermeticity — run N+1 inherits whatever run N left behind in the filesystem, which becomes a permanent class of "fails after the previous run" debugging. The speedup mostly comes from cache reuse, which is achievable with bind-mounted cache directories without taking on the statefulness debt. Personal forge doing dozens of runs/week, not thousands/day.
 
-The runner doesn't get its own process because **it doesn't execute user code in its address space**. With container-per-job, the runner reads files, builds a `docker run` argv, spawns it, copies stdout to a log file, reads exit code. None of those steps run user code in-process. A bug in `cargo test` can't crash the runner because it's running in a different container with its own kernel namespace. Process isolation between web and runner would buy nothing here — the docker boundary is doing that work. Don't pay twice for it.
+Per-run (vs per-job) is the simplest granularity for v1: one container start per run, jobs share workspace and toolchain caches naturally, and multi-job (when it lands) becomes concurrent `docker exec` into the same container. Per-job container differentiation can be added later if pipelines actually need it.
+
+The runner doesn't get its own process because **it doesn't execute user code in its address space**. The runner reads files, builds a `docker run` argv to start the per-run container, then issues `docker exec` calls for each `(sh ...)` from each job, streams stdout/stderr from each exec into per-job log files, captures exit codes, records container ID for cancellation. None of these steps run user code in-process. A bug in `cargo test` can't crash the runner because it's running in a different container with its own kernel namespace. Process isolation between web and runner would buy nothing here — the docker boundary is doing that work. Don't pay twice for it.
+
+Within the host process, `(sh ...)` is the only sanctioned host-effect primitive in the Lua VM. See "Sandbox the in-process VM" below — the compile-then-execute split removes `io`/`os`/`debug` from the execute VM so a buggy or hostile ci.fnl can't bypass the chokepoint.
 
 ## Communication: filesystem as state of record, channels as optimization
 
@@ -67,7 +71,7 @@ Within a run, **jobs form a DAG** (see next section), but the executor schedules
 When a new push arrives for a ref that already has work in flight or queued for the same `(repo, ref)`:
 
 * **Queued, not yet started:** new push replaces the queued one. Old run marked `superseded`. If you pushed twice in 30 seconds, you almost certainly only care about the second result.
-* **Currently running:** kill the in-flight sandbox (`docker kill <id>`), mark the run `superseded`, enqueue the new one.
+* **Currently running:** kill the in-flight run container (`docker kill <id>`), mark the run `superseded`, enqueue the new one.
 * **Different ref of same repo:** unaffected. Pushing to `feature-branch` should not kill a running build of `main`.
 
 Cheap to get right *if* the run record stores the ref it's building from the start, and queue lookups are "any pending or active runs for `<repo>:<ref>`?" Both are one-line conditions.
@@ -124,11 +128,13 @@ Code, not data, means matrix builds, helpers, and conditionals fall out for free
     :run "cargo test"})}
 ```
 
-### Eval runs in-process, unsandboxed by default
+### Eval runs in-process; the execute VM is sandboxed
+
+Eval happens inside `quire serve`, in the same Lua/Fennel host that loads `config.fnl`. No subprocess, no wallclock cap, no memory limit. Every `ci.fnl` is code the operator wrote; the untrusted-code threat model that would justify external isolation doesn't exist.
 
-Eval happens inside `quire serve`, in the same Fennel host that loads `config.fnl`. No subprocess, no wallclock cap, no memory limit. Every `ci.fnl` is code the operator wrote; the threat model that would justify a sandbox doesn't exist.
+A separate concern is in-process VM hardening: keeping a buggy or careless ci.fnl from bypassing the `(sh ...)` chokepoint by reaching for `os.execute` or `io.open` directly. The plan is a compile-then-execute VM split — the compile VM runs Lua 5.4 with full `debug` (Fennel's macroexpand and traceback need it); the execute VM is `Lua::new()` with `io`/`os`/`debug` removed and only `{sh, secret, jobs, string, table, math}` exposed. This makes `sh` the documented chokepoint and the JSONL persistence path unbypassable. See backlog `lsqluktu`. A subsequent task (`rzsonvsx`) layers Luau on the execute VM for bytecode-load validation and a tighter `debug` API as defense in depth.
 
-The cost: a buggy `ci.fnl` (infinite loop, runaway allocation, `string.rep "x" 2^30`) can hang or OOM the server. Mitigation is "don't write that"; for the personal-forge case this is acceptable. If a `ci.fnl` does hang the server, the operator notices because they wrote the bad `ci.fnl` and pushed it themselves.
+The cost of in-process eval remains: a `ci.fnl` with an infinite loop or runaway allocation (`string.rep "x" 2^30`) can hang or OOM the server. Mitigation is "don't write that"; for the personal-forge case this is acceptable.
 
 ### Sandboxed eval — opt-in, future
 
@@ -143,28 +149,31 @@ The reason this is the chosen path rather than "subprocess + rlimit, no bwrap" 
 1. **`post-receive` hook** sends a push event (one JSON line: `{type, repo, pushed_at, refs: [{ref, old_sha, new_sha}, ...]}`) over `/var/quire/server.sock` and exits. The listener task in `quire serve` parses the event, allocates a run-id per ref, writes `runs/<repo>/<run-id>/{meta.json, state.json}`, and signals the runner via mpsc. No CI work runs in the hook itself.
 2. **Runner picks up** the entry from the queue. Atomic rename `pending/<id>` → `active/<id>` for state-machine clarity.
 3. **Materialize workspace.** `git --git-dir=repos/foo.git archive <sha> | tar -x -C workspace/`. No worktree, no checkout state on the bare repo. Workspace is throwaway; deleted at end of run.
-4. **Evaluate `.quire/ci.fnl`** in-process (see above). Result is the job DAG.
-5. **Per ready job:** spawn the sandbox with workspace + caches mounted, stream stdout/stderr to `jobs/<job-id>/log` (and broadcast for live web tailing), capture exit code, record container ID for cancellation.
-6. **Aggregate.** Write final status to the run directory. Move `active/<id>` → `complete/<id>` (or `failed/<id>`).
+4. **Evaluate `.quire/ci.fnl`** in the host process (see above). Pipeline image is read from the `(ci.image ...)` registration; jobs are registered via `(ci.job ...)`; the run-fns are not yet invoked.
+5. **Start the run container.** `docker run -d --rm --mount type=bind,src=<run-dir>,dst=/work -w /work <image> sleep infinity`. Container ID stowed on the runtime. The run's container hosts every `(sh ...)` call from every job in the run.
+6. **Per ready job:** invoke its run-fn in topological order. Each `(sh ...)` call inside the run-fn issues `docker exec` (no TTY) into the run container, streams stdout/stderr into `jobs/<job-id>/log.jsonl` as JSONL events (one per `sh-start`, `stdout`/`stderr`, `sh-exit`), and returns `{exit, stdout, stderr, cmd}` to Lua. Container-level events (`container-start`, `container-died`, `container-end`) go into the run's own `<run-dir>/log.jsonl`.
+7. **Tear down the run container.** `docker stop` + `docker rm`. Even on error paths — no orphaned containers if a run-fn errors.
+8. **Aggregate.** Write final status to the run directory. Move `active/<id>` → `complete/<id>` (or `failed/<id>`).
 
 ## Run record schema
 
 ```
 runs/<repo>/<run-id>/
   meta.json        # immutable: sha, ref, pusher, pushed_at
-  state.json       # mutable: status, started_at, finished_at, runner_pid, sandbox_id
+  state.json       # mutable: status, started_at, finished_at, runner_pid, container_id
+  log.jsonl        # per-run events: container-start, container-died, container-end
   jobs/
     <job-id>/
-      spec.json    # immutable: image, cmds, env, needs (extracted from ci.fnl)
-      state.json   # mutable: status, started_at, finished_at, exit_code, sandbox_id
-      log          # append-only stdout+stderr
+      spec.json    # immutable: inputs, registration source location
+      state.json   # mutable: status, started_at, finished_at, outputs
+      log.jsonl    # per-job events: sh-start, stdout, stderr, sh-exit
   cancel           # touch-file; runner checks before each job
 ```
 
 Two principles fall out:
 
 * **Immutable vs. mutable files are separate.** `meta.json` is written once and never touched. Readers (the web UI) can cache `meta.json` indefinitely and only re-read `state.json`.
-* **Append-only logs.** Web UI tails the log file; runner appends; no coordination needed. Live tailing also goes through a `tokio::sync::broadcast` channel for sub-second latency, but the file is the source of truth.
+* **Append-only JSONL.** Each `log.jsonl` is one structured event per line, written as bytes arrive. The web UI tails the file directly — no extra protocol needed for streaming. Crash-safe: if `quire serve` dies mid-run, the file is valid JSONL up to the last complete line. Non-UTF-8 stdout/stderr bytes are recorded with `encoding: "base64"` rather than silently substituted with U+FFFD. Live tailing can still go through a `tokio::sync::broadcast` channel for sub-second latency, but the file is the source of truth.
 
 ## Sandbox backend — the real fork in the road
 
@@ -172,7 +181,7 @@ Polyglot toolchains rule out "just bind-mount host `/`" — that path requires e
 
 ### Path A: Docker (DooD)
 
-`docker run --rm -v <ws>:/workspace -w /workspace --cpus=N --memory=M <image> sh -c '<cmds>'` per job. Shared image cache, well-trodden, every CI system on earth has done this.
+`docker run -d --rm --mount type=bind,src=<ws>,dst=/work -w /work --cpus=N --memory=M <image> sleep infinity` per run, then `docker exec` (no TTY) for each `(sh ...)` call from every job in the run. Shared image cache, well-trodden, every CI system on earth has done this.
 
 Quire stays containerized. The container talks to the host's docker daemon via bind-mounted `/var/run/docker.sock`. Anyone with that socket effectively has root on the host — fine here since quire and the operator account already share the box.
 
@@ -182,7 +191,7 @@ Cost: socket mount, the path-pinning rule, daemon-talking-to-daemon, quire stays
 
 ### Path B: OCI + bubblewrap
 
-`skopeo copy docker://rust:1.75-slim oci:images/rust-1.75:latest`, then `umoci unpack`, then bwrap binds the rootfs and runs the job:
+`skopeo copy docker://rust:1.75-slim oci:images/rust-1.75:latest`, then `umoci unpack`, then bwrap binds the rootfs and runs the run container. `docker exec`'s role is filled by spawning into the persistent bwrap namespace (or relaunching bwrap per `(sh ...)` if persistent processes prove painful — measure):
 
 ```
 bwrap --bind rootfs/rust-1.75 / \
@@ -238,7 +247,9 @@ Punt on cache invalidation until it actually annoys. "Delete the cache dir" is a
 
 * **Runner is in-process** with `quire serve` as a tokio task; not a separate process. Filesystem is the state of record; channels are the wakeup optimization.
 * **No SQLite in v1.** If it enters later, it's a secondary index over the filesystem, never primary. `rm quire.db && quire reindex` must always recover.
-* **Container-per-job**, not long-lived runners.
+* **Per-run container**, not per-job and not long-lived runners. One `docker run` at run start, `docker exec` per `(sh ...)` call from each job, `docker stop` at run end. Per-job container differentiation is a deferred extension.
+* **`(sh ...)` is the only host-effect primitive in the Lua VM.** No `(container ...)` primitive. The execute VM is hardened (no `io`/`os`/`debug`) so `sh` becomes the documented chokepoint — every effect is auditable, persistable, redactable in one place.
+* **Pipeline-level image declaration via `(ci.image ...)`.** Single image per pipeline; per-job override deferred until pipelines actually need heterogeneity.
 * **DooD for v1**; OCI+bwrap as planned migration path.
 * **Workspace materialized via `git archive`**, not worktree.
 * **Max concurrency 1** across the whole forge. Escape valve is `max_concurrent_runs` config + per-repo cache file lock; not building it now.
@@ -246,6 +257,6 @@ Punt on cache invalidation until it actually annoys. "Delete the cache dir" is a
 * **`:allow-failure`** flag exists from v1.
 * **Supersede on same `(repo, ref)`**: replace queued, kill running.
 * **`.quire/ci.fnl` is executed**, returns the DAG.
-* **Eval runs in-process, unsandboxed by default.** Trusted code; the operator wrote it. Sandboxed eval (bwrap, with filesystem/network/wallclock/memory limits) is an opt-in for repos that run `ci.fnl` from someone other than the operator. Not built; not v1.
+* **Eval runs in-process; the execute VM is sandboxed.** Compile VM keeps full Lua 5.4 (Fennel macroexpand/traceback need `debug`); execute VM removes `io`/`os`/`debug` and exposes only `{sh, secret, jobs, string, table, math}`. Trusted-code threat model — no external isolation. Bwrap-based eval sandbox stays available as an opt-in for the day quire runs `ci.fnl` from someone other than the operator. Not built; not v1.
 * **Hook is a transport, not a writer.** `post-receive` sends a push event over `/var/quire/server.sock`; `quire serve` writes the run record. Hook never touches `runs/`. Tradeoff: zero-loss-on-server-down is dropped in v1 (push lands but no run is created). Fallback to direct disk write is a deferred follow-up.
 * **Caches** are bind-mounted directories under `/var/quire/cache/<repo>/`.
diff --git a/docs/plans/2026-05-01-ci-execution-architecture-design.md b/docs/plans/2026-05-01-ci-execution-architecture-design.md
new file mode 100644
index 0000000..3334b88
--- /dev/null
+++ b/docs/plans/2026-05-01-ci-execution-architecture-design.md
@@ -0,0 +1,87 @@
+# CI execution architecture
+
+Captures the pivot from "run-fn returns a `(container {...})` spec" to "per-run container, `sh` tunnels via `docker exec`," and the surrounding decisions that fall out of it.
+
+## Context
+
+Today, ci.fnl evaluates in-process inside `quire serve`, and `(sh ...)` shells out on the host. There is no container; `sh` runs commands as the quire user. A buggy or hostile ci.fnl can `os.execute("rm -rf ~")` and bypass everything — the Lua VM has full standard libraries.
+
+The next iceboxed CI story (`uutoospp`, since archived) framed containerization as: the run-fn returns a `(container {:image ... :cmd ...})` table; the runner spawns a one-shot container per job with that spec. Container is a fire-and-forget primitive. The run-fn is a planner.
+
+This session reconsidered that model.
+
+## Three architectures
+
+A. **VM-in-container** — Lua/Fennel runs inside the per-run container. Heaviest image (Lua + Fennel + quire glue per job), and a double-evaluation problem: graph extraction has to happen outside the container, per-job execution inside.
+
+B. **VM-on-host, `(container {...})` spec** — what `uutoospp` described. Fennel reduces to a configuration DSL; the run-fn emits a static spec. The container is fire-and-forget and the run-fn cannot react to mid-command output. Most of Fennel-the-language's value (branching, data manipulation, reuse) is wasted because the only thing crossing the container boundary is a static spec table.
+
+C. **VM-on-host, `sh` tunnels via `docker exec`** — the run-fn executes inside the host process; each `(sh ...)` call execs into the run's container. Fennel becomes the orchestrator: branching on real `sh` output, parsing intermediate results, conditional follow-up commands, helper functions. The container is the sandbox boundary for individual commands.
+
+C wins because the entire reason for using Fennel rather than YAML or JSON-with-templates is dynamic orchestration. Under B, you lose that. Under C, you get it.
+
+## Granularity: per-run, not per-job
+
+One container per run, shared across all jobs in the run, instead of one container per job.
+
+Per-run is simpler: one container start per run, workspace and toolchain caches shared across jobs naturally, multi-job (when it lands) becomes concurrent `docker exec` into the same container. Per-run gives up per-job image differentiation (mitigation: pipeline-level image suffices for v1; per-job override can be added later if needed) and hard isolation between jobs (not a concern at personal-forge scale).
+
+## API changes
+
+`(container ...)` is removed as a primitive. `(sh cmd opts?)` becomes the only host-effect channel — the chokepoint that makes the in-process Lua VM sandbox actually meaningful (every effect goes through one auditable Rust function instead of `os.execute`, `io.open`, etc. quietly providing alternates).
+
+`(ci.image <name>)` is added as a top-level pipeline registration form. Single image per pipeline. Per-job override can be a third opts arg to `ci.job` later if pipelines need heterogeneity. YAGNI for now.
+
+The run-fn signature stays `(fn [{: sh : secret : jobs}] ...)`. Returning `nil` still skips the job; returning anything else marks it complete and records the value as outputs.
+
+## Persistence: streaming JSONL
+
+Replaces today's buffered `output()`-then-`write_all_logs` flow.
+
+Per-job log: `<run-dir>/jobs/<id>/log.jsonl`, one JSON object per line:
+
+- `{ts, kind: "sh-start", n, cmd}`
+- `{ts, kind: "stdout"|"stderr", n, data, encoding?}` — `encoding: "base64"` marker for non-UTF-8 bytes; default UTF-8
+- `{ts, kind: "sh-exit", n, exit, signal?, duration_ms}`
+
+Per-run log: `<run-dir>/log.jsonl`:
+
+- `{ts, kind: "container-start", image, container-id}`
+- `{ts, kind: "container-died", reason}` — distinct from sh-exit-non-zero (OOMKill, image-pull failure, daemon kill)
+- `{ts, kind: "container-end", status}`
+
+JSONL is append-only and tail-able; the future web view streams the file with no extra protocol. Crash-safe (truncate at the last complete line). The Lua-side `ShOutput` table return shape doesn't change — Rust accumulates while writing.
+
+## stdout/stderr separation
+
+`docker exec` without `-t` keeps stdout and stderr as distinct streams. Docker multiplexes them in its frame protocol (8-byte header: stream-ID byte + length, payload follows); the Docker CLI and `bollard` both demux for the caller. Always invoke without TTY allocation. Cross-stream byte ordering is approximate; per-event timestamps preserve temporal ordering for replay.
+
+## In-process VM sandbox
+
+Two layers, additive:
+
+1. **Compile-then-execute split** (`lsqluktu`). Keep a Lua 5.4 VM with full `debug` for Fennel macroexpansion and traceback; execute compiled output in a separate `Lua::new()` VM with `io`/`os`/`debug` removed and only `{sh, secret, jobs, string, table, math}` exposed. Cheap; doesn't touch Fennel internals.
+
+2. **Luau as defense in depth** (new icebox `rzsonvsx`). Swap mlua's execute-VM backend from Lua 5.4 to Luau. Adds bytecode-load validation and a tighter `debug` API that closes runtime introspection escapes pure-Lua sandboxes leak through (`debug.getupvalue`, metatable manipulation). Depends on Fennel's *compiled* output being Luau-compatible at runtime — needs verification before adopting. The previous Luau investigation (`nlvwpspv`) flagged Fennel's *compile-time* use of debug; runtime is a different question.
+
+Both layer cleanly because the sandbox lives on the *execute* VM only; the compile VM stays Lua 5.4 throughout.
+
+## What this design does not address
+
+- **Multi-job DAG** (`sxllwuxk`) under per-run container. Parallel jobs become concurrent `docker exec` calls into the same container. Read-only parallel jobs (lint + test) compose cleanly; parallel jobs that mutate `/work` will collide. Solved later when multi-job lands.
+- **Per-repo cache** (`zopyouwu`). Bind-mounted into the run container instead of per-job. Same principle, different mount point.
+- **Mirror push job**. Under per-run, runs in the same container as user jobs. Image needs `git`. Most workload images have it.
+- **Preflight gating** (`zvvkmrlx`). Less valuable under per-run (the container is already up, so skipping a job only saves the run-fn invocation and any `sh` calls). Kept as low-priority icebox.
+
+## Backlog references
+
+- `vowkxpuz` — Pipeline-level container image declaration
+- `lpmoszxo` — Per-run container lifecycle
+- `knmkqkvx` — Route sh through docker exec into the run container
+- `xrupozur` — Streaming JSONL log persistence per job
+- `zmtuqwly` — Detect container-died as a distinct failure mode
+- `lsqluktu` — Sandbox CI execution with compile-then-run separation
+- `rzsonvsx` — Adopt Luau for the execute VM as defense in depth
+- `zvvkmrlx` — Preflight gating to skip jobs via :when predicate
+- Archived: `uutoospp` (B-shaped, superseded)
+- Prior investigation: `nlvwpspv` (Luau)