Make ci.fnl eval in-process by default
Every ci.fnl is operator code; subprocess + rlimit isn't worth the
cost. Bwrap is the future opt-in for untrusted authors and covers
filesystem and network too. Markdown formatting normalized alongside.
Assisted-by: Claude Opus 4.7 (1M context) via Claude Code
diff --git a/docs/CI-FENNEL.md b/docs/CI-FENNEL.md
index a8554c0..d22271b 100644
--- a/docs/CI-FENNEL.md
+++ b/docs/CI-FENNEL.md
@@ -8,8 +8,8 @@ A CI config is a **dataflow graph of jobs**. Each job is a function from inputs
Inputs come in two flavors, used uniformly:
-- **Job references.** `[:build]` — depend on another job's outputs.
-- **Source references.** `[:quire/push]` — depend on an external event. The runner provides the outputs. Builtins live under the `quire/` namespace; user job ids cannot contain `/`.
+* **Job references.** `[:build]` — depend on another job's outputs.
+* **Source references.** `[:quire/push]` — depend on an external event. The runner provides the outputs. Builtins live under the `quire/` namespace; user job ids cannot contain `/`.
There's no structural distinction between "trigger jobs" and "regular jobs." Sources are just things you list as inputs, in the same place as job references.
@@ -19,15 +19,15 @@ This is closer to Concourse's resources-and-jobs model than to GitHub Actions' t
## The `job` primitive
-```fennel
+```
(job id inputs run)
```
Three positional arguments:
-- **`id`** — keyword. The job's identity. Cannot contain `/`.
-- **`inputs`** — list of names. Each is a job id or a source ref. Must be non-empty. v1: strings/keywords only (see "Future: input args" below).
-- **`run`** — function from inputs to outputs (a table) or `nil` (skipped).
+* **`id`** — keyword. The job's identity. Cannot contain `/`.
+* **`inputs`** — list of names. Each is a job id or a source ref. Must be non-empty. v1: strings/keywords only (see "Future: input args" below).
+* **`run`** — function from inputs to outputs (a table) or `nil` (skipped).
That's the entire surface. Image selection, conditional firing, output extraction — all done inside `run` using runtime primitives. Fennel-as-code means there's no need for config-language conveniences when a function will do.
@@ -35,7 +35,7 @@ If a fourth concept ever genuinely needs to be expressible at the job level (per
## Inputs
-```fennel
+```
[:quire/push :compute-version]
```
@@ -49,21 +49,21 @@ The dependency graph is *derived* from the inputs list. No separate `:needs` fie
The function receives an outer table with an `:inputs` key. Standard pattern is to destructure:
-```fennel
+```
(fn [{: inputs}]
(.. "checkout " inputs.build.sha))
```
For source inputs whose names contain `/`, the dot-access syntax is awkward. **Destructure at the function arg** — both cleaner and less error-prone:
-```fennel
+```
(fn [{:inputs {:quire/push push}}]
(.. "checkout " push.sha))
```
The `push` local rebinding is the recommended idiom for any source input. Use the same pattern when destructuring multiple inputs:
-```fennel
+```
(fn [{:inputs {:quire/push push : build : compute-version}}]
(.. "deploying " compute-version.version " from " push.sha))
```
@@ -85,7 +85,7 @@ For v1, the only source is `:quire/push`. Outputs:
Every push to any ref fires a run that includes every job whose transitive inputs include `:quire/push`. Filtering "which pushes do I care about" happens inside `run` — return `nil` to skip:
-```fennel
+```
(job :test-main [:quire/push]
(fn [{:inputs {:quire/push push}}]
(when (= "main" push.branch)
@@ -107,7 +107,7 @@ This means **every push starts a run**, even if no job's predicate matches. Skip
Source types that need configuration — cron schedules, webhook paths — can't be expressed as bare keywords. The planned shape is a constructor call returning a value the runner recognizes:
-```fennel
+```
(job :nightly-audit [(cron :daily)]
(fn [{:inputs {: cron}}] ...))
@@ -137,14 +137,14 @@ A bad `ci.fnl` push gets a CI run that fails immediately with the parse error, s
`run` is a host-side Fennel function (the container can't run Fennel) called when the job is about to execute, with all upstream inputs resolved. It returns either:
-- **A table** — the job's outputs. Whatever keys are in it become available to dependent jobs as `inputs.<this-job>.<key>`.
-- **`nil`** — the job is skipped. Dependents see `inputs.<this-job>` as `nil`.
+* **A table** — the job's outputs. Whatever keys are in it become available to dependent jobs as `inputs.<this-job>.<key>`.
+* **`nil`** — the job is skipped. Dependents see `inputs.<this-job>` as `nil`.
That's the whole contract. No sugar layer, no introspection, no defaulting. The runner records what was returned.
Inside `run`, the function uses **runtime primitives** to do work. The most important is `(container {...})`, which runs a container and returns a result table:
-```fennel
+```
(job :test [:quire/push]
(fn [{:inputs {:quire/push push}}]
(container {:image "rust:1.75"
@@ -155,7 +155,7 @@ Inside `run`, the function uses **runtime primitives** to do work. The most impo
For more complex jobs, the function does its own orchestration: multiple containers, host-side work between them, computed outputs derived from intermediate results:
-```fennel
+```
(job :test-and-package [:quire/push]
(fn [{:inputs {:quire/push push}}]
(let [test (container {:image "rust:1.75"
@@ -174,10 +174,10 @@ If the test fails, the outer `(when ...)` returns nil → job skipped. If it pas
Earlier drafts of this design had three return shapes (string, list of strings, table) plus an `:outputs` field for declarative output extension plus a `:when` field for conditional firing plus an `:image` field for the default container image. All gone. They were paying for conveniences that aren't conveniences in a code-first config:
-- **String sugar.** `:run "cargo test"` saves about ten characters over `(fn [_] (container {:image "rust:1.75" :cmd "cargo test"}))`. Not worth a second mental model.
-- **`:outputs` declarative extension.** "Read coverage.json after the container exits" is a Fennel one-liner inside `run`: `(let [r (container {...})] {:exit r.exit :coverage (read-json "coverage.json")})`. Helpers compose to clean up repetition.
-- **`:when`.** Returning `nil` from `run` already means "skip." Filtering and work end up in the same expression, which makes the intent more visible, not less.
-- **`:image`.** Image lives on the `(container ...)` call where it's actually used. Lets a single job legitimately use multiple images.
+* **String sugar.** `:run "cargo test"` saves about ten characters over `(fn [_] (container {:image "rust:1.75" :cmd "cargo test"}))`. Not worth a second mental model.
+* **`:outputs` declarative extension.** "Read coverage.json after the container exits" is a Fennel one-liner inside `run`: `(let [r (container {...})] {:exit r.exit :coverage (read-json "coverage.json")})`. Helpers compose to clean up repetition.
+* **`:when`.** Returning `nil` from `run` already means "skip." Filtering and work end up in the same expression, which makes the intent more visible, not less.
+* **`:image`.** Image lives on the `(container ...)` call where it's actually used. Lets a single job legitimately use multiple images.
The residual things that *aren't* "just functions" — the inputs list and the id — are the ones that genuinely need to be language-level. They define the graph and the identity. Everything else is user-space.
@@ -185,19 +185,19 @@ The residual things that *aren't* "just functions" — the inputs list and the i
Functions in scope inside `run`:
-- `(container {opts})` — run a container, return `{:exit :stdout :stderr :duration}`. Opts: `:image`, `:cmd` (string or list), `:env`, `:cwd`, `:cache` (cache dir mount, defaults to job's image-keyed cache).
-- `(sh cmd)` — run a command on the host, no container. For cheap utility work. Returns the same shape as `container`.
-- `(read-file path)`, `(read-json path)`, `(write-file path content)` — workspace I/O. Paths relative to the workspace.
-- `(log msg)` — append to the job's log file. Visible in the web UI.
-- `(env name)` — read an environment variable from the runner's environment (typically secrets).
+* `(container {opts})` — run a container, return `{:exit :stdout :stderr :duration}`. Opts: `:image`, `:cmd` (string or list), `:env`, `:cwd`, `:cache` (cache dir mount, defaults to job's image-keyed cache).
+* `(sh cmd)` — run a command on the host, no container. For cheap utility work. Returns the same shape as `container`.
+* `(read-file path)`, `(read-json path)`, `(write-file path content)` — workspace I/O. Paths relative to the workspace.
+* `(log msg)` — append to the job's log file. Visible in the web UI.
+* `(env name)` — read an environment variable from the runner's environment (typically secrets).
Each of these blocks the Fennel function until it returns. Multi-container parallelism inside one job is a v2 want; the v1 model is "the function runs sequentially, calling primitives that block."
-The wallclock and memory limits on Fennel eval (10s, 512 MB by default — see CI.md) **don't apply to time spent inside primitives**, because the function blocks on real work. The runner accounts for container time separately. The eval budget is for Fennel-side computation between primitive calls.
+Eval is unsandboxed by default (see CI.md). A `run` function that loops forever or allocates without bound will hang or OOM `quire serve`. The mitigation is the same as for any Fennel hang: write `ci.fnl` thoughtfully. The bwrap opt-in (also see CI.md) covers eval and primitive calls together when it lands.
## A worked example
-```fennel
+```
;; Helper: a parameterized test job
(fn rust-test [version]
(job (.. "test-" version) [:quire/push]
@@ -241,47 +241,48 @@ The wallclock and memory limits on Fennel eval (10s, 512 MB by default — see C
```
What this expresses:
-- Every push fires a run. Test jobs check `push.branch` and return nil for non-main pushes; build/deploy chain skips with them (their inputs are nil, their `(when ...)` checks see nil).
-- Tagged pushes additionally fire `:publish`, which has its own predicate.
-- The "all tests passed" check in `:build` is now visible in code rather than implicit. More verbose than a `:when` field, but the verbosity is honest about what's happening — and a helper (`(all-passed test-1.75 test-1.76 test-stable)`) would clean it up if the pattern repeats.
+
+* Every push fires a run. Test jobs check `push.branch` and return nil for non-main pushes; build/deploy chain skips with them (their inputs are nil, their `(when ...)` checks see nil).
+* Tagged pushes additionally fire `:publish`, which has its own predicate.
+* The "all tests passed" check in `:build` is now visible in code rather than implicit. More verbose than a `:when` field, but the verbosity is honest about what's happening — and a helper (`(all-passed test-1.75 test-1.76 test-stable)`) would clean it up if the pattern repeats.
## Evaluation timing
-> **v0 status:** the three-context model below is the eventual target. Initial implementation collapses to a single in-process eval per run — registration and per-job execution happen together at run start. The model expands back out to three contexts when cross-job inputs (job B consuming job A's outputs) and the subprocess sandbox land.
+> **v0 status:** the three-context model below is the eventual target. Initial implementation collapses to a single in-process eval per run — registration and per-job execution happen together at run start. The model expands back out to three contexts when cross-job inputs (job B consuming job A's outputs) make per-job re-eval necessary.
-`ci.fnl` is evaluated in **three contexts**, all using the subprocess machinery from CI.md (10s wallclock, 512 MB memory cap):
+`ci.fnl` is evaluated in **three contexts**, all in-process inside `quire serve` (see CI.md for the threat model and the bwrap opt-in for untrusted code):
1. **Registration eval.** When `ci.fnl` changes on the default branch. The runner walks the resulting job set, runs structural validation (cycles, non-empty inputs, reachability, namespace rule). For v1, nothing else needs to happen here — `:quire/push` is implicit, requires no registration. When source types that need registration arrive (cron schedules, webhook routes), they'll be discovered here via the constructor form in inputs.
2. **Run eval.** When a push arrives and a run starts. The runner evaluates `ci.fnl` to get the current job set, computes which jobs are reachable from `:quire/push`, schedules them.
-3. **Per-job eval.** When a job is about to execute, its `run` function is invoked with concrete input values. Same subprocess, same limits, but per job.
+3. **Per-job eval.** When a job is about to execute, its `run` function is invoked with concrete input values.
-The three-context model means **`ci.fnl` is re-evaluated more than you might expect.** Pure functions, no caching across runs. This is fine — eval is fast and bounded — but worth knowing if a future helper does expensive work at the top level (parsing a large file, hitting a network endpoint). Top-level work runs three times per change, plus once per job. Move expensive work into `run` where it runs once per job execution.
+The three-context model means **`ci.fnl` is re-evaluated more than you might expect.** Pure functions, no caching across runs. This is fine — eval is fast — but worth knowing if a future helper does expensive work at the top level (parsing a large file, hitting a network endpoint). Top-level work runs three times per change, plus once per job. Move expensive work into `run` where it runs once per job execution.
## Open questions
-- **Source events with no matching jobs.** If `ci.fnl` has no jobs whose transitive inputs include `:quire/push`, do pushes still create empty runs? Probably no — skip silently. But worth being explicit.
-- **What's the exact set of runtime primitives?** `container`, `sh`, `read-file` are obvious. Less obvious: do we expose `tcp-connect`, `http-get`? They'd enable real "jobs as observers" patterns, but they're a long road into "Fennel is a real programming environment." Probably no, defer.
-- **Artifacts as inputs.** Job B with `[:build]` as inputs — does B's workspace start with build's artifacts already in place? Probably yes; otherwise the `:artifacts` output is data-only and you can't use them in subsequent containers. Implementation: artifacts unpacked into B's workspace before B's container starts.
-- **Image pre-pull discoverability.** Without a top-level `:image` field, the runner can't statically know what images a job uses — it has to actually run the function (or analyze it, which is fragile). Probably acceptable for v1: pull-on-demand from `(container ...)` calls works fine, just with a one-time latency per new image. A `quire ci pull <image>` command lets users warm explicitly.
-- **Error semantics inside `run`.** What if it throws? Job marked failed, exception text into the log. What if it returns a malformed value (not nil, not a table)? Mark failed, log a schema warning.
-- **Push payload size.** `:quire/push.files-changed` could be huge for a large merge. Do we cap it? Stream it differently? Defer to first time it bites.
-- **Composition across files.** A `quire/stdlib.fnl` of common helpers, or per-repo Fennel modules. Real want eventually; not v1.
-- **Pre-execution skip hook.** "Every push starts a run" is fine for personal scale. If it ever isn't, a hook that runs *before* workspace materialization to skip the whole run is the escape valve. Currently you can return nil from any `run` to skip that job, but the run still happens.
-- **Map-form variant trigger.** What's the threshold for switching from `(job id inputs run)` positional to `(job id {:inputs ... :run ... :extra ...})` map-form? First option that genuinely needs to exist at the job level — likely candidates would be per-job timeout or retry policy. None planned for v1.
+* **Source events with no matching jobs.** If `ci.fnl` has no jobs whose transitive inputs include `:quire/push`, do pushes still create empty runs? Probably no — skip silently. But worth being explicit.
+* **What's the exact set of runtime primitives?** `container`, `sh`, `read-file` are obvious. Less obvious: do we expose `tcp-connect`, `http-get`? They'd enable real "jobs as observers" patterns, but they're a long road into "Fennel is a real programming environment." Probably no, defer.
+* **Artifacts as inputs.** Job B with `[:build]` as inputs — does B's workspace start with build's artifacts already in place? Probably yes; otherwise the `:artifacts` output is data-only and you can't use them in subsequent containers. Implementation: artifacts unpacked into B's workspace before B's container starts.
+* **Image pre-pull discoverability.** Without a top-level `:image` field, the runner can't statically know what images a job uses — it has to actually run the function (or analyze it, which is fragile). Probably acceptable for v1: pull-on-demand from `(container ...)` calls works fine, just with a one-time latency per new image. A `quire ci pull <image>` command lets users warm explicitly.
+* **Error semantics inside `run`.** What if it throws? Job marked failed, exception text into the log. What if it returns a malformed value (not nil, not a table)? Mark failed, log a schema warning.
+* **Push payload size.** `:quire/push.files-changed` could be huge for a large merge. Do we cap it? Stream it differently? Defer to first time it bites.
+* **Composition across files.** A `quire/stdlib.fnl` of common helpers, or per-repo Fennel modules. Real want eventually; not v1.
+* **Pre-execution skip hook.** "Every push starts a run" is fine for personal scale. If it ever isn't, a hook that runs *before* workspace materialization to skip the whole run is the escape valve. Currently you can return nil from any `run` to skip that job, but the run still happens.
+* **Map-form variant trigger.** What's the threshold for switching from `(job id inputs run)` positional to `(job id {:inputs ... :run ... :extra ...})` map-form? First option that genuinely needs to exist at the job level — likely candidates would be per-job timeout or retry policy. None planned for v1.
## Locked-in decisions
-- **`(job id inputs run)`** — three positional arguments. No options map; if a fourth option ever needs to exist, that's the moment to introduce a map-form variant.
-- **`id`** is a keyword; cannot contain `/`. Validation rule, parse-time error.
-- **`inputs`** is a non-empty list of names. Each is either a job id or a source ref (reserved name in the `quire/` namespace).
-- **v1 supports only strings/keywords in `inputs`.** Constructor calls (for cron, webhook, output cherry-picks) are the planned extension; shape settled, implementation deferred.
-- **Builtins live under `quire/`**; user job ids cannot contain `/`.
-- **For v1, the only source is `:quire/push`.** Cron, webhook, manual deferred.
-- **Filtering happens inside `run`** by returning `nil`. Every push starts a run; jobs that return nil from `run` are skipped.
-- **Destructure source inputs at the function arg** — `(fn [{:inputs {:quire/push push}}] ...)` — to avoid awkward dot-access on `/`-containing keys.
-- **Dependency graph derived from the inputs list**, not declared separately. No `:needs`.
-- **Four structural validations**: acyclic (registration eval), non-empty inputs (registration eval), reachability from a source (registration eval), no `/` in user job ids (parse time). All fail-closed with named-target error messages.
-- **`run` is a function** `(fn [{: inputs}] ...)`. Returns a table (the outputs) or `nil` (skipped). No sugar.
-- **`(container {opts})` is the primary primitive** for running containers. Opts include `:image`, so a single job can use multiple images by making multiple container calls.
-- **Three eval contexts** — registration, run start, per job — all using the same subprocess machinery and limits.
-- **Source registration sourced from the default branch only** (relevant once registration becomes meaningful — for v1 it's a no-op since `:quire/push` needs no registration).
+* **`(job id inputs run)`** — three positional arguments. No options map; if a fourth option ever needs to exist, that's the moment to introduce a map-form variant.
+* **`id`** is a keyword; cannot contain `/`. Validation rule, parse-time error.
+* **`inputs`** is a non-empty list of names. Each is either a job id or a source ref (reserved name in the `quire/` namespace).
+* **v1 supports only strings/keywords in `inputs`.** Constructor calls (for cron, webhook, output cherry-picks) are the planned extension; shape settled, implementation deferred.
+* **Builtins live under `quire/`**; user job ids cannot contain `/`.
+* **For v1, the only source is `:quire/push`.** Cron, webhook, manual deferred.
+* **Filtering happens inside `run`** by returning `nil`. Every push starts a run; jobs that return nil from `run` are skipped.
+* **Destructure source inputs at the function arg** — `(fn [{:inputs {:quire/push push}}] ...)` — to avoid awkward dot-access on `/`-containing keys.
+* **Dependency graph derived from the inputs list**, not declared separately. No `:needs`.
+* **Four structural validations**: acyclic (registration eval), non-empty inputs (registration eval), reachability from a source (registration eval), no `/` in user job ids (parse time). All fail-closed with named-target error messages.
+* **`run` is a function** `(fn [{: inputs}] ...)`. Returns a table (the outputs) or `nil` (skipped). No sugar.
+* **`(container {opts})` is the primary primitive** for running containers. Opts include `:image`, so a single job can use multiple images by making multiple container calls.
+* **Three eval contexts** — registration, run start, per job — all in-process inside `quire serve`. Sandboxing model and threat model are described in CI.md.
+* **Source registration sourced from the default branch only** (relevant once registration becomes meaningful — for v1 it's a no-op since `:quire/push` needs no registration).
diff --git a/docs/CI.md b/docs/CI.md
index 239b5e3..f83dac8 100644
--- a/docs/CI.md
+++ b/docs/CI.md
@@ -27,7 +27,7 @@ The runner doesn't get its own process because **it doesn't execute user code in
Run records on disk are the **durable truth** once written. The hook is a thin transport: it sends a push event over a Unix socket to `quire serve`, which is the sole writer of run records on disk.
| Component | Reads from disk | Writes to disk | In-memory comms |
-|---|---|---|---|
+| --- | --- | --- | --- |
| Hook (`post-receive`) | — | — | push event → `quire serve` socket listener |
| Runner (in-process with `quire serve`) | run records on startup | `meta.json`, `state.json`, `jobs/*/`, logs | wakeup from listener (mpsc); broadcast logs → web |
| Web (`quire serve`) | run records on demand | — | subscribe to log broadcasts |
@@ -53,9 +53,10 @@ This is the principle that prevents drift. The temptation to migrate state into
**One run executes at a time across the entire forge.** Job 2 of repo A waits for job 1 of repo B to finish.
Implications:
-- Cache contention disappears entirely — no two jobs ever touch the same cache dir simultaneously.
-- Resource limits are trivial: the box is dedicated to whatever's running. No `--cpus`/`--memory` math, no oversubscription.
-- Queueing is FIFO from `runs/pending/`. No fairness story needed.
+
+* Cache contention disappears entirely — no two jobs ever touch the same cache dir simultaneously.
+* Resource limits are trivial: the box is dedicated to whatever's running. No `--cpus`/`--memory` math, no oversubscription.
+* Queueing is FIFO from `runs/pending/`. No fairness story needed.
The cost is latency under load: push to repo A while a 5-minute build of repo B is running, and you wait. For personal scale this is almost never the experience. The escape valve is documented and small: add a `max_concurrent_runs` config knob and a per-repo cache file lock; spawn N runner tasks instead of 1. The queue, supersede logic, and on-disk schema don't change.
@@ -65,9 +66,9 @@ Within a run, **jobs form a DAG** (see next section), but the executor schedules
When a new push arrives for a ref that already has work in flight or queued for the same `(repo, ref)`:
-- **Queued, not yet started:** new push replaces the queued one. Old run marked `superseded`. If you pushed twice in 30 seconds, you almost certainly only care about the second result.
-- **Currently running:** kill the in-flight sandbox (`docker kill <id>`), mark the run `superseded`, enqueue the new one.
-- **Different ref of same repo:** unaffected. Pushing to `feature-branch` should not kill a running build of `main`.
+* **Queued, not yet started:** new push replaces the queued one. Old run marked `superseded`. If you pushed twice in 30 seconds, you almost certainly only care about the second result.
+* **Currently running:** kill the in-flight sandbox (`docker kill <id>`), mark the run `superseded`, enqueue the new one.
+* **Different ref of same repo:** unaffected. Pushing to `feature-branch` should not kill a running build of `main`.
Cheap to get right *if* the run record stores the ref it's building from the start, and queue lookups are "any pending or active runs for `<repo>:<ref>`?" Both are one-line conditions.
@@ -75,7 +76,7 @@ Cheap to get right *if* the run record stores the ref it's building from the sta
Jobs declare dependencies via `:needs`. Missing `:needs` means no dependencies — ready immediately. Failure of a job marks all transitive dependents as `skipped`, unless the failing job has `:allow-failure true` (in which case dependents proceed normally).
-```fennel
+```
{:jobs
[{:id "setup"
:image "rust:1.75"
@@ -101,9 +102,10 @@ Jobs declare dependencies via `:needs`. Missing `:needs` means no dependencies
With max-concurrency 1, executor topo-sorts and picks one ready job at a time (FIFO among ready jobs = spec order). `lint` and `test` are both ready after `setup`; lint runs first, then test, then deploy. If `setup` fails, all three skip.
Schema decisions baked in:
-- `:needs` is `needs-all` (job runs only when *all* listed jobs succeed). `needs-any` is a real but rare want; the schema can grow `:needs-any` later without breaking existing specs.
-- Job ids are arbitrary non-empty strings. Cycle detection at parse time via Kahn's algorithm — fails closed, error message names the cycle.
-- `:allow-failure` exists from v1. Without it, the only way to express "lint can fail and we still want to deploy" is to remove the dependency, which loses the ordering signal.
+
+* `:needs` is `needs-all` (job runs only when *all* listed jobs succeed). `needs-any` is a real but rare want; the schema can grow `:needs-any` later without breaking existing specs.
+* Job ids are arbitrary non-empty strings. Cycle detection at parse time via Kahn's algorithm — fails closed, error message names the cycle.
+* `:allow-failure` exists from v1. Without it, the only way to express "lint can fail and we still want to deploy" is to remove the dependency, which loses the ordering signal.
## Fennel evaluation
@@ -111,7 +113,7 @@ Schema decisions baked in:
Code, not data, means matrix builds, helpers, and conditionals fall out for free without dedicated schema features:
-```fennel
+```
(local rust-versions [:1.75 :1.76 :stable])
{:jobs
@@ -122,49 +124,26 @@ Code, not data, means matrix builds, helpers, and conditionals fall out for free
:run "cargo test"})}
```
-### Eval is sandboxed by subprocess + timeouts
-
-> **v0 status:** initial implementation evaluates `ci.fnl` in-process inside `quire serve`, with no wallclock or memory cap. A buggy or hostile `ci.fnl` can hang or OOM the server. Subprocess + the limits described below are the eventual target; they land when ci.fnl runaway becomes a real liability.
+### Eval runs in-process, unsandboxed by default
-Eval runs as a subprocess: `quire eval-ci-config <workspace>`. The child reads the file, runs the Fennel evaluator, serializes the result table to JSON on stdout, exits. Runner reads stdout, kills the child after the deadline:
+Eval happens inside `quire serve`, in the same Fennel host that loads `config.fnl`. No subprocess, no wallclock cap, no memory limit. Every `ci.fnl` is code the operator wrote; the threat model that would justify a sandbox doesn't exist.
-```rust
-let mut cmd = Command::new(env::current_exe()?);
-cmd.args(["eval-ci-config", "--workspace"]).arg(workspace)
- .stdout(Stdio::piped()).stderr(Stdio::piped());
+The cost: a buggy `ci.fnl` (infinite loop, runaway allocation, `string.rep "x" 2^30`) can hang or OOM the server. Mitigation is "don't write that"; for the personal-forge case this is acceptable. If a `ci.fnl` does hang the server, the operator notices because they wrote the bad `ci.fnl` and pushed it themselves.
-unsafe {
- cmd.pre_exec(|| {
- let lim = libc::rlimit {
- rlim_cur: 512 * 1024 * 1024,
- rlim_max: 512 * 1024 * 1024,
- };
- libc::setrlimit(libc::RLIMIT_AS, &lim);
- Ok(())
- });
-}
+### Sandboxed eval — opt-in, future
-let child = cmd.spawn()?;
-let output = timeout(Duration::from_secs(10), child.wait_with_output()).await
- .map_err(|_| anyhow!("ci.fnl evaluation exceeded 10s deadline"))??;
-```
+The day `quire` runs `ci.fnl` written by someone other than the operator (a guest contributor, an automated pipeline pulling third-party templates, etc.) the in-process model stops being safe. The opt-in path is **bubblewrap**: same eval, same Fennel host, but invoked inside a bwrap sandbox that constrains filesystem access (workspace + the Fennel stdlib only), denies network, dies with the parent, and runs under a wallclock + memory cap.
-Defaults, both global config in `config.fnl`:
-- **10 second wallclock.** If `ci.fnl` needs longer than 10s to *decide what jobs to run*, something's wrong — that's design-time work, not runtime work.
-- **512 MB memory.** Same logic; eval shouldn't be doing heavy computation.
+Not built. Not designed in detail. The commitment is just: when sandboxing becomes necessary, it's a per-repo opt-in flag (`{:ci {:sandbox :bwrap}}` or similar), not a global default change. The default stays "in-process, unsandboxed."
-Per-repo overrides intentionally not supported: the only reason a repo would need more is that the repo is doing something it shouldn't.
-
-This isn't bwrap because it doesn't need to be. The threat model is "did I accidentally write a `ci.fnl` that infinite-loops or eats the disk," not "is someone attacking me." Wallclock kill works regardless of what the eval is doing in C; OOM kills the child, not the runner; crash isolation is free. Filesystem and network isolation that bwrap would add buy nothing — eval is reading files in a workspace it has legitimate access to anyway.
-
-The in-process Lua `debug.sethook` approach is clever but has a blind spot inside C functions (`string.rep("x", 2^30)` returns instantly from Lua's perspective, the hook never fires during it, the runner OOMs). Subprocess + kill-on-timeout is boring and correct.
+The reason this is the chosen path rather than "subprocess + rlimit, no bwrap" — which also gets crash isolation and resource caps — is that the opt-in case *is* the untrusted-code case, and untrusted code wants filesystem and network isolation too. Bwrap covers all four (wallclock, memory, filesystem, network); subprocess+rlimit covers only the first two. The bwrap primitive is in the codebase already (the README commits to it), so reaching for the same primitive when it's needed is the simpler story.
## Run lifecycle
1. **`post-receive` hook** sends a push event (one JSON line: `{type, repo, pushed_at, refs: [{ref, old_sha, new_sha}, ...]}`) over `/var/quire/server.sock` and exits. The listener task in `quire serve` parses the event, allocates a run-id per ref, writes `runs/<repo>/<run-id>/{meta.json, state.json}`, and signals the runner via mpsc. No CI work runs in the hook itself.
2. **Runner picks up** the entry from the queue. Atomic rename `pending/<id>` → `active/<id>` for state-machine clarity.
3. **Materialize workspace.** `git --git-dir=repos/foo.git archive <sha> | tar -x -C workspace/`. No worktree, no checkout state on the bare repo. Workspace is throwaway; deleted at end of run.
-4. **Evaluate `.quire/ci.fnl`** via subprocess (see above). Result is the job DAG.
+4. **Evaluate `.quire/ci.fnl`** in-process (see above). Result is the job DAG.
5. **Per ready job:** spawn the sandbox with workspace + caches mounted, stream stdout/stderr to `jobs/<job-id>/log` (and broadcast for live web tailing), capture exit code, record container ID for cancellation.
6. **Aggregate.** Write final status to the run directory. Move `active/<id>` → `complete/<id>` (or `failed/<id>`).
@@ -183,8 +162,9 @@ runs/<repo>/<run-id>/
```
Two principles fall out:
-- **Immutable vs. mutable files are separate.** `meta.json` is written once and never touched. Readers (the web UI) can cache `meta.json` indefinitely and only re-read `state.json`.
-- **Append-only logs.** Web UI tails the log file; runner appends; no coordination needed. Live tailing also goes through a `tokio::sync::broadcast` channel for sub-second latency, but the file is the source of truth.
+
+* **Immutable vs. mutable files are separate.** `meta.json` is written once and never touched. Readers (the web UI) can cache `meta.json` indefinitely and only re-read `state.json`.
+* **Append-only logs.** Web UI tails the log file; runner appends; no coordination needed. Live tailing also goes through a `tokio::sync::broadcast` channel for sub-second latency, but the file is the source of truth.
## Sandbox backend — the real fork in the road
@@ -217,20 +197,23 @@ bwrap --bind rootfs/rust-1.75 / \
Full Docker Hub image catalog. No daemon, no socket, no privilege, no DinD/DooD question. The cascade: quire becomes a systemd unit on the host; one process tree; the `/var/quire` path-pinning rule becomes irrelevant because nothing crosses a container boundary.
Costs that need real work:
-- **Writable rootfs.** Most images expect to write outside the workspace (apt, scripts dropping files in `/etc`). Bwrap's `--overlay-src` gives a writable union with a throwaway upper layer. ~30 lines, but mandatory by the second image you try.
-- **Image refresh.** No auto-pull on tag updates. Either explicit `quire ci pull` or digest-check before each run.
-- **Resource limits.** No `--cpus`/`--memory`. Wrap with `systemd-run --user --scope -p MemoryMax=2G -p CPUQuota=200% bwrap ...` or write the bwrap PID into a cgroup directly.
-- **OCI config.** Images carry `entrypoint`/`cmd`/`USER` in their config; bwrap doesn't read it. Parse the JSON yourself if you want to honor it. For CI it barely matters since you're overriding the command anyway.
+
+* **Writable rootfs.** Most images expect to write outside the workspace (apt, scripts dropping files in `/etc`). Bwrap's `--overlay-src` gives a writable union with a throwaway upper layer. ~30 lines, but mandatory by the second image you try.
+* **Image refresh.** No auto-pull on tag updates. Either explicit `quire ci pull` or digest-check before each run.
+* **Resource limits.** No `--cpus`/`--memory`. Wrap with `systemd-run --user --scope -p MemoryMax=2G -p CPUQuota=200% bwrap ...` or write the bwrap PID into a cgroup directly.
+* **OCI config.** Images carry `entrypoint`/`cmd`/`USER` in their config; bwrap doesn't read it. Parse the JSON yourself if you want to honor it. For CI it barely matters since you're overriding the command anyway.
Roughly 200-400 lines of Rust beyond the bind-host case, mostly shelling to `skopeo`/`umoci` and assembling the bwrap argv.
+The bwrap primitive used here (running a job in a sandbox) is the same one as the opt-in eval sandbox. Building Path B for jobs and the eval opt-in for `ci.fnl` would share most of their plumbing.
+
### Recommendation
**DooD for v1, OCI+bwrap as a known migration path.**
-- DooD gets CI working in a week. Polyglot is free.
-- The runner is one tokio task in one binary. Swapping its backend is a contained change. The Fennel job spec doesn't care which backend ran it.
-- Once the system has been used enough to know what's actually needed from it, the OCI+bwrap migration removes the last reason for quire to be containerized at all — which is the more on-brand endpoint given the rest of the design.
+* DooD gets CI working in a week. Polyglot is free.
+* The runner is one tokio task in one binary. Swapping its backend is a contained change. The Fennel job spec doesn't care which backend ran it.
+* Once the system has been used enough to know what's actually needed from it, the OCI+bwrap migration removes the last reason for quire to be containerized at all — which is the more on-brand endpoint given the rest of the design.
If the impulse is to skip straight to OCI+bwrap on aesthetic grounds: defensible, but you're paying ~2 weeks of sandbox plumbing before any CI runs at all. The intermediate state of "DooD works, here's what I actually want from it" is worth a lot.
@@ -242,26 +225,27 @@ Punt on cache invalidation until it actually annoys. "Delete the cache dir" is a
## Open questions
-- **Fennel stdlib surface.** What does `eval-ci-config` expose? At minimum: env access (`(env :GITHUB_TOKEN)`, scoped to repo secrets), table-building for jobs, maybe a `matrix` helper. Bigger question: does eval get to read files from the workspace (`(read-file "Cargo.toml")` to decide what jobs to register)? "Yes" is the thin end of the dynamic-jobs wedge; "no" keeps the model strict.
-- **Image pre-warming.** First run of any image pulls hundreds of MB. Want both implicit pull-on-demand and an explicit `quire ci pull <image>` to warm before pushing.
-- **Log streaming UX.** SSE tailing the log file works for the web UI, but the broadcast-channel-vs-file-tail interaction has subtleties around "client connects mid-job, wants backlog + live."
-- **Image GC.** Host accumulates layers. Weekly `docker image prune` via host cron is the dumb correct answer for DooD; OCI+bwrap needs a `quire ci gc` that walks `images/` and `rootfs/` against last-used timestamps.
-- **Services / sidecars.** Some jobs want postgres or redis alongside. The shape is "bring up sidecar, run job against it, tear down." Adds a small orchestration layer. Not v1.
-- **Secrets.** CI jobs that need API tokens. Probably env-injected from `config.fnl`, scoped per-repo. Worth designing the surface area before the first job needs one.
-- **Cycle detection error UX.** Where do parse errors surface — does the push fail (post-receive returns nonzero) or does the run start and immediately error? Probably the latter, since hooks should be fast and CI errors belong in CI history.
+* **Fennel stdlib surface.** What does the Fennel eval expose? At minimum: env access (`(env :GITHUB_TOKEN)`, scoped to repo secrets), table-building for jobs, maybe a `matrix` helper. Bigger question: does eval get to read files from the workspace (`(read-file "Cargo.toml")` to decide what jobs to register)? "Yes" is the thin end of the dynamic-jobs wedge; "no" keeps the model strict.
+* **Image pre-warming.** First run of any image pulls hundreds of MB. Want both implicit pull-on-demand and an explicit `quire ci pull <image>` to warm before pushing.
+* **Log streaming UX.** SSE tailing the log file works for the web UI, but the broadcast-channel-vs-file-tail interaction has subtleties around "client connects mid-job, wants backlog + live."
+* **Image GC.** Host accumulates layers. Weekly `docker image prune` via host cron is the dumb correct answer for DooD; OCI+bwrap needs a `quire ci gc` that walks `images/` and `rootfs/` against last-used timestamps.
+* **Services / sidecars.** Some jobs want postgres or redis alongside. The shape is "bring up sidecar, run job against it, tear down." Adds a small orchestration layer. Not v1.
+* **Secrets.** CI jobs that need API tokens. Probably env-injected from `config.fnl`, scoped per-repo. Worth designing the surface area before the first job needs one.
+* **Cycle detection error UX.** Where do parse errors surface — does the push fail (post-receive returns nonzero) or does the run start and immediately error? Probably the latter, since hooks should be fast and CI errors belong in CI history.
+* **Sandbox opt-in surface.** When the bwrap eval/job sandbox lands as an opt-in, what's the per-repo flag's exact shape? Probably one boolean covering both eval and jobs (you don't want one without the other if you don't trust the source), but the exact key in per-repo config wants designing alongside the rest of the per-repo schema.
## Locked-in decisions
-- **Runner is in-process** with `quire serve` as a tokio task; not a separate process. Filesystem is the state of record; channels are the wakeup optimization.
-- **No SQLite in v1.** If it enters later, it's a secondary index over the filesystem, never primary. `rm quire.db && quire reindex` must always recover.
-- **Container-per-job**, not long-lived runners.
-- **DooD for v1**; OCI+bwrap as planned migration path.
-- **Workspace materialized via `git archive`**, not worktree.
-- **Max concurrency 1** across the whole forge. Escape valve is `max_concurrent_runs` config + per-repo cache file lock; not building it now.
-- **Jobs are a DAG** with `:needs` (needs-all). Executor schedules serially in topological order under max-concurrency 1; lifting that constraint changes the executor, not the spec.
-- **`:allow-failure`** flag exists from v1.
-- **Supersede on same `(repo, ref)`**: replace queued, kill running.
-- **`.quire/ci.fnl` is executed**, returns the DAG.
-- **Eval runs as a subprocess** with 10s wallclock and 512 MB memory cap. Not bwrap. **v0 deviation:** initial implementation evaluates in-process; subprocess sandbox lands when ci.fnl runaway becomes a real liability.
-- **Hook is a transport, not a writer.** `post-receive` sends a push event over `/var/quire/server.sock`; `quire serve` writes the run record. Hook never touches `runs/`. Tradeoff: zero-loss-on-server-down is dropped in v1 (push lands but no run is created). Fallback to direct disk write is a deferred follow-up.
-- **Caches** are bind-mounted directories under `/var/quire/cache/<repo>/`.
+* **Runner is in-process** with `quire serve` as a tokio task; not a separate process. Filesystem is the state of record; channels are the wakeup optimization.
+* **No SQLite in v1.** If it enters later, it's a secondary index over the filesystem, never primary. `rm quire.db && quire reindex` must always recover.
+* **Container-per-job**, not long-lived runners.
+* **DooD for v1**; OCI+bwrap as planned migration path.
+* **Workspace materialized via `git archive`**, not worktree.
+* **Max concurrency 1** across the whole forge. Escape valve is `max_concurrent_runs` config + per-repo cache file lock; not building it now.
+* **Jobs are a DAG** with `:needs` (needs-all). Executor schedules serially in topological order under max-concurrency 1; lifting that constraint changes the executor, not the spec.
+* **`:allow-failure`** flag exists from v1.
+* **Supersede on same `(repo, ref)`**: replace queued, kill running.
+* **`.quire/ci.fnl` is executed**, returns the DAG.
+* **Eval runs in-process, unsandboxed by default.** Trusted code; the operator wrote it. Sandboxed eval (bwrap, with filesystem/network/wallclock/memory limits) is an opt-in for repos that run `ci.fnl` from someone other than the operator. Not built; not v1.
+* **Hook is a transport, not a writer.** `post-receive` sends a push event over `/var/quire/server.sock`; `quire serve` writes the run record. Hook never touches `runs/`. Tradeoff: zero-loss-on-server-down is dropped in v1 (push lands but no run is created). Fallback to direct disk write is a deferred follow-up.
+* **Caches** are bind-mounted directories under `/var/quire/cache/<repo>/`.