# `Linx.Seccomp`
[🔗](https://github.com/oshlabs/linx/blob/v0.2.0/lib/linx/seccomp.ex#L1)

Linux seccomp ("SECure COMPuting") primitives — per-thread cBPF
syscall-filter facilities exposed as Elixir verbs.

## What seccomp is

A seccomp filter is a small cBPF program the kernel runs on every
syscall entry. Its return value tells the kernel whether to allow
the syscall, return an errno, kill the calling process or thread,
raise SIGSYS, or log and proceed. Filters install per-thread; they
never come off once on; and they only get looser via reset, never
tighter, after install. Together those properties let a workload
drop its syscall envelope to a small documented set before
`execve`, so a 0-day in the kernel's capability check still can't
reach the relevant code path if the syscall is gated.

See `seccomp(2)` and the kernel's
`Documentation/userspace-api/seccomp_filter.rst` for the canonical
reference.

## What Linx exposes — and what it doesn't

This module is a *primitive*. It exposes:

  * **Detection.** `supported?/0` (whether the kernel has the
    facility at all) and `arch/0` (which architecture we're
    building filters for).

  * **Filter construction.** Two layers:

    - **Sugar:** `allow_list/2` ("only these syscalls"),
      `deny_list/2` ("not these"), and the fluent
      `Linx.Seccomp.Builder` DSL.

    - **Data:** `from_rules/1` for consumers that translate
      external policies — Docker
      `seccomp.json`, custom DSLs, runtime policy — into a
      plain `[{action, syscall_atom}, ...]` Elixir list and hand
      it to Linx. `to_rules/1` is the inverse for filters Linx
      itself built.

  * **Install.** `install/2` is checkpoint-bound, the same shape
    as `Linx.Capabilities.drop_bounding/2` — the same commit
    pattern, because the kernel forbids cross-thread seccomp
    installation. The child agent in `linx_process.c` does the
    actual `seccomp(2)` call at the parked checkpoint.

Higher-level concerns — parsing JSON profiles, looking up which
syscalls nginx 1.24 needs, tracking workload-to-filter mappings —
are policy and orchestration. Those live in consumers that
build on Linx.

## Motivating composition

    {:ok, c} = Linx.Process.spawn(argv: ["/usr/sbin/nginx"],
                                  no_new_privs: true)
    receive do {:linx_process, :ready, _} -> :ok end

    {:ok, filter} = Linx.Seccomp.allow_list(
      ~w(read write openat close fstat brk mmap munmap mprotect
         accept4 bind listen socket connect setsockopt
         rt_sigaction rt_sigprocmask rt_sigreturn exit_group)a,
      default: :kill_process
    )

    :ok = Linx.Seccomp.install(c, filter)
    :ok = Linx.Process.proceed(c)

After `proceed/1`, nginx runs with that exact syscall envelope.
A bug that tries `execve(2)` (not on the list) kills the process;
the kernel never enters `do_execve`.

## Forward compatibility

Linx.Seccomp.Syscalls.from_number/2 returns `:unknown` for a syscall
number outside Linx's per-arch table rather than crashing, so decoding
a filter that references a newer syscall degrades gracefully.
Construction is strict the other way: an unknown syscall *atom* is
rejected at build time, since a typo must never silently widen a filter.

Per-argument matching (`allow_if/3`), multi-arch routing, and
`SECCOMP_USER_NOTIF` are deferred to future work.

# `arch`

```elixir
@type arch() :: :x86_64 | :aarch64 | :unsupported
```

An architecture atom. Linx v1 supports `:x86_64` and `:aarch64`;
any other host arch yields `:unsupported` and the filter-build
verbs reject it.

# `allow_list`

```elixir
@spec allow_list(
  Enumerable.t(),
  keyword()
) :: {:ok, Linx.Seccomp.Filter.t()} | {:error, term()}
```

Build an allow-list filter: every listed syscall gets `:allow`,
every other syscall gets the default action.

Options:

  * `:default` — the action for non-listed syscalls. Defaults to
    `:kill_process` — allow-lists are
    contracts ("I have enumerated what's safe"); a syscall outside
    is a bug or attack and should fail loudly.

## Errors

Same shape as `from_rules/1`. See its docs for the full list.

## Examples

    {:ok, filter} = Linx.Seccomp.allow_list(
      ~w(read write openat close exit_group)a,
      default: :kill_process
    )

    # Looser default — useful when the goal is to log unlisted
    # syscalls for profiling rather than killing the workload.
    {:ok, filter} = Linx.Seccomp.allow_list([:read, :write],
                                            default: :log)

# `arch`

```elixir
@spec arch() :: arch()
```

The current host architecture as an atom — `:x86_64`, `:aarch64`,
or `:unsupported`.

Resolved on first call from
`:erlang.system_info(:system_architecture)` and cached in
`:persistent_term` for the rest of the VM's life (the host arch
can't change). Cheap on every subsequent call.

## Examples

    iex> Linx.Seccomp.arch() in [:x86_64, :aarch64, :unsupported]
    true

# `builder`

```elixir
@spec builder() :: Linx.Seccomp.Builder.t()
```

Convenience for `Linx.Seccomp.Builder.new/0` — start an empty
builder pipeline.

## Example

    Linx.Seccomp.builder()
    |> Linx.Seccomp.Builder.allow(:read)
    |> Linx.Seccomp.Builder.deny(:ptrace)
    |> Linx.Seccomp.Builder.build(default: :kill_process)

# `deny_list`

```elixir
@spec deny_list(
  Enumerable.t(),
  keyword()
) :: {:ok, Linx.Seccomp.Filter.t()} | {:error, term()}
```

Build a deny-list filter: every listed syscall gets the deny
action, every other syscall gets the default action.

Options:

  * `:default` — the action for non-listed syscalls. Defaults to
    `:allow` — deny-lists are
    graceful-degradation shapes (Docker's default profile).

  * `:deny_action` — the action for listed syscalls. Defaults to
    `{:errno, :eperm}`.

## Errors

Same shape as `from_rules/1`.

## Examples

    # Docker-style: deny the dangerous syscalls, allow the rest.
    {:ok, filter} = Linx.Seccomp.deny_list(
      ~w(kexec_load init_module delete_module ptrace mount)a
    )

    # Same denies but with a sharper edge — kill instead of EPERM.
    {:ok, filter} = Linx.Seccomp.deny_list(
      [:kexec_load, :init_module],
      deny_action: :kill_process
    )

# `from_rules`

```elixir
@spec from_rules({[Linx.Seccomp.Filter.rule()], Linx.Seccomp.Filter.action()}) ::
  {:ok, Linx.Seccomp.Filter.t()} | {:error, term()}
```

Build a filter from a normalised rules list — the data-layer API.

Accepts `{rules, default_action}` where `rules` is a list of
`{action, syscall_atom}` tuples and `default_action` is the
fallthrough verdict. The seam external consumers (a
`seccomp.json` adapter, custom DSLs, runtime policy) use to hand
fully-resolved policy to Linx — the consumer's job is "translate
JSON to this list shape"; Linx's job starts here.

The filter targets the current host architecture (see `arch/0`).
Filters built for one arch don't install on another; multi-arch
filters are deferred.

## Returns

  * `{:ok, %Linx.Seccomp.Filter{}}` on success — the filter's
    `:rules` field carries the normalised `{rules, default}` so
    `to_rules/1` can introspect it later.

  * `{:error, {:unsupported_arch, arch}}` — the host arch isn't
    in Linx's supported list (`:x86_64`, `:aarch64`).
  * `{:error, {:bad_action, term}}` — the default or one of the
    per-rule actions isn't a recognised verdict.
  * `{:error, {:unknown_syscall, atom}}` — a rule names a syscall
    atom that isn't in the per-arch table. See
    Linx.Seccomp.Syscalls "Extending this table" for how to
    add one.
  * `{:error, {:duplicate_rule, atom}}` — the same syscall
    appears in more than one rule.
  * `{:error, {:bad_rule, term}}` — an element of the rules list
    isn't a `{action, syscall_atom}` tuple.
  * `{:error, %Linx.Seccomp.Error{operation: :build, errno: :e2big}}`
    — the filter would need a jump > 255 instructions
    (jump-trampoline support is deferred; the current
    ~150-syscall table fits comfortably under this limit).

## Examples

    rules = [
      {:allow, :read},
      {:allow, :write},
      {{:errno, :eperm}, :ptrace},
      {:kill_process, :kexec_load}
    ]
    {:ok, filter} = Linx.Seccomp.from_rules({rules, :allow})

    # Errors are caller-actionable atoms:
    Linx.Seccomp.from_rules({[{:allow, :not_a_real_syscall}], :allow})
    # => {:error, {:unknown_syscall, :not_a_real_syscall}}

# `install`

```elixir
@spec install(Linx.Process.t(), Linx.Seccomp.Filter.t()) ::
  :ok | {:error, :not_ready | :running | :no_process}
```

Install a compiled filter on a parked `Linx.Process` session.

Checkpoint-bound — the same shape as
`Linx.Capabilities.drop_bounding/2`. The kernel forbids
cross-thread `seccomp(2)`, so the child agent in `linx_process.c`
does the actual install at the checkpoint window before
`execve`.

If `PR_SET_NO_NEW_PRIVS` isn't already on (either because the
caller didn't pass `no_new_privs: true` to `Linx.Process.spawn/1`
or because the workload isn't privileged enough to install
without NNP), the agent sets it automatically before the
`seccomp(2)` call — the "be helpful" path. Callers who want the principled
posture should still pass the spawn opt; the auto-set is just a
fallback so an unprivileged caller who forgot doesn't get a
confusing `EPERM`.

## Errors

  * `{:error, :not_ready}` — session hasn't reached the checkpoint
    yet. Wait for `{:linx_process, :ready, _}` first.
  * `{:error, :running}` — past `proceed/1`, the child has
    `execve`'d; installing now is too late.
  * `{:error, :no_process}` — the session emitted its
    terminal event.

Kernel-level install failures arrive asynchronously as
`{:linx_process, :error, errno, :seccomp_install}` or
`{:linx_process, :error, errno, :seccomp_no_new_privs}` on the
session's owner mailbox, the same shape as other pre-`execve`
failures.

## Examples

    {:ok, c} = Linx.Process.spawn(argv: ["/usr/sbin/nginx"],
                                  no_new_privs: true)
    receive do {:linx_process, :ready, _} -> :ok end

    {:ok, filter} = Linx.Seccomp.allow_list(~w(read write …)a)
    :ok = Linx.Seccomp.install(c, filter)
    :ok = Linx.Process.proceed(c)

# `supported?`

```elixir
@spec supported?() :: boolean()
```

Returns `true` iff the running kernel exposes seccomp filtering —
i.e. `/proc/self/status` contains a `Seccomp:` line.

True on every Linux ≥ 3.5, which is every kernel Linx targets.
Useful as a precondition guard in setup checks; this module's
build verbs don't gate on it themselves (a missing line would
manifest as an install-time `ENOSYS` from the agent).

# `to_rules`

```elixir
@spec to_rules(Linx.Seccomp.Filter.t()) ::
  {:ok, {[Linx.Seccomp.Filter.rule()], Linx.Seccomp.Filter.action()}}
  | {:error, :no_rules}
```

Inverse of `from_rules/1` — extract the rules list from a filter
Linx itself built.

Filters whose `:rules` field is `nil` (which would arise from a
consumer path that loads externally-supplied raw BPF blobs)
return `{:error, :no_rules}`. The current build verbs always
populate `:rules`, so this is reliable for any filter Linx
itself produced.

## Examples

    iex> {:ok, f} = Linx.Seccomp.allow_list([:read, :write])
    iex> {:ok, {rules, default}} = Linx.Seccomp.to_rules(f)
    iex> rules
    [{:allow, :read}, {:allow, :write}]
    iex> default
    :kill_process

---

*Consult [api-reference.md](api-reference.md) for complete listing*
