# `Linx.Netfilter`
[🔗](https://github.com/oshlabs/linx/blob/v0.2.0/lib/linx/netfilter.ex#L1)

Linux netfilter primitives — modern firewall (nf_tables) via the
`NETLINK_NETFILTER` netlink protocol family, plus live ruleset
monitoring and packet-event capture (NFLOG).

## Why a separate subsystem

Netfilter is a coherent kernel concept (firewall + connection
tracking + packet event streams) with its own netlink protocol
family (`NETLINK_NETFILTER` = 12) and a sprawling but consistent
surface. Wrapping it as its own concept module — peer to
`Linx.Process`, `Linx.Cgroup`, `Linx.Mount`, `Linx.User`,
`Linx.Capabilities`, `Linx.Seccomp`, `Linx.Sysctl` — keeps the
firewall mental model explicit. The underlying transport,
`Linx.Netlink.Nfnl`, mirrors `Linx.Netlink.Rtnl`'s shape.

## Value, not handle

`%Linx.Netfilter.Ruleset{}` is plain data: tables containing chains
containing ordered rules, plus sets/maps/vmaps and named objects.
Pure Elixir values, freely composable and inspectable. Four verbs:

  * `build` — construct via pipeline DSL or `~NFT` sigil.
  * `push/2` — write to the kernel atomically (`:replace` rebuilds,
    `:reconcile` computes the minimal diff).
  * `pull/1..2` — read kernel state into a ruleset value.
  * `diff/2` — compute the patch between two rulesets.

Kernel state lives in the kernel; the Elixir value is the Elixir
value. Mirrors `%Linx.Seccomp.Filter{}` scaled to a larger surface.

## Transactions are mandatory

Every mutation goes through a `NFNL_MSG_BATCH_BEGIN` /
`NFNL_MSG_BATCH_END` envelope; the kernel applies the whole batch
atomically or rejects it whole. `push/2` is the only mutator,
batch-shaped from the outside in.

Modes:

  * `:replace` (default) — tear down and rebuild the named tables.
    Simple, brief disruption.
  * `:reconcile` — compute the minimal patch between current kernel
    state and the desired Ruleset, emit as one batch.
    LiveView-of-firewalls; no service interruption when only
    adding/removing rules at the margins.

## Optimistic concurrency via `NFTA_BATCH_GENID`

`:reconcile` mode threads the kernel's generation counter through
the batch: "I computed this against generation N; reject if N has
moved". The kernel returns `ERESTART` on mismatch — `push/2`
retries with bounded attempts, surfacing
`{:error, %Error{errno: :erestart, ruleset_gen: gen}}` on
exhaustion. Lets Linx cooperate cleanly with `nft` CLI / firewalld /
any other writer in the same netns.

## Owner flag is the default

`create_table/2` sets `NFT_TABLE_F_OWNER` by default: the table is
destroyed when the creating netlink socket closes. The supervisor
that opens the Nfnl socket owns the firewall; if it dies, rules
vanish. **No other firewall management tool exposes this naturally.**

Opt out with `persist: true` (uses `NFT_TABLE_F_PERSIST`, 6.9+) for
policies that should survive the BEAM. Older kernels fall back to
no-flags, table survives socket close until explicitly deleted.

## Per-namespace isolation

Each netns has fully independent nftables state — own tables, own
generation counter, own commit mutex, own multicast group.
`Linx.Netlink.Nfnl.open({:pid, child_pid})` opens the socket inside
that netns for its whole life; reads/writes through that socket
land in the child's nftables instance. Same value type, same
verbs.

## Authoring surfaces: peers, not layers

Two authoring surfaces produce the same `%Ruleset{}`:

  * **Pipeline DSL** —
    `Ruleset.new() |> Ruleset.add_table(...) |> Table.add_chain(...) |> Chain.add_rule(...)` —
    for runtime-shaped rulesets (interfaces discovered at boot,
    IPs from config).
  * **`~NFT` sigil** — `~NFT"table inet myapp { chain ... }"` —
    for compile-time-authored rulesets with safe Elixir
    interpolation and lossless round-trip to `nftables.conf`
    files. Modelled on Phoenix LiveView's HEEx.

Both call the same validator-setter functions; both produce the
same value.

The setters use `add_*` (`add_table` / `add_chain` / `add_rule`), not
the `create_*` of `Linx.Cgroup` or `Linx.Netlink.Rtnl`, deliberately:
`add_*` inserts into a *value*, while `create` materialises a kernel
object — different acts, different verbs.

## Composition with `Linx.Process`

Same shape as every other Linx subsystem: configure the child's
network and firewall at the checkpoint between `:ready` and
`proceed/1`, then release the workload with everything in force:

    {:ok, c} = Linx.Process.spawn(argv: [...], namespaces: [:net])
    receive do {:linx_process, :ready, _} -> :ok end
    {:ok, host_pid} = Linx.Process.host_pid(c)

    {:ok, ct_nfnl} = Linx.Netlink.Nfnl.open({:pid, host_pid})
    :ok = Linx.Netfilter.push(ct_nfnl, container_ruleset())

    :ok = Linx.Process.proceed(c)

`Linx.Process` has zero awareness of netfilter; the checkpoint is
the only coupling, exactly the way `Linx.Sysctl` / `Linx.Mount` /
every other subsystem composes.

See `docs/netfilter/DESIGN.md` for design work intentionally deferred.

## References

  * [`include/uapi/linux/netfilter/nf_tables.h`](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/netfilter/nf_tables.h)
  * [`include/uapi/linux/netfilter/nfnetlink.h`](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/netfilter/nfnetlink.h)
  * [wiki.nftables.org](https://wiki.nftables.org/wiki-nftables/index.php/Main_Page)
  * [`Documentation/networking/netlink_spec/nftables`](https://docs.kernel.org/networking/netlink_spec/nftables.html)

# `create_table`

```elixir
@spec create_table(Linx.Netlink.Socket.t(), String.t(), keyword()) ::
  {:ok, Linx.Netfilter.Ruleset.t()}
  | {:error, Linx.Netfilter.Error.t() | term()}
```

Creates a new table in the kernel's nftables instance.

## Options

  * `:family` — `:ip` | `:ip6` | `:inet` | `:arp` | `:bridge` |
    `:netdev`. Default: `:inet` (the firewall sweet spot — one
    table covers both IPv4 and IPv6).
  * `:persist` — `true` to disable the owner flag, leaving the
    table behind when the socket closes. Default `false` (table
    auto-destroys with the socket; see *Owner flag is the default*
    in the moduledoc).

Returns `{:ok, %Ruleset{}}` — the ruleset has just this one
table, ready for chains / rules to be added with the
`Linx.Netfilter.Ruleset` pipeline DSL and then pushed back with
`push/2`.

Wire-level failures come back as `{:error, %Linx.Netfilter.Error{}}`
with the operation set to `:create_table` and the kernel's
errno / extended-ack message attached. `EEXIST` means the table
was already present (pass through `Ruleset.pull/2` first if you
want a "create-or-fetch" pattern).

# `diff`

```elixir
@spec diff(Linx.Netfilter.Ruleset.t(), Linx.Netfilter.Ruleset.t()) ::
  Linx.Netfilter.Patch.t()
```

Computes the minimum-mutation `%Linx.Netfilter.Patch{}` between
two Rulesets — the operations that turn `from` into `to`.

Identity rules:

  * Tables / chains / sets / maps — `name` (within the relevant
    scope: tables within family, the rest within their table).
  * Rules within a chain — `:tag` when set, positional index
    otherwise. Mixed-tag chains fall back to a full rebuild.
  * Set elements — the element value itself.

Rule attribute changes use `NLM_F_REPLACE` over the
kernel-assigned handle carried by `from`'s rule (so you must
diff against a Ruleset *pulled* from the kernel, not against a
freshly-built one — otherwise handles are nil).

Patches are topologically sorted: deletes before creates of
their dependencies (see `Linx.Netfilter.Patch`).

See `Linx.Netfilter.Diff` for the underlying implementation.

# `dry_run`

```elixir
@spec dry_run(Linx.Netfilter.Ruleset.t(), Linx.Netfilter.Ruleset.t()) ::
  Linx.Netfilter.Patch.t()
```

Alias for `diff/2` — return the patch without sending it. The
name reads better at call sites where the intent is "show me
what would change".

# `log_listen`

```elixir
@spec log_listen(
  pid(),
  keyword()
) :: {:ok, pid()} | {:error, term()}
```

Opens an NFLOG listener bound to `:group`. The owner receives
`{:linx_netfilter, :log, %Linx.Netfilter.Log.Event{}}` per
logged packet.

Required option:

  * `:group` — NFLOG group (1..65535) the rule's
    `Linx.Netfilter.Expr.log/1` directs packets to. Linx
    convention: use `5000` if you don't care which group.

Optional:

  * `:netns` — namespace; default `:host`.
  * `:copy_mode` — `:none` | `:meta` | `:packet` |
    `{:packet, snaplen}`. Default `:meta` (header info only,
    no payload).
  * `:qthresh` — kernel-side queue threshold; default `1`.
  * `:timeout_ms` — kernel-side batching timeout; default `0`
    (no time-based batching).
  * `:flags` — `[:seq, :seq_global, :conntrack]`.
  * `:families` — protocol families to bind; default
    `[:ipv4, :ipv6]`.
  * `:rcvbuf` — `SO_RCVBUF` bytes; default 4 MiB.

Returns `{:ok, listener_pid}`. Close with `unlog_listen/1`.

See `Linx.Netfilter.Log` for the GenServer's full surface and
`Linx.Netfilter.Log.Event` for the packet-event shape.

# `pull`

```elixir
@spec pull(Linx.Netlink.Socket.t(), keyword() | {atom(), String.t()}) ::
  {:ok, Linx.Netfilter.Ruleset.t()}
  | {:error, Linx.Netfilter.Error.t() | term()}
```

Pulls the kernel's nftables state into a Ruleset value.

No-arg form dumps the entire netns — every table, every chain,
every rule the caller can see. Pass a `{family, name}` tuple to
scope the dump to one table (or `pull/3` with options).

Options (no-arg form):

  * `:subscribe_first` — pid of a `Linx.Netfilter.Monitor` to
    handshake against. Captures the current gen via `GETGEN`
    and tells the monitor to drop events at or below it.
    Subsequent multicast events with `gen_id > captured` are
    guaranteed not to be in the returned snapshot (snapshot+tail
    pattern).

Implementation: three sequential dumps (`GETTABLE`, `GETCHAIN`,
`GETRULE`) plus per-set `GETSETELEM`, then `Decoder.from_msgs/5`
assembles them. Dumps are not atomic across types — for full
consistency under churn, combine with `:subscribe_first` and the
Monitor.

# `pull`

```elixir
@spec pull(Linx.Netlink.Socket.t(), {atom(), String.t()}, keyword()) ::
  {:ok, Linx.Netfilter.Ruleset.t()}
  | {:error, Linx.Netfilter.Error.t() | term()}
```

Scoped pull — fetches one table by `(family, name)` plus its
chains, rules, and sets.

Accepts the same options as the no-arg `pull/2` (currently
`:subscribe_first`).

Returns `{:ok, %Ruleset{}}` containing just that table, or
`{:error, %Linx.Netfilter.Error{errno: :enoent}}` if the table
doesn't exist.

# `push`

```elixir
@spec push(Linx.Netlink.Socket.t(), Linx.Netfilter.Ruleset.t(), keyword()) ::
  :ok | {:error, Linx.Netfilter.Error.t() | term()}
```

Pushes a Ruleset to the kernel atomically as one batched
transaction.

Modes:

  * `:replace` (default) — for each table in `ruleset`, the
    kernel sees `DESTROYTABLE` (silent-if-missing, 6.3+) then
    `NEWTABLE` plus all its chains and rules. Other tables in
    the netns are untouched.
  * `:reconcile` — minimal-diff push with `NFTA_BATCH_GENID`
    CAS for cooperative concurrency.

Returns `:ok` on success, or `{:error, %Linx.Netfilter.Error{}}`
carrying the first inner-message rejection (with `:batch_seq`
pointing at the offending message position).

# `subscribe`

```elixir
@spec subscribe(
  pid(),
  keyword()
) :: {:ok, pid()} | {:error, term()}
```

Subscribes `owner_pid` to multicast nfnetlink events for ruleset
changes in the current netns.

Returns `{:ok, monitor_pid}`. The owner then receives:

  * `{:linx_netfilter, :event, %Linx.Netfilter.Event{}}` per
    committed change (one `:new_gen` followed by one event per
    mutated entity).
  * `{:linx_netfilter, :resync_needed}` when the monitor socket
    overflows (`ENOBUFS`) — the owner should re-pull state.

Options:

  * `:netns` — namespace to monitor. Defaults to `:host`.
  * `:since_gen` — initial floor; events at or below this gen
    are dropped. Use in tandem with `pull/1..2`'s
    `:subscribe_first` for snapshot+tail.
  * `:rcvbuf` — multicast socket receive buffer size in bytes;
    default 4 MiB.

See `Linx.Netfilter.Monitor` for the GenServer's full surface.

# `supported?`

```elixir
@spec supported?() :: boolean()
```

Returns `true` iff the kernel supports nfnetlink (i.e., a
`NETLINK_NETFILTER` socket can be opened in the current netns).

Opening the socket verifies the kernel was built with
`CONFIG_NETFILTER_NETLINK=y` (universal in modern Linux) — every
real operation against it (`GETGEN`, mutations) requires
`CAP_NET_ADMIN`, but the socket open itself is unprivileged. So
this probe answers "would Linx.Netfilter work *if* I had the right
capabilities", not "do I have the right capabilities" — the latter
surfaces as a `:eperm` error from the actual verb call when the
time comes.

Returns `false` if the kernel module is missing or the BEAM
process can't allocate a socket. Doesn't distinguish between
those.

# `unlog_listen`

```elixir
@spec unlog_listen(pid()) :: :ok
```

Stops a Log listener returned by `log_listen/2`. The kernel-side
group binding is dropped before the socket is closed.

# `unsubscribe`

```elixir
@spec unsubscribe(pid()) :: :ok
```

Unsubscribes by stopping the Monitor returned from `subscribe/2`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
