/ specifications / sha256::f417ee1e407c181e4d3ed92680d8e25d98560342591eedb4faf871da2dcff83c

[spec] SKEIN Mesh — Design Overview

Provenance
content hash: sha256::f417ee1e407c181e4d3ed92680d8e25d98560342591eedb4faf871da2dcff83c
signature: UNSIGNED — Sigstore signing not yet wired into publish

SKEIN Mesh — Design Overview

What SKEIN is

SKEIN is a system for managing knowledge and work as small, self-contained units of content called folios. A folio might be a note, a finding, a task, a handoff document, or a design spec. Folios are connected to one another by typed relationships called threads — "this supersedes that," "this references that," "this belongs to that collection." Together, folios and threads form a graph of content that can be edited, attributed, and shared.

The mesh is SKEIN's federation layer: a way for independently operated hosts — called stations — to peer with one another and exchange content while preserving a single, consistent principle: content is sovereign and portable. A folio is a self-contained, verifiable unit that does not depend on where it is hosted. Stations are hosts, not owners. Every layer has an exit: you can fork content, leave with your work, run your own station, or defederate.

This document describes the design of that mesh: how folios are identified and addressed, how stations relate, how authorship and authority are recorded, how stations replicate one another through an event log, how privacy works, and how the whole system can be deployed incrementally without disrupting an existing single-host installation.

Two ideas are worth separating up front, because they are easy to conflate:

A folio is identified by its content — its identity is a hash of its bytes (sha256::<digest>), not a human-assigned name. This makes folios portable and verifiable: anyone can confirm that a folio is what it claims to be, regardless of which station served it.
Authorship is identified by a human or organization. Who wrote a folio, who is accountable for it, and who is authorized to change it all key off a human (or org) identity, rooted in a recognized signing identity. Human identity is retained throughout; content addressing changes only how a folio is named and located, not who is responsible for it.

The three primitives

Folio — a unit of content. Identified canonically by its content hash (sha256::<digest>). When federated, a folio is signed by its author or authors; multiple authors are supported. Each folio carries a type field. Folios that live only on a local station are unsigned.

Thread — a first-class, signed object representing a typed relationship between two resources. A thread carries an id (its own content hash), a type, a from_id and to_id, a weaver (who created it), a signature when federated, a timestamp, and optional content. Threads can target other threads. A thread cannot be superseded; it is retracted by issuing a status thread against it.

Thread types, grouped by their consequence:

Structural: mention, reference, tag, within
Chain: supersedes, forked_from, merged_from
Authorization: assignment, trust, steward, revoke_steward
Workflow: propose, status

Station — the substrate that hosts folios and threads, and the unit of operator authority. A station peers with other stations to form the mesh. Every station keeps an append-only event log (the op log) that is signed when federated.

Station types

Filesystem station — files in folders. Carries an alias, a path, a transport (local, ssh, and others), and a host. A purely local working directory is simply a filesystem station with transport=local.

Networked station — an HTTP service. Carries an alias, a url, and an operator_identity.

Trust stance is orthogonal to type. A single unified registry holds local and remote stations side by side. Promoting a local station to a networked one creates a new station entry with forked_from lineage back to the original.

Local use versus federation

SKEIN is fully usable on a single machine with no account, login, signing, or encryption. The design draws a clear boundary between local use and federation, and crosses it only on a deliberate action.

Local stations require no signing, identity provider, or encryption. The threat model is ordinary filesystem access control. The op log still hash-chains for integrity, but its entries carry no signatures, and folios are unsigned.

The federation boundary is where signing begins. Publishing content from a local station to a networked one triggers an identity login and produces signatures.

Local-to-federated content is handled by republication, not retroactive signing. Publishing a local folio to a networked station creates a new, signed folio with a forked_from thread back to the local original. Signatures attach at publish time, not creation time.

First-run experience: a new user has a fully working local SKEIN immediately, with no account or signing setup required.

Lineage and editing

A lineage is a sequence of folios connected by supersedes threads — the edit history of a piece of content. It is identified by the content hash of its root folio, and most of its properties are derived rather than stored:

lineage_id is the root folio's content hash.
home_station is declared on the root at creation time; it is the station responsible for serializing the supersedes chain.
head is found by following supersedes threads to the leaf.
members are the root plus all of its supersedes-descendants.
co-authors are derived from assignment threads and from who authored each folio in the chain.
merge_policy is derived from the folio's type, not set per lineage.

A folio belongs to exactly one lineage. Relating content across lineages is done with a merged_from thread, which records attribution only.

The edit semantic is built entirely from threads and supersedes: an edit is a successor folio carrying a supersedes thread to the previous head. The home station maintains exactly one head per lineage. A publish that attempts to supersede a folio that is no longer the head is rejected, and the current head is returned; the author rebases onto it and retries. The initial merge policy across all types is linear-head-only (reject a stale base); future types may declare alternative policies.

Authority model

The design records two distinct facts about every change: authorship — who signed it — and authority — whose op log says it counts. Signatures are canonical for authorship; acceptance into a station's op log is the canonical event.

The core rule: the home station of a lineage is authoritative for every lineage-affecting thread, regardless of who wove it.

Lineage-affecting operations (which require home-station acceptance) include supersedes, assignment and revoke_steward, status on an assignment, propose, merged_from, and tombstone.

Non-lineage-affecting operations (which only require the weaver's own authority) include mention, reference, tag, within, trust, and station-info.

The authority cascade, from broadest to narrowest:

Operator — full override on their own station.
Author — may edit and retract their own content implicitly, even on stations they do not operate, subject to home-station policy.
Editor — granted per-lineage editing rights via an assignment thread.
Anyone — may only propose, gated by station policy.

Author-implicit authority is what makes public-station use work without a separate assignment thread for every folio: authors retain edit and retract rights over the lineages they originated. The op log records which of these applied to each event via an authorized_by field (author, editor with an assignment reference, operator-override, proposer, or policy).

Resolved edge cases

Revocation races. The home station is a single sequencer with a monotonic op log; authorization is evaluated at the moment an operation is accepted into the log. There is no cross-client clock comparison and therefore no possibility of a tie — the outcome is deterministic by construction.
Tombstoning a folio in the middle of a chain. Tombstones are per-folio. A later folio still validly supersedes a tombstoned earlier one; the chain structure stays intact. Tombstoning a whole range is a client convenience, not a protocol-level cascade.
Author retraction surviving loss of editor status. An author-signed tombstone of their own folio is always valid. Losing editor status stops further edits but does not revoke the author's ability to retract content they authored. This does not reach forks.
Operator override. An operator-signed tombstone can remove anything on their station, but an operator cannot forge authorship, because the operator holds no author key — "tombstone yes, forge no."
Multi-author successors. The authority to publish a successor is ordinary edit authority (author-implicit or assignment); there is no need to re-gather every prior co-signer. In practice, multi-author content is produced by successive signing down a supersedes chain rather than simultaneous co-signing.
Loss of a home station. There is no special recovery mechanism in the initial design; durability comes from replication. If replicas exist, fork from one; if none exist, the content is gone. More elaborate recovery schemes are deferred.

Contribution paths

There are three ways to contribute, and only one of them is gated on operator permission:

Create — start a new lineage on a station. Station policy decides who may create (open, curated, or personal). This is not a proposal.
Edit — change an existing lineage. Authors and editors edit directly; anyone else must propose, and an operator or editor promotes the proposal. This is the only path gated on operator permission.
Comment or reference — publish on your own station with a mention, reference, or tag pointing at the original. No round-trip and no gate.

Addressing

Addresses are text-canonical. (An emoji encoding, specified separately, is a reversible layer on top of the text form, not a separate scheme.)

Grammar. The delimiter is ::. The initial type words are alias (look the station up in a registry), web (fetch over HTTP), and hash (content lookup within a station). Reserved for future use: ipfs, ssh, oidc, peer.

Examples:

alias::myproject::sha256::<digest>
web::mesh.alice.example::sha256::<digest>

Bare forms. A full folio hash on its own cascades across local stations. <alias>::<folio> is shorthand for the alias:: form. Short or truncated hashes never cascade.

Reserved alias names. Station initialization rejects, as alias names, all type words and all hash-algorithm identifiers (sha256, sha512, blake3, and so on), so that dispatch stays unambiguous.

Tokenizer. Parsing proceeds in a fixed order:

Fragment first. Split off an optional verifier fragment at the first #, before any :: tokenizing. The fragment's grammar is exactly sha256::<64-lowercase-hex>. Percent-decoding of the address is forbidden — addresses are not URLs.
:: tokenization with bracket opacity. Split the remainder on ::, treating any bracketed [...] region as opaque. This is necessary because IPv6 zero-compression itself uses ::: web::[2001:db8::1]:8080::sha256::<digest> must tokenize to [web, [2001:db8::1]:8080, sha256, <digest>]. Brackets are legal only in the authority segment of authority-bearing types; nested [, an unmatched ], or end-of-input inside a bracket all reject.
Empty-segment and trailing-delimiter reject. Empty segments and a trailing :: are invalid.

Authority canonicalization. A web authority that must round-trip through the emoji alphabet has to be a canonical ASCII DNS name: UTS-46 / IDNA punycode, lowercased, with explicit port syntax (bracketed for IPv6). No internationalized hostnames, percent-encoded host bytes, or uppercase are permitted in the raw authority. Text addressing may accept a non-canonical authority and canonicalize it before encoding.

Folio token. There are two grammar productions: a full hash, sha256::<64-lowercase-hex>, and a short hash, sha256::<n-lowercase-hex> with 8 ≤ n < 64, which is valid only when an explicit station context is present. Hex is always lowercase.

Folio identity. A folio's identity is its content hash — canonical, with no human ID and no display handle. Uniqueness rests on a microsecond-precision created_at value being part of the canonical bytes that are hashed, so two independently created folios differ in their timestamp and therefore in their hash. The reasoning for omitting a human handle is that the three things a handle might do are each better served another way: scanning by eye is served by titles and the emoji encoding, lookup is served by the address, and uniqueness is served by the hash — leaving a handle redundant. The mental model is Git's: folios are like commits (a hash plus a message), and sites are like branches (named pointers).

Short-hash resolution. A short hash is resolved only within an explicit station context, never by cascade. On an ambiguous prefix the resolver must error and return the colliding full hashes (or the minimum distinguishing length); it must never silently pick one. Growing a prefix to disambiguate is a rendering-tool affordance — only a tool that already holds the target full hash may decide how much of it to display — not something a resolver does.

Chain-hash versus address-hash spelling. A single colon, sha256:<hex>, denotes an internal chain-link hash (such as the op log's prev_hash field) and is not an address. A double colon, sha256::<hex>, denotes a resolvable content address. Reserving the two spellings for the two namespaces ensures a parser never mistakes a chain hash for an address.

Emoji encoding

Addresses also have an optional, reversible emoji encoding — a compact, pasteable form that resolves to the same content. It encodes the station in full and the folio as a short-hash prefix, then resolves by decoding the station, asking it to expand the prefix, and recovering the full digest from the resolved folio (never from the emoji itself). It is specified separately in the Emoji Address Encoding document.

Privacy

The initial federation model uses station-level access control without encryption. The threat model it defends against is unauthorized retrieval across the network: a station refuses to serve the bytes of a private folio to anyone outside its recipient list, and an authenticated fetcher is checked against that list before any bytes are returned. It does not defend against the operator of the station where the content lives — at this level, private content is stored in a form the operator can read, and the model is explicit about that boundary. Encryption that protects content from the operator is an opt-in capability planned for a later version, intended for users who do not wish to extend that trust to their operator. The default model targets the common case, where a usable system without per-user key management matters more than defending against one's own host.

The mechanics:

Folios carry a non-canonical privacy field: public or private.
A private folio's recipient list is kept as a station-internal table, not as threads or folio metadata.
The station refuses bytes to non-recipients; a fetcher authenticates, and the station checks the access list.
Recipient lists are inherited: a new folio inherits its predecessor's recipients, and edit access implies read access.
The verbs are share and unshare; the op log records recipient_granted and recipient_revoked.
Private content does not federate — it is single-homed and is lost if its home station is lost.
One leak surface is accepted by design: the public op log entries for a private lineage reveal the lineage's existence, its chain motion, its authorship, and its timing — but never its bytes.

The later trajectory adds opt-in encryption: per-folio data keys wrapped in a per-recipient envelope, sign-then-encrypt ordering, and medium-term encryption public keys carried in identity records.

The op log

Each station keeps an append-only event log of per-event entries, hash-chained together.

A key property: there is no per-entry operator signature. Each entry references an artifact (a folio or thread) that is already signed, and that artifact's signature carries the authenticity. Operator-originated events — tombstone, policy change, recipient changes, assignment changes — reference an operator-signed artifact. Chain integrity comes from each entry's prev_hash. External tamper-evidence comes from periodically anchoring the chain head to a public transparency log (Sigstore's Rekor); the frequency is a station policy, on by default and blockable.

On rate limiting: anchoring and signing raise the cost of bulk spam and make abuse traceable, but they are not a hard per-folio throttle — a single short-lived signing certificate can sign many artifacts, and the work parallelizes across identities. Genuine rate limiting still requires explicit station policy.

Event types: folio_received, lineage_head_advanced, thread_accepted, proposal_received, lineage_created, tombstone, assignment_granted and assignment_revoked, policy_changed, recipient_granted and recipient_revoked, and station_initialized (the genesis event).

Common envelope:

{
  "schema_version": "1.0",
  "sequence": 1247,
  "prev_hash": "sha256:<64-hex>",
  "timestamp_us": 1748376556123456,
  "station_id": "web::mesh.alice.example",
  "event_type": "lineage_head_advanced",
  "event_payload": { ... },
  "authorized_by": "author"
}

Here prev_hash is the single-colon chain hash and station_id is a typed address. Payload fields carry sha256::<digest> hash addresses for folio, thread, artifact, and lineage references; identities are { issuer, subject } objects. A folio_received event carries a single folio_hash. The log is versioned with schema_version: unknown event types are preserved opaquely and unknown payload fields are accepted, so older readers tolerate newer logs.

Sites

A site is a folio — of type=site, with content-hash identity like any other folio. It is not a separate primitive.

Membership is a thread. A folio belongs to a site through a within thread pointing from the folio to the site folio. Membership is many-to-many: a folio can be within several sites at once. The within thread is structural and non-lineage-affecting. "The members of site S" is just "the folios with a within thread to S."

Sites do not appear in addresses. Folios are always addressed by hash directly; a site is an organizing overlay, not a routing element.

Human naming is handled by a station-internal slug table mapping a readable slug to a site folio's hash. This is a local lookup convenience, not federated and not part of any address; two different stations may each define their own slug for their own site.

Sites are themselves lineages. A site folio can be edited (renamed, re-described) through supersedes, and the slug table points at the current head.

Cross-station notification

A user's identity can be made reachable through an inbox station — a station whose op log accepts proposals from anyone. The user's identity record advertises "my inbox is at station J," and senders deliver to that user by appending proposals to station J's op log.

The user's local configuration lives in a single file holding pointers to their identity and signing key, plus local state — cursors, subscription filters, a trust cache, the inbox pointer, and preferences. Losing this file loses configuration, not identity or signing material. It is canonicalized JSON, signed, schema-versioned, and encrypted by default. In the initial version it is client-managed and invisible to the user; a user-managed version is a later opt-in.

Sovereignty

The design is anchored in a set of rights and limits.

Rights: to fork content; to leave with your work; to operate your own station; to be free of faked authorship; to remove your content (though not the recorded fact that you made it); to work local-first; to defederate; and to audit.

Limits: there is no right to be heard, no right to be unforked or to have authorship erased, no platform-level enforcement of these rights, and no protection from social deplatforming.

Initial-version compromises, on the path toward fuller sovereignty: a dependency on an external identity provider, a client-managed configuration file, weaker pseudonymity, out-of-band discovery, and a default-on (but blockable) public-transparency anchor.

Deploying without disruption: the parallel-station model

A federated SKEIN is not introduced by rewriting an existing single-host installation in place. Instead, the new federated system is stood up as a separate, parallel station — its own command-line entry point, its own network port, its own data directory — while the existing installation keeps running untouched. Content is migrated to the new station per-collection, reversibly, and only after the new tool has proven itself on real data.

The property that makes this cheap is that there is no single global database: each working store is independent, just files on disk. You never migrate "SKEIN" as a monolith; you migrate independent stores one at a time, in an order you choose, each one individually reversible from a snapshot.

The strategy in outline:

Freeze the legacy. The existing installation is pinned and immutable — the safety net. The new code is developed additively alongside it and never touches a live store.
The new station is a separate install — a distinct command name during the parallel period, its own port, its own data directory. It never opens a legacy store.
Develop against snapshots. Because the data is just files, it can be copied; the import is built and tested against copies.
Cut over later, per-collection. Lowest-stakes collections move first and daily-driver collections last, after the new tool has run clean on real data. Each cutover is atomic and reversible.
One tool per store, ever. A given store is only ever written by a single binary, so the on-disk formats never tangle.

This migration is itself the first real exercise of the federation model's own promised exit path — stand up a new station, migrate content into it, and defederate from the old one. If that path is rough, that is a finding about the design rather than merely an operational inconvenience.

The hardest design question — moving from human-assigned folio identifiers to content-hash identity — is resolved by going parallel. The new station is content-hash-native from the first day, with no live data to convert in place. The legacy installation stays human-ID-native and untouched. The import bridges the two: it reads legacy folios, mints content-hash identities, and keeps a permanent alias table mapping each old identifier to its new hash, so that the free-text references woven through years of existing prose still resolve. The unrewriteable-prose problem becomes a one-time import mapping, which is exactly what an alias table is for. The same enabling fact that underpins content-hash identity in general — distinct, microsecond-precision creation timestamps — has been confirmed to hold for the legacy data, so the uniqueness basis carries over.

Because the daily workflow never depends on the new station, there is no schedule pressure on the remaining design or build. Shipping the migration becomes "build a parallel station and migrate to it when ready."

Open questions and future work

The following are acknowledged as open or deliberately deferred:

Migration tooling to carry existing stores, sites, and identifiers into the new content-hash-native station. The design preconditions are in place; the tooling is the remaining work.
Discovery beyond out-of-band exchange — directory services, search relays, and an operator-rooted address layer.
A hosted offering that preserves the sovereignty exits.
Concurrent editing by three or more co-authors. The initial model is linear-head-only; richer per-type merge policies come later.
Identity evolution — multiple identity providers, pure-key identity, and a user-managed configuration file.
The emoji encoding's final form — curating the 1024-emoji alphabet, locking the encoding specification, and the rendering fallback. The encoding's constraints are already pinned.
Recovery for a lost home station — erasure coding, designated backups, or social recovery.

Also explicitly deferred to later versions: an inbox folio-type convention; acceptance-gating as a default policy; diff compression; an IPFS station type; merge as a first-class verb; symmetric collaboration without a single home station; log compaction; standalone content-routed addressing; encryption at rest; and configuration-file synchronization.