Skip to main content
canonical · conf 0.71

Alignment as an epistemic problem

Most AI alignment debate treats the question as ethical: how to make the system "want" what we want. I argue for a prior framing — epistemic.

The argument

Before the system can "want" the right things, it must not lie about what it knows. Not in the moral sense — in the formal sense: it must maintain calibration between reported confidence and actual knowledge state.

This is the property today's systems break most easily. An LLM produces an answer with the same confident fluency whether it is evidence-backed or hallucinated. The downstream ethical alignment — what to do with the answer — presupposes that we know its origin.

info

This is the property Gnostikon implements rudely: no answer without origin, no confidence without traceable origin, no agent without a pinned ethos.

Research program

  1. Formalize "lying about what it knows" in probabilistic calibration terms.
  2. Map architectures that make this impossible by construction (not by training).
  3. Test whether systems with traceable origins are preferred by humans long-term, even when less fluent.