Skip to content

Discovery Pipeline

reptimeline's discovery pipeline extracts structured knowledge from trained models without pre-defined labels or ontologies. It answers: "What did the model learn, and does it match what we expected?"

Pipeline Overview

ConceptSnapshot --> BitDiscovery --> AutoLabeler --> Reconciler
                        |                |              |
                   DiscoveryReport   BitLabels    ReconciliationReport

Step 1: BitDiscovery

Discovers what each bit means, which bits are opposites, and how bits depend on each other.

from reptimeline.discovery import BitDiscovery

discovery = BitDiscovery()
report = discovery.discover(snapshots[-1], timeline=timeline)
discovery.print_report(report)

What It Finds

Bit Semantics -- For each active bit, identifies:

  • Activation rate (how often the bit fires)
  • Top concepts (concepts most associated with this bit)
  • Anti-concepts (concepts that never activate this bit)
  • Auto-generated label

Dual Pairs -- Anti-correlated bit pairs (semantic opposites):

  • Detects pairs where high activation of bit A predicts low activation of bit B
  • Example: bit 12 ("life") and bit 47 ("death") are duals

Dependencies -- Parent-child relationships between bits:

  • Bit B depends on bit A if B is only active when A is active
  • Builds a dependency hierarchy

Triadic Dependencies -- 3-way AND-gate interactions (novel):

P(r|i,j) > threshold    (high when both present)
P(r|i)   < threshold    (low when only i)
P(r|j)   < threshold    (low when only j)

Bit r activates only when bits i AND j are both active, but NOT when either is active alone. No equivalent exists in current tools (SAE tools analyze features individually or in pairs).

Analogy to genetics

Triadic dependencies are analogous to epistasis in genetics: the phenotype (bit r) depends on the combination of genotypes (bits i and j) in ways neither predicts alone.

Step 2: AutoLabeler

Translates discovered bits to human-readable names using three strategies:

from reptimeline.autolabel import AutoLabeler

labeler = AutoLabeler()

# Strategy 1: Embedding centroid
labels_emb = labeler.label_by_embedding(report)

# Strategy 2: Contrastive (top vs anti concepts)
labels_con = labeler.label_by_contrastive(report)

# Strategy 3: LLM (send descriptions to an LLM)
labels_llm = labeler.label_by_llm(report, llm_fn=my_api_call)

labeler.print_labels(labels_llm)
Strategy Needs Best For
Embedding Pre-trained encoder Quick exploration
Contrastive Concept list Discriminative labels
LLM API access Rich, human-quality labels

Step 3: Reconciler

Compares discovered ontology against a theoretical expectation. Suggests corrections in both directions -- for the theory and for the model.

from reptimeline.reconcile import Reconciler
from reptimeline.overlays.primitive_overlay import PrimitiveOverlay

overlay = PrimitiveOverlay()
reconciler = Reconciler(overlay)
recon = reconciler.reconcile(report, snapshots[-1].codes)
reconciler.print_report(recon)

What It Reports

  • Bit mismatches: Bits where discovered semantics disagree with manual assignment
  • Dual mismatches: Discovered duals that don't match theoretical opposites
  • Dependency mismatches: Discovered hierarchy vs expected layer structure
  • Agreement score: Overall alignment between discovery and theory
  • Suggested corrections: For both the anchor set and the theoretical framework

Full Pipeline Example

from reptimeline import TimelineTracker, TriadicExtractor, PrimitiveOverlay
from reptimeline.discovery import BitDiscovery
from reptimeline.autolabel import AutoLabeler
from reptimeline.reconcile import Reconciler

# Extract and track
extractor = TriadicExtractor()
snapshots = extractor.extract_sequence("checkpoints/", concepts)
tracker = TimelineTracker(extractor)
timeline = tracker.analyze(snapshots)

# Discover (no labels needed)
discovery = BitDiscovery()
report = discovery.discover(snapshots[-1], timeline=timeline)

# Label (optional)
labeler = AutoLabeler()
labels = labeler.label_by_contrastive(report)

# Reconcile against theory
overlay = PrimitiveOverlay()
reconciler = Reconciler(overlay)
recon = reconciler.reconcile(report, snapshots[-1].codes)

# What used to take months of manual analysis:
reconciler.print_report(recon)

Before vs After

Task Before reptimeline With reptimeline
Identify what bits mean Manual correlation, weeks BitDiscovery -- seconds
Find opposite pairs Manual inspection Automatic dual detection
Discover dependencies Not attempted Automatic hierarchy
Find 3-way interactions Not possible DiscoveredTriadicDep
Compare with theory Months of analysis Reconciler -- one line
Suggest retraining targets Intuition Data-driven suggestions