Discovery Pipeline¶

reptimeline's discovery pipeline extracts structured knowledge from trained models without pre-defined labels or ontologies. It answers: "What did the model learn, and does it match what we expected?"

Pipeline Overview¶

ConceptSnapshot --> BitDiscovery --> AutoLabeler --> Reconciler
                        |                |              |
                   DiscoveryReport   BitLabels    ReconciliationReport

Step 1: BitDiscovery¶

Discovers what each bit means, which bits are opposites, and how bits depend on each other.

from reptimeline.discovery import BitDiscovery

discovery = BitDiscovery()
report = discovery.discover(snapshots[-1], timeline=timeline)
discovery.print_report(report)

What It Finds¶

Bit Semantics -- For each active bit, identifies:

Activation rate (how often the bit fires)
Top concepts (concepts most associated with this bit)
Anti-concepts (concepts that never activate this bit)
Auto-generated label

Dual Pairs -- Anti-correlated bit pairs (semantic opposites):

Detects pairs where high activation of bit A predicts low activation of bit B
Example: bit 12 ("life") and bit 47 ("death") are duals

Dependencies -- Parent-child relationships between bits:

Bit B depends on bit A if B is only active when A is active
Builds a dependency hierarchy

Triadic Dependencies -- 3-way AND-gate interactions (novel):

P(r|i,j) > threshold    (high when both present)
P(r|i)   < threshold    (low when only i)
P(r|j)   < threshold    (low when only j)

Bit r activates only when bits i AND j are both active, but NOT when either is active alone. No equivalent exists in current tools (SAE tools analyze features individually or in pairs).

Analogy to genetics

Triadic dependencies are analogous to epistasis in genetics: the phenotype (bit r) depends on the combination of genotypes (bits i and j) in ways neither predicts alone.

Step 2: AutoLabeler¶

Translates discovered bits to human-readable names using three strategies:

from reptimeline.autolabel import AutoLabeler

labeler = AutoLabeler()

# Strategy 1: Embedding centroid
labels_emb = labeler.label_by_embedding(report)

# Strategy 2: Contrastive (top vs anti concepts)
labels_con = labeler.label_by_contrastive(report)

# Strategy 3: LLM (send descriptions to an LLM)
labels_llm = labeler.label_by_llm(report, llm_fn=my_api_call)

labeler.print_labels(labels_llm)

Strategy	Needs	Best For
Embedding	Pre-trained encoder	Quick exploration
Contrastive	Concept list	Discriminative labels
LLM	API access	Rich, human-quality labels

Step 3: Reconciler¶

Compares discovered ontology against a theoretical expectation. Suggests corrections in both directions -- for the theory and for the model.

from reptimeline.reconcile import Reconciler
from reptimeline.overlays.primitive_overlay import PrimitiveOverlay

overlay = PrimitiveOverlay()
reconciler = Reconciler(overlay)
recon = reconciler.reconcile(report, snapshots[-1].codes)
reconciler.print_report(recon)

What It Reports¶

Bit mismatches: Bits where discovered semantics disagree with manual assignment
Dual mismatches: Discovered duals that don't match theoretical opposites
Dependency mismatches: Discovered hierarchy vs expected layer structure
Agreement score: Overall alignment between discovery and theory
Suggested corrections: For both the anchor set and the theoretical framework

Full Pipeline Example¶

from reptimeline import TimelineTracker, TriadicExtractor, PrimitiveOverlay
from reptimeline.discovery import BitDiscovery
from reptimeline.autolabel import AutoLabeler
from reptimeline.reconcile import Reconciler

# Extract and track
extractor = TriadicExtractor()
snapshots = extractor.extract_sequence("checkpoints/", concepts)
tracker = TimelineTracker(extractor)
timeline = tracker.analyze(snapshots)

# Discover (no labels needed)
discovery = BitDiscovery()
report = discovery.discover(snapshots[-1], timeline=timeline)

# Label (optional)
labeler = AutoLabeler()
labels = labeler.label_by_contrastive(report)

# Reconcile against theory
overlay = PrimitiveOverlay()
reconciler = Reconciler(overlay)
recon = reconciler.reconcile(report, snapshots[-1].codes)

# What used to take months of manual analysis:
reconciler.print_report(recon)

Before vs After¶

Task	Before reptimeline	With reptimeline
Identify what bits mean	Manual correlation, weeks	`BitDiscovery` -- seconds
Find opposite pairs	Manual inspection	Automatic dual detection
Discover dependencies	Not attempted	Automatic hierarchy
Find 3-way interactions	Not possible	`DiscoveredTriadicDep`
Compare with theory	Months of analysis	`Reconciler` -- one line
Suggest retraining targets	Intuition	Data-driven suggestions