Discovery Pipeline¶
reptimeline's discovery pipeline extracts structured knowledge from trained models without pre-defined labels or ontologies. It answers: "What did the model learn, and does it match what we expected?"
Pipeline Overview¶
ConceptSnapshot --> BitDiscovery --> AutoLabeler --> Reconciler
| | |
DiscoveryReport BitLabels ReconciliationReport
Step 1: BitDiscovery¶
Discovers what each bit means, which bits are opposites, and how bits depend on each other.
from reptimeline.discovery import BitDiscovery
discovery = BitDiscovery()
report = discovery.discover(snapshots[-1], timeline=timeline)
discovery.print_report(report)
What It Finds¶
Bit Semantics -- For each active bit, identifies:
- Activation rate (how often the bit fires)
- Top concepts (concepts most associated with this bit)
- Anti-concepts (concepts that never activate this bit)
- Auto-generated label
Dual Pairs -- Anti-correlated bit pairs (semantic opposites):
- Detects pairs where high activation of bit A predicts low activation of bit B
- Example: bit 12 ("life") and bit 47 ("death") are duals
Dependencies -- Parent-child relationships between bits:
- Bit B depends on bit A if B is only active when A is active
- Builds a dependency hierarchy
Triadic Dependencies -- 3-way AND-gate interactions (novel):
P(r|i,j) > threshold (high when both present)
P(r|i) < threshold (low when only i)
P(r|j) < threshold (low when only j)
Bit r activates only when bits i AND j are both active, but NOT when either is active alone. No equivalent exists in current tools (SAE tools analyze features individually or in pairs).
Analogy to genetics
Triadic dependencies are analogous to epistasis in genetics: the phenotype (bit r) depends on the combination of genotypes (bits i and j) in ways neither predicts alone.
Step 2: AutoLabeler¶
Translates discovered bits to human-readable names using three strategies:
from reptimeline.autolabel import AutoLabeler
labeler = AutoLabeler()
# Strategy 1: Embedding centroid
labels_emb = labeler.label_by_embedding(report)
# Strategy 2: Contrastive (top vs anti concepts)
labels_con = labeler.label_by_contrastive(report)
# Strategy 3: LLM (send descriptions to an LLM)
labels_llm = labeler.label_by_llm(report, llm_fn=my_api_call)
labeler.print_labels(labels_llm)
| Strategy | Needs | Best For |
|---|---|---|
| Embedding | Pre-trained encoder | Quick exploration |
| Contrastive | Concept list | Discriminative labels |
| LLM | API access | Rich, human-quality labels |
Step 3: Reconciler¶
Compares discovered ontology against a theoretical expectation. Suggests corrections in both directions -- for the theory and for the model.
from reptimeline.reconcile import Reconciler
from reptimeline.overlays.primitive_overlay import PrimitiveOverlay
overlay = PrimitiveOverlay()
reconciler = Reconciler(overlay)
recon = reconciler.reconcile(report, snapshots[-1].codes)
reconciler.print_report(recon)
What It Reports¶
- Bit mismatches: Bits where discovered semantics disagree with manual assignment
- Dual mismatches: Discovered duals that don't match theoretical opposites
- Dependency mismatches: Discovered hierarchy vs expected layer structure
- Agreement score: Overall alignment between discovery and theory
- Suggested corrections: For both the anchor set and the theoretical framework
Full Pipeline Example¶
from reptimeline import TimelineTracker, TriadicExtractor, PrimitiveOverlay
from reptimeline.discovery import BitDiscovery
from reptimeline.autolabel import AutoLabeler
from reptimeline.reconcile import Reconciler
# Extract and track
extractor = TriadicExtractor()
snapshots = extractor.extract_sequence("checkpoints/", concepts)
tracker = TimelineTracker(extractor)
timeline = tracker.analyze(snapshots)
# Discover (no labels needed)
discovery = BitDiscovery()
report = discovery.discover(snapshots[-1], timeline=timeline)
# Label (optional)
labeler = AutoLabeler()
labels = labeler.label_by_contrastive(report)
# Reconcile against theory
overlay = PrimitiveOverlay()
reconciler = Reconciler(overlay)
recon = reconciler.reconcile(report, snapshots[-1].codes)
# What used to take months of manual analysis:
reconciler.print_report(recon)
Before vs After¶
| Task | Before reptimeline | With reptimeline |
|---|---|---|
| Identify what bits mean | Manual correlation, weeks | BitDiscovery -- seconds |
| Find opposite pairs | Manual inspection | Automatic dual detection |
| Discover dependencies | Not attempted | Automatic hierarchy |
| Find 3-way interactions | Not possible | DiscoveredTriadicDep |
| Compare with theory | Months of analysis | Reconciler -- one line |
| Suggest retraining targets | Intuition | Data-driven suggestions |