Customer intelligence modules—churn, health, expansion, intent, territory—are not independent models. They are sections of a sheaf on a site whose Grothendieck topology encodes segment-specific coverage. Descent is Bayesian updating; cohomological obstructions detect structural anomalies. The modules compose into a closed lifecycle via six domain theory transformations, including a recursive expansion loop $F_{\text{sales}} \circ F_{\text{rec}} \circ F_{\text{exp}} \circ F_{\text{evo}} : C \to C$. The Fisher information metric determines weights canonically—the manifold is nearly one-dimensional in the churn direction. Built solo at Series E scale: 10 modules, 60% above commercial alternatives.
Standard customer intelligence builds separate models for churn, expansion, health, and intent from the same underlying data. Each model carries its own assumptions, its own feature space, its own notion of distance. The results don’t compose. This article outlines an alternative architecture—grounded in category theory and information geometry—where the modules are not independent models but sections of a single geometric structure. Deployed at enterprise scale (Series E, >3,500 accounts), the architecture produced 10 production modules as a sole analytics function, beating commercially available alternatives by 60%.
A typical customer intelligence stack looks like five independent projects: a churn model, a health score, an expansion predictor, an intent classifier, a territory optimizer. Each is trained separately, evaluated separately, and maintained separately. They share a data warehouse but nothing else—no common notion of distance, no compositional structure, no way to propagate a learned relationship from one module to another.
The cost is not just engineering overhead. It is mathematical: separate models with separate feature spaces cannot detect when their outputs contradict each other. A customer scored as “healthy” by one module and “likely to churn” by another is not a bug report—it is information about the structure of the data. But the standard architecture discards it.
The alternative is to treat the entire CI system as a single mathematical object. The natural language for this is category theory.
Let $\mathcal{C}$ be a category whose objects are data domains—usage telemetry, billing events, support interactions, CRM state, product engagement. A morphism $f : U \to V$ is a data pipeline: a deterministic transformation that carries information from one domain to another. Composition of morphisms corresponds to chaining pipelines. The identity morphism on each object is the trivial pass-through.
Not every collection of local observations should be trusted to represent global customer state. A customer’s billing data alone does not determine their health; a product-engagement signal alone does not determine intent. The question is: which collections of local views are sufficient?
This is precisely what a Grothendieck topology encodes. A topology $J$ on $\mathcal{C}$ specifies, for each object $U$, which families of morphisms into $U$ constitute covering families—sufficient local observations. In the CI system, the covering condition is segment-specific: the set of signals required to cover “enterprise health” differs from the set required to cover “SMB churn.” The pair $(\mathcal{C}, J)$ forms a site.
Site (C, J)
├── objects: usage, billing, support, CRM, product
├── morphisms: data pipelines (ETL, joins, aggregations)
└── topology: J = segment-specific covering families
Covering family for enterprise churn:
{usage → churn, billing → churn, CRM → churn}
Covering family for SMB churn:
{usage → churn, product → churn}
(billing signal insufficient at SMB scale)
Each CI module is a sheaf on the site $(\mathcal{C}, J)$. Formally, a sheaf is a functor $F : \mathcal{C}^{\text{op}} \to \textbf{Man}$ that satisfies the gluing axiom with respect to $J$—where $\textbf{Man}$ is the category of statistical manifolds.
The sheaf condition says: if you have compatible local sections (a churn estimate from usage data, a churn estimate from billing data, a churn estimate from CRM data), and these local sections agree on their overlaps, then there exists a unique global section that restricts to each local estimate. The “agreement on overlaps” is where the real content lives.
The gluing procedure—reconciling local estimates into a global state—is descent. In the CI system, descent takes a specific computational form: Bayesian updating with Beta-conjugate priors.
The prior is tier-stratified: $\text{Beta}(\alpha_{\text{tier}}, \beta_{\text{tier}})$, with enterprise accounts carrying high inertia (slow update) and growth/SMB accounts responding faster. Each observation updates the posterior via conjugacy:
where $e_t$ is the evidence at time $t$ and $\lambda_{\text{tier}}$ is the tier-specific learning rate. The posterior mean $\frac{\alpha}{\alpha + \beta}$ gives the churn probability with a full credible interval via MCMC.
This is not a metaphor. The Beta-conjugate update rule satisfies exactly the cocycle condition required for descent data on the site. When three local estimates (from usage, billing, and CRM) meet at a triple overlap, the order in which you update does not matter—the cocycle condition guarantees coherence.
The most interesting output of this architecture is not the predictions. It is the obstructions.
When local sections fail to glue—when the churn signal from usage data and the churn signal from billing data cannot be reconciled—there is a nonzero class in $H^1(\mathcal{C}, F)$. This is a cohomological obstruction to global gluing. In practice, it means: the customer is in a state that the system’s covering families do not adequately describe. The topology needs refinement, or the customer is genuinely in transition between segments.
Standard CI treats this as noise. The geometric architecture treats it as the signal. A nonzero $H^1$ class is a detected structural anomaly—exactly the kind of event that precedes churn, expansion, or segment migration.
F : C → Man
(sheaf of modules)
│
┌────────┬───────┼───────┬────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
F(U₁) F(U₂) F(U₃) F(U₄) F(U₅)
churn health expan. intent terr.
Local sections glue?
├── Yes → global customer state (H⁰)
└── No → structural anomaly (H¹)
The five module families are not a flat list. They compose into a closed lifecycle through six domain theory transformations:
The full lifecycle from intent to projection is a single composed morphism:
But the architecture’s real power is the expansion recursion loop—a closed endomorphism on the customer category:
Expansion generates new opportunities, which re-enter the pipeline, which produce new customers (or upgrade existing ones), which evolve and expand again. The system doesn’t just predict expansion—it models the recursion.
Expansion detection and sales conversion form an adjoint pair:
This means there is a natural bijection between morphisms $F_{\text{exp}}(C) \to O$ and morphisms $C \to F_{\text{sales}}(O)$—detecting expansion potential in a customer is adjoint to converting pipeline into that customer’s account. The unit and counit of this adjunction encode the expansion/conversion cycle.
When independent modules produce outputs that disagree, a hierarchical composition algebra resolves the conflict:
Axiom 1 (Risk Dominance). If $P(\text{churn}) > \tau_{\text{crit}}$, expansion is multiplicatively dampened:
A customer likely to leave cannot meaningfully expand.
Axiom 2 (Intent Validation). If $P(\text{expand}) > 0.70$ but $Y_{\text{intent}} < \tau_{\text{low}}$: state = INCUBATE. Structural potential without behavioral intent is premature.
Axiom 3 (High-Velocity Growth). If $P(\text{expand}) > 0.70$ and $Y_{\text{intent}} > \tau_{\text{high}}$ and $P(\text{churn}) < \tau_{\text{safe}}$: state = EXPAND. All signals converging = immediate action.
The synthesis vector for each account lives in a three-dimensional composition space:
yielding four output states: {PROTECT, EXPAND, INCUBATE, IGNORE}.
The target category of the sheaf functor is $\textbf{Man}$—the category of statistical manifolds. Each module maps data domains not to numbers but to points on a Riemannian manifold equipped with the Fisher information metric.
For a parametric family of distributions $p(x \mid \theta)$, the Fisher information matrix is
This defines a Riemannian metric on the parameter space. By Čencov’s theorem (1982), it is the unique Riemannian metric on statistical manifolds that is invariant under sufficient statistics. The metric is not chosen—it is canonical.
In the CI system, each module’s output lives on a statistical manifold whose geometry is determined by the data, not by the modeler. Distance between two customer states has a precise meaning: the number of distinguishable statistical observations separating them. This is why the modules compose—they all speak the same geometric language.
Each evidence signal $S_i$ is weighted by its mutual information with the outcome $Y$:
This analysis revealed a striking asymmetry: one behavioral signal carries 1–2 orders of magnitude more mutual information with the churn outcome than all other signals combined. This signal functions as an approximate minimal sufficient statistic.
This is not a feature engineering result. It is a geometric fact. The Fisher information matrix of the churn model is nearly rank-1—the statistical manifold is effectively one-dimensional in the churn direction. Most of the “features” that a standard model would include contribute negligible Fisher information. They add parameters without adding distinguishability.
Mutual information with churn (bits):
signal_1 ████████████████████████████████ 0.847
signal_2 ███ 0.071
signal_3 ██ 0.043
signal_4 █ 0.029
signal_5 █ 0.018
...
↑
rank(g) ≈ 1 in churn direction
The system weights evidence through mutual information computed per segment. When the relationship between a signal and an outcome weakens—because customer behavior shifts, or because a product change alters the data-generating process—the mutual information drops, and the signal’s contribution contracts automatically.
This is not regularization. Regularization penalizes model complexity as an engineering choice. Here, the geometry itself contracts. The Fisher metric on the manifold changes shape as the data distribution shifts. Directions that carried information flatten out; the manifold loses a dimension. The model follows because it is built on the manifold, not bolted on top of it.
The architecture satisfies three conservation properties that mirror physical conservation laws:
Value conservation (attribution). Every dollar of revenue is attributed exactly once:
Probability conservation (Bayesian updates). The posterior remains a valid distribution at every timestep:
Information monotonicity (learning). Mutual information with future outcomes is non-decreasing:
The system gets smarter over time—provably. Each interaction enriches the state representation. This is not a tuning claim; it is a consequence of the Bayesian structure.
Three independent models—health scoring (retention-focused), expansion detection (growth-focused), and intent scoring (conversion-focused)—converged on the same critical threshold:
A customer below $\mathcal{I} = 0.362$ bits of content activity is effectively churning, regardless of which measurement system is applied. This convergence—three independent cohomology theories agreeing on a fundamental invariant—is the empirical evidence for the Grothendieck isomorphism. Different functors, same fixed point.
Deployed at a Series E company (>3,500 accounts) over 10 months, the architecture produced five core module families:
All 10 modules were built and maintained as a sole analytics function, outperforming commercially available tools by 60% on revenue-relevant metrics. The architecture requires no model selection—the geometry determines the model. When the underlying data distribution shifted, the modules re-calibrated without retraining, because the Fisher metric updated and the mutual information weights followed.
CI Hub
├── CustomerHealth
│ ├── Signal → behavioral insight, segmentation
│ ├── Pulse → temporal lifecycle predictor (AUC 0.971)
│ └── Sentinel → live monitor, automated playbook triggers
├── Forecasting
│ └── Prism → win-rate prediction, pipeline coverage
├── Prospecting
│ └── Magnet → lead scoring, contact prioritization
└── Planning
└── Tactix → capacity planning, territory optimization
There is a deeper structure beneath the modules. In algebraic geometry, the theory of motives (Grothendieck, 1969) seeks a universal cohomology—a single object from which all cohomological invariants of a variety can be derived. The CI architecture has an analogous structure: the Fisher information manifold of the full joint distribution is the motive from which each module’s manifold is obtained by projection.
This remains conjectural. The modules were built by constructing their individual manifolds and verifying that they glue via the sheaf condition. Whether a single Fisher manifold on the joint distribution recovers all five as canonical projections—whether the motive exists—is an open question. The evidence is suggestive: the self-correction behavior, the MI asymmetry, the convergence on $\mathcal{I} = 0.362$, and the domain-portability of the $(P, R, V)$ framework all point toward a low-dimensional joint structure from which the modules inherit their geometry.
Methodology described in Information Geometry in Practice. Full module specifications on cv.nati.sh.