Canary LP · Technical Library

Architecture

End-to-end walkthrough of what Canary LP is, how it ingests events, how it detects loss, and how it's deployed.

System context

Canary LP is a loss-prevention analytics platform for Square-POS merchants. The user-facing product is an AI-assisted dashboard that watches every transaction, refund, void, cash-drawer event, discount, and timecard change in real time — and surfaces plain-English alerts when a pattern looks wrong.

The platform is built on a canonical data model (the CRDM) that descends from three generations of enterprise retail LP systems. Square is the first POS integration; the canonical layer is source-agnostic by design, so additional POS integrations are a parser and a field-mapping rather than a rewrite.

It is not a payment processor. It is not a POS replacement. It is not a rule-authoring engine that replaces human judgement. It is a layer on top of the POS that reads everything Square emits, applies decades of LP pattern knowledge, and writes back nothing. Canary is a lens.

Component map

At the highest level, Canary is a Python 3.12 Flask app with 29 blueprints, 12 MCP servers, a four-stage stream pipeline, and a PostgreSQL+Valkey data plane. It runs as a Docker Compose stack locally and behind a Cloudflare tunnel in hosted dev.

Canary service mesh — the 12 MCP servers and the Flask app
Fig. O-01 · Service Mesh

Traffic flows in three broad directions:

Data architecture

One Postgres database named canary with three schemas:

The split is enforced through search_path at session open; foreign keys cross schemas freely. The Field Registry catalogs every column in every table across all three schemas.

Database schema architecture — 3 schemas, cross-schema FKs, 98 tables
Fig. I-02 · Database Schema Architecture

Three tiers of immutability keep the evidence chain honest:

  1. Financial ledger (append-only). sales.transactions, sales.refund_links, sales.cash_drawer_events. No UPDATE, no DELETE. Enforced by the seed-time trigger prevent_mutation().
  2. Evidentiary (insert-only). app.fox_evidence, app.fox_evidence_access_log, sales.evidence_records, sales.event_inscriptions. If the platform accuses someone, the evidence chain must be unbroken.
  3. Audit trail (hash-chained). Every write to app.audit_log and app.fox_evidence gets a SHA-256 chain hash linking it to the previous row. Tamper with one entry, the chain breaks downstream.

Row-Level Security is enabled on every merchant-scoped table. Every session sets canary.current_merchant_id; every query is scoped to that merchant by Postgres, not by application code.

The detection pipeline — TSP

TSP stands for Triple Subscriber Pipeline (the name predates the fourth subscriber). Webhooks arrive at POST /webhooks/square; HMAC is validated; the raw payload is sealed into Postgres as an append-only evidence record and published to the canary:events Valkey stream. Four consumer groups read from that stream and do non-overlapping work:

TSP orchestration — 4 subscribers, Valkey streams, canonical tables
Fig. P-00 · TSP Orchestration Overview

Detection runs in three execution tiers. Tier 1 (stateless) fires from a single webhook payload with zero lookups — microsecond evaluation, the fast path. Tier 2 (stateful) needs shift-level or session-level aggregation, such as employee refund rate. Tier 3 (composite) correlates multiple primary-rule hits into higher-order patterns. All three tiers share the same detection_rules catalog and merchant-specific thresholds in merchant_rule_config.

Investigation lifecycle

An alert is the beginning of a narrative, not the end. A critical-severity rule (C-104 After-Hours Drawer, C-204 Untendered Order, C-301 Off-Clock Transaction, C-502 Post-Void) auto-opens a Fox case. Non-critical alerts sit in the Alert Queue until a merchant reviews them.

A Fox case has a case number (CASE-2026-NNNNN), a status lifecycle (open → investigating → escalated → closed | dismissed), an assigned reviewer, an evidence locker, and a hash-chained timeline. Evidence types include screenshots, register tapes, CCTV links, employee statements, and the triggering alerts themselves.

The lifecycle diagram is Atlas figure L-03. Field-level detail for every Fox table is in the Field Registry under the App domain.

Search & metrics

Owl is the AI assistant layer. It accepts natural-language questions ("What employees have the highest refund rate this week?") and translates them to structured queries over the CRDM. Owl uses the Field Registry as a typed schema, applies tenant-scoped RLS, and returns an answer with citations back to the underlying rows.

The Risk Dictionary is a curated set of 21 predefined z-score outlier queries — the drill-down entry points merchants see on the dashboard. Each entry maps a question to a drill order (by_employee / by_location / by_day) and a set of canonical filters. No pre-built SQL; the drill engine handles query construction at runtime.

Metrics are computed in metrics schema via daily + period aggregation:

The MCP surface

Every bounded service domain exposes an MCP server at its own URL prefix — /owl, /chirp, /fox, /alert, /analytics, /identity, /tsp, /raas, /bff, /condor, /atlas, plus ALX for institutional memory. Each server follows the same contract:

Auth is JWT on tool invocation; manifest/tools/health are public. The shared base kit in canary/mcp/ stamps these endpoints via create_mcp_blueprint(), so adding a new domain is a model + a handler file, not a blueprint scaffold.

This is the same contract the Agent SDK uses. Owl, ALX, and the Ops Console's QA Agent all talk to these servers as tool consumers — no internal-only API, no backdoors.

Deployment shape

Docker stack topology — Flask, four TSP subscribers, Postgres, Valkey, Ollama, PgAdmin, MailHog
Fig. I-01 · Docker Stack Topology

All environments run the same Docker Compose stack:

In dev, dev.growdirect.app routes to the Flask container through a Cloudflare tunnel. Production target is canary.growdirect.app on a Mac Mini behind a second tunnel (same compose, no nginx — Cloudflare handles TLS). No AWS. No Kubernetes. The architecture is deliberately one founder's operational surface area.

Where to go next

You now have the shape. For depth: