> ## Documentation Index
> Fetch the complete documentation index at: https://docs.finwatch.finance/llms.txt
> Use this file to discover all available pages before exploring further.

# Understanding the Architecture

This guide gives you a deep mental model of how FinWatch works internally. Understanding the architecture will help you make informed decisions about configuration, rule design, and production deployment.

## High-Level Architecture

FinWatch is a single binary that embeds all of its dependencies. There is no external database to manage, no message queue to configure, and no separate rule engine to deploy. Everything runs in one process.

```mermaid theme={null}
graph LR
    subgraph FinWatch["FinWatch Engine"]
        API["HTTP API Server<br/>(Port 8081)"]
        Ingest["Transaction Ingestion<br/>(Decode + Store)"]
        DuckDB["DuckDB<br/>(Embedded Analytical<br/>Database)"]
        RuleEngine["Rule Engine<br/>(Lexer → Parser → AST<br/>→ Interpreter)"]
        RiskConsolidator["Risk Consolidator<br/>(Score Aggregation +<br/>Final Verdict)"]
    end

    YourApp["Your App<br/>(or Blnk Webhook)"]
    BlnkDB["Blnk PostgreSQL"]
    GitRepo["Git Repo<br/>(or local directory)"]
    AnomalyReporter["Anomaly Reporter<br/>(WebSocket Tunnel to<br/>Blnk Cloud Dashboard)"]

    YourApp -->|POST /inject<br/>POST /blnkwebhook| API
    API --> Ingest
    Ingest --> DuckDB
    DuckDB --> RuleEngine
    RuleEngine --> RiskConsolidator
    RiskConsolidator --> AnomalyReporter
    BlnkDB -->|Watermark Sync| DuckDB
    GitRepo -->|.ws files<br/>GitOps sync| RuleEngine
```

### Why DuckDB?

FinWatch chose [DuckDB](https://duckdb.org/) as its embedded analytical database for several compelling reasons:

1. **Columnar Storage:** DuckDB stores data in a columnar format, which is significantly more efficient for analytical queries (aggregations, filtering, scanning) than row-based databases like SQLite or PostgreSQL. When a rule asks "sum all amounts where source equals X in the last 24 hours," DuckDB only reads the `amount`, `source`, and `timestamp` columns — not the entire row.
2. **Vectorized Execution:** DuckDB processes data in batches (vectors) rather than row-by-row. This means aggregate functions like `COUNT`, `SUM`, and `AVG` execute at near-native speed, leveraging modern CPU architectures (SIMD instructions, cache-friendly access patterns).
3. **Zero External Dependencies:** DuckDB is an in-process database. There is no separate server to install, configure, or manage. It compiles into the FinWatch binary and runs inside the same process. This dramatically simplifies deployment — especially for an embeddable product that runs on the customer's server.
4. **SQL Compatibility:** DuckDB supports a rich SQL dialect, which means FinWatch can translate aggregate functions from the DSL into standard SQL queries. This makes the interpreter straightforward to implement and debug.

### Trade-offs

Every architectural choice has trade-offs. DuckDB's are:

* **Single-Writer Concurrency:** DuckDB allows multiple concurrent reads but only a single writer at a time. FinWatch handles this with a mutex lock (`dbMutex`) to serialize write operations. In practice, this is not a bottleneck because transaction ingestion is I/O-bound, not CPU-bound.
* **Local Storage:** Data lives on the local filesystem as `.db` files in the `blnk_agent/` directory. This means the data is tied to the server. If the server is lost, the local data is lost. However, the source of truth for historical data is the Blnk PostgreSQL database (synced via the watermark pattern), and the source of truth for rules is the Git repository. FinWatch can be fully reconstructed from these external sources.
* **Memory Usage:** DuckDB's performance comes from keeping data in memory. As your transaction volume grows, so does DuckDB's memory footprint. FinWatch provides a configurable `memory_limit` (default: `2GiB`) and an auto-scaling feature to manage this. See the [Production Deployment Guide](production-deployment.md) for details.

### Database Files

FinWatch creates two DuckDB databases:

| File              | Location                         | Purpose                                                                                  |
| ----------------- | -------------------------------- | ---------------------------------------------------------------------------------------- |
| `finwatch.db`     | `finwatch_agent/blnk.db`         | Stores the `transactions` table — all ingested transaction data.                         |
| `instructions.db` | `finwatch_agent/instructions.db` | Stores compiled rules as "instructions" — the JSON representation of parsed `.ws` files. |

A temporary directory (`finwatch_agent/duckdb_temp`) is also created for DuckDB's spill-to-disk operations when queries exceed the configured memory limit.

### Connection Configuration

DuckDB is initialized with the following pragmas:

```sql theme={null}
SET access_mode = 'READ_WRITE';
SET threads = 1;
SET memory_limit = '2GiB';
SET checkpoint_threshold = '64MiB';
```

* **`threads = 1`**: Limits DuckDB to a single thread. This simplifies concurrency management and is sufficient for the single-writer model.
* **`memory_limit = '2GiB'`**: The default upper bound on memory. Configurable via the `FINWATCH_MEMORY_LIMIT` environment variable.
* **`checkpoint_threshold = '64MiB'`**: Controls how frequently DuckDB writes its in-memory data to disk. A lower value means more frequent writes (safer but slower); a higher value means less frequent writes (faster but more data at risk during a crash).

## Transaction Lifecycle

<Card title="In One Line" icon="note">
  Ingest → Store → Evaluate → Decide → Alert
</Card>

**Ingestion:** A transaction enters FinWatch through an API (either directly or via webhook).

**Storage:** The transaction is stored and made available for analysis.

**Evaluation Trigger:** The system asynchronously picks up the transaction and prepares everything needed to assess it:

* Loads active risk rules
* Prepares any required historical or aggregated context

**Rule Execution:** Each rule is evaluated against the transaction: •	Checks transaction attributes (e.g., amount, source)

* Uses historical patterns (e.g., frequency, past behaviour)
* Applies logic and time-based conditions. If a rule matches, it produces a risk signal.

**Risk Decision:** All risk signals are combined into a single outcome:

* A risk score is computed
* A verdict is assigned (e.g., allow, alert, review, block)
* A risk level is determined (very low → high)
* A reason is generated

**Alerting:** If the transaction is risky:

* An anomaly is sent in real-time to the monitoring system
* Includes key details (transaction info, risk score, reason, verdict)

***

## Rule Compilation Pipeline

When you create or modify a `.ws` file, FinWatch detects the change and compiles the rule through a multi-stage pipeline:

```mermaid theme={null}
graph LR
    A[".ws File<br/>(text)"] -->|Lexer| B["Tokens<br/>(stream)"]
    B -->|Parser| C["AST<br/>(tree)"]
    C -->|astToRule| D["JSON Rule<br/>(storable)"]
    D -->|Interpretation| E["Evaluation<br/>Result"]
```

**Stage 1: Lexing.** The `Lexer` reads the raw `.ws` text character by character and produces a stream of `Token` objects. Each token represents a fundamental language element: a keyword (`rule`, `when`, `then`), an operator (`==`, `>`), a literal (`10000`, `"USD"`), or a delimiter (`{`, `}`).

**Stage 2: Parsing.** The `Parser` consumes the token stream and builds an Abstract Syntax Tree (AST). The AST is a hierarchical representation of the rule's structure. At the top is a `RuleStatement` containing a name, description, a `when` expression (which can be a nested tree of logical and comparison expressions), and a `then` action expression.

**Stage 3: AST to JSON.** The `astToRule()` function converts the AST into a `Rule` struct — a flat, JSON-serializable representation that the interpreter can evaluate efficiently. Logical expressions are flattened into a list of conditions. The JSON rule is stored in the instructions database.

**Stage 4: Interpretation.** At evaluation time, the interpreter reads the JSON rule and evaluates each condition against the transaction data. This separation of parsing (compile-time) and evaluation (runtime) means that rules are only parsed once, even if they are evaluated millions of times.

**Why this pipeline exists:** The pipeline separates concerns. The DSL provides a human-friendly authoring experience. The JSON intermediate format provides a machine-friendly evaluation target. This means you can write rules in the expressive `.ws` syntax, while the engine evaluates them in a format optimized for speed.

***

## Data Synchronization

FinWatch can synchronize data from your Blnk PostgreSQL database into its local DuckDB using the **watermark sync** pattern. This is essential for aggregate functions — if a rule needs to count "transactions from this account in the last 24 hours," the local DuckDB must contain that historical data.

### How It Works

1. FinWatch connects to the Blnk PostgreSQL using the `BLNK_DSN` connection string.
2. It maintains a `sync_watermark` table in DuckDB that tracks the last synchronized position (a combination of `last_sync_timestamp` and `last_record_id`).
3. On each sync cycle, it queries PostgreSQL for records created **after** the watermark.
4. New records are inserted into the local DuckDB tables.
5. The watermark is updated.

This approach ensures:

* **No duplicates:** Records are only synced once.
* **No gaps:** All records after the watermark are eventually synced.
* **Efficient incremental updates:** Only new records are transferred, not the entire dataset.

The sync handles four entity types: **transactions**, **identities**, **balances**, and **ledgers**.

> For the full technical specification, see the [Watermark Sync Documentation](../WATERMARK_SYNC.md).

***

## Anomaly Reporting

FinWatch communicates with the Blnk Cloud dashboard through a **WebSocket tunnel**. This is a persistent, bidirectional connection that enables real-time anomaly reporting.

When a transaction triggers one or more rules and the risk consolidator determines that the result warrants attention, an `AnomalyMessage` is sent through the tunnel. The message contains all the context a fraud analyst needs: the transaction ID, the risk score, the verdict, the reason, and the transaction's metadata.

The WebSocket tunnel is initialized at startup and automatically reconnects if the connection is dropped. If the tunnel is unavailable, anomaly messages are logged locally but not sent — FinWatch does not block transaction processing due to a reporting failure.

***

## Next Steps

Now that you understand the architecture:

* [**Writing Your First Rule**](writing-your-first-rule.md) — Apply this knowledge to build your first rule step by step.
* [**Aggregate Functions Guide**](aggregate-functions-guide.md) — Understand how aggregate functions translate to DuckDB SQL queries.
* [**Production Deployment**](production-deployment.md) — Configure memory limits, monitoring, and backups for a production environment.
* [**Integration Guide**](integration-guide.md) — Connect FinWatch to your application via the API or Blnk webhooks.
