Reference Architecture · Whitepaper

Govern AI data after it leaves storage.

This whitepaper explains how PumaMesh helps teams understand sensitive AI data, track where it moves, keep policy attached, and produce evidence after it reaches models, tools, and people.

It is a deeper technical resource, but the buyer promise stays simple: one platform keeps movement, lineage, and audit connected across AI workflows.

Problem

Static inventories cannot explain what AI actually used.

Traditional posture tools help find sensitive data at rest. AI creates a harder problem: data moves into indexes, fine-tunes, prompts, tools, and outputs faster than periodic scans can explain.

Scan Cadence

Scheduled scans miss fast AI handoffs

AI pipelines create embeddings, fine-tunes, responses, and intermediate artifacts faster than periodic scanning can explain.

Platform Boundaries

Platform guardrails stop at platform boundaries

Cloud and AI platform controls help locally, but they do not show the full upstream and downstream data path.

Lineage Gap

Lineage disappears between systems

Warehouses end at export and AI platforms start at prompt or training input. The gap is where teams lose the story.

Regulatory Pressure

AI reviews now ask for evidence

Regulators, auditors, model-risk teams, and insurers increasingly ask which data was used, where it moved, and what controls applied.

Control Surfaces

Six places where AI lineage can disappear.

A credible DSPM for AI story covers source posture, movement, training, retrieval, tool-calls, and evidence. Miss one and the chain breaks.

1. Source Data Posture

Classification and sensitivity of records before they enter AI pipelines

Traditional DSPM territory — extended so classifications are machine-readable downstream, not just visible in a dashboard.

2. Transfer Lineage

What moved, from where, to where, under which policy

The gap between warehouse and AI platform. Transfers must carry classification as first-class metadata, not opaque payloads.

3. Training and Fine-Tune Provenance

Which sensitive records entered which model artifact

Fine-tune sets, embedding indexes, and LoRA adapters inherit the sensitivity of their source data. Provenance has to follow.

4. Retrieval and Prompt Lineage

Which records were retrieved, embedded in context, or returned in a response

RAG systems pull thousands of rows per prompt. Lineage has to tie each retrieval back to source-row sensitivity — not just a vector ID.

5. Agent Tool-Call Policy

Which tools agents are allowed to invoke with which data

Agent frameworks hand out tool access broadly. DSPM for AI has to enforce ABAC on tool-calls the same way it enforces it on files.

6. Evidence and Audit

Exportable artifacts aligned to regulatory frameworks

EU AI Act Article 12, NIST AI RMF Measure/Manage, ISO/IEC 42001, and internal model-risk review all require artifacts the platforms don't produce on their own.

Reference Architecture

One lineage view across AI platforms.

The architecture keeps data where it already lives, enforces policy at movement and AI boundaries, and produces one lineage view across the platforms involved.

Data Plane

Keep data on the platforms where it belongs

Warehouses, feature stores, object storage, AI platforms, and agent runtimes remain in place. The architecture federates evidence instead of centralizing data.

Control Plane

Enforce policy at the boundaries that matter

Movement, retrieval, and tool-call boundaries use data attributes so governance can follow the record across workflows.

Federated Analytics Plane

Produce one lineage and evidence view

Posture, transfer, training, retrieval, and tool-call events feed a neutral view of what data was used and which policy applied.

Non-Goals

Federation, not consolidation

No forced data centralization, no replacement for platform-native guardrails, and no requirement to standardize on one AI platform.

Control Plane Detail

Write policy against data meaning, not platform names.

The control plane writes policy against data attributes such as classification, marking, owner, sensitivity, and jurisdiction. That keeps rules portable across file transfer, retrieval, tool-call, and training boundaries.

Transfer Boundary

Restricted records stay inside approved jurisdictions

The movement path checks the data context before a transfer begins.

Retrieval Boundary

Sensitive records are filtered before model context

Retrieval workflows can use attributes before context is assembled.

Tool-Call Boundary

AI agents inherit data-aware limits

Tool access can account for the agent role and the target resource context.

Training Boundary

Training sets respect source sensitivity

Fine-tune inputs can be checked against record attributes before they are added to a training artifact.

Evidence and Audit

Produce artifacts reviewers now ask for.

The federated analytics plane can export evidence for auditors, cyber insurers, and model-risk teams from events already created by movement and policy workflows.

EU AI Act · Article 12

Automatic activity logs over the lifetime of each high-risk AI system — inputs, events, and outputs, all traceable.

NIST AI RMF · Measure & Manage

Artifacts for Measure (MS-1 to MS-4) and Manage (MG-1 to MG-4) functions, with mapped control evidence per model.

ISO/IEC 42001

AI management system evidence — risk assessment inputs, control logs, and continuous-monitoring output.

Internal Model Risk Review

Model-card inputs covering training data provenance, sensitive-data exposure, and retrieval-path inventory.

Cyber Insurance

AI data surface inventory, sensitive-data flow map, and incident-response artifacts underwriters now want at renewal.

CMMC v1, v2, v3 & FedRAMP-Aligned Controls

All 110 CMMC controls for data sharing met by the product. FedRAMP-aligned (80+ NIST SP 800-53 Rev 5). Federal and defense deployments pull AI workloads inside the accreditation boundary with inherited evidence.

The Capability Model

How the reference architecture maps to Protect, Understand, Move, and Accelerate.

DSPM for AI is not a new product category — it is the Understand pillar extended into AI pipelines, with Protect, Move, and Accelerate running alongside.

P

Protect AI traffic

Training sets, weights, and inference traffic encrypted at rest and in flight. 100% post-quantum. Forwarding nodes never decrypt.

U

Understand the data

Content inspection, compliance framework matching, customer ontology matching, ABAC access gating — across transfers, RAG, fine-tunes, tool-calls.

M

Move AI workloads

Line-rate model delivery (70B in <60s), federated learning across sovereignty zones, Windows + Linux native.

A

Accelerate evidence

EU AI Act Article 12, NIST AI RMF, ISO 42001, CMMC v1/v2/v3 — evidence packs built from the audit stream continuously.

Implementation with PumaMesh

How Pulse and the fabric make the architecture operational.

PumaMesh is the reference architecture in product form. The fabric is the control plane — transfer, classification, and policy all sit in-path. Pulse is the federated analytics plane — posture, lineage, and evidence across every node and every AI platform. Pulse is the Understand pillar, delivered.

Control Plane

Fabric + Shield + Transit

ABAC checked at every transfer. Classification attached to records inline. Quantum-safe crypto posture held across sovereignty zones.

Federated Analytics

Pulse

Eleven views cover posture, discovery, UEBA, legal hold, audit, and AI Insights. Federated queries reach every node — no central collector.

AI Platform Coverage

Bedrock, Foundry, Vertex, Databricks, Snowflake Cortex

Gateway proxies for each platform capture retrieval, prompt, and tool-call events and feed them into the Pulse lineage graph.

Evidence Packs

Audit stream → framework-aligned artifacts

EU AI Act, NIST AI RMF, ISO/IEC 42001, CMMC v1/v2/v3 (all 110 controls for data sharing), and FedRAMP-aligned control packs — all built from the audit stream the fabric emits continuously.