Store source metadata such as title, provider, canonical URL, author label, publication date, retrieval timestamp, and source type.
Metadata storage must not silently copy protected full-text content.The Vault should preserve intelligence, not hoard noise.
A read-only Vault ingestion readiness layer that defines source trails, redaction, rights, retention, attribution, scoring, deduplication, conflict, and capsule-candidate boundaries before any Vault writes are enabled.
Every TheoB pathway can move through Past, Present, and Future without losing context.
Read current signals, conditions, and live context.
Voice ready
Store the trail. Preserve the truth.
Discovery Vault Ingestion Readiness prepares TheoB to store source-linked, redacted, rights-aware discovery records without copying protected content, hiding conflict, erasing provenance, or compressing intelligence too early.
Vault ingestion readiness is active as a non-destructive policy layer. TheoB can define Vault-safe record types, ingestion pipelines, redaction boundaries, rights checks, and future receipts, but it cannot write records, store external content, or compress capsules yet.
Prepare structured source-grounded reference cards for future Vault retrieval and agent evidence handoff.
Reference cards must remain linked to source trails.Store extracted claims, entities, dates, numbers, and context after scoring and conflict checks.
Claims must preserve uncertainty and not become unsupported facts.Store future source scoring results and dimension explanations.
Scores are evidence maps, not final truth verdicts.Store future duplicate cluster maps so repetition is separated from independent corroboration.
Deduplication must compress noise, not erase provenance.Store future conflict maps, disagreement severity, resolution status, and review requirements.
Conflict must be preserved, not buried.Prepare future records for image, diagram, CAD, schematic, map, dataset, and visual observation interpretation.
Visual meaning must preserve uncertainty zones and file provenance.Prepare future handoff records for TheoB Intelligence Capsule Engine compression.
A capsule must preserve enough truth to be reawakened faithfully.A future approved provider returns source metadata or source references.
Output: raw provider result pointerNormalize URL, provider ID, source type, publication date, retrieval date, and author/publisher labels.
Output: normalized source metadataAttach future scoring dimensions and confidence bands.
Output: source score attachmentAttach duplicate cluster IDs and independence signals.
Output: dedupe attachmentAttach conflict severity and resolution status when relevant.
Output: conflict attachmentVerify redaction, license, terms, caching, attribution, and retention boundaries.
Output: ingestion clearanceCreate the Vault-safe record only after governance gates are approved.
Output: vault recordPrepare future capsule candidate without compressing it yet.
Output: capsule candidateTheoB should ingest source metadata and source-linked summaries before any full external content storage.
Do not store copyrighted or licensed bodies unless rights allow it.Every Vault-ingested discovery object must preserve provider, source URL or safe source ID, retrieval timestamp, and transformation trail.
No orphaned intelligence objects.Records must be redacted before persistence where user data, secrets, credentials, private payloads, or sensitive fields could appear.
The Vault must not become a leak archive.Provider terms, licensing, caching, attribution, and retention rules must be approved before storing provider-derived records.
Discovery cannot be built on rights violations.Future Vault records should carry source scoring and confidence bands where available.
Vault storage does not equal truth.Vault ingestion should preserve duplicate cluster context before any capsule compression.
Do not capsule raw repeated noise.If evidence conflicts, the Vault record must preserve conflict status and review requirements.
Do not store resolved-looking records when conflicts remain unresolved.This layer defines ingestion readiness only and does not write records.
No database mutation, no external content storage, no hidden persistence.Medical, legal, financial, safety, founder-authorized, production, or rights-sensitive records require human review before ingestion.
High-impact uncertainty cannot be automated away.Images, PDFs, CAD, schematics, architecture plans, maps, and datasets require file validation, rights awareness, redaction, and interpretation boundaries.
Files are not just text blobs.Vault ingestion may prepare capsule candidates, but cannot compress intelligence capsules yet.
Compression must reduce size, not truth.