Provenance Types
This guide covers Scallop’s provenance tracking system - a powerful semiring-based framework for reasoning with different semantics.
Overview
Provenance determines how facts are tagged and how tags combine during reasoning. Scallop’s unified execution engine can switch between discrete logic, probabilistic reasoning, and differentiable computation while maintaining full traceability of derivations.
What is Provenance?
In Scallop, every fact has an associated tag that tracks metadata:
#![allow(unused)]
fn main() {
// Without provenance (standard DataLog)
edge(0, 1)
// With probabilistic provenance
0.8::edge(0, 1) // 80% confidence
// With counting provenance
5::edge(0, 1) // Appears 5 times
}
Key insight: The same Scallop program can execute with different provenance types to answer different questions:
- Unit - “Does this fact hold?” (true/false)
- Natural - “How many derivations exist?” (count)
- MinMaxProb - “What’s the confidence?” (probability)
- TopKProofs - “What are the top-K explanations?” (proofs + probability)
The Three-Stage Tag Flow
Input Facts Runtime Execution Output Results
↓ ↓ ↓
InputTag ──tagging_fn()→ Tag ──operations→ Tag ──recover_fn()→ OutputTag
Example:
#![allow(unused)]
fn main() {
// Input: User provides probability
InputTag = 0.8
// Internal: Converted to provenance tag
Tag = 0.8 // For MinMaxProbProvenance
// Output: Result displayed to user
OutputTag = 0.56 // After combining: 0.8 * 0.7 = 0.56
}
The Provenance Trait
Trait Definition
#![allow(unused)]
fn main() {
pub trait Provenance: Clone + 'static {
/// The input tag space (what users provide)
type InputTag: Clone + Debug + StaticInputTag;
/// The internal tag space (used during execution)
type Tag: Tag;
/// The output tag space (what users see in results)
type OutputTag: Clone + Debug + Display;
/// Name of the provenance
fn name(&self) -> String;
/// Convert input tag to internal tag
fn tagging_fn(&self, ext_tag: Self::InputTag) -> Self::Tag;
/// Convert optional input tag (None → one())
fn tagging_optional_fn(&self, ext_tag: Option<Self::InputTag>) -> Self::Tag {
match ext_tag {
Some(et) => self.tagging_fn(et),
None => self.one(),
}
}
/// Convert internal tag to output tag
fn recover_fn(&self, t: &Self::Tag) -> Self::OutputTag;
/// Check if a fact should be discarded
fn discard(&self, t: &Self::Tag) -> bool;
/// Zero element (disjunction identity)
fn zero(&self) -> Self::Tag;
/// One element (conjunction identity)
fn one(&self) -> Self::Tag;
/// Add operation (disjunction, OR)
fn add(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag;
/// Multiply operation (conjunction, AND)
fn mult(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag;
/// Negate operation (NOT)
fn negate(&self, t: &Self::Tag) -> Option<Self::Tag> {
None // Default: negation not supported
}
/// Check if tag has saturated (convergence)
fn saturated(&self, t_old: &Self::Tag, t_new: &Self::Tag) -> bool;
/// Get weight of a tag (for ranking)
fn weight(&self, tag: &Self::Tag) -> f64 {
1.0 // Default: all tags equally weighted
}
}
}
Associated Types
InputTag - What users provide when adding facts:
#![allow(unused)]
fn main() {
ctx.add_facts("edge", vec![
(Some(0.8.into()), (0, 1).into()), // InputTag = f64 for MinMaxProb
], false)?;
}
Tag - Internal representation during execution:
#![allow(unused)]
fn main() {
// MinMaxProbProvenance uses f64 as Tag
type Tag = f64;
// TopKProofsProvenance uses DNFFormula
type Tag = Rc<DNFFormula>;
}
OutputTag - What users see in results:
#![allow(unused)]
fn main() {
for elem in results.iter() {
println!("Tag: {}, Tuple: {:?}", elem.tag, elem.tuple);
// elem.tag is OutputTag type
}
}
Available Provenance Types
Discrete Provenances
UnitProvenance - Standard DataLog
No tracking - Classic logical reasoning.
#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::discrete::unit::UnitProvenance;
impl Provenance for UnitProvenance {
type InputTag = ();
type Tag = Unit;
type OutputTag = Unit;
fn add(&self, _t1: &Unit, _t2: &Unit) -> Unit { Unit } // OR
fn mult(&self, _t1: &Unit, _t2: &Unit) -> Unit { Unit } // AND
}
}
Use case: Traditional DataLog queries without metadata.
#![allow(unused)]
fn main() {
let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
ctx.add_facts("edge", vec![
(None, (0, 1).into()), // No tag
(None, (1, 2).into()),
], false)?;
}
BooleanProvenance - Boolean Algebra
Boolean tags - Negation-as-failure support.
#![allow(unused)]
fn main() {
type InputTag = bool;
type Tag = bool;
type OutputTag = bool;
// Semiring operations:
fn add(&self, t1: &bool, t2: &bool) -> bool { *t1 || *t2 } // OR
fn mult(&self, t1: &bool, t2: &bool) -> bool { *t1 && *t2 } // AND
fn negate(&self, t: &bool) -> Option<bool> { Some(!*t) } // NOT
}
Use case: Programs with negation.
NaturalProvenance - Counting
Count multiplicity - Track number of derivations.
#![allow(unused)]
fn main() {
type InputTag = usize;
type Tag = usize;
type OutputTag = usize;
// Semiring operations:
fn add(&self, t1: &usize, t2: &usize) -> usize { t1 + t2 } // Sum
fn mult(&self, t1: &usize, t2: &usize) -> usize { t1 * t2 } // Product
}
Use case: Cardinality queries, bag semantics.
#![allow(unused)]
fn main() {
let prov = NaturalProvenance::default();
// Fact appears 3 times
ctx.add_facts("edge", vec![(Some(3), (0, 1).into())], false)?;
}
ProofsProvenance - Derivation Tracking
Proof trees - Track all derivation paths (no probabilities).
#![allow(unused)]
fn main() {
type InputTag = Exclusion;
type Tag = Rc<Proofs>;
type OutputTag = Rc<Proofs>;
}
Use case: Debugging, explainability without probabilities.
Probabilistic Provenances
MinMaxProbProvenance - Probabilistic (Min-Max Semiring)
Probability tracking with min-max semantics.
#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::probabilistic::min_max_prob::MinMaxProbProvenance;
impl Provenance for MinMaxProbProvenance {
type InputTag = f64;
type Tag = f64;
type OutputTag = f64;
fn add(&self, t1: &f64, t2: &f64) -> f64 {
t1.max(*t2) // Best alternative (OR)
}
fn mult(&self, t1: &f64, t2: &f64) -> f64 {
t1.min(*t2) // Weakest link (AND)
}
fn negate(&self, p: &f64) -> Option<f64> {
Some(1.0 - p) // Complement probability
}
}
}
Semiring intuition:
add(OR) = take maximum probability (best alternative)mult(AND) = take minimum probability (weakest link)
Use case: Fuzzy logic, confidence propagation.
#![allow(unused)]
fn main() {
let prov = MinMaxProbProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
ctx.add_facts("edge", vec![
(Some(0.8.into()), (0, 1).into()),
(Some(0.7.into()), (1, 2).into()),
], false)?;
// path(0, 2) derives with prob = min(0.8, 0.7) = 0.7
}
AddMultProbProvenance - Probabilistic (Add-Mult Semiring)
Independent events probability.
#![allow(unused)]
fn main() {
fn add(&self, t1: &f64, t2: &f64) -> f64 {
t1 + t2 - (t1 * t2) // Inclusion-exclusion (OR)
}
fn mult(&self, t1: &f64, t2: &f64) -> f64 {
t1 * t2 // Independent events (AND)
}
}
Semiring intuition:
add(OR) = inclusion-exclusion principlemult(AND) = independent probability multiplication
Use case: Statistical reasoning, independent events.
TopKProofsProvenance - Top-K Most Probable Proofs
Track top-K derivation proofs with probabilities.
#![allow(unused)]
fn main() {
type InputTag = InputExclusiveProb;
type Tag = Rc<DNFFormula>;
type OutputTag = f64;
}
Internally tracks:
- DNF formula representing proof combinations
- Computes probability via Weighted Model Counting (WMC)
- Returns top-K most probable derivations
Use case: Explanation generation, ranking derivations.
#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::probabilistic::top_k_proofs::TopKProofsProvenance;
let prov = TopKProofsProvenance::<RcFamily>::new(3); // Top-3 proofs
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
// Results include probability computed from top proofs
}
ProbProofsProvenance - Exact Probability with All Proofs
Complete proof tracking with exact probabilities.
#![allow(unused)]
fn main() {
type InputTag = ProbProofs<RcFamily>;
type Tag = Rc<ProbProofs<RcFamily>>;
type OutputTag = f64;
}
Tracks all derivation paths and computes exact probability via SDD (Sentential Decision Diagram).
Use case: Exact probabilistic reasoning, complete explanations.
Differentiable Provenances
These provenances support gradient computation for integration with machine learning frameworks like PyTorch.
DiffTopKProofsProvenance<T> - Differentiable Top-K
Backpropagation support for neural-symbolic integration.
#![allow(unused)]
fn main() {
type InputTag = InputExclusiveDiffProb<T>;
type Tag = Rc<DNFFormula>;
type OutputTag = (f64, Vec<T>); // (probability, gradients)
}
External tag T: Typically a PyTorch tensor for gradient tracking.
Use case: Neural-symbolic learning, gradient-based optimization.
DiffTopKProofsDebugProvenance<T> - Differentiable with Proofs
Debug variant that returns proofs alongside gradients.
#![allow(unused)]
fn main() {
type InputTag = InputExclusiveDiffProbWithID<T>;
type OutputTag = (f64, Vec<T>, Vec<Proofs>); // (prob, gradients, proofs)
}
Unique feature: Supports user-provided stable IDs for facts.
Use case: Debugging neural-symbolic systems, stable fact identification.
Using Different Provenances
Creating a Context with Provenance
#![allow(unused)]
fn main() {
use scallop_core::integrate::*;
use scallop_core::runtime::provenance::*;
use scallop_core::runtime::env::RcFamily;
// Standard DataLog
let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
// Probabilistic (min-max)
let prov = MinMaxProbProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
// Top-3 proofs
let prov = TopKProofsProvenance::<RcFamily>::new(3);
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
}
Adding Facts with Tags
#![allow(unused)]
fn main() {
use scallop_core::common::tuple::Tuple;
// Unit provenance (no tag)
ctx.add_facts("edge", vec![
(None, Tuple::from((0i32, 1i32))),
], false)?;
// Probabilistic provenance
ctx.add_facts("edge", vec![
(Some(0.8.into()), Tuple::from((0i32, 1i32))),
(Some(0.9.into()), Tuple::from((1i32, 2i32))),
], false)?;
// Natural provenance (count)
ctx.add_facts("edge", vec![
(Some(5usize), Tuple::from((0i32, 1i32))), // Appears 5 times
], false)?;
}
Interpreting Output Tags
#![allow(unused)]
fn main() {
ctx.run()?;
let results = ctx.computed_relation_ref("path")?;
for elem in results.iter() {
match provenance_type {
"unit" => {
println!("Tuple: {:?}", elem.tuple);
// elem.tag is Unit (no info)
}
"minmaxprob" => {
println!("Probability: {}, Tuple: {:?}", elem.tag, elem.tuple);
// elem.tag is f64
}
"natural" => {
println!("Count: {}, Tuple: {:?}", elem.tag, elem.tuple);
// elem.tag is usize
}
_ => {}
}
}
}
Semiring Operations
Provenance forms a semiring with two operations:
Addition (Disjunction, OR)
Combines alternative derivations of the same fact.
rel path(0, 2) :- edge(0, 1), edge(1, 2) // Derivation 1
rel path(0, 2) :- edge(0, 2) // Derivation 2
How provenances combine alternatives:
| Provenance | add(t1, t2) | Example |
|---|---|---|
| Unit | Unit | No change |
| Boolean | t1 ∨ t2 | true ∨ false = true |
| Natural | t1 + t2 | 3 + 5 = 8 |
| MinMaxProb | max(t1, t2) | max(0.8, 0.6) = 0.8 |
| AddMultProb | t1 + t2 - t1*t2 | 0.8 + 0.6 - 0.48 = 0.92 |
Multiplication (Conjunction, AND)
Combines dependent facts in a rule body.
rel path(a, c) :- edge(a, b), edge(b, c) // Both facts needed
How provenances combine conjunctions:
| Provenance | mult(t1, t2) | Example |
|---|---|---|
| Unit | Unit | No change |
| Boolean | t1 ∧ t2 | true ∧ false = false |
| Natural | t1 * t2 | 3 * 5 = 15 |
| MinMaxProb | min(t1, t2) | min(0.8, 0.9) = 0.8 |
| AddMultProb | t1 * t2 | 0.8 * 0.9 = 0.72 |
Identity Elements
Every semiring has zero (additive identity) and one (multiplicative identity):
| Provenance | Zero | One |
|---|---|---|
| Unit | Unit | Unit |
| Boolean | false | true |
| Natural | 0 | 1 |
| MinMaxProb | 0.0 | 1.0 |
Properties:
add(t, zero) = t
mult(t, one) = t
Example: MinMaxProb Semiring
rel 0.8::edge(0, 1)
rel 0.9::edge(1, 2)
rel 0.7::edge(0, 2)
rel path(a, b) = edge(a, b)
rel path(a, c) = edge(a, b), edge(b, c)
query path(0, 2)
Derivation:
- Direct path:
edge(0, 2)→ probability =0.7 - Indirect path:
edge(0, 1) ∧ edge(1, 2)→ probability =min(0.8, 0.9) = 0.8 - Combine alternatives:
add(0.7, 0.8) = max(0.7, 0.8) = 0.8
Result: path(0, 2) has probability 0.8
Complete Example: Probabilistic Path Finding
Here’s a full program demonstrating probabilistic reasoning.
use scallop_core::integrate::*;
use scallop_core::runtime::provenance::probabilistic::min_max_prob::MinMaxProbProvenance;
use scallop_core::runtime::env::RcFamily;
use scallop_core::common::tuple::Tuple;
use scallop_core::common::value::Value;
fn main() -> Result<(), IntegrateError> {
// Create context with min-max probability provenance
let prov = MinMaxProbProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
// Define schema
ctx.add_relation("edge(i32, i32)")?;
// Add probabilistic edges
ctx.add_facts("edge", vec![
(Some(0.8.into()), Tuple::from((0i32, 1i32))), // 80% confidence
(Some(0.9.into()), Tuple::from((1i32, 2i32))), // 90% confidence
(Some(0.7.into()), Tuple::from((2i32, 3i32))), // 70% confidence
(Some(0.6.into()), Tuple::from((0i32, 2i32))), // 60% confidence (shortcut)
], false)?;
// Define transitive closure
ctx.add_rule("path(a, b) = edge(a, b)")?;
ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")?;
// Execute
ctx.run()?;
// Query paths with probabilities
let path = ctx.computed_relation_ref("path")?;
println!("Probabilistic Paths:");
for elem in path.iter() {
if let (Some(Value::I32(from)), Some(Value::I32(to))) =
(elem.tuple.get(0), elem.tuple.get(1))
{
println!(" path({}, {}) with confidence: {:.2}", from, to, elem.tag);
}
}
Ok(())
}
Expected Output:
Probabilistic Paths:
path(0, 1) with confidence: 0.80
path(1, 2) with confidence: 0.90
path(2, 3) with confidence: 0.70
path(0, 2) with confidence: 0.80 // max(0.6 direct, 0.8 via 1)
path(1, 3) with confidence: 0.70 // min(0.9, 0.7)
path(0, 3) with confidence: 0.70 // multiple paths, best is 0.70
Explanation:
-
path(0, 2)has two derivations:- Direct edge with prob 0.6
- Via node 1: min(0.8, 0.9) = 0.8
- Combined: max(0.6, 0.8) = 0.8
-
path(0, 3)has multiple paths:- Via 1, 2: min(0.8, 0.9, 0.7) = 0.7
- Via 2: min(0.8, 0.7) = 0.7 (using shortcut)
- Combined: max(0.7, 0.7) = 0.7
Comparing Probabilistic Provenances
MinMaxProb vs AddMultProb vs TopKProofs
#![allow(unused)]
fn main() {
// Same input facts
let facts = vec![
(Some(0.8.into()), (0, 1).into()),
(Some(0.9.into()), (1, 2).into()),
];
// Program: path(a, c) :- edge(a, b), edge(b, c)
}
Results for path(0, 2):
| Provenance | Probability | Interpretation |
|---|---|---|
| MinMaxProb | 0.8 | Weakest link: min(0.8, 0.9) |
| AddMultProb | 0.72 | Independent: 0.8 × 0.9 |
| TopKProofs | 0.72 | Via WMC (equivalent to AddMultProb for single proof) |
When to use each:
- MinMaxProb - Fuzzy logic, confidence propagation, when conjunction means “limited by weakest”
- AddMultProb - Statistical independence, Bayesian reasoning
- TopKProofs - When you need explanations and multiple derivation paths
- ProbProofs - When you need exact probabilities with all proof trees
Advanced: Weighted Model Counting (WMC)
For proof-based provenances (TopKProofs, ProbProofs), probabilities are computed via Weighted Model Counting over Boolean formulas.
How It Works
-
Proofs to Formula:
Proofs: {{fact_0, fact_1}, {fact_2}} Formula: (f₀ ∧ f₁) ∨ f₂ -
Build SDD (Sentential Decision Diagram) for efficient computation
-
Evaluate with probability semiring:
f₀ = 0.8, f₁ = 0.9, f₂ = 0.5 WMC = (0.8 × 0.9) + 0.5 - (0.8 × 0.9 × 0.5) = 0.72 + 0.5 - 0.36 = 0.86
Key insight: Proof-based provenances use inclusion-exclusion to compute exact probabilities from potentially overlapping proofs.
Provenance Selection Guide
Quick Reference
| Use Case | Recommended Provenance | Why |
|---|---|---|
| Standard DataLog | UnitProvenance | No overhead, classic semantics |
| Counting derivations | NaturalProvenance | Track multiplicity |
| Fuzzy logic / confidence | MinMaxProbProvenance | Simple, efficient |
| Statistical reasoning | AddMultProbProvenance | Models independence |
| Need explanations | TopKProofsProvenance | Provides proof trees |
| Exact probabilities | ProbProofsProvenance | Complete computation |
| ML integration | DiffTopKProofsProvenance | Gradient support |
| Debugging proofs | DiffTopKProofsDebugProvenance | Full observability |
Performance Considerations
Computational cost (low to high):
- UnitProvenance - No overhead
- NaturalProvenance, MinMaxProbProvenance - Simple arithmetic
- AddMultProbProvenance - Inclusion-exclusion
- TopKProofsProvenance - WMC with top-K pruning
- ProbProofsProvenance - Full WMC (most expensive)
- DiffTopKProofsProvenance - WMC + gradient computation
Memory usage:
- Unit, Boolean, Natural - Minimal (single value)
- Probabilistic (f64) - 8 bytes per fact
- Proof-based - Stores DNF formulas (can be large)
Next Steps
- Foreign Functions - Extend Scallop with custom computations
- Foreign Predicates - Tag facts with probabilities
- Rust Examples - See provenance in action
- Getting Started - Basic examples with different provenances
Resources
- Trait Definition:
scallop-core/src/runtime/provenance/provenance.rs - Implementations:
scallop-core/src/runtime/provenance/{discrete,probabilistic,differentiable}/ - Research Paper: Scallop: A Language for Neurosymbolic Programming
- WMC Background: Weighted Model Counting