Configuring Provenance
Provenance in Scallopy determines how Scallop tracks and computes probabilities during execution. This guide shows you how to configure provenance in Python and work with probabilistic facts using the ScallopContext API.
Setting Provenance in Python
When creating a ScallopContext, you specify the provenance type as a string parameter.
Basic Provenance Configuration
import scallopy
# Standard DataLog (no probability tracking)
ctx = scallopy.ScallopContext(provenance="unit")
# Min-max probability bounds (fast, conservative)
ctx = scallopy.ScallopContext(provenance="minmaxprob")
# Add-mult probability (fast, approximate)
ctx = scallopy.ScallopContext(provenance="addmultprob")
# Top-K proofs with exact probability
ctx = scallopy.ScallopContext(provenance="topkproofs", k=3)
Available Provenance Types
The provenance parameter accepts the following values:
Discrete Provenances (no probability):
"unit"- Standard DataLog, no provenance tracking"proofs"- Collect all derivation proofs"tropical"- Min-add semiring (positive integers + infinity)"boolean"- Boolean logic tracking"natural"- Natural number semiring
Probabilistic Provenances:
"minmaxprob"- Min-max probability bounds (fast, conservative)"addmultprob"- Add-mult probability (fast, approximate)"topkproofs"- Top-K most probable proofs with exact probability via WMC"probproofs"- All proofs with exact probability"topbottomkclauses"- Top/bottom-K clauses for negation and aggregation
Differentiable Provenances (for PyTorch integration):
"difftopkproofs"- Differentiable top-K proofs"difftopkproofsdebug"- With stable fact IDs for debugging"diffminmaxprob"- Differentiable min-max probability"diffaddmultprob"- Differentiable add-mult probability"diffsamplekproofs"- Differentiable unbiased sampling of K proofs"difftopbottomkclauses"- Differentiable top/bottom-K with full negation support
The K Parameter
Provenances like topkproofs and difftopkproofs require a k parameter specifying how many proofs to track:
# Keep top 5 most probable proofs for each derived fact
ctx = scallopy.ScallopContext(provenance="topkproofs", k=5)
Tradeoff:
- Larger K = More memory, more accurate probabilities, slower execution
- Smaller K = Less memory, approximate probabilities, faster execution
Typical values: k=3 for development, k=5-10 for production
Train vs. Test K
For machine learning applications, you can use different K values during training and testing:
ctx = scallopy.ScallopContext(
provenance="difftopkproofs",
train_k=3, # Smaller K during training (faster)
test_k=10 # Larger K during inference (more accurate)
)
Adding Probabilistic Facts
Once you’ve configured provenance, you add facts with probabilities using the add_facts() method.
Basic Probabilistic Facts
The most common format is a list of (probability, tuple) pairs:
import scallopy
ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.add_relation("edge", (int, int))
# Add facts with probabilities
ctx.add_facts("edge", [
(0.1, (0, 1)), # 10% probability
(0.2, (1, 2)), # 20% probability
(0.3, (2, 3)), # 30% probability
])
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = edge(a, b), path(b, c)")
ctx.run()
# Inspect results with probabilities
for (prob, (start, end)) in ctx.relation("path"):
print(f"Path {start}→{end}: probability {prob:.3f}")
Output:
Path 0→1: probability 0.100
Path 1→2: probability 0.200
Path 2→3: probability 0.300
Path 0→2: probability 0.200
Path 1→3: probability 0.300
Path 0→3: probability 0.300
Facts Without Probabilities
If provenance supports it, you can omit probabilities (defaults to 1.0):
ctx.add_facts("edge", [
(0, 1), # Implicitly probability 1.0
(1, 2),
])
Mutual Exclusion (Disjunctions)
When facts are mutually exclusive (only one can be true), specify disjunctions to get correct probabilities:
ctx = scallopy.ScallopContext(provenance="topkproofs", k=3)
ctx.add_relation("color", (int, str))
# Object 0 can be blue OR green (mutually exclusive)
ctx.add_facts("color", [
(0.7, (0, "blue")),
(0.3, (0, "green")),
], disjunctions=[
[0, 1] # Indices 0 and 1 are mutually exclusive
])
# Object 1 can be red OR yellow
ctx.add_facts("color", [
(0.6, (1, "red")),
(0.4, (1, "yellow")),
], disjunctions=[
[2, 3] # Indices 2 and 3 are mutually exclusive
])
Important: The disjunctions parameter uses indices into the facts list being added. Each disjunction is a list of indices that are mutually exclusive.
Multiple disjunction groups:
ctx.add_facts("relation", [
(0.5, (0,)), # Index 0
(0.5, (1,)), # Index 1
(0.3, (2,)), # Index 2
(0.7, (3,)), # Index 3
], disjunctions=[
[0, 1], # Facts 0 and 1 are mutually exclusive
[2, 3], # Facts 2 and 3 are mutually exclusive
])
Loading Facts from CSV
For large datasets, load facts directly from CSV files:
ctx = scallopy.ScallopContext(provenance="minmaxprob")
# Load CSV with implicit probability 1.0
ctx.add_relation("edge", (int, int), load_csv="edges.csv")
CSV format with probabilities:
0.9,0,1
0.8,1,2
0.7,2,3
# First column is probability
ctx.add_relation("edge", (int, int), load_csv="edges_prob.csv")
For advanced CSV options, see ScallopContext documentation.
Differentiable Provenance
Differentiable provenances integrate with PyTorch, enabling gradient-based learning over symbolic reasoning.
Basic Setup with PyTorch
import torch
import scallopy
ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("edge", (int, int))
# Add facts with torch tensors
ctx.add_facts("edge", [
(torch.tensor(0.9), (0, 1)),
(torch.tensor(0.8), (1, 2)),
(torch.tensor(0.7), (2, 3)),
])
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = edge(a, b), path(b, c)")
ctx.run()
# Results are tensors with gradients
for (prob_tensor, (start, end)) in ctx.relation("path"):
print(f"Path {start}→{end}: {prob_tensor}")
Forward Functions for Neural Networks
The most common pattern is using forward_function() to create differentiable modules:
import torch
import scallopy
# Create context and define program
ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("digit_1", int, range(10))
ctx.add_relation("digit_2", int, range(10))
ctx.add_rule("sum_2(a + b) = digit_1(a) and digit_2(b)")
# Create forward function
forward = ctx.forward_function("sum_2", list(range(19)))
# Use in training loop
digit_1 = torch.softmax(torch.randn((16, 10), requires_grad=True), dim=1)
digit_2 = torch.softmax(torch.randn((16, 10), requires_grad=True), dim=1)
# Forward pass through Scallop
sum_2 = forward(digit_1=digit_1, digit_2=digit_2)
# Backward pass computes gradients
loss = torch.nn.BCELoss()(sum_2, ground_truth)
loss.backward()
# Gradients flow back to digit_1 and digit_2
Disjunctions with PyTorch
Mutual exclusion works with differentiable provenances too:
ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("obj_color", (int, str))
# Object colors with mutual exclusion
ctx.add_facts("obj_color", [
(torch.tensor(0.99), (0, "blue")),
(torch.tensor(0.01), (0, "green")),
], disjunctions=[[0, 1]])
ctx.add_facts("obj_color", [
(torch.tensor(0.86), (1, "blue")),
(torch.tensor(0.14), (1, "green")),
], disjunctions=[[0, 1]])
Stable Fact IDs for Debugging
The difftopkproofsdebug provenance supports user-provided stable fact IDs for tracking and debugging:
import torch
import scallopy
ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("edge", (int, int))
# Add facts with stable IDs
ctx.add_facts("edge", [
((torch.tensor(0.9), 1), (0, 1)), # Fact ID = 1
((torch.tensor(0.8), 2), (1, 2)), # Fact ID = 2
((torch.tensor(0.7), 3), (2, 3)), # Fact ID = 3
])
Fact ID requirements:
- Start from 1 (not 0)
- Contiguous (no gaps)
- Unique across all facts
Use cases:
- Debugging: Trace which facts contributed to conclusions
- HNLE MCP: Retract facts by stable ID
- Provenance auditing: Track data lineage
For detailed usage, see Debugging Probabilistic Programs.
Custom Provenance
For advanced use cases, you can implement custom provenance semirings in Python.
Built-in Python Provenances
Scallopy includes Python-implemented provenances:
ctx = scallopy.ScallopContext(provenance="diffaddmultprob2")
# Equivalent to diffaddmultprob but implemented in Python
Available Python provenances:
"diffaddmultprob2"- Add-mult probability"diffnandmultprob2"- NAND-mult probability"diffmaxmultprob2"- Max-mult probability
Custom Provenance Objects
You can create custom provenance semirings by subclassing ScallopProvenance:
from scallopy import ScallopProvenance
class MyCustomProvenance(ScallopProvenance):
def __init__(self):
super().__init__()
def base(self, tag):
# Define base tagging
return tag
def add(self, tag1, tag2):
# Define disjunction (OR)
return tag1 + tag2
def mult(self, tag1, tag2):
# Define conjunction (AND)
return tag1 * tag2
# Use custom provenance
ctx = scallopy.ScallopContext(custom_provenance=MyCustomProvenance())
Provenance semiring operations:
base(tag)- Tag a base factadd(t1, t2)- Combine alternative derivations (disjunction)mult(t1, t2)- Combine rule body atoms (conjunction)negate(tag)- Negate a tag (optional, for negation support)
Common Patterns
Pattern 1: Choosing the Right Provenance
Decision tree:
-
Do you need probabilities?
- No → Use
"unit"(fastest) - Yes → Continue
- No → Use
-
Do you need gradients for ML?
- No → Continue to #3
- Yes → Continue to #4
-
Non-differentiable probabilistic:
- Fast bounds okay? →
"minmaxprob" - Need exact probability? →
"topkproofs"with k=5-10 - Need all proofs? →
"probproofs"(expensive)
- Fast bounds okay? →
-
Differentiable for ML:
- Standard case →
"difftopkproofs"with k=3-5 - Need negation/aggregation →
"difftopbottomkclauses" - Debugging needed →
"difftopkproofsdebug"
- Standard case →
Pattern 2: Incremental Fact Addition
Add facts incrementally during execution:
ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.add_relation("edge", (int, int))
# Add initial facts
ctx.add_facts("edge", [(0.9, (0, 1))])
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.run()
# Add more facts later
ctx.add_facts("edge", [(0.8, (1, 2))])
ctx.run() # Re-run with new facts
Pattern 3: Batched Facts with Input Mappings
For ML applications, use input mappings to define the domain:
ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
# Define relation with input mapping (domain)
ctx.add_relation("digit", int, input_mapping=range(10))
# Add batched facts (probability distribution over domain)
digit_probs = torch.softmax(torch.randn(10), dim=0)
ctx.add_facts("digit", digit_probs)
Input mapping defines the expected domain, and facts can be provided as:
- Tensor of shape
(domain_size,)- probability distribution - List of tuples - sparse facts
Pattern 4: Provenance for Different Reasoning Tasks
Knowledge graph reasoning:
ctx = scallopy.ScallopContext(provenance="topkproofs", k=10)
# Need exact probabilities for multi-hop reasoning
Neurosymbolic learning:
ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
# Gradients for training, K=3 for efficiency
Approximate inference:
ctx = scallopy.ScallopContext(provenance="minmaxprob")
# Fast bounds for large-scale applications
Debugging and explanation:
ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=5)
# Stable fact IDs for tracing derivations
Pattern 5: WMC Optimization
Enable WMC with disjunctions for better probability computation:
ctx = scallopy.ScallopContext(
provenance="topkproofs",
k=5,
wmc_with_disjunctions=True # Better handling of disjunctions
)
This improves probability computation when you have many mutually exclusive facts.
Summary
- Set provenance when creating
ScallopContext(provenance="...") - 18 provenance types available: discrete, probabilistic, differentiable
- Add probabilistic facts with
add_facts(relation, [(prob, tuple), ...]) - Mutual exclusion specified via
disjunctionsparameter - Differentiable provenances integrate with PyTorch for gradient-based learning
- Stable fact IDs available in
difftopkproofsdebugfor debugging - Custom provenances possible by subclassing
ScallopProvenance
For more details:
- Provenance Concepts - Theory and semiring framework
- Provenance Library - All 18 provenance types explained
- Debugging Proofs - Using difftopkproofsdebug
- ScallopContext API - Complete API reference
- Creating Modules - PyTorch integration for ML