Crash Course
Welcome to Scallop! This crash course will get you started with Scallop in about 15 minutes. You’ll learn the basics of logic programming, probabilistic reasoning, and Python integration.
What is Scallop?
Scallop is a DataLog-based language that combines three powerful paradigms:
- Logic Programming: Write declarative rules to derive new facts from existing ones
- Probabilistic Reasoning: Attach probabilities to facts and track uncertainty through computations
- Differentiable Computing: Integrate with machine learning frameworks like PyTorch for neurosymbolic AI
Scallop is built on a Provenance Semiring framework that tracks how conclusions are derived. This means you can not only compute answers, but also understand why those answers exist and how probable they are.
Common use cases:
- Knowledge graph reasoning
- Probabilistic databases
- Neurosymbolic AI (combining neural networks with symbolic logic)
- Program analysis
- Question answering with uncertainty
Let’s dive in!
Installation
Before you begin, make sure you have Scallop installed:
For Command-Line Programs
Install the Scallop CLI tools (scli, sclrepl):
From binary releases:
# Download from https://github.com/scallop-lang/scallop/releases
# Or build from source:
git clone https://github.com/scallop-lang/scallop.git
cd scallop
cargo build --release
# Binaries in target/release/
Verify installation:
scli --version
# Output: scli 0.2.5
For Python Integration
Install scallopy for Python:
pip install scallopy
Verify installation:
import scallopy
print(scallopy.__version__)
For complete installation instructions, see Scallop CLI and Getting Started with Scallopy.
Your First Scallop Program
The best way to learn is by example. Let’s start with a classic problem: computing the transitive closure of a graph.
The Problem
Suppose we have a graph with edges connecting nodes:
- Node 0 connects to node 1
- Node 1 connects to node 2
- Node 2 connects to node 3
We want to find all paths in this graph (not just direct edges).
The Scallop Solution
Create a file called edge_path.scl:
rel edge = {(0, 1), (1, 2), (2, 3)}
rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b) and edge(b, c)
query path
Let’s break this down line by line:
Line 1: We declare facts about edges using set notation. This defines three edges in our graph.
Line 3: The first rule says “there’s a path from a to b if there’s an edge from a to b”. This handles direct connections.
Line 4: The second rule says “there’s a path from a to c if there’s a path from a to b AND an edge from b to c”. This is the recursive case that builds longer paths.
Line 6: We query all paths to see the results.
Running the Program
Save the file and run it with the Scallop interpreter:
scli edge_path.scl
You’ll see the output:
path: {(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)}
Scallop found all six paths in the graph! Notice how it computed the transitive closure automatically using the recursive rules.
Key Concepts
- Relations: Like
edgeandpath- they hold sets of tuples - Facts: Individual data points like
(0, 1) - Rules: Logical statements with
=that derive new facts - Queries: Ask Scallop to show you the results
Adding Probabilities
Now let’s make things more interesting by adding probabilities. We’ll model rolling two dice and computing the maximum value.
Probabilistic Facts
Create a file called double_dice.scl:
rel first_dice = {
0.166::1;
0.166::2;
0.166::3;
0.166::4;
0.166::5;
0.166::6;
}
rel second_dice = {
0.166::1;
0.166::2;
0.166::3;
0.166::4;
0.166::5;
0.166::6;
}
rel result(x) = first_dice(x), x > 3
rel result(y > x ? y : x) = first_dice(x), x <= 3, second_dice(y)
query result
Understanding the Syntax
Probabilistic facts use the :: operator: probability::value
Semicolons (;) indicate mutual exclusion - the die can only show one number at a time. This is called an annotated disjunction.
The logic:
- If the first die shows > 3, that’s our result (we don’t need the second die)
- If the first die shows ≤ 3, we take the maximum of both dice
Running with Probabilities
scli --provenance minmaxprob double_dice.scl
The --provenance minmaxprob flag tells Scallop to track probabilities using the min-max provenance (a conservative probability bound).
You’ll see results like:
result: {0.166::(4), 0.166::(5), 0.416::(6), 0.083::(3), ...}
Each result has a probability! For example, getting a 6 has probability ~0.416 (41.6%).
Key Probabilistic Concepts
- Tagged facts:
probability::factattaches probabilities to data - Annotated disjunctions:
;separator for mutually exclusive alternatives - Provenance: The tracking method (we’ll explore more types later)
Python Integration
Scallop really shines when integrated with Python for machine learning applications. Let’s see how to use the Python API.
Setting Up
First, install scallopy:
pip install scallopy
Your First Python Program
Create edge_path_prob.py:
from scallopy import ScallopContext
# Create a context with probabilistic reasoning
ctx = ScallopContext(provenance="minmaxprob")
# Define the relation schema
ctx.add_relation("edge", (int, int))
# Add probabilistic facts
ctx.add_facts("edge", [
(0.1, (0, 1)), # 10% chance of edge 0→1
(0.2, (1, 2)), # 20% chance of edge 1→2
(0.3, (2, 3)), # 30% chance of edge 2→3
])
# Add rules
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = edge(a, b), path(b, c)")
# Run the program
ctx.run()
# Inspect results
for (probability, (start, end)) in ctx.relation("path"):
print(f"Path {start}→{end}: probability {probability:.3f}")
Run it:
python edge_path_prob.py
Output:
Path 0→1: probability 0.100
Path 1→2: probability 0.200
Path 2→3: probability 0.300
Path 0→2: probability 0.200
Path 1→3: probability 0.300
Path 0→3: probability 0.300
Understanding the API
ScallopContext is the main interface for Scallop in Python:
ScallopContext(provenance="...")- Create a context with specified provenancectx.add_relation(name, types)- Declare a relation’s schemactx.add_facts(relation, [(prob, tuple), ...])- Add probabilistic factsctx.add_rule(rule_string)- Add logical rulesctx.run()- Execute the programctx.relation(name)- Get results as a list of(probability, tuple)pairs
PyTorch Integration
Scallop can integrate directly with PyTorch for differentiable reasoning! Here’s a taste:
import torch
import scallopy
# Create a differentiable module
sum_2 = scallopy.Module(
provenance="difftopkproofs", # Differentiable provenance
program="rel sum_2(a + b) = digit_a(a) and digit_b(b)",
input_mappings={"digit_a": range(10), "digit_b": range(10)},
output_mapping=("sum_2", range(19))
)
# Use it in a neural network
class MNISTAdder(torch.nn.Module):
def __init__(self):
super().__init__()
self.digit_classifier = torch.nn.Linear(784, 10) # Neural digit classifier
self.scallop_reasoner = sum_2 # Symbolic addition
def forward(self, img1, img2):
digit1_probs = torch.softmax(self.digit_classifier(img1), dim=-1)
digit2_probs = torch.softmax(self.digit_classifier(img2), dim=-1)
sum_probs = self.scallop_reasoner(digit_a=digit1_probs, digit_b=digit2_probs)
return sum_probs
The neural network learns to classify digits, and Scallop handles the logical reasoning (addition) - all with gradient flow for end-to-end training!
Next Steps
Congratulations! You’ve learned the basics of Scallop. Here’s where to go next:
Learn More Language Features
- Language Reference - Comprehensive guide to Scallop’s syntax
- Relations and Facts - Data modeling
- Writing Rules - Logic programming patterns
- Recursive Rules - Powerful recursive reasoning
- Aggregations - count, sum, min, max, and more
- Algebraic Data Types - Structured data and pattern matching
Dive Into Probabilistic Programming
- Probabilistic Programming - Master uncertainty reasoning
- Provenance - How probability tracking works
- Proofs - Understanding derivations
- Provenance Library - All 18 provenance types explained
- Logic and Probability - Combining symbolic and probabilistic reasoning
Python and Machine Learning
- Scallopy - Python API and PyTorch integration
- Getting Started - Setup and basics
- Scallop Context - The core API
- Creating Modules - PyTorch integration
- Configuring Provenance - Probability tracking in Python
- Debugging Proofs - Understanding derivations
Tools and CLI
- Scallop CLI - Command-line interpreter
- Scallop REPL - Interactive exploration
Example Programs
Check out the examples directory for more:
/examples/datalog/- Classic logic programming examples/etc/scallopy/examples/- Python integration examples
Getting Help
- Documentation: You’re reading it!
- GitHub: github.com/scallop-lang/scallop
- Paper: PLDI 2023 paper on Scallop’s foundations
Quick Reference
Basic Syntax
// Facts
rel edge(0, 1)
rel edge = {(0, 1), (1, 2), (2, 3)}
// Probabilistic facts
rel 0.8::reliable_edge(0, 1)
rel color = {0.7::red; 0.3::blue} // Mutually exclusive
// Rules
rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b) and edge(b, c)
// Queries
query path
query path(0, x) // Specific query
CLI Commands
scli program.scl # Run program
scli --provenance minmaxprob program.scl # With provenance
scli --provenance topkproofs --k 5 prog.scl # Top-5 proofs
sclrepl # Start REPL
Python API
import scallopy
# Context API
ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(0.8, (0, 1))])
ctx.add_rule("path(a, b) = edge(a, b)")
ctx.run()
results = ctx.relation("path")
# Module API (for PyTorch)
module = scallopy.Module(
provenance="difftopkproofs",
program="...",
input_mappings={...},
output_mapping=(...)
)
output = module(input1=tensor1, input2=tensor2)
Happy programming with Scallop!