Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scallop, a Language for Neurosymbolic Programming

Scallop is a language based on DataLog that supports differentiable logical and relational reasoning. Scallop program can be easily integrated in Python and even with a PyTorch learning module. You can also use it as another DataLog solver. This book aims to give both high-level overview of the language usage and also low-level documentation on how each language feature is used.

The following example shows how knowledge base facts, rules, and probabilistic facts recognized from images can operate together.

// Knowledge base facts
rel is_a("giraffe", "mammal")
rel is_a("tiger", "mammal")
rel is_a("mammal", "animal")

// Knowledge base rules
rel name(a, b) :- name(a, c), is_a(c, b)

// Recognized from an image, maybe probabilistic
rel name = {
  0.3::(1, "giraffe"),
  0.7::(1, "tiger"),
  0.9::(2, "giraffe"),
  0.1::(2, "tiger"),
}

// Count the animals
rel num_animals(n) :- n = count(o: name(o, "animal"))

Table of Content

Please refer to the side-bar for a detailed table of content. At a high-level, we organize this book into the following 5 sections:

Installation and Crash Course

Installation gives instructions on how to install the Scallop on your machine. Crash Course gives a quick introduction to what the language is and how it is used. Both sections are designed so that you can start quickly with Scallop.

Scallop and Logic Programming

Scallop and Logic Programming aims to give you a detailed introduction on the language. It introduces language features such as relational programming, negation and aggregation, queries, foreign constructs, and etc. Reading through all of these you should be well-versed in Scallop’s core functionality and you will be able to use Scallop as a Datalog engine.

type fib(bound x: i32, y: i32)
rel fib = {(0, 1), (1, 1)}
rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x > 1
query fib(10, y)

Scallop and Probabilistic Programming

Scallop and Probabilistic Programming introduces the probabilistic side of Scallop. You will learn to tag facts with probabilities, its underlying algorithms and frameworks, and additional programming constructs for probabilistic semantics. By the end of this section, you will be familiar with using Scallop as a probabilistic programming language.

rel attr = { 0.99::(OBJECT_A, "blue"), 0.01::(OBJECT_B, "red"), ... }
rel relate = { 0.01::(OBJECT_A, "holds", OBJECT_B), ... }

Scallopy and Neurosymbolic Programming

Scallopy and Neurosymbolic Programming goes into the heart of Scallop to introduce applying Scallop to write Neurosymbolic applications. Neurosymbolic methods are for methods that have both neural and logical components. For this, we are going to use the Python binding of Scallop, scallopy, to integrate with machine learning libraries such as PyTorch. This section will be describing the API of scallopy.

sum_2 = scallopy.Module(
  program="""type digit_1(i32), digit_2(i32)
             rel sum_2(a + b) = digit_1(a) and digit_2(b)""",
  input_mappings={"digit_1": range(10), "digit_2": range(10)},
  output_mapping=("sum_2", range(19)))

For Developers

For Developers discusses how developers and researchers who are interested in extending Scallop can step into the source code of Scallop and program extensions.

Installation

There are many ways in which you can use Scallop, forming a complete toolchain. We specify how to installing the toolchain from source. The following instructions assume you have access to the Scallop source code and have basic pre-requisites installed.

Requirements

  • Rust - nightly 2023-03-07 (please visit here to learn more about Rust nightly and how to install them)
  • Python 3.7+ (for connecting Scallop with Python and PyTorch)

Scallop Interpreter

The interpreter of Scallop is named scli. To install it, please do

$ make install-scli

From here, you can use scli to test and run simple programs

$ scli examples/datalog/edge_path.scl

Scallop Interactive Shell

Scallop Python Interface

Crash Course

Welcome to Scallop! This crash course will get you started with Scallop in about 15 minutes. You’ll learn the basics of logic programming, probabilistic reasoning, and Python integration.

What is Scallop?

Scallop is a DataLog-based language that combines three powerful paradigms:

  1. Logic Programming: Write declarative rules to derive new facts from existing ones
  2. Probabilistic Reasoning: Attach probabilities to facts and track uncertainty through computations
  3. Differentiable Computing: Integrate with machine learning frameworks like PyTorch for neurosymbolic AI

Scallop is built on a Provenance Semiring framework that tracks how conclusions are derived. This means you can not only compute answers, but also understand why those answers exist and how probable they are.

Common use cases:

  • Knowledge graph reasoning
  • Probabilistic databases
  • Neurosymbolic AI (combining neural networks with symbolic logic)
  • Program analysis
  • Question answering with uncertainty

Let’s dive in!


Installation

Before you begin, make sure you have Scallop installed:

For Command-Line Programs

Install the Scallop CLI tools (scli, sclrepl):

From binary releases:

# Download from https://github.com/scallop-lang/scallop/releases
# Or build from source:
git clone https://github.com/scallop-lang/scallop.git
cd scallop
cargo build --release
# Binaries in target/release/

Verify installation:

scli --version
# Output: scli 0.2.5

For Python Integration

Install scallopy for Python:

pip install scallopy

Verify installation:

import scallopy
print(scallopy.__version__)

For complete installation instructions, see Scallop CLI and Getting Started with Scallopy.


Your First Scallop Program

The best way to learn is by example. Let’s start with a classic problem: computing the transitive closure of a graph.

The Problem

Suppose we have a graph with edges connecting nodes:

  • Node 0 connects to node 1
  • Node 1 connects to node 2
  • Node 2 connects to node 3

We want to find all paths in this graph (not just direct edges).

The Scallop Solution

Create a file called edge_path.scl:

rel edge = {(0, 1), (1, 2), (2, 3)}

rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b) and edge(b, c)

query path

Let’s break this down line by line:

Line 1: We declare facts about edges using set notation. This defines three edges in our graph.

Line 3: The first rule says “there’s a path from a to b if there’s an edge from a to b”. This handles direct connections.

Line 4: The second rule says “there’s a path from a to c if there’s a path from a to b AND an edge from b to c”. This is the recursive case that builds longer paths.

Line 6: We query all paths to see the results.

Running the Program

Save the file and run it with the Scallop interpreter:

scli edge_path.scl

You’ll see the output:

path: {(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)}

Scallop found all six paths in the graph! Notice how it computed the transitive closure automatically using the recursive rules.

Key Concepts

  • Relations: Like edge and path - they hold sets of tuples
  • Facts: Individual data points like (0, 1)
  • Rules: Logical statements with = that derive new facts
  • Queries: Ask Scallop to show you the results

Adding Probabilities

Now let’s make things more interesting by adding probabilities. We’ll model rolling two dice and computing the maximum value.

Probabilistic Facts

Create a file called double_dice.scl:

rel first_dice = {
  0.166::1;
  0.166::2;
  0.166::3;
  0.166::4;
  0.166::5;
  0.166::6;
}

rel second_dice = {
  0.166::1;
  0.166::2;
  0.166::3;
  0.166::4;
  0.166::5;
  0.166::6;
}

rel result(x) = first_dice(x), x > 3
rel result(y > x ? y : x) = first_dice(x), x <= 3, second_dice(y)

query result

Understanding the Syntax

Probabilistic facts use the :: operator: probability::value

Semicolons (;) indicate mutual exclusion - the die can only show one number at a time. This is called an annotated disjunction.

The logic:

  • If the first die shows > 3, that’s our result (we don’t need the second die)
  • If the first die shows ≤ 3, we take the maximum of both dice

Running with Probabilities

scli --provenance minmaxprob double_dice.scl

The --provenance minmaxprob flag tells Scallop to track probabilities using the min-max provenance (a conservative probability bound).

You’ll see results like:

result: {0.166::(4), 0.166::(5), 0.416::(6), 0.083::(3), ...}

Each result has a probability! For example, getting a 6 has probability ~0.416 (41.6%).

Key Probabilistic Concepts

  • Tagged facts: probability::fact attaches probabilities to data
  • Annotated disjunctions: ; separator for mutually exclusive alternatives
  • Provenance: The tracking method (we’ll explore more types later)

Python Integration

Scallop really shines when integrated with Python for machine learning applications. Let’s see how to use the Python API.

Setting Up

First, install scallopy:

pip install scallopy

Your First Python Program

Create edge_path_prob.py:

from scallopy import ScallopContext

# Create a context with probabilistic reasoning
ctx = ScallopContext(provenance="minmaxprob")

# Define the relation schema
ctx.add_relation("edge", (int, int))

# Add probabilistic facts
ctx.add_facts("edge", [
  (0.1, (0, 1)),  # 10% chance of edge 0→1
  (0.2, (1, 2)),  # 20% chance of edge 1→2
  (0.3, (2, 3)),  # 30% chance of edge 2→3
])

# Add rules
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = edge(a, b), path(b, c)")

# Run the program
ctx.run()

# Inspect results
for (probability, (start, end)) in ctx.relation("path"):
  print(f"Path {start}→{end}: probability {probability:.3f}")

Run it:

python edge_path_prob.py

Output:

Path 0→1: probability 0.100
Path 1→2: probability 0.200
Path 2→3: probability 0.300
Path 0→2: probability 0.200
Path 1→3: probability 0.300
Path 0→3: probability 0.300

Understanding the API

ScallopContext is the main interface for Scallop in Python:

  • ScallopContext(provenance="...") - Create a context with specified provenance
  • ctx.add_relation(name, types) - Declare a relation’s schema
  • ctx.add_facts(relation, [(prob, tuple), ...]) - Add probabilistic facts
  • ctx.add_rule(rule_string) - Add logical rules
  • ctx.run() - Execute the program
  • ctx.relation(name) - Get results as a list of (probability, tuple) pairs

PyTorch Integration

Scallop can integrate directly with PyTorch for differentiable reasoning! Here’s a taste:

import torch
import scallopy

# Create a differentiable module
sum_2 = scallopy.Module(
  provenance="difftopkproofs",  # Differentiable provenance
  program="rel sum_2(a + b) = digit_a(a) and digit_b(b)",
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

# Use it in a neural network
class MNISTAdder(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.digit_classifier = torch.nn.Linear(784, 10)  # Neural digit classifier
    self.scallop_reasoner = sum_2  # Symbolic addition

  def forward(self, img1, img2):
    digit1_probs = torch.softmax(self.digit_classifier(img1), dim=-1)
    digit2_probs = torch.softmax(self.digit_classifier(img2), dim=-1)
    sum_probs = self.scallop_reasoner(digit_a=digit1_probs, digit_b=digit2_probs)
    return sum_probs

The neural network learns to classify digits, and Scallop handles the logical reasoning (addition) - all with gradient flow for end-to-end training!


Next Steps

Congratulations! You’ve learned the basics of Scallop. Here’s where to go next:

Learn More Language Features

Dive Into Probabilistic Programming

Python and Machine Learning

Tools and CLI

Example Programs

Check out the examples directory for more:

  • /examples/datalog/ - Classic logic programming examples
  • /etc/scallopy/examples/ - Python integration examples

Getting Help


Quick Reference

Basic Syntax

// Facts
rel edge(0, 1)
rel edge = {(0, 1), (1, 2), (2, 3)}

// Probabilistic facts
rel 0.8::reliable_edge(0, 1)
rel color = {0.7::red; 0.3::blue}  // Mutually exclusive

// Rules
rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b) and edge(b, c)

// Queries
query path
query path(0, x)  // Specific query

CLI Commands

scli program.scl                           # Run program
scli --provenance minmaxprob program.scl   # With provenance
scli --provenance topkproofs --k 5 prog.scl  # Top-5 proofs
sclrepl                                    # Start REPL

Python API

import scallopy

# Context API
ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(0.8, (0, 1))])
ctx.add_rule("path(a, b) = edge(a, b)")
ctx.run()
results = ctx.relation("path")

# Module API (for PyTorch)
module = scallopy.Module(
  provenance="difftopkproofs",
  program="...",
  input_mappings={...},
  output_mapping=(...)
)
output = module(input1=tensor1, input2=tensor2)

Happy programming with Scallop!

Scallop Language Reference

Scallop is a DataLog-based logic programming language extended with powerful features for modern applications. This section covers the core language constructs and advanced features.

Overview

Scallop extends traditional DataLog with:

  • Probabilistic reasoning - Attach probabilities to facts and track uncertainty
  • Algebraic data types - Define structured data with sum and product types
  • Aggregations - Compute count, sum, max, min, and custom aggregations
  • Negation - Express what is not true
  • Disjunctive heads - Represent choices and alternatives
  • Foreign functions - Integrate Python and external computation
  • Magic set transformation - Optimize query evaluation

Core Language Features

Relations and Facts

Relations are the fundamental data structure in Scallop. Learn about declaring relations, adding facts, and data types:

  • Relations - Declaring and using relations
  • Value Types - Scallop’s type system (integers, floats, strings, etc.)
  • Constants - Named constants for readability

Rules and Logic

Rules define how to derive new facts from existing facts using logical inference:

  • Rules - Basic rule syntax and patterns
  • Recursion - Recursive rules for transitive closure, paths, etc.
  • Negation - Expressing negative conditions
  • Queries - Extracting results

Advanced Data Types

Scallop supports sophisticated type systems for structuring data:

Aggregations and Computation

Compute derived values from collections of facts:

Probability and Provenance

Track uncertainty and trace how conclusions are derived:

Advanced Features

Push the boundaries of logic programming:


Language Philosophy

Declarative Programming

Scallop programs describe what to compute, not how to compute it:

// What: Define transitive closure
rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b), edge(b, c)

// Scallop figures out how to compute it efficiently

Set Semantics

Relations are sets of tuples - order doesn’t matter, duplicates are eliminated:

rel numbers = {1, 2, 3}
rel numbers = {3, 2, 1}  // Same as above
rel numbers = {1, 1, 2}  // Duplicate 1 is ignored

Monotonic Reasoning

Facts can only be added, never removed (except with negation). This enables efficient incremental computation.


Syntax Quick Reference

Relation Declaration

// Declare relation with types
type edge(from: i32, to: i32)

// Declare and add facts
rel edge = {(0, 1), (1, 2), (2, 3)}

Rules

// Basic rule
rel path(a, b) = edge(a, b)

// Rule with multiple conditions
rel path(a, c) = path(a, b), edge(b, c)

// Rule with constraint
rel adult(name) = person(name, age), age >= 18

Aggregation

// Count elements
rel total(n) = n = count(x: numbers(x))

// Sum values
rel sum_ages(s) = s = sum(age: person(_, age))

// Max value
rel oldest(max_age) = max_age = max(age: person(_, age))

Disjunctive Head

// Express choices
rel { heads(); tails() } = coin_flip()

Pattern Matching

// Match on ADT variants
rel is_leaf(t) = case t is Leaf(_)
rel left_child(t, l) = case t is Node(l, _)

Example Programs

Transitive Closure

rel edge = {(0, 1), (1, 2), (2, 3)}

rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b), edge(b, c)

query path
// Result: {(0,1), (0,2), (0,3), (1,2), (1,3), (2,3)}

Probabilistic Graph

rel edge = {0.8::(0, 1), 0.9::(1, 2)}

rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b), edge(b, c)

query path
// Result: {0.8::(0,1), 0.9::(1,2), 0.72::(0,2)}

Family Relations

rel parent = {("alice", "bob"), ("alice", "charlie"), ("bob", "diana")}

rel ancestor(a, d) = parent(a, d)
rel ancestor(a, d) = ancestor(a, p), parent(p, d)

rel sibling(a, b) = parent(p, a), parent(p, b), a != b

query sibling
// Result: {("bob", "charlie"), ("charlie", "bob")}

Language Tools

  • scli - Run .scl programs from the command line (CLI Guide)
  • sclrepl - Interactive REPL for experimentation (REPL Guide)
  • scallopy - Python integration for ML applications (Python Guide)

Further Reading

Relations and Facts

Scallop is a relational and logical programming language. As described in the Wikipedia:

Logic programming is a programming paradigm which is largely based on formal logic. Any program written in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain.

In Scallop, relations are the most fundamental building blocks of program. In the following example, we have declared the type of a relation called edge, using the type keyword:

type edge(a: i32, b: i32)

We say that the name edge is a predicate or a relation. Inside of the parenthesis, we have two arguments, a: i32 and b: i32. Therefore, we have edge being an arity-2 relation, due to it having 2 arguments. For the argument a: i32, we give a name of the field (a) and a type of that argument i32. Here, both of the arguments are of the i32 type, which means signed-integer, 32-bit. For more information on value types, refer to the Value Types section.

The above line only declares the type of the relation but not the content of the relation. The actual information stored in the relations are called facts. Here we define a single fact under the relation edge:

rel edge(0, 1)

Assuming 0 and 1 each denote an ID of a node, this fact declares that there is an edge going from node 0 to node 1. There are two arguments in this fact, matching the arity of this relation. Regardless of the predicate edge, one also simply consider the (0, 1) as a tuple, more specifically, a 2-tuple.

To declare multiple facts, one can simply write multiple single fact declaration using the rel keyword, like

rel edge(0, 1)
rel edge(1, 2)

One can also use the set syntax to declare multiple facts of a relation. The following line reads: “the relation edge contains a set of tuples, including (0, 1) and (1, 2)”:

rel edge = {(0, 1), (1, 2)}

Note that it is possible to declare multiple fact sets for the same relation.

rel edge = {(0, 1), (1, 2)}
rel edge = {(2, 3)}

With the above two lines the edge relation now contains 3 facts, (0, 1), (1, 2), and (2, 3).

Examples of Relations

Boolean and 0-arity Relation

Many things can be represented as relations. We start with the most basic programming construct, boolean. While Scallop allows value to have the boolean type, relations themselves can encode boolean values. The following example contains an arity-0 relation named is_target:

type is_target()

There is only one possible tuple that could form a fact in this relation, that is the empty tuple (). Assuming that we are treating the relation is_target as a set, then if the set contains no element (i.e., empty), then it encodes boolean “false”. Otherwise, if the set contains exactly one (note: it can contain at most one) tuple, then it encodes boolean “true”. Declaring only the type of is_target as above, would assume that the relation is empty. To declare the fact, we can do:

rel is_target()
// or
rel is_target = {()}

Unary Relations

Unary relations are relations of arity 1. We can define unary relations for “variables” as we see in other programming languages. The following example declares a relation named greeting containing one single string of "hello world!".

rel greeting("hello world!")
// or
rel greeting = {("hello world!",)}

Note that for the second way of expressing the fact, we may omit the parenthesis and make it cleaner:

rel greeting = {"hello world!"}

In light of this, we may write the following rule:

rel possible_digit = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Integer Arithmetics as Relations

Integer arithmetics can be represented as relations as well. Consider a simple summation in algebra, a + b = c encodes the sum relationship among two operands (a and b) and their summation (c). Encoded in Scallop, they form arity-3 relations:

type add(op1: i32, op2: i32, result: i32)

Note that, in Scallop, relations are not polymorphic. That is, every relation, no matter declared or inferred, only has one type annotation.

We are working on an update in the future to relax this restriction.

To declare facts of this add relation, such as 3 + 4 = 7, we write

rel add(3, 4, 7) // 3 + 4 = 7

However, you might notice that the add relation is theoretically infinite. That is, there are infinite amount of facts that can satisfy the add relation. There is no way that we can possibly enumerate or declare all the facts. In such case, we resort to declaring rules using foreign functions or predicates, which we will discuss later. For now, let’s use add as an example relation that encodes integer arithmetics.

Terminologies

We have the following terminologies for describing relations.

  • Boolean Relation: arity-0 relation
  • Unary Relation: arity-1 relation
  • Binary Relation: arity-2 relation
  • Ternary Relation: arity-3 relation

Type Inference

Scallop supports Type Inference. One does not need to fully annotate every relation on their types. Types are inferred during the compilation process.

For example, given the following code,

rel edge = {(0, 1), (1, 2)}

we can infer that the relation edge is a binary-relation where both arguments are integers. Note that when integers are specified, they are set default to the type of i32.

Type inference will fail if conflicts are detected. In the following snippet, we have the second argument being 1 as integer and also "1" as string.

rel edge = {(0, 1), (0, "1")}

Having this code will raise the following compile error, suggesting that the types cannot be unified. Note that the following response is generated in sclrepl command line interface.

[Error] cannot unify types `numeric` and `string`, where the first is declared here
  REPL:0 | rel edge = {(0, 1), (0, "1")}
         |                 ^
and the second is declared here
  REPL:0 | rel edge = {(0, 1), (0, "1")}
         |                         ^^^

For more information on values and types, please refer to the next section

Rules

Rules are the fundamental to computation in Scallop. Each rule defines the value and data flowing from some relation to another relation. In the following program, we have defined a few facts for the edge relation. On the second line, we have defined that, for each edge (a, b), there is also a path (a, b). We note that here, a and b are variables instead of constants as we have with defining facts. During computation, the two facts in edge will populate the path relation. This way, we have defined a rule for the path, which is executed during computation.

rel edge = {(0, 1), (1, 2)}
rel path(a, b) = edge(a, b) // (0, 1), (1, 2)

In this section, we talk about how we write rules in Scallop and how intricate computation can be done through it.

Syntax

In general, the basic rules in Scallop are of the form

RULE    ::= rel ATOM = FORMULA
FORMULA ::= ATOM
          | not ATOM
          | CONSTRAINT
          | AGGREGATION
          | FORMULA and FORMULA
          | FORMULA or FORMULA
          | ( FORMULA )

For each rule, we name the atom on the left to be the head of the rule, and the formula on the right to be the body. We read it from right to left: when the body formula holds, the head also holds. The formula might contain atoms, negated atoms, aggregations, conjunction, disjunction, and a few more constructs. For this section, we focus on simple (positive) atom, constraints, and their conjunctions and disjunctions. We will leave the discussion of negation and aggregation to the next sections.

Atom

Simple atoms are of the form RELATION(ARG_1, ARG_2, ...). Similar to facts, we have the relation name followed by a tuple of numerous arguments. Now, the arguments can be of richer forms, involving variables, constants, expressions, function calls, and many more.

Considering the most basic example from above:

rel path(a, b) = edge(a, b)

We have two variables a and b grounded by the edge relation. This means we are treating the variables a and b as source of information, which can be propagated to the head. In this example, the head also contains two variables, both being grounded by the body. Therefore the whole rule is well formed.

In case the head variables are not grounded by the body, such as the following,

rel path(a, c) = edge(a, b)

we would get an error that looks like the following:

[Error] Argument of the head of a rule is ungrounded
  REPL:1 | rel path(a, c) = edge(a, b)
         |             ^

The error message points us to the variable c that has not being grounded in the body.

For basic atoms, such as the ones that the user has defined, can be used to directly ground variables which are directly arguments of the atoms. They can be used to ground other variables or expressions. In the following example, although the rule itself might not make any sense, the variable a is used to ground the expression a + 1. Therefore, the rule is completely valid.

rel output_relation(a, a + 1) = input_relation(a)

In certain cases, expressions can be used to bound variables as well!

rel output_relation(a, b) = input_relation(a, b + 1)

In the above example, the expression b + 1 can be used to derive b, and thus making the variable b grounded. However, this might not be true for other expressions:

rel output_relation(b, c) = input_relation(b + c) // FAILURE

The input_relation can ground the expression b + c directly, however, the two arguments b and c cannot be derived from their sum, as there are (theoretically) infinite amount of combinations. In this case, we will get a compilation failure.

There can be constraints present in atoms as well. For example, consider the following rule:

rel self_edge(a) = edge(a, a)

The atom edge(a, a) in the body grounds only one variable a. But the pattern is used to match any edge that goes from a and to a itself. Therefore, instead of grounding two values representing the “from” and “to” of an edge, we are additionally posing constraint on the type of edge that we are matching. Conceptually, we can view the above rule as the following equivalent rule:

rel self_edge(a) = edge(a, b) and a == b

where there is an additional constraint posed on the equality of a and b. We are going to touch on and and constraints in the upcoming sections.

Disjunction (Or)

The body formula can contain logical connectives such as and, or, not, and implies, used to connect basic formulas such as Atom. In the following example, we are defining that if a is b’s father or mother, then a is b’s parent:

rel parent(a, b) = father(a, b)
rel parent(a, b) = mother(a, b)

In this program, we have divided the derivation of parent into two separate rules, one processing the father relationship and the other processing the mother relationship. This natually form a disjunction (or), as the derivation of parent can come from 2 disjunctive sources. Note that in Scallop (or Datalog in general), the ordering of the two rules does not matter.

Therefore, given that

rel father = {("Bob", "Alice")}
rel mother = {("Christine", "Alice")}

we can derive that the parent relation holding two tuples, ("Bob", "Alice") and ("Christine", "Alice").

The above program can be rewritten into a more compact form that looks like the following:

rel parent(a, b) = father(a, b) or mother(a, b)
// or
rel parent(a, b) = father(a, b) \/ mother(a, b)

We have used an explicit or (\/) keyword to connect the two atoms, father(a, b) and mother(a, b). The \/ symbol, which is commonly seen in the formal logics as the symbol vee (\(\vee\)), is also supported. Notice that written in this way, each branch of the disjunction need to fully bound the variables/expressions in the head.

Conjunction (And)

To demonstrate the use of and, let’s look at the following example computing the relation of grandmother based on father and mother:

rel grandmother(a, c) = mother(a, b) and father(b, c)
// or
rel grandmother(a, c) = mother(a, b) /\ father(b, c)

Notice that the symbol /\ is a replacement for the and operator, which resembles the wedge (\(\wedge\)) symbol seen in formal logics.

As can be seen from the rule, the body grounds three variables a, b, and c. The variables a and b comes from mother and the variables b and c comes from father. Notice that there is one variable, b, in common. In this case, we are joining the relation of mother and father on the variable b.

Constraints

Rule body can have boolean constraints. For example, the conjunctive rule above can be re-written as

rel grandmother(a, c) = mother(a, b) and father(bp, c) and b == bp

Here, we are posing an equality (==) constraint on b and bp. Normally, constraints are such kind of binary expressions involving predicates such as

  • equality and inequality (== and !=)
  • numerical comparisons (<, >, <=, and >=)

Other constructs

There are other constructs available for defining rules, which we continue to discuss in detail in other sections:

Traditional Datalog Syntax

If you are familiar with traditional Datalog, you can have it by swapping the = with :-, and the and to , For example, the rule for defining grandmother can be rewritten as

rel grandmother(a, c) :- mother(a, b), father(b, c)

Values and Types

Scallop has a built-in set of basic value types, following Rust’s naming convention. From there, we have types such as Symbol, DateTime, Entity, and Tensor, which are special types to Scallop.

TypeDescription
i8Signed-integer, 8-bit
i16Signed-integer, 16-bit
i32Signed-integer, 32-bit
i64Signed-integer, 64-bit
i128Signed-integer, 128-bit
isizeSigned size; its size is dependent on the system
u8Unsigned-integer, 8-bit
u16Unsigned-integer, 16-bit
u32Unsigned-integer, 32-bit
u64Unsigned-integer, 64-bit
u128Unsigned-integer, 128-bit
usizeUnsigned size; its size is dependent on the system
f32Floating-point number, 32-bit
f64Floating-point number, 64-bit
boolBoolean
charCharacter
StringVariable-length string
SymbolSymbol
DateTimeDate and time
DurationDuration
EntityEntity
TensorTensor

Integers

Integers are the most basic data-type in Scallop. If not specified, the default integer type that the system will pick is the i32 (signed integer 32-bit) type:

rel edge = {(0, 1), (1, 2)} // (i32, i32)

If an unsigned integer type is specified but a negative number is used in the declared facts, a type inference error will be raised. We demonstrate this in the sclrepl environment:

scl> type my_edge(usize, usize)
scl> rel my_edge = {(-1, -5), (0, 3)}
[Error] cannot unify types `usize` and `signed integer`, where the first is declared here
  REPL:0 | type my_edge(usize, usize)
         |              ^^^^^
and the second is declared here
  REPL:1 | rel my_edge = {(-1, -5), (0, 3)}
         |                 ^^

Primitive operations that can be used along with integers are

  • Comparators:
    • == (equality)
    • != (inequality)
    • > (greater-than)
    • >= (greater-than-or-equal-to)
    • < (less-than)
    • <= (less-than-or-equal-to)
  • Arithmetic operators:
    • + (plus)
    • - (minus/negate)
    • * (mult)
    • / (div)
    • % (mod)

All of the above operations need to operate on two integers of the same type. For instance, you cannot compare an i32 value with a usize value.

Floating Point Numbers

Floating point numbers are supported in Scallop as well. The following example shows the definition of student and their class grades:

type student_grade(name: String, class: String, grade: f32)

rel student_grade = {
  ("alice", "cse 100", 95.2),
  ("bob", "cse 100", 90.8),
}

It is possible derive special floating points such as inf and -inf, though we cannot declare such values directly. For the floating point that is nan (not-a-number), we will omit the whole fact from the database to maintain sanity. Specifically, the derivation of nan is treated as a failure of foreign functions, which we explain in detail here.

All the basic operations that can work on integers would be able to work for floating point numbers as well.

Boolean

Scallop allows the use of boolean values (true and false).

type variable_assign(String, bool)
rel variable_assign = {("a", true), ("b", false)}

We support the following boolean operations:

  • Comparisons
    • == (equality)
    • != (inequality)
  • Logical operations
    • ! (unary negate)
    • && (binary and)
    • || (binary or)
    • ^ (binary xor)

For example, we can have the following code

rel result(a ^ b) = variable_assign("a", a) and variable_assign("b", b) // true

Character

Scallop allows definition of characters such as 'a', '*'. They are single-quoted, and can contain escaped characters such as '\n' (new-line) and '\t' (tab).

type my_chars = {(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o')}

Comparisons operations == and != are available for characters.

String

Scallop support variable length strings of the type String. Strings are declared using the double quote ("), and can contain escaped characters such as \n and \t.

rel greeting = {"Hello World"}

Strings can certainly be compared using == and !=. The main ways for interacting with strings are through foreign functions such as $string_length, $substring, $string_concat, and etc. Please refer to the foreign functions section for more information.

Symbols

Symbols are internally registered strings. They are most commonly created through loading from external files. But they can still be specified using the s-quoted-string notation:

rel symbols = {s"NAME", s"AGE", s"GENDER"}

DateTime and Duration

DateTime and Duration are natively supported data structures by Scallop. We commonly specify DateTime and Duration using their string form. In the following example, we specify the DateTime values using the t-quoted-string notation (t represents time):

rel event_dates = {("enroll", t"2020-01-01"), ("finish", t"2020-03-01")}

The dates will be all transformed into UTC time-zone. When the date part is specified and the time is not specified, we will fill the time 00:00:00 UTC. When the time is specified but the date is not, we will use the current date when the program is invoked. Any reasonable date-time format are acceptable, common ones include

  • t"2019-11-29 08:08:05-08"
  • t"4/8/2014 22:05"
  • t"September 17, 2012 10:09am"
  • t"2014/04/2 03:00:51"
  • t"2014年04月08日"

Durations can be specified using the d-quoted-string notation (d represents duration):

rel event_durations = {("e1", d"12 days"), ("e2", d"15 days 20 seconds")}

The string can contain numbers followed by their units. When specifying durations, the following units are accepted:

  • nanoseconds (n)
  • microseconds (usecs)
  • milliseconds (msecs)
  • seconds (secs)
  • minutes (m)
  • hours (h)
  • days (d)
  • weeks (w)
  • months (M)
  • years (y)

We can operate between Duration and DateTime using simple operations such as + and -:

  • DateTime + Duration ==> DateTime
  • Duration + Duration ==> Duration
  • DateTime - DateTime ==> Duration
  • DateTime - Duration ==> DateTime
  • Duration - Duration ==> Duration

Entity

Entity values are 64-bit unsigned integers created through hashing. They are used to represent pointers of created entities. They cannot be directly created. Rather, they are managed by Scallop through the creation of entities. For example,

type List = Nil() | Cons(i32, List)
const MY_LIST = Cons(1, Cons(2, Nil()))
rel input_list(MY_LIST)
query input_list

The result is then

input_list: {(entity(0x4cd0d9e6652cdfc7))}

Please refer to this section for more informaiton on algebraic data types and entities.

Type Conversions

In Scallop, types can be converted using the as operator. For example, we can have

rel numbers = {1, 2, 3, 4, 5}
rel num_str(n as String) = numbers(n)

to derive the numbers to be {"1", "2", "3", "4", "5"}. In general, we can have all numbers castable to each other. We also have every type being castable to String. For converting String to other types, it undergoes a parsing process. When the parsing does not go through, no result will be returned.

Writing Queries

Consider the following example of classes, students, and enrollments, and that we want to compute the number of students who have enrolled in at least one CS class.

// There are three classes
rel classes = {0, 1, 2}

// Each student is enrolled in a course (Math or CS)
rel enroll = {
  ("tom", "CS"), ("jenny", "Math"), // Class 0
  ("alice", "CS"), ("bob", "CS"), // Class 1
  ("jerry", "Math"), ("john", "Math"), // Class 2
}

// Count how many student enrolls in CS course
rel num_enroll_cs(n) = n := count(s: enroll(s, "CS"))

Normally, executing a program would result in scli outputting every single relation.

classes: {(0), (1), (2)}
num_enroll_cs: {(3)}
enroll: {("alice", "CS"), ("bob", "CS"), ("jenny", "Math"), ...}

However, we might only be interested in the relation named num_enroll_cs. In this case, we write a query using the query keyword:

query num_enroll_cs

In this case, only the relation num_enroll_cs will be output:

num_enroll_cs: {(3)}

Atomic Query

One can also write atomic query if we just want to get a part of the relation. For instance, consider the fibonacci example:

type fib(x: i32, y: i32)
rel fib = {(0, 1), (1, 1)}
rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x <= 10
query fib(8, y) // fib(8, y): {(8, 34)}

In this case, we are just looking at the 8-th fibonacci number, which is 34.

Recursive Rules

One very powerful programming construct with Scallop is to declaratively define recursion. Inside of a rule, if a relational predicate appearing in the head appears in the body, the predicate is recursive. For example, the definition of fibonacci number is recursive:

\[ \text{fib}(x) = \left\{ \begin{array}{ll} \text{fib}(x - 1) + \text{fib}(x - 2), & \text{if}~ x > 1 \\ 1, & \text{otherwise} \end{array} \right. \]

Written in Scallop, we encode the function fib as a binary relation between the integer input and output:

type fib(x: i32, y: i32)

We can define the base cases for \(\text{fib}(0)\) and \(\text{fib}(1)\):

rel fib = {(0, 1), (1, 1)}

Now it comes to the definition of recursive cases, which peeks into \(\text{fib}(x - 1)\) and \(\text{fib}(x - 2)\) and sums them.

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) // infinite-loop

However, when actually executing this, it would not terminate as we are attempting to compute all fibonacci numbers, and there are infinite amount of them. In order to stop it, we can temporarily add a constraint to limit the value of x, so that we only compute the fibonacci number up to 10:

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x <= 10

At the end, we would get a the fib relation to contain the following facts:

fib: {(0, 1), (1, 1), (2, 2), (3, 3), (4, 5), (5, 8), (6, 13), (7, 21), (8, 34), (9, 55), (10, 89)}

As suggested by the result, the 10-th fibonacci number is 89.

Case Study: Graphs and Transitive Closure

Following is one of the most widely known Datalog program: computing the paths inside of a graph. By definition, an edge or a sequence of edges constitute a path. This is reflected by the following two rules:

type edge(i32, i32)

rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b) and edge(b, c)

The first line states that an edge can form a path. The second line states that a path, connected to a new edge, forms a new path. As can be seen from the second line, the relation path appears in both the body and the head, making it a recursive relation.

In this example, suppose we have

rel edge = {(0, 1), (1, 2)}

we would get the set of paths to be

path: {(0, 1), (0, 2), (1, 2)}

Notice that the path (0, 2) is a compound path obtained from joining the two edges (0, 1) and (1, 2).

Relation Dependency

Given a rule with head and body, we say that the predicate appearing in the head depends on the predicates of the atoms appearing in the body. This forms a dependency graph. The above edge-path example would have the following dependency graph:

edge <--- path <---+
            |      |
            +------+

The relation edge depends on nothing, while path depends on edge and also path itself. This forms a loop in the dependency graph. In general, if a program has a dependency graph with a loop, then the program requires recursion. Any relation that is involved in a loop would be a recursive relation.

Notice that we are mostly talking about positive dependency here, as the atoms in the body of the rule are positive atoms (i.e., without annotation of negation or aggregation). In more complex scenarios, there will be negation or aggregation in a rule, which we explain in detail in future sections.

Fixed-point Iteration

The recursion in Scallop happens in fixed-point iteration. In plain terms, the recursion will continue until there is no new fact being derived in an iteration. In hind-sight, the whole Scallop program is executed in a loop. Within one iteration, all of the rules inside of the program are executed. Let us digest the actual execution happens when executing the above edge-path program:

rel edge = {(0, 1), (1, 2), (2, 3)}
rel path(a, b) = edge(a, b)                 // rule 1
rel path(a, c) = path(a, b) and edge(b, c)  // rule 2

Before the first iteration, the edge has already been filled with 3 facts, namely (0, 1), (1, 2), and (2, 3). But the path is empty. Let’s now go through all the iterations:

Iter 0: path = {}
Iter 1: path = {(0, 1), (1, 2), (2, 3)}
       Δpath = {(0, 1), (1, 2), (2, 3)} // through applying rule 1
Iter 2: path = {(0, 1), (1, 2), (2, 3), (0, 2), (1, 3)}
       Δpath = {(0, 2), (1, 3)}         // through applying rule 2
Iter 3: path = {(0, 1), (1, 2), (2, 3), (0, 2), (1, 3), (0, 3)}
       Δpath = {(0, 3)}                 // through applying rule 2
Iter 4: path = {(0, 1), (1, 2), (2, 3), (0, 2), (1, 3), (0, 3)}
       Δpath = {}

In the above note, we also include Δpath, which contains the new paths derived during the current iteration. As can be seen, during iteration 1, paths of length 1 are derived; during iteration 2, paths of length 2 are derived. During iteration 4, there is no more path to be derived, and therefore the Δpath is empty. This tells us that no new facts are derived and the whole fixed-point iteration is stopped, giving us the final result.

Infinite Relations

As we have described in the fixed-point iteration, the recursion will continue until no more fact is derived. However, we are capable of writing rules that are infinite. As shown in the first example:

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2)

gives you an infinite relation as there can always be a new x to be derived. In this case, the fixed-point iteration never stops.

The root cause of this is Scallop’s support for value creationg, i.e., the creation of new values. Typically, database systems work in closed-world assumption, that is, all the items being reasoned about are already there. No computation is done on arbitrarily created values. But in the above example, we have derived x from the grounded expression x - 1, hence creating a new value.

Typically, the way to resolve this is to create bounds on the created values. For example, the rule

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x <= 10

restricts that x cannot be greater than 10. This makes the fixed-point iteration to stop after around 10 iterations.

Other way of getting around with this involve the use of Magic-Set Transformations, which we describe its equivalent in Scallop in a later section.

Negations

Scallop supports negation to be attached to atoms to form negations. In the following example, we are trying to obtain the set of people with no children:

rel person = {"bob", "alice", "christine"} // There are three persons of interest
rel father = {("bob", "alice")}            // Bob is Alice's father
rel mother = {("alice", "christine")}      // Alice is Christine's mother

rel has_no_child(n) = person(n) and not father(n, _) and not mother(n, _)

The last rule basically says that if there is a person n who is neither anyone’s father nor anyone’s mother then the person n has no child. This is indeed what we are going to obtain:

has_no_child: {("christine",)}

It is clear that negations are very helpful in writing such kind of the rules. However, there are many restrictions on negations. We explain in detail such restrictions.

Negation and Variable Grounding

If we look closely to the rule of has_no_child above, we will find that there is an atom person(n) being used in the body. Why can’t we remove it and just say “if one is neither father nor mother then the one has no child”?

rel has_no_child(n) = not father(n, _) and not mother(n, _) // Error: variable `n` is not grounded

The problem is with variable grounding. For the variable n to be appeared in the head, there is no positive atom that grounds it. All we are saying are what n is not, but not what n is. With only “what it is not”, it could be literally anything else in the world.

Therefore, we need to ground it with a positive atom such as person(n). With this rule, we have basically

Stratified Negation

Expanding upon our definition of dependency graph, if a predicate occurs in a negative atom in a body, we say that the predicate of the rule head negatively depends on this predicate. For example, the above has_no_child example has the following dependency graph. Notice that we have marked the positive (pos) and negative (neg) on each edge:

person <--pos-- has_no_child --neg--> father
                      |
                      +-----neg-----> mother

Scallop supports stratified negation, which states that there is never a loop in the dependency graph which involves a negative dependency edge. In other words, if there exists such a loop, the program will be rejected by the Scallop compiler. Consider the following example:

rel is_true() = not is_true() // Rejected

The relation is_true negatively depends on the relation is_true itself, making it a loop containing a negative dependency edge. The error message would show that this program “cannot be stratified”. If we draw the dependency graph of this program, it look like the following:

is_true <---+
   |        |
   +--neg---+

Since there is a loop (is_true -> is_true) and the loop contains a negative edge, this program cannot be stratified.

The reason that stratified negation is named such way is that, if there is no negative dependency edge in a loop, the whole dependency can be decomposed in to strongly connected components, where inside of each strongly connected component (SCC), there is no negative dependency. In other words, the negation has been stratified, so that the negative edge can only happen between SCCs. We call each SCC a stratum, and the collection of them a strata. Any non-recursive program has a dependency graph forming a Directed Acyclic Graph (DAG), and is therefore always stratifiable.

The following program, although containing both negation and recursion, can be stratified:

rel path(a, b) = edge(a, b) and not sanitized(b)
rel path(a, c) = path(a, b) and edge(b, c) and not sanitized(b)

For it, the following dependency graph can be drawn:

sanitized <--neg-- path <----+
                   |  |      |
     edge <--pos---+  +--pos-+

In this program, we have three SCCs (or strata):

  • Stratum 1: {edge}
  • Stratum 2: {sanitized}
  • Stratum 3: {path}

Negative dependency only occurs between stratum 2 and 3. Therefore, the program can be accepted.

Aggregations

Aggregations in Scallop can be viewed as operations that aggregates over multiple facts. Such operations include counting, summation and product, finding min and max, and logical quantifiers such as exists and forall. Aggregations appear in the body of a rule, and can be nested for abbrevity.

As a concrete example, we look at a program which counts over a set of people:

rel person = {"alice", "bob", "christine"}
rel num_people(n) = n := count(p: person(p)) // n = 3

In general, we use the following syntax for aggregation formulas.

R1, R2, ... := AGGREGATOR(V1, V2, ...: FORMULA (where U1, U2, ...: FORMULA)?)

We name R1, ... to be the aggregation result variable, V1, ... to be the binding variable, and the formula inside of the aggregation the body. When the where keyword is used, we have the aggregation associated with explicit group-by clause. Here, we call the set of variables U1, ... as group-by variables. The formula under the where clause is named the group-by body. The binding variables need to be fully grounded by the body formula, and the group-by variables (if presented) need to also be fully grounded by the group-by body. For different types of aggregation, the AGGREGATOR might also change and annotated with different information. The number of result variables, the number of binding variables, and their types differ for each aggregation.

Here is a high-level overview of each supported aggregator and their configurations. In the table, ... is used to denote an arbitrary amount of variables.

AggregatorBinding VariablesResult Variables
countAny...usize
sumNumberthe same as the binding variable
prodNumberthe same as the binding variable
minAnythe same as the binding variables
maxAnythe same as the binding variables
existsAny...bool
forallAny...bool

Below, we elaborate on each aggregators and describe their usages.

Count

To count the number of facts, we can use the count aggregator. Just repeating the examples shown in the beginning:

rel person = {"alice", "bob", "christine"}
rel num_people(n) = n := count(p: person(p)) // n = 3

We are counting the number of persons appear in the person relation. To be more concrete, let’s read out the aggregation formula:

We count the number of p such that p is a person, and assign the result to the variable n.

For count, there could be arbitrary (> 0) number of binding variables which can be typed arbitrarily. It will only have a single result variable which is typed usize. For example, you may count the number of edges:

rel num_edges(n) = n := count(a, b: edge(a, b))

Here, we have two binding variables a and b, meaning that we are counting the number of distinct pairs of a and b.

Note that we can use the syntax sugar for aggregation to omit the repeated n:

rel num_edges = count(a, b: edge(a, b))

Implicit Group-By

With group-by, we may count the number of facts under a pre-defined group. Consider the example where there is a scene with differet colored objects,

rel obj_color = {(0, "red"), (1, "red"), (2, "blue"), (3, "red")}
rel num_obj_per_color(col, num) = num := count(obj: obj_color(obj, col))

As suggested by the facts inside of obj_color, there are 4 objects indexed using 0, 1, 2, 3, each associated with a different color. The object #0, #1, and #3 are red and the object #2 is blue. Therefore, we will get 3 red objects and 1 blue object, as computed in the result of num_obj_per_color:

num_obj_per_color: {("blue", 1), ("red", 3)}

Let’s analyze the rule in detail. We find that we are counting over obj such that the object obj has a certain color col. But col is also a variable occurring in the head of the rule. This is an implicit group-by, in that the variable col is being used as an implicit group-by variable. That is, we are conditioning the counting procedure under each group that is defined by the col variable. Since there are two colors appearing in the obj_color relation, we are performing count for each of the two groups.

In general, if a variable is positively grounded in the body and appear in the head of a parent rule, we call the variable an implicit group-by variable.

Explicit Group-By

In the above example, there is no green colored object. However, how do we know that the number of green object is 0? The result does not seem to address this problem.

The missing piece is a domain of the possible groups. Without explicitly setting the domain, Scallop could only search inside of the database on possible groups. However, we can explicitly tell Scallop about what are the groups. Consider the following rewrite of the above program:

rel colors = {"red", "green", "blue"}
rel obj_color = {(0, "red"), (1, "red"), (2, "blue"), (3, "red")}
rel num_obj_per_color(col, num) = num := count(obj: obj_color(obj, col) where col: colors(col))

With the where clause, we have explicitly declared that col is a group-by variable which is grounded by the colors relation. If we look into the colors relation, we find that there are three possible colors that we care about, red, green, and blue. In this case, we will consider "green" as the third group and try to count the number of green objects – which there are 0:

num_obj_per_color: {("blue", 1), ("green", 0), ("red", 3)}

Sum and Product

We can use the aggregator of sum and product to aggregate multiple numerical values. Consider the following example of sales:

rel sales = {("alice", 1000.0), ("bob", 1200.0), ("christine", 1000.0)}

We can compute the sum of all the sales:

rel total_sales(s) = s := sum[p](sp: sales(p, sp)) // 3200.0
// or
rel total_sales = sum[p](sp: sales(p, sp)) // 3200.0

Notice that the result type of s is the same as the type of the binding variable sp, which is f32 as indicated by the decimals in the definition of sales. Here, the argument variable p is necessary since it is the key to index each sale number. The above rule body is equivalent to the following math formula:

\[ s = \sum_p \text{sale}_p \]

If we do not use the argument variable, we get the following:

rel total_sales_wrong(s) = s := sum(sp: sales(p, sp)) // 2200.0, since the two 1000.0 will be deduplicated without its key

The product aggregator prod can be used in a similar manner as sum.

Min, Max, Argmin, and Argmax

Scallop can compute the minimum or maximum among a set of values. In the following example, we find the maximum grade of an exam:

rel exam_grades = {("a", 95.2), ("b", 87.3), ("c", 99.9)}
rel min_score(m) = m := max(s: exam_grades(_, s)) // 99.9
// or, succinctly
rel min_score = max(s: exam_grades(_, s)) // 99.9

The number (and types) of binding variables can be arbitrary, but the result variables must match the binding variables. In the above case, since s is of type f32, m will be of type f32 as well.

It is also possible to get argmax/argmin. Suppose we want to get the person (along with their grade) who scored the best, we write:

rel best_student(n, s) = (n, s) := max[n](s: exam_grades(n, s))
// or, succinctly
rel best_student = max[n](s: exam_grades(n, s))

Here, we are still finding the maximum score s, but along with max we have specified the “arg” ([n]) which associates with the maximum score. We call n an arg variable for min/max aggregator. The arg variable is grounded by the aggregation body, and can be directly used in the head of the rule.

If we do not care about the grade and just want to know who has the best grade, we can use wildcard _ to ignore the result variable, like

rel best_student(n) = (n, _) := max[n](s: exam_grades(n, s))

Alternatively, we can also use argmax:

rel best_student(n) = n := argmax[n](s: exam_grades(n, s))
// or, succinctly
rel best_student = argmax[n](s: exam_grades(n, s))

Exists and Forall

Logical quantifier such as exists and forall can also be encoded as aggregations. They will return value of boolean as the aggregation result.

Existential Quantifier

Let us start with discussing the easier of the two, exists. Technically, all variables in the body of Scallop rule are existentially quantified. We can use exists aggregation to make it explicit. For example, we can check if there exists an object that is blue:

rel obj_color = {(0, "red"), (1, "green")}
rel has_blue(b) = b := exists(o: obj_color(o, "blue"))

Specifically, we are checking “if there exists an object o such that its color is blue”. The result is being assigned to a variable b. Since there is no blue object, we will get a result of has_blue(false).

In case when we just want the result boolean to be true or false, we can omit the result variables. For example, we can rewrite the recursive case of edge-path transitive closure as

rel path(a, c) = exists(b: path(a, b) and edge(b, c))

We note that this is just a syntax sugar equivalent to the following:

rel path(a, c) = r := exists(b: path(a, b) and edge(b, c)) and r == true

When we want to know the inexistence of something, we can do

rel no_red() = not exists(o: obj_color(o, "red"))

Note that there can be arbitrary amount of binding variables.

Universal Quantifier

We can also have universal quantifier forall. For this, there is a special requirement for universal quantification: the body formula has to be an implies (=>) formula. This restriction is enforced so that all the binding variables have bounds being specified on the left-hand-side of the implies formula. In the following example, we check if all the objects are spherical:

type Shape = CUBE | SPHERE | CONE | CYLINDER
rel object = {0, 1, 2}
rel obj_shape = {(0, CUBE), (1, SPHERE), (2, SPHERE)}
rel target(b) = b := forall(o: object(o) implies obj_shape(o, SPHERE))

Notice that we have a relation which defines the domain of object, suggesting that there are just 3 objects for us to work with. In the aggregation, we are checking “for all o such that o is an object, is the object a sphere?” The result is stored in the variable b and propagated to the target relation.

The reason we need to have an implies formula is that we need to use the left-hand-side of implies to give bounds to the universally quantified variables. Scallop cannot reason about open domain variables.

Note that similar to exists, we can also remove the result variable. The following program derives a boolean (arity-0) relation target denoting whether all the red objects are cubes:

type Shape = CUBE | SPHERE | CONE | CYLINDER
type Color = RED | GREEN | BLUE
rel obj_shape = {(0, CUBE), (1, SPHERE), (2, SPHERE)}
rel obj_color = {(0, RED),  (1, GREEN),  (2, GREEN)}
rel target() = forall(o: obj_color(o, RED) implies obj_shape(o, CUBE)) // {()}

Here, we directly use obj_color to serve as the left-hand-side of the implies. There will be one empty tuple being derived, suggesting that the statement is true.

String Join

If you have multiple facts containing strings and you want to join them together, you can use the string_join aggregator:

rel R = {"hello", "world"}
rel P1(n) = n := string_join(s: R(s)) // P1("helloworld")
rel P2(n) = n := string_join<" ">(s: R(s)) // P2("hello world")

In the above example, we can either directly join, producing the string “helloworld”, or join with separator " ", producing the string “hello world”. Note that the order of the strings in the joined string is determined by the strings. Here, "hello" starts with "h", which is smaller than the "w" in "world", therefore occurring before "world". If you want to specify an explicit order, use the argument variable:

rel R = {(2, "hello"), (1, "world")}
rel P(n) = n := string_join<" ">[i](s: R(i, s)) // P("world hello")

Since we have specified the variable i to be the argument of string_join, it serves to order the tuples. Here, we have (1, "world") and (2, "hello"), so the joined string will be "world hello" instead of "hello world".

Declaring Constants

We can declare constants and give it names. The general syntax is the following:

const NAME (: TYPE)? = CONSTANT

For example, we can define the value of PI:

const PI = 3.1415926

Notice that here we have not specified the type of PI. By default, a float value would resort to the place where the constant is used. If we want to specify a non-default type, we can do

const PI: f64 = 3.1415926

We can also declare multiple constants at a time:

const LEFT = 0, UP = 1, RIGHT = 2, DOWN = 3

Enum Types

We sometimes want to define enum types which contain constant variables. Common examples include RED, GREEN, and BLUE under the Color type, and LEFT, RIGHT, UP under the Action type. These can be achieved by defining enum types:

type Color = RED | GREEN | BLUE
type Action = LEFT | UP | RIGHT | DOWN

Internally, the values such as RED and UP are unsigned integer constants. If not specified, the values start from 0 and goes up 1 at a time.

For example, given the type definition above, RED = 0, GREEN = 1, and BLUE = 2. For Actions, LEFT = 0, UP = 1, and etc. Notice that even when Color and Action are different types, their values can overlap.

One can specify the values of these enum variants by attaching actual numbers to them. In the following example, we have explicitly assigned three values to the colors.

type Color = RED = 3 | GREEN = 5 | BLUE = 7

We can also just set a few of those:

type Color = RED | GREEN = 10 | BLUE

In this case, RED = 0, GREEN = 10, and BLUE = 11. Notice how blue’s value is incremented from GREEN.

Displaying Constants

Constants are just values and many of them are integer values. They are not explicitly associated with any symbols. If you want to display them correctly, we advise you create auxilliary relations storing the mapping from each constant to its string form. For example, we can have

rel color_to_string = {(RED, "red"), (GREEN, "green"), (BLUE, "blue")}

In this case, following the result with color_to_string relation will display their desired meanings properly.

Algebraic Data Type and Entities

Algebraic data types are powerful programming constructs that allows user to define custom data structures and variants. Consider a traditional functional definition of a List:

type IntList = Nil()
             | Cons(i32, List)

We are saying that a IntList can be one of two variants, Nil and Cons:

  • Nil denotes the end of a list;
  • Cons contains the current i32 integer and a continuation of the list.

In this representation, we can represent a list like [1, 2, 3] with Cons(1, Cons(2, Cons(3, Nil()))). This is indeed what we can write in Scallop. We can declare such a list as a constant:

const MY_LIST = Cons(1, Cons(2, Cons(3, Nil())))

In general, we call the type definition of such data structure Algebraic Data Type definitions, or ADT definitions. The name Entity is used to refer to objects of such data types. In the example above, the constant MY_LIST is an entity of the ADT named IntList.

In this section, we describe in detail the definition and use of ADT and Entities. We also touch on the internals.

Defining Algebraic Data Types (ADT)

We use the following syntax to define ADTs:

type TYPE_NAME = VARIANT_NAME(ARG_TYPE_1, ARG_TYPE_2, ...) | ...

An ADT named TYPE_NAME is defined to have multiple (at least 2) named variants with VARIANT_NAME. Each variant holds a tuple of values typed by ARG_TYPE_1, ARG_TYPE_2, etc. We call variants that have no argument terminal variants. Parenthesis are still needed for those variants.

Please note that there cannot be duplicated variant names, either within the same ADT or different ADTs. For example, the following code would result in compilation failure:

type IntList  = Cons(i32, IntList)   | Nil()
type BoolList = Cons(bool, BoolList) | Nil() // Failure: Cons and Nil are already defined

Currently, ADTs do not support generics. In the above case, the IntList and BoolList needs to be defined separately with differently named variants.

Using ADT to represent arithmetic expressions

Common data that can be expressed through ADT could be structured expressions. The following definition describes the abstract syntax tree (AST) of simple arithmetic expressions:

type Expr = Int(i32)        // An expression could be a simple integer,
          | Add(Expr, Expr) // a summation of two expressions
          | Sub(Expr, Expr) // a substraction of two expressions

The following code encodes a simple expression

// The expression (1 + 3) - 5
const MY_EXPR = Sub(Add(Int(1), Int(3)), Int(5))

Using ADT to represent data structures

Data structures such as binary trees can also be represented:

type Tree = Node(i32, Tree, Tree) | Nil()

Here, Node(i32, Tree, Tree) represents a node in a tree holding three things: an integer (i32), a left sub-tree Tree, and a right sub-tree Tree. The other variant Nil represents an empty sub-tree. In this encoding, Node(5, Nil(), Nil()) would be representing a leaf-node holding a number 5.

The following code encodes a balanced binary search tree:

//         3
//      /     \
//    1         5
//  /   \     /   \
// 0     2   4     6
const MY_TREE =
  Node(3,
    Node(1,
      Node(0, Nil(), Nil()),
      Node(2, Nil(), Nil()),
    ),
    Node(5,
      Node(4, Nil(), Nil()),
      Node(6, Nil(), Nil()),
    )
  )

Working with Entities

Entities are most commonly created as constants using the const keyword. Let us revisit the List example and see how we can use the defined constant in our analysis.

type List = Cons(i32, List) | Nil()

const MY_LIST = Cons(1, Cons(2, Cons(3, Nil()))) // [1, 2, 3]

Using Entities in Relations

We can include the constant entities as part of a fact:

rel target(MY_LIST)
query target

As a result of the above program, we are going to get the value of the entity MY_LIST:

target: {(entity(0xff08d5d60a201f17))}

The value is going to be a 64-bit integer encoded in hex. It is a unique identifier for the created entity.

Note that, identical entities are going to have the same identifier. In the following example, MY_LIST_1 and MY_LIST_2 are identical, and therefore their hex identifier are the same.

const MY_LIST_1 = Cons(1, Nil()),
      MY_LIST_2 = Cons(1, Nil()),
      MY_LIST_3 = Cons(2, Nil())

rel lists = {
  (1, MY_LIST_1),
  (2, MY_LIST_2),
  (3, MY_LIST_3),
}

query lists
// lists: {
//   (1, entity(0x678defa0a65c83ab)), // Notice that the entity 1 and 2 are the same
//   (2, entity(0x678defa0a65c83ab)),
//   (3, entity(0x3734567c3d9f8d3f)), // This one is different than above
// }

Decomposing Entities in Rules

To peek into the content of an Entity, we can destruct it using the case-is operator. We look at an example of computing the length of a list:

type length(list: List, len: i32)
rel length(list, 0)     = case list is Nil()
rel length(list, l + 1) = case list is Cons(_, tl) and length(tl, l)

We define a recursive relation length to compute the length of a list. There are two cases. When the list is Nil(), this means the list has already ended. Therefore the list has a length of 0 For the second case, the list is Cons(_, tl). Here, the length of list is the length of tl plus 1.

We can then compute the length of a list by querying the length relationship on a constant list.

query length(MY_LIST, l) // l = 3

Case Study: Decomposing Entities for Pretty-Printing

We can look at more examples of using the case-is operators. The following set of rules pretty-prints expressions:

type Expr = Int(i32) | Add(Expr, Expr) | Sub(Expr, Expr)

type to_string(expr: Expr, str: String)
rel to_string(e, $format("{}", i))           = case e is Int(i)
rel to_string(e, $format("({} + {})", a, b)) = case e is Add(e1, e2) and to_string(e1, a) and to_string(e2, b)
rel to_string(e, $format("({} - {})", a, b)) = case e is Sub(e1, e2) and to_string(e1, a) and to_string(e2, b)

Shown in the example, we have written three to_string rules for pretty-printing the Expr data structure. Each rule correspond to handling exactly one of the variants. For the inductive cases Add and Sub, we have the to_string rule defined recursively so that the sub-expressions are also converted to strings. For pretty-printing, we have used the $format foreign function.

At the end, running the following snippet

const MY_EXPR = Sub(Add(Int(3), Int(5)), Int(1))
query to_string(MY_EXPR, s)

would give the following result, suggesting that the pretty-printed expression is ((3 + 5) - 1)

to_string(MY_EXPR, s): {(entity(0xa97605c2703c6249), "((3 + 5) - 1)")}

Case Study: Checking Regular Expressions

With ADT, we can specify the language of regular expressions (regex) with ease. Let’s consider a very simple regex with union (|) and star (*), while phrases can be grouped together. For example, the regex "a*b" expresses that character a can be repeated arbitrary amount of time (including 0-times), followed by a single b. This regex can be used to match strings like "aaaab" and "b", but not "ba".

Let’s try to define this regex language in Scallop!

type Regex = Char(char)           // a single character
           | Star(Regex)          // the star of a regex
           | Union(Regex, Regex)  // a union of two regexes
           | Concat(Regex, Regex) // concatenation of two regexes

As can be seen, we have defined 4 variants of this regex language. With this, our regex "a*b" can be expressed as follows:

// a*b
const A_STAR_B = Concat(Star(Char('a')), Char('b'))

Now, let’s define the actual semantics of this regex language and write a relation matches to check if the regex matches with a given sub-string. We first setup the types of such relations.

  • input_regex is a unary-relation for holding the regex to be checked against;
  • input_string is a unary-relation for holding the string to be checked against;
  • matches_substr is for checking if a sub-regex r can be matched with the input string between begin and end indices, where end is exclusive;
  • matches is a boolean relation telling whether the A_STAR_B regex matches with the input string or not.
type input_regex(r: Regex)
type input_string(s: String)
type matches_substr(r: Regex, begin: usize, end: usize)
type matches()

The main bulk of the code will then be dedicated to define the matches_substr relation. At a high level, we decompose on each type of regex, and match on sub-strings. The first rule that we are going to write would be for the Char variant.

rel matches_substr(r, i, i + 1) = case r is Char(c) and input_string(s) and string_chars(s, i, c)

The rule suggests that if the regex r is a single character c, then we go into the input string s and find all the index i such that its corresponding character is c. The matched sub-string would start at index i and end at index i + 1. Note that the string_chars relation is a foreign predicate that decomposes the string into characters.

Similarly, we can write the rules for other variants:

// For star; it matches empty sub-strings [i, i) and recursively on sub-regex
rel matches_substr(r, i, i) = case r is Star(_) and input_string(s) and string_chars(s, i, _)
rel matches_substr(r, b, e) = case r is Star(r1) and matches_substr(r, b, c) and matches_substr(r1, c, e)

// For union; any string that matches left or right sub-regex would match the union
rel matches_substr(r, b, e) = case r is Union(r1, r2) and matches_substr(r1, b, e)
rel matches_substr(r, b, e) = case r is Union(r1, r2) and matches_substr(r2, b, e)

// For concat; we need strings to match in a consecutive matter
rel matches_substr(r, b, e) = case r is Concat(r1, r2) and matches_substr(r1, b, c) and matches_substr(r2, c, e)

Lastly, we add the rule to derive the final matches relation. Basically, it checks if the regex matches the start-to-end of the input string

rel matches() = input_regex(r) and input_string(s) and matches_substr(r, 0, $string_length(s))

Let us test the result!

rel input_regex(A_STAR_B)
rel input_string("aaaab")
query matches // {()}

Dynamically Creating Entities

There are cases where we want to create new entities during the deductive process. This is done through the new keyword followed by the entity to create. Suppose we have the definition of List and some pretty-printing code for it:

type List = Cons(i32, List) | Nil()

rel to_string_2(l, "]")                      = case l is Nil()
rel to_string_2(l, $format("{}]", i))        = case l is Cons(i, Nil())
rel to_string_2(l, $format("{}, {}", i, ts)) = case l is Cons(i, tl) and case tl is Cons(_, _) and to_string_2(tl, ts)
rel to_string(l, $format("[{}", tl))         = to_string_2(l, tl)

The following example shows that, given an input list l, we generate a result list Cons(1, l).

type input_list(List)
rel result_list(new Cons(1, l)) = input_list(l)

Given an actual list defined as a constant, we will be able to specify that the constant is the input list:

const MY_INPUT_LIST = Cons(2, Cons(3, Nil()))
rel input_list(MY_INPUT_LIST)

Now, let’s visualize the results!

rel input_list_str(s) = to_string(MY_INPUT_LIST, s)
rel result_list_str(s) = result_list(l) and to_string(l, s)

query input_list_str  // [2, 3]
query result_list_str // [1, 2, 3]

As can be seen, through the new operator, we have essentially created a new list containing the element 1. We note that the rule for result_list is not recursive. In general, extra care needs to be taken to ensure that the program does not go into infinite loop.`

Case Study: Creating Entities for Equality Saturation

In this case study we look at the problem of equality saturation. Given an symbolic expression, there might be ways to simplify it, which are defined through rewrite rules. Notice that after simplification, the program should be equivalent to the input. The problem is challenging as there might be multiple ways to apply the rewrite rules. How do we then systematically derive the simplest equivalent program?

A simple example here is the symbolic arithmetic expression language, with constant, variables, and summation rule:

type Expr = Const(i32) | Var(String) | Add(Expr, Expr)

One example expression that we can express in this language would be

const MY_EXPR = Add(Add(Const(-3), Var("a")), Const(3)) // (-3 + a) + 3

For visualization, we write a to_string function

rel to_string(p, i as String) = case p is Const(i)
rel to_string(p, v)           = case p is Var(v)
rel to_string(p, $format("({} + {})", s1, s2)) =
  case p is Add(p1, p2) and to_string(p1, s1) and to_string(p2, s2)

If we query on to_string for MY_EXPR, we would get

query to_string(MY_EXPR, s) // s = "((-3 + a) + 3)"

Now let us deal with the actual simplification. The expression (-3 + a) + 3 could be simplified to just a, as the -3 and 3 cancels out. The way to do the simplification is to write two things:

  1. rewrite rules in the form of equivalence relation;
  2. the weight function giving each expression a weight to tell which expression is simpler.

For this, the following set of relations needs to be defined.

type input_expr(expr: Expr)
type equivalent(expr_1: Expr, expr_2: Expr)
type weight(expr: Expr, w: i32)
type simplest(expr: Expr)

Note that we need set a prior knowledge on equivalent: the expr_1 is always more complex than the expr_2. This is to prevent the simplification to go to arbitrary direction and result in infinite-loop. In such case, equivalent would not be commutative. Let us start with equivalent and define its basic property of identity and transitivity:

// Identity
rel equivalent(e, e) = case e is Const(_) or case e is Var(_) or case e is Add(_, _)

// Transitivity
rel equivalent(e1, e3) = equivalent(e1, e2) and equivalent(e2, e3)

Now, we can write the rewrite rules. The first one we are going to write states that, if e1 and e1p are equivalent and e2 and e2p are equivalent, their additions (Add(e1, e2) and Add(e1p, e2p)) are equivalent too.

// e1 == e1p, e2 == e2p ==> (e1 + e2) == (e1p + e2p)
rel equivalent(e, new Add(e1p, e2p)) = case e is Add(e1, e2) and equivalent(e1, e1p) and equivalent(e2, e2p)

The next rule states that Addition is commutative, such that Add(a, b) is equivalent to Add(b, a):

// (a + b) == (b + a)
rel equivalent(e, new Add(b, a)) = case e is Add(a, b)

We also have a rule for associativity:

// (a + (b + c)) == ((a + b) + c)
rel equivalent(e, new Add(new Add(a, b), c)) = case e is Add(a, Add(b, c))

A rule for simplifying adding summation identity 0:

// a + 0 = a
rel equivalent(e, a) = case e is Add(a, Const(0))

A rule for reducing two constants addition:

rel equivalent(e, Const(a + b)) = case e is Add(Const(a), Const(b))

Now we have 5 rewrite-rules in place, let us define how to compute the weight of each expression. The leaf nodes (Var and Const) have weight of 1, and the addition have the weight from left and right sub-expr added together plus 1.

rel weight(e, 1) = case e is Var(_) or case e is Const(_)
rel weight(e, l + r + 1) = case e is Add(a, b) and weight(a, l) and weight(b, r)

Lastly, we use the aggregation to find the equivalent programs with the minimum weight, which is our definition of the “simplest” program. Note that we have used an argmax aggregation denoted by min[p] here:

rel best_program(p) = p := argmin[p](w: input_expr(e) and equivalent(e, p) and weight(p, w))

If we query for the best program and turn it into string, we will get our expected output, a single variable "a"!

rel best_program_str(s) = best_program(p) and to_string(p, s)
query best_program_str // {("a")}

Parsing Entities from String

Scallop provides foreign functions and predicates for dynamically parsing entities from string input. Consider the following example:

type Expr = Const(f32) | Add(Expr, Expr)

rel expr_str = {"Add(Const(1), Const(2.5))"}

Let us say that we want to parse an expression from the expr_str, we can do the following:

rel expr($parse_entity(s)) = expr_str(s)

Here, we are using the foreign function of $parse_entity. We would get the following result:

query expr
// expr: {(entity(0xadea13a2621dd155))}

On-Demand Relations

There are often times relations/predicates where you know that would not need to be fully computed. This would include the infinite relations. This means that, we want to define such relations without worrying about its infinite-ness while also being able to supply it with information needed for the computation. Such relations are called On-Demand Relations.

We show here one on-demand relation which is the fibonacci number relation:

type fib(bound x: i32, y: i32)
rel fib = {(0, 1), (1, 1)}
rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x > 1
query fib(10, y)

Normally, if we define the fibonacci relation, it would only contain the second and the third line, which respectively defines the base cases and the recursive cases. However, as we all know, there are infinitely many fibonacci numbers and it would not be wise to compute the relation fully. Usually, we want to infer some fact inside of the infinite relation, based on some inputs. In this case, as noted on the last line, we want to know the 10th fibonacci number.

It is hinted that when we want to compute a fibonacci number, we usually supply the x value, in this case, 10, in order to get the value y. This is exactly what we tell the compiler in the first line. Inside of the type declaration, we provide an additional adornment to each of the variables.

  • x is adorned by bound, denoting that it is treated as an input (or bounded) variable to the relation
  • y is not adorned by anything, suggesting that it is a free variable which will be computed by the rules of the relation

Getting x based on y is out-of-scope in this tutorial.

By providing the adornments (with at least one bound), we are telling Scallop that the relation should be computed on-demand. From there, Scallop will search for every place where the relation is demanded, and restrict the computation of the relation only on the demand.

In our case, there is just one single place where the fib relation is demanded (where x is 10). Therefore, Scallop will compute only the necessary facts in order to derive the final solution.

Adornments

There are only two kinds of adornments:

  • bound
  • free

Annotating whether the variable is treated as bounded variable or free variable.

If an adornment is not provided on a variable, then it is by default a free variable. In this sense, all normal relations without any adornment would be treated as non-on-demand relations.

When at least one bound adornment is annotated on a relation type declaration, we know that the relation needs to be computed on-demand.

More Examples

On-Demand Path

Let’s go back to our example of edge-and-path. Consider that there is a huge graph, but we only want to know a path ending at a specific node:

rel path(a, b) = edge(a, b) or (edge(a, c) and path(c, b))
query path(a, 1024)

In this case, enumerating all paths would be strictly more expensive than just exploring from the end point. Therefore, we add an adornment to the path relation like the following:

type path(free i32, bound i32)

We say the second argument is bound and the first argument is free, matching what we expect from the query.

On-Demand To-String

Let’s consider an simple arithmetic expression language and a to_string predicate for the language:

type Expr = Const(i32) | Var(String) | Add(Expr, Expr) | Sub(Expr, Expr)

rel to_string(e, $format("{}", i))             = case e is Const(i)
rel to_string(e, $format("{}", v))             = case e is Var(e)
rel to_string(e, $format("({} + {})", s1, s2)) = case e is Add(e1, e2) and to_string(e1, s1) and to_string(e2, s2)
rel to_string(e, $format("({} - {})", s1, s2)) = case e is Sub(e1, e2) and to_string(e1, s1) and to_string(e2, s2)

Now that let’s say there are many expressions declared as constants:

const EXPR_1 = Add(Const(1), Add(Const(5), Const(3)))
const EXPR_2 = Add(Const(1), Var("x"))
const EXPR_3 = Const(13)

Scallop would have automatically generated string for all of the expressions.

However, let’s say we are only interested in one of the expressions:

query to_string(EXPR_3, s)

Then most of the computations for to_string would be redundant.

In this case, we would also declare to_string as an on-demand predicate, like this:

type to_string(bound Expr, String)

Then only the queried expression will be to_string-ed.

Internals

Internally, when there are relations being annotated with adornments, the whole Scallop program is undergone a program transformation. This transformation is traditionally called Magic-Set Transformation or Demand Transformation. There are multiple papers on the topic, which we reference below:

Loading from CSV

Scallop can be used along with existing datasets loaded from CSVs. This is usually achieved with annotating on specific relations. For example, assuming we have a file edge.csv,

0,1
1,2

we can load the content of it into a relation edge in Scallop using the following syntax

@file("edge.csv")
type edge(from: usize, to: usize)

rel path(a, c) = edge(a, c) or path(a, b) and edge(b, c)

query path

In particular, we annotate the @file(...) attribute onto the relation type declaration type edge(...). The file name is written inside the @file attribute. We require the relation to be declared with types in order for it to be loaded with CSV file content. Depending on the type declaration, the file content will be parsed into values of certain types.

From here, the edge relation will be loaded with the content (0, 1) and (1, 2). After executing the Scallop program above, we would obtain the result path being (0, 1), (0, 2), and (1, 2).

Certainly, there are many ways to load CSV. In this section, we introduce the various ways to configure the CSV loading.

Headers

There are CSV files with headers. Suppose we have the following CSV file

from,to
0,1
1,2

To load this file, we would need to add an additional argument header=true to the @file attribute:

@file("edge.csv", header=true)
type edge(from: usize, to: usize)

Note that by default we assume that CSV files don’t have headers.

Deliminators

By default, we assume the values inside of the CSV file are deliminated by commas ','. In case where CSV files have values deliminated by other characters, such as tabs '\t', we would need to specify that in the @file attribute:

@file("edge.csv", deliminator="\t")
type edge(from: usize, to: usize)

Note that deliminators cannot be of multiple characters.

Parsing Field-Value Pairs

There are many CSV tables which have a lot of columns. One way is to specify all the fields and their types, like the following.

type table(field1: type1, field2: type2, ..., fieldn: typen)

However, this might be very hard to encode. Therefore, we provide another way of parsing CSV files into relations, by using primary keys and field-value pairs. Let’s assume we have the following CSV file:

student_id,name,year,gender
0001,alice,2020,female
0002,bob,2021,male

We see that student_id can serve as the primary key of this table. With this, it can be loaded into the following relation

@file("student.csv", keys="student_id")
type table(student_id: usize, field: String, value: String)

By specifying keys="student", we tell Scallop that student_id should be viewed as unique primary keys. The rest of the two elements are field and value, both need to be typed Strings. As a result, it produces the following 6 facts in the table relation:

(1, "name", "alice"), (1, "year", "2020"), (1, "gender", "female"),
(2, "name", "bob"),   (2, "year", "2021"), (2, "gender", "male")

Note that there could be more than one keys. Consider the following table

student_id,course_id,enroll_time,grade
0001,cse100,fa2020,a
0001,cse101,sp2021,a
0002,cse120,sp2021,b

We see that the combination of student_id and course_id form the unique primary keys. In this case, they can be loaded using the following syntax:

@file("enrollment.csv", keys=["student_id", "course_id"])
type enrollment(student_id: usize, course_id: String, field: String, value: String)

By setting keys to be a list ["student_id", "course_id"], the student_id field is the first primary key and course_id is the second. There are still two additional arguments for the enrollment relation. In general, the arity of the relation will be the number of primary keys plus 2.

Specifying Fields to Load

In case not all fields are desired when loading, one can use the fields argument to specify what to load. Consider the same enrollment table encoded in CSV:

student_id,course_id,enroll_time,grade
0001,cse100,fa2020,a
0001,cse101,sp2021,a
0002,cse120,sp2021,b

If we only want to get everything but omit the enroll_time column, we can do

@file("enrollment.csv", fields=["student_id", "course_id", "grade"])
type enrollment(student_id: usize, course_id: String, grade: String)

This can also work in conjunction with the keys argument. In this case, we do not need to specify the primary keys.

@file("enrollment.csv", keys=["student_id", "course_id"], fields=["grade"])
type enrollment(student_id: usize, course_id: String, field: String, value: String)
// The following facts will be obtained
//   enrollment(1, "cse100", "grade", "a")
//   enrollment(1, "cse101", "grade", "a")
//   enrollment(2, "cse120", "grade", "b")

Foreign Functions

Foreign functions allows for complex value manipulation in Scallop. Conceptually, they are pure and partial functions that operate on value(s) and return one single value only. Functions with states, such as random, are not allowed as foreign functions.

Function Types

In Scallop, foreign functions are generically typed with optional and variable arguments. All the functions have a dollar sign ($) associated with the function name. We use the following syntax to denote a function signature

$FUNC_NAME<ARG(: FAMILY)?, ...>(
  POS_ARG: POS_ARG_TYPE, ...,
  OPT_ARG: OPT_ARG_TYPE?, ...,
  VAR_ARG_TYPE...
) -> RETURN_TYPE

The generic arguments are specified in the <...> after the function name, and can be annotated by optional type family. For the arguments of the function, optional arguments have to appear after all positional arguments, and the variable arg type must appear after all positional and optional arguments. Functions must have a return type.

For example, the function $string_char_at(s: String, i: usize) -> char takes in a string s and an index i, and returns the character at that location. The two arguments s and i are both positional arguments.

In the function $substring(s: String, b: usize, e: usize?), we have 2 positional arguments (s and b) and 1 optional argument (e). This means that this substring function can be invoked with 2 or 3 arguments. Invoking $substring("hello", 3) would give us "lo", and invoking $substring("hello", 1, 3) would give us "el".

For functions like $abs<T: Number>(T) -> T, we have absolute value function taking in value of any type that is a number (including integers and floating points). The function also returns a type the same as the input.

For a function like $format(f: String, Any...), it looks at the format string and fill all the {} symbol in the string with the latter arguments. Notice how there can be arbitrary number of arguments (variable arg) of Any type. For example, we can have $format("{} + {}", 3, "a") ==> "3 + a" and $format("{}", true) ==> "true".

Function Failures

Foreign functions may fail with errors such as divide-by-zero, index-out-of-bounds. When error happens, values will not be propagated along the computation, and will be dropped silently.

For example, the following program makes use of the foreign function $string_char_at. It walks through the indices 1, 3, and 5, and get the character on those indices of the string "hello".

rel indices = {1, 3, 5}
rel output(i, $string_char_at("hello", i)) = indices(i)

However, there are only 5 characters in the string, meaning that getting the 5-th character would result in an index-out-of-bounds error. Scallop will drop this invokation silently, resulting in only two facts being derived:

output: {(1, 'e'), (3, 'l')}

Similar things happen when nan is derived from floating point operations, or that the foreign function fails.

Library of Foreign Functions

We hereby list all the available foreign functions, their signatures, descriptions, and an example of how to invoke them. The functions here are ordered alphabetically. For some of the functions that are slightly more complicated (e.g. $format), please refer to the section below for more information.

FunctionDescriptionExample
$abs<T: Number>(x: T) -> TAbsolute value function \(\lvert x \rvert\)$abs(-1) => 1
$acos<T: Float>(x: T) -> TArc cosine function \(\text{acos}(x)\)$acos(0.0) => 1.5708
$atan<T: Float>(x: T) -> TArc tangent function \(\text{atan}(x)\)$atan(0.0) => 0.0
$atan2<T: Float>(y: T, x: T) -> T2-argument arc tangent function \( \text{atan}(y, x) \)$atan2(0.0, 1.0) => 0.0
$ceil<T: Number>(x: T) -> TRound up to closest integer \( \lceil x \rceil \)$ceil(0.5) => 1.0
$cos<T: Float>(x: T) -> TCosine function \(\text{cos}(x)\)$cos(0.0) => 1.0
$datetime_day(d: DateTime) -> u32Get the day component of a DateTime, starting from 1$datetime_day(t"2023-01-01") => 1
$datetime_month(d: DateTime) -> u32Get the month component of a DateTime, starting from 1$datetime_month(t"2023-01-01") => 1
$datetime_month0(d: DateTime) -> u32Get the month component of a DateTime, starting from 0$datetime_month0(t"2023-01-01") => 0
$datetime_year(d: DateTime) -> i32Get the year component of a DateTime$datetime_month0(t"2023-01-01") => 2023
$dot(a: Tensor, b: Tensor) -> TensorDot product of two tensors \(a \cdot b\); only available when compiled with torch-tensor
$exp<T: Float>(x: T) -> TExponential function \(e^x\)$exp(0.0) => 1.0
$exp2<T: Float>(x: T) -> TExponential function \(2^x\) (base 2)$exp2(2.0) => 4.0
$floor<T: Number>(x: T) -> TRound down to closest integer \(\lfloor x \rfloor\)$exp2(2) => 4
$format(String, Any...) -> StringFormatting string$format("{} + {}", 3, "a") => "3 + a"
$hash(Any...) -> u64Hash the given values$hash("a", 3, 5.5) => 5862532063111067262
$log<T: Float>(x: T) -> TNatural logarithm function \(\text{log}_e(x)\)$log(1.0) => 0.0
$log2<T: Float>(x: T) -> TLogarithm function \(\text{log}_2(x)\) (base 2)$log2(4.0) => 2.0
$max<T: Number>(T...) -> TMaximum \(\text{max}(x_1, x_2, \dots)\)$max(4.0, 1.0, 9.5) => 9.5
$min<T: Number>(T...) -> TMinimum \(\text{min}(x_1, x_2, \dots)\)$max(4.0, 1.0, 9.5) => 1.0
$pow<T: Integer>(x: T, y: u32) -> TInteger power function \(x^y\)$pow(2.2, 2) => 4.84
$powf<T: Float>(x: T, y: T) -> TFloat power function \(x^y\)$powf(4.0, 0.5) => 2.0
$sign<T: Number>(x: T) -> TSign function that returns \({-1, 0, 1}\) in respective types$sign(-3.0) => -1.0
$sin<T: Float>(x: T) -> TSine function \(\text{sin}(x)\)$sin(0.0) => 0.0
$string_char_at(s: String, i: usize) -> charGet the i-th character of string s$string_char_at("hello", 2) => 'l'
$string_concat(String...) -> StringConcatenate multiple strings$string_concat("hello", " ", "world") => "hello world"
$string_index_of(s: String, pat: String) -> usizeFind the index of the first occurrence of the pattern pat in string s$string_index_of("hello world", "world") => 6
$string_length(s: String) -> usizeGet the length of the string$string_length("hello") => 5
$string_lower(s: String) -> StringTo lower-case$string_lower("LisA") => "lisa"
$string_trim(s: String) -> StringTrim a string$string_trim(" hello ") => "hello"
$string_upper(s: String) -> StringTo upper-case$string_upper("LisA") => "LISA"
$substring(s: String, b: usize, e: usize?) -> StringFind the substring given begin index and optional the end index$substring("hello world", 6) => "world"
$tan<T: Number>(x: T) -> TTangent function \(\text{tan}(x)\)$tan(0.0) => 0.0

Foreign Predicates

Foreign predicates aim to provide programmers with extra capabilities with relational predicates. Traditional Datalog program defines relational predicate using only horn-rules. Given the assumption that the input database is finite, these derived relational predicates will also be finite. However, there are many relational predicates that are infinite and could not be easily expressed by horn-rules. One such example is the range relation. Suppose it is defined as range(begin, end, i) where i could be between begin (inclusive) and end (exclusive). There could be infinitely many triplets, and we cannot simply enumerate all of them. But if the first two arguments begin and end are given, we can reasonably enumerate the i.

In Scallop, range is available to be used as a foreign predicate. Notice that range can be applied on any integer data, making it a generic predicate. For example, to use range on i32 data, we will need to invoke range<i32>:

rel result(x) = range<i32>(0, 5, x)

Here we enumerate the value of x from 0 (inclusive) to 5 (exclusive), meaning that we will obtain that result = {0, 1, 2, 3, 4}. For the rest of this section, we describe in detail how foreign predicates are constructed in Scallop and why are they useful.

Foreign Predicate Types

Foreign predicates can be generic and are statically typed. In addition to just providing the argument types, we also need to provide a boundness pattern.

A boundness pattern is a string of length equal to the relation arity and consisting of b and f. The character b means bounded, reflecting the variable on that position is taken as input to the predicate. The character f means free, suggesting that the variable on that position will be generated as output by the predicate.

For example, the full definition of range is

range<T: Integer>(begin: T, end: T, i: T)[bbf]

Notice that at the end of the definition we have [bbf]. Here, bbf is a boundness pattern for range, suggesting that begin and end will be provided as input, and i will be generated as output.

In the future, we plan to allow the definition of multiple boundness patterns

Standard Library of Foreign Predicates (Part A)

In this part, we give an overview to the foreign predicates that are discrete only.

Foreign PredicateDescription
datetime_ymd(d: DateTime, y: i32, m: u32, d: u32)[bfff]Get the year, month, and day from a DateTime value
range<T: Integer>(begin: T, end: T, i: T)[bbf]Generate all the integers i starting from begin and end with end - 1
string_chars(s: String, i: usize, c: char)[bff]Generate all the index-character tuples inside of string s
string_find(s: String, pat: String, begin: usize, end: usize)[bbff]Generate all the begin-end ranges of the pattern pat’s occurrence in the string s
string_split(s: String, pat: String, out: String)[bbf]Split the string s using the pattern pat and generate the out strings

Reference Guide

We list all the language features supported by Scallop.

Import Files

import "path/to/other/file.scl"

Type Definition

Type Alias Definition

type ObjectId = usize

Sub-Type Definition

type Name <: String

Enum Type Definition

type Action = LEFT | RIGHT | UP | DOWN

Algebraic Data Type Definition

type Expr = Const(i32) | Add(Expr, Expr) | Sub(Expr, Expr)

Relation Type Definition

type edge(x: i32, y: i32)

Constant Definition

const PI: f32 = 3.1415

Relation Definition

Fact Definition

rel edge(1, 2)

Set-of-Tuples Definition

rel edge = {(1, 2), (2, 3), (3, 4)}

Rule Definition

rel path(a, b) = edge(a, b) or path(a, c) and edge(c, b)

Disjunctive Head

rel { assign(v, false); assign(v, true) } = variable(v)

Atom

fib(x - 1, y)

Negation

rel has_no_child(p) = person(p) and not father(p, _) and not mother(p, _)

Constraint

rel number(0)
rel number(i + 1) = number(i) and i < 10

Aggregation

rel person = {"alice", "bob", "christine"}
rel num_people(n) = n := count(p: person(p))

Foreign Predicate

rel grid(x, y) = range<i32>(0, 5, x) and range<i32>(0, 5, y)

Query Definition

query path

Scallop and Probabilistic Programming

One fundamental concept in machine learning is probability. Scallop, being a neurosymbolic programming language, supports probability and probabilistic programming natively. For example, one can write the following program:

type Action = UP | DOWN | LEFT | RIGHT
rel predicted_action = {0.05::UP, 0.09::DOWN, 0.82::LEFT, 0.04::RIGHT}

where the predicted_action relation encodes a distribution of actions and their probabilities. In particular, the UP action is predicted to have a \(0.05\) probability. Here, the :: symbol is used to suggest that probabilities (such as 0.05) are used to tag the facts (such as UP).

Since we can define probability on user declared facts, the derivation of new facts will be associated with probabilities too. This means that Scallop is doing probabilistic reasoning. The whole probabilistic reasoning semantics of Scallop is defined with the theory of provenance semiring. In this chapter, we give detailed explanation to the probabilities appeared in Scallop.

Tags and Provenance

Scallop’s probabilistic semantics is realized by the Provenance Semiring framework. Inside of this framework, each fact can be tagged by an extra piece of information, which we call tag. Such information is propagated throughout the execution of Scallop program according to the provenance, which is the mathematical object defining how tags propagate.

Motivating Probabilistic Example

The following example shows a fact earthquake() being tagged by a probability 0.03 (earthquake could happen with a 0.03 probability):

rel 0.03::earthquake()

Concretely, we have an (external) tag space of \([0, 1]\), which contains real numbers between 0 and 1, which is the space of probabilities. Similarly, we define another tagged fact burglary():

rel 0.20::burglary()

We can declare a rule saying that, “when earthquake or burglary happens, an alarm will go off”.

rel alarm() = earthquake() or burglary()
query alarm

Remember that the facts earthquake() and burglary() are probabilistic. Intuitively, the derived fact alarm() will also be associated with a derived probability. Based on probability theory, we have

\[ \begin{align} \Pr(\text{alarm}) &= \Pr(\text{earthquake} \vee \text{burglary}) \\ &= 1 - \Pr(\neg \text{earthquake} \wedge \neg \text{burglary}) \\ &= 1 - \Pr(\neg \text{earthquake}) \cdot \Pr(\neg \text{burglary}) \\ &= 1 - (1 - \Pr(\text{earthquake})) \cdot (1 - \Pr(\text{burglary})) \\ &= 1 - (1 - 0.03) (1 - 0.20) \\ &= 1 - 0.97 \times 0.8 \\ &= 0.224 \end{align} \]

This is indeed what we get if we use the topkproofs provenance (which we discuss later in the chapter) with the scli Scallop interpreter:

> scli alarm.scl
alarm: {0.224::()}

Proofs Provenance

Proofs are fundamental to understanding how Scallop derives conclusions. When Scallop computes a result, it doesn’t just give you an answer - it can also tell you why that answer exists by tracking the derivation proofs.

What are Proofs?

In Scallop, a proof is a set of base facts that, when combined together through rules, derive a conclusion. Think of it as the “evidence” or “reasoning chain” that supports a derived fact.

Simple Example

Consider this simple graph program:

rel edge = {(0, 1), (1, 2), (0, 2)}
rel path(a, c) = edge(a, c)
rel path(a, c) = path(a, b), edge(b, c)
query path(0, 2)

The fact path(0, 2) can be derived in two different ways:

  1. Proof 1: Directly from edge(0, 2) (using the first rule)
  2. Proof 2: From edge(0, 1) + edge(1, 2) (using the second rule)

Each proof is a set of fact IDs - unique identifiers for the base facts:

  • Let’s say edge(0, 1) is fact ID 0
  • edge(1, 2) is fact ID 1
  • edge(0, 2) is fact ID 2

Then:

  • Proof 1 = {2} (just uses fact 2)
  • Proof 2 = {0, 1} (uses facts 0 and 1 together)

Proofs Provenance

The proofs provenance tracks all possible derivation paths for each conclusion.

Enabling Proofs Tracking

In Scallop CLI:

scli --provenance proofs program.scl

In Python:

import scallopy

ctx = scallopy.ScallopContext(provenance="proofs")
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(0, 1), (1, 2), (0, 2)])
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")
ctx.run()

# Each result includes its proofs
for (proofs_obj, (start, end)) in ctx.relation("path"):
  print(f"Path ({start}, {end}): {proofs_obj}")

Understanding Proof Structure

Proofs are represented as a set of sets of fact IDs:

Proofs = { {fact_id₁, fact_id₂}, {fact_id₃}, ... }
  • The outer set represents alternative derivations (disjunction - OR)
  • Each inner set represents facts used together (conjunction - AND)

Example interpretation:

Proofs = { {0, 1}, {2} }

This means: “The conclusion can be derived by using (fact 0 AND fact 1) OR (fact 2 alone)”

Multiple Proofs for the Same Tuple

When a tuple can be derived in multiple ways, all proofs are tracked:

rel edge = {(0, 1), (1, 2), (0, 2), (0, 3), (1, 3), (2, 3)}
rel path(a, c) = edge(a, c)
rel path(a, c) = path(a, b), edge(b, c)

For path(0, 3), there might be many proofs:

  • Direct: {edge(0,3)}
  • Via 1: {edge(0,1), edge(1,3)}
  • Via 2: {edge(0,2), edge(2,3)}
  • Via 1→2: {edge(0,1), edge(1,2), edge(2,3)}

The proofs provenance tracks all of them.


Top-K Proofs

Tracking all proofs can be expensive - there might be exponentially many derivations! The topkproofs provenance provides a memory-efficient alternative by keeping only the top-K most probable proofs.

Why Top-K?

Consider a graph with many paths. A conclusion might have thousands of alternative derivations. In practice, we often only care about the most likely explanations.

Using Top-K Proofs

In CLI:

scli --provenance topkproofs --k 3 program.scl

In Python:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=3)

The k parameter controls how many proofs to keep. With k=3, Scallop maintains the 3 most probable derivation paths for each conclusion.

Top-K with Probabilities

Here’s where Top-K shines - with probabilistic facts:

rel edge = {
  0.9::(0, 1),  // High confidence edge
  0.8::(1, 2),  // High confidence edge
  0.2::(0, 2),  // Low confidence edge
  0.7::(1, 3)
}

rel path(a, c) = edge(a, c)
rel path(a, c) = path(a, b), edge(b, c)

query path(0, 2)

For path(0, 2), there are two proofs:

  1. Proof via 1: {edge(0,1), edge(1,2)} with probability 0.9 × 0.8 = 0.72
  2. Direct proof: {edge(0,2)} with probability 0.2

With topkproofs and k=1, only the most probable proof (via node 1) would be kept.

Exact Probability via WMC

Top-K proofs use Weighted Model Counting (WMC) to compute exact probabilities from the kept proofs. The proofs are converted to a Boolean formula and evaluated using a Sentential Decision Diagram (SDD) for efficient computation.

Example:

  • Proofs: {{0, 1}, {2}}
  • Boolean formula: (f₀ ∧ f₁) ∨ f₂
  • With probabilities: P(f₀)=0.9, P(f₁)=0.8, P(f₂)=0.2
  • WMC computes: P = 0.72 + 0.2 - (0.72 × 0.2) = 0.776

The inclusion-exclusion principle ensures we don’t double-count when proofs overlap.


DNF Formula Representation

Internally, Scallop represents proofs as Disjunctive Normal Form (DNF) formulas.

What is DNF?

A DNF formula is a disjunction (OR) of conjunctions (AND):

Formula = (lit₁ ∧ lit₂ ∧ ...) ∨ (lit₃ ∧ lit₄ ∧ ...) ∨ ...
          \_____________/     \_____________/
              Clause 1            Clause 2

Each clause is a conjunction of literals (fact IDs).

Example

Proofs {{0, 1}, {2}} becomes:

DNF = (fact_0 ∧ fact_1) ∨ fact_2

This structure makes it efficient to:

  1. Combine proofs from different rules
  2. Compute probabilities via WMC
  3. Handle negation and disjunctions

Operations on Proofs

Scallop’s provenance framework defines how proofs combine:

Addition (Disjunction): Merge alternative derivations

{fact_0} + {fact_1} = {fact_0, fact_1}

Multiplication (Conjunction): Combine proofs from rule bodies

{{0}} × {{1}} = {{0, 1}}
{{0, 1}} × {{2}} = {{0, 1, 2}}

These operations follow semiring properties, making the system mathematically principled.


Proofs vs. Other Provenances

Different provenance types have different tradeoffs:

ProvenanceTracks Proofs?MemoryExact ProbabilityUse Case
unitNoMinimalN/AStandard DataLog, no tracking
proofsYes, allHighNoFull derivation tracking
topkproofsYes, top-KMediumYes (via WMC)Probabilistic reasoning with efficiency
minmaxprobNoMinimalNo (bounds only)Fast probabilistic bounds
addmultprobNoMinimalNo (approximate)Fast probabilistic approximation

Choose proofs when:

  • You need to understand all derivation paths
  • Memory is not a concern
  • You’re debugging logic

Choose topkproofs when:

  • You have probabilistic facts
  • You need exact probabilities
  • Memory efficiency matters
  • You care about top explanations

Choose minmaxprob when:

  • You need fast probabilistic bounds
  • Exact probabilities aren’t critical
  • Memory is very limited

Proof Debugging

For advanced proof debugging, Scallop provides difftopkproofsdebug provenance that exposes fact IDs and full proof structures. See Debugging Proofs for details.

Example: Finding Which Facts Were Used

import torch
import scallopy

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("edge", (int, int))

# Add facts with explicit IDs
ctx.add_facts("edge", [
  ((torch.tensor(0.9), 1), (0, 1)),  # Fact ID 1
  ((torch.tensor(0.8), 2), (1, 2)),  # Fact ID 2
  ((torch.tensor(0.2), 3), (0, 2)),  # Fact ID 3
])

ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")
ctx.run()

# Results include fact IDs in proofs
for (result, tuple_data) in ctx.relation("path"):
  print(f"Tuple: {tuple_data}, Result: {result}")
  # Result contains probability and proofs with fact IDs

The proofs will show exactly which fact IDs were used to derive each path.


Common Patterns

Counting Proofs

Want to know how many ways something can be derived?

rel count_derivations(n) = n = count(p: path(0, 2) with proof p)

(Note: This is conceptual - actual implementation depends on provenance support)

Filtering by Proof Confidence

With probabilistic proofs, you can filter for high-confidence derivations:

rel high_confidence_path(a, b) = path(a, b) and confidence(a, b) > 0.9

Analyzing Derivation Depth

By tracking proofs, you can analyze how “deep” a derivation is (how many facts it uses):

  • Proofs with 1 fact = direct facts
  • Proofs with 2 facts = one-hop derivations
  • Proofs with N facts = (N-1)-hop derivations

Summary

  • Proofs = sets of fact IDs that together derive a conclusion
  • proofs provenance = tracks all derivation paths
  • topkproofs provenance = keeps top-K most probable proofs for efficiency
  • DNF formulas = internal representation enabling efficient computation
  • WMC = algorithm for computing exact probabilities from proofs
  • Fact IDs = enable traceability and debugging

Understanding proofs is key to:

  1. Debugging your Scallop programs
  2. Understanding why conclusions are drawn
  3. Optimizing probabilistic reasoning
  4. Building explainable AI systems

Further Reading

Fact with Probability

We can associate facts with probabilities through the use of :: symbol. This can be done with the set syntax and also individual fact syntax:

rel color = {0.1::"red", 0.8::"green", 0.1::"blue"}

// or

rel 0.1::color("red")
rel 0.8::color("green")
rel 0.1::color("blue")

Mutual exclusive facts

Within the set annotation, if we replace the comma (,) with semi-colons (;), we will be specifying mutual exclusive facts. If one is encoding a categorical distribution, they should be specifying mutual exclusions by default. Suppose we have two MNIST digits that can be classified as a number between 0 to 9. If we represent each digit with their ID, say A and B, we should write the following program in Scallop:

type ImageID = A | B
type digit(img_id: ImageID, number: i32)

rel digit = {0.01::(A, 0); 0.86::(A, 1); ...; 0.03::(A, 9)}
rel digit = {0.75::(B, 0); 0.03::(B, 1); ...; 0.02::(B, 9)}

Notice that we have specified two sets of digits, each being a mutual exclusion, as suggested by the semi-colon separator (;). This means that each of A and B could be classified as one of the 10 numbers, but not multiple.

Specifyin mutually exclusive facts in Scallopy

Probabilistic Rules

In Scallop, rules can have probabilities too, just like the probabilities associated with the facts and tuples. For instance, you might write the following probabilistic rule to denote that “when earthquake happens, there is a 80% chance that an alarm will go off”:

rel 0.8::alarm() = earthquake()

Combine the above rule with the fact that earthquake happens with a 10% probability, we obtain that the alarm will go off with a 0.08 probability. Note that this result can be obtained using topkproofs or addmultprob provenance, while a provenance such as minmaxprob will give different results.

rel 0.1::earthquake()

query alarm // 0.08::alarm()

Rule tags with expressions

What is special about the probabilities of rules is that the probabilities could be expressions depending on values within the rule. For instance, here is a set of rules that say that the probability of a path depends on the length of a path, which falls off when the length increases:

// A few edges
rel edge = {(0, 1), (1, 2)}

// Compute the length of the paths (note that we encode length with floating point numbers)
rel path(x, y, 1.0) = edge(x, y)
rel path(x, z, l + 1.0) = path(x, y, l) and edge(y, z)

// Compute the probabilistic path using the fall off (1 / length)
rel 1.0 / l :: prob_path(x, y) = path(x, y, l)

// Perform the query with arbitrary
query prob_path // prob_path: {1.0::(0, 1), 0.5::(0, 2), 1.0::(1, 2)}

Here, since path(0, 1) and path(1, 2) have length 1, their probability is 1 / 1 = 1. However, path(0, 2) has length 2 so its probability is 1 / 2 = 0.5.

As can be seen, with the support for having expressions in the tag, we can encode more custom probabilistic rules in Scallop. Internally, this is implemented through the use of custom foreign predicates.

Rule tags that are not floating points

In general, Scallop supports many forms of tag, including but not limited to probabilities (floating points). For instance, we can encode boolean as well:

rel constraint(x == y) = digit_1(x) and digit_2(y)
rel b::sat() = constraint(b)

The relation constraint has type (bool), and therefore the variable b in the second rule has type boolean as well. With the second rule, we lift the boolean value into the boolean tag associated with the nullary relation sat.

Associating rules with tags from Scallopy

We elaborate on this topic in the Scallopy section as well

You can associate rules with tags from Scallopy as well, so that we are not confined to Scallop’s syntax. For instance, the following python program creates a new Scallop context and inserts a rule with a tag of 0.8.

ctx = scallopy.Context(provenance="topkproofs")
ctx.add_rule("alarm() = earthquake()", tag=0.8)

Of course, the tag doesn’t need to be a simple constant floating point, since we are operating within the domain of Python. How about using a PyTorch tensor? Certainly!

ctx = scallopy.Context(provenance="topkproofs")
ctx.add_rule("alarm() = earthquake()", tag=torch.tensor(0.8, requires_grad=True))

Notice that we have specified that requires_grad=True. This means that if any Scallop output depends on the tag of this rule, the PyTorch back-propagation will be able to accumulate gradient on this tensor of 0.8. Any optimization will have an effect on updating the tag, by essentially treating it as a parameter. Of course, we might need more thoughts so that the optimization can actually happen. For instance, you will need to tell the optimizer that this tensor is a parameter. But we will delay this discussion to a later section.

Provenance Library

Scallop provides 18 different provenance types covering discrete logic, probabilistic reasoning, and differentiable computation. This reference guide helps you choose the right provenance for your application.

Overview

Provenances are organized into three categories:

  1. Discrete (5 types) - For standard logic programming without probabilities
  2. Probabilistic (6 types) - For reasoning under uncertainty
  3. Differentiable (7+ types) - For integration with neural networks and gradient-based learning

Each provenance defines how tags (like probabilities) propagate through logical rules, following the provenance semiring framework.


Discrete Provenances

Use discrete provenances when you don’t need probabilistic reasoning.

unit - No Tracking

Description: Standard DataLog with no provenance tracking. Fastest and most memory-efficient.

Tag Type: None (unit type)

Use When:

  • Pure logic programming
  • No need for probabilities or proof tracking
  • Maximum performance needed

CLI:

scli program.scl  # unit is default

Python:

ctx = scallopy.ScallopContext(provenance="unit")

Example:

rel edge = {(0, 1), (1, 2)}
rel path(a, c) = edge(a, c)
rel path(a, c) = path(a, b), edge(b, c)
query path  // {(0,1), (0,2), (1,2)}

proofs - Full Derivation Tracking

Description: Tracks all possible derivation proofs for each conclusion. Each proof is a set of base facts that together derive the result.

Tag Type: Proofs (set of sets of fact IDs)

Use When:

  • Debugging logic programs
  • Understanding all derivation paths
  • Explainability is critical
  • Memory is not constrained

CLI:

scli --provenance proofs program.scl

Python:

ctx = scallopy.ScallopContext(provenance="proofs")

Output: Each tuple comes with all its proofs

path: {({0, 1}, (0, 2)), ({0}, (0, 1)), ...}

See: Proofs Provenance for detailed explanation


boolean - Boolean Algebra

Description: Uses boolean semiring where tags are True/False values.

Tag Type: bool

Operations:

  • Addition (OR): true ∨ false = true
  • Multiplication (AND): true ∧ false = false

Use When:

  • Tracking whether facts exist
  • Boolean constraints
  • Reachability analysis

Example:

ctx = scallopy.ScallopContext(provenance="boolean")
ctx.add_facts("reliable_edge", [(True, (0, 1)), (False, (1, 2))])

natural - Natural Numbers

Description: Counts using natural numbers. Useful for counting derivations.

Tag Type: Natural numbers (0, 1, 2, …)

Operations:

  • Addition: Standard addition
  • Multiplication: Standard multiplication

Use When:

  • Counting how many ways something is derived
  • Aggregation over counts

tropical - Tropical Semiring

Description: Min-plus tropical semiring for shortest path problems.

Tag Type: Integers

Operations:

  • Addition: min(a, b)
  • Multiplication: a + b
  • Zero:
  • One: 0

Use When:

  • Shortest path algorithms
  • Cost minimization
  • Distance metrics

Example:

rel edge = {5::(0, 1), 3::(1, 2), 7::(0, 2)}  // Weighted edges
rel shortest_path(a, c) = edge(a, c)
rel shortest_path(a, c) = shortest_path(a, b), edge(b, c)
query shortest_path  // Finds minimum-cost paths

Probabilistic Provenances

Use probabilistic provenances for reasoning under uncertainty.

minmaxprob - Conservative Probability Bounds

Description: Fast probabilistic bounds using min/max operations. Not probabilistically exact but provides conservative estimates.

Tag Type: f64 (probability between 0.0 and 1.0)

Operations:

  • Addition (OR): max(p1, p2) - Most optimistic
  • Multiplication (AND): min(p1, p2) - Most pessimistic

Use When:

  • Fast probabilistic reasoning needed
  • Exact probabilities not critical
  • Conservative bounds acceptable
  • Very large graphs

CLI:

scli --provenance minmaxprob program.scl

Python:

ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.add_facts("edge", [(0.8, (0, 1)), (0.9, (1, 2))])

Example Output:

path: {0.8::(0, 1), 0.8::(0, 2), 0.9::(1, 2)}

Note: Not probabilistically accurate! Path (0,2) uses two edges with p=0.8 and p=0.9, but reports min(0.8, 0.9) = 0.8, not the actual 0.72.


addmultprob - Add-Mult Probability

Description: Sum-product semiring with clamping. Fast but approximate.

Tag Type: f64

Operations:

  • Addition (OR): min(p1 + p2, 1.0) - Clamped sum
  • Multiplication (AND): p1 * p2

Use When:

  • Fast probabilistic approximation
  • Probabilities won’t sum > 1.0
  • Exact computation too expensive

Python:

ctx = scallopy.ScallopContext(provenance="addmultprob")

Limitation: Sum of probabilities can exceed 1.0 before clamping, violating probability axioms.


topkproofs - Top-K Proofs with Exact Probability

Description: Keeps top-K most probable proofs and computes exact probability using Weighted Model Counting (WMC) via Sentential Decision Diagrams (SDD).

Tag Type: DNFFormula internally, f64 output

Use When:

  • Exact probabilities needed
  • Memory efficiency matters
  • Only top explanations relevant
  • Standard probabilistic reasoning

CLI:

scli --provenance topkproofs --k 3 program.scl

Python:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=3)

Parameters:

  • k: Number of proofs to keep (default: 3)
  • wmc_with_disjunctions: Include mutual exclusion in WMC (default: false)

Example:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=5)
ctx.add_facts("edge", [
  (0.9, (0, 1)),
  (0.8, (1, 2)),
  (0.2, (0, 2))
])
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")
ctx.run()

for (prob, (a, b)) in ctx.relation("path"):
  print(f"Path ({a}, {b}): {prob:.4f}")
# Output:
# Path (0, 1): 0.9000
# Path (0, 2): 0.776  # Exact: 0.72 + 0.2 - 0.72*0.2 (inclusion-exclusion)
# Path (1, 2): 0.8000

Key Features:

  • WMC computes exact joint probabilities
  • SDD enables efficient Boolean formula evaluation
  • Mutual exclusion supported via disjunctions
  • Top-K keeps memory bounded

See: Proofs Provenance for WMC details


probproofs - All Proofs with Exact Probability

Description: Like topkproofs but keeps ALL proofs. More accurate but higher memory.

Tag Type: ProbProofs

Use When:

  • Exact probabilities with all derivations
  • Memory not constrained
  • Complete proof tracking needed

Python:

ctx = scallopy.ScallopContext(provenance="probproofs")

Tradeoff: Higher memory than topkproofs, but no loss of proofs.


samplekproofs - Sampled K Proofs

Description: Samples K proofs probabilistically for unbiased statistical approximation.

Tag Type: Sampled proofs

Use When:

  • Stochastic approximation acceptable
  • Memory very limited
  • Statistical estimates sufficient

Python:

ctx = scallopy.ScallopContext(provenance="samplekproofs", k=10)

topbottomkclauses - Top-K and Bottom-K Clauses

Description: Keeps both top-K and bottom-K clauses for full negation support.

Tag Type: Top/bottom clause sets

Use When:

  • Negation in probabilistic programs
  • Need both positive and negative evidence

Python:

ctx = scallopy.ScallopContext(provenance="topbottomkclauses", k=3)

Example:

import scallopy
import torch

ctx = scallopy.ScallopContext(provenance="difftopbottomkclauses")
ctx.add_relation("obj_color", (int, str))
ctx.add_facts("obj_color", [
  (torch.tensor(0.99), (0, "blue")),
  (torch.tensor(0.01), (0, "green"))
])
ctx.add_rule('num_blue(x) :- x = count(o: obj_color(o, "blue"))')
ctx.run()

Differentiable Provenances

Use differentiable provenances for integration with neural networks and gradient-based learning.

difftopkproofs - Differentiable Top-K Proofs

Description: topkproofs with PyTorch tensor support and gradient computation.

Tag Type: torch.Tensor

Use When:

  • Training with gradient descent
  • Neurosymbolic AI applications
  • End-to-end learning with logic

Python:

import torch
import scallopy

ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [
  (torch.tensor(0.9, requires_grad=True), (0, 1)),
  (torch.tensor(0.8, requires_grad=True), (1, 2))
])
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.run()

# Results are tensors with gradient support
for (prob_tensor, (a, b)) in ctx.relation("path"):
  loss = (prob_tensor - target) ** 2
  loss.backward()  # Gradients flow back through logic

Key Feature: Probabilities are PyTorch tensors, enabling backpropagation through logical reasoning.


difftopkproofsdebug - With Stable Fact IDs ⭐

Description: difftopkproofs with user-provided stable fact IDs for debugging and traceability. ONLY provenance supporting stable IDs.

Tag Type: (torch.Tensor, int) - probability and fact ID

Use When:

  • Debugging probabilistic programs
  • Fact tracking and retraction needed
  • Provenance auditing required
  • Building knowledge management systems (HNLE use case)

Python:

import torch
import scallopy

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("edge", (int, int))

# !!! SPECIAL FACT FORMAT with explicit IDs !!!
ctx.add_facts("edge", [
  ((torch.tensor(0.9), 1), (0, 1)),  # Fact ID = 1
  ((torch.tensor(0.8), 2), (1, 2)),  # Fact ID = 2
  ((torch.tensor(0.2), 3), (0, 2)),  # Fact ID = 3
])

ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")
ctx.run()

# Proofs reference stable fact IDs
for (result, tuple_data) in ctx.relation("path"):
  print(f"Tuple: {tuple_data}, Result: {result}")

Fact ID Requirements:

  • IDs must start from 1
  • Must be contiguous (no gaps)
  • Must be unique across all facts
  • User is responsible for ID management

Output Includes Proofs:

When used with forward() in modules, returns (result_tensor, proofs):

proofs = [
  [ # Datapoint 1
    [ # Proofs of tuple 1
      [(True, 1), (True, 2)],  # Proof uses fact IDs 1 and 2
    ],
    [ # Proofs of tuple 2
      [(True, 3)],  # Proof uses fact ID 3
    ]
  ]
]

Proof Structure: List[List[List[List[Tuple[bool, int]]]]]

  • Batch → Datapoint → Proofs → Proof → Literal
  • Each literal: (is_positive, fact_id)

Use Case - HNLE MCP: Enables fact retraction by stable ID, critical for knowledge management with complex string data.

Limitations (from FloatWithID research):

  • Display format 0.8 [ID(42)] exists but cannot be parsed from .scl files
  • IDs only exist during API execution (not persisted to .scl)
  • Only this provenance type supports user-provided IDs

See: Debugging Proofs for detailed examples


diffminmaxprob - Differentiable Min-Max

Description: minmaxprob with gradient support.

Tag Type: torch.Tensor

Python:

ctx = scallopy.ScallopContext(provenance="diffminmaxprob")

diffaddmultprob - Differentiable Add-Mult

Description: addmultprob with gradient support.

Tag Type: torch.Tensor

Python:

ctx = scallopy.ScallopContext(provenance="diffaddmultprob")

diffsamplekproofs - Differentiable Sampled Proofs

Description: samplekproofs with gradient support and unbiased gradient estimates.

Tag Type: torch.Tensor

Python:

ctx = scallopy.ScallopContext(provenance="diffsamplekproofs", k=10)

Additional Differentiable Variants

Scallop also provides Python-based differentiable provenances:

  • diffaddmultprob2 - Pure Python implementation of add-mult
  • diffnandmultprob2 - NAND-mult semiring in Python
  • diffmaxmultprob2 - Max-mult semiring in Python

These are useful for experimentation and custom semiring development.


Provenance Selection Guide

Decision Tree

Need gradients for neural networks?

  • → YES: Use diff* provenance (differentiable)
    • Need fact ID tracking? → difftopkproofsdebug
    • Need exact probability? → difftopkproofs
    • Need speed? → diffminmaxprob or diffaddmultprob
  • → NO: Continue below

Need probabilities?

  • → YES: Use probabilistic provenance
    • Need exact probability? → topkproofs (recommended) or probproofs
    • Need speed over accuracy? → minmaxprob or addmultprob
    • Need sampling? → samplekproofs
  • → NO: Use discrete provenance
    • Need proof tracking? → proofs
    • Need boolean logic? → boolean
    • Need shortest paths? → tropical
    • Need standard logic? → unit (fastest)

Performance Characteristics

ProvenanceSpeedMemoryProbability AccuracyGradient Support
unit★★★★★★★★★★N/ANo
proofs★★☆☆☆★☆☆☆☆N/ANo
minmaxprob★★★★★★★★★★Bounds onlyNo
addmultprob★★★★★★★★★★ApproximateNo
topkproofs★★★☆☆★★★☆☆ExactNo
probproofs★★☆☆☆★☆☆☆☆ExactNo
difftopkproofs★★★☆☆★★★☆☆ExactYes
difftopkproofsdebug★★☆☆☆★★☆☆☆ExactYes

Common Use Cases

Knowledge Graph Reasoning:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=5)

Neurosymbolic AI (Training):

ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)

Fast Probabilistic Queries:

ctx = scallopy.ScallopContext(provenance="minmaxprob")

Debugging Logic:

ctx = scallopy.ScallopContext(provenance="proofs")

Fact Tracking / Retraction (HNLE):

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)

Shortest Path:

ctx = scallopy.ScallopContext(provenance="tropical")

Configuration Options

Common Parameters

k - Number of proofs to keep (for top-k provenances)

ctx = scallopy.ScallopContext(provenance="topkproofs", k=5)

wmc_with_disjunctions - Include mutual exclusion in probability computation

ctx = scallopy.ScallopContext(
  provenance="topkproofs",
  k=3,
  wmc_with_disjunctions=True  # Respect mutual exclusion
)

train_k / test_k - Different k values for training vs. testing

module = scallopy.Module(
  provenance="difftopkproofs",
  train_k=3,  # Keep 3 proofs during training
  test_k=10,  # Keep 10 proofs during testing
  ...
)

Summary

  • 18 provenance types covering discrete, probabilistic, and differentiable reasoning
  • Discrete (unit, proofs, boolean, natural, tropical) for pure logic
  • Probabilistic (minmaxprob, addmultprob, topkproofs, etc.) for uncertainty
  • Differentiable (diff* variants) for neural network integration
  • difftopkproofsdebug is special - only one with stable user-provided fact IDs
  • Choose based on: speed, memory, accuracy, gradients, and tracking needs

Further Reading

Aggregation with Probability

With the introduction of probabilities, many existing aggregators are augmented with new semantics, which we typically call multi-world semantics. What’s more, there are new aggregators, such as softmax, rank, and weighted_avg, that make use of the probabilities. We introduce these aggregators one-by-one in this section.

Multi-world Semantics with Aggregators

Let us take the count aggregator as an example. Consider we have 2 objects, each could be big or small with their respective probabilities:

type OBJ = OBJ_A | OBJ_B
rel size = {0.8::(OBJ_A, "big"); 0.2::(OBJ_A, "small")} // obj A is very likely big
rel size = {0.1::(OBJ_B, "big"); 0.9::(OBJ_B, "small")} // obj B is very likely small

Now let’s say we want to count how many big objects are there, by using the following

Note that even when using probabilites, one can opt to not use the multi-world semantics by adding ! sign to the end of the aggregator.

New Aggregators using Probabilities

Softmax and Normalize

Rank

Weighted Average and Weighted Sum

Sampling with Probability

In Scallop, samplers share the same syntax as aggregators. They usually work with probabilistic provenances, but can also work without them. Here are some example samplers:

  • top: get the $k$ facts with top probabilities
  • categorical: treat the relation as a categorical distribution and sample from it
  • uniform: treat the relation as a uniform distribution and sample from it

Let’s take top as an example. We can obtain the top ranked symbol by using the following rule:

rel symbols = {0.9::"+", 0.05::"-", 0.02::"3"}
rel top_symbol(s) = s := top<1>(s: symbols(s)) // 0.9::top_symbol("+")

The Scallop Python Binding scallopy

scallopy is the Python binding for Scallop, offering an interface for computationg with Scallop in Python. In addition, it can be integrated with PyTorch, allowing users to write Neuro-Symbolic applications that can be connected to PyTorch. In this section, we elaborate on how to install, configure, and use the scallopy library.

For an example, please look at Getting Started. To start reading the documentation, proceed to Scallopy Context

Installation

TODO: Installation with venv

TODO: Installation with Conda

Getting Started with Scallopy

Installation

Prerequisites

Scallopy requires:

  • Python 3.8 or later
  • Rust toolchain (for building from source)

Install from PyPI

The easiest way to install scallopy:

pip install scallopy

Install from Source (Development)

For the latest development version or to contribute:

# Clone the Scallop repository
git clone https://github.com/scallop-lang/scallop.git
cd scallop/etc/scallopy

# Install in development mode
pip install -e .

Development install benefits:

  • Get the latest features
  • Make changes to the source code
  • Run tests and contribute

Verify Installation

Check that scallopy is installed correctly:

import scallopy
print(scallopy.__version__)

Installing PyTorch (Optional)

For machine learning and differentiable programming features, install PyTorch:

pip install torch

See PyTorch installation guide for platform-specific instructions.


Motivating Example

Let’s start with a very simple example illustrating the usage of scallopy.

import scallopy

ctx = scallopy.Context()

ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(1, 2), (2, 3)])

ctx.add_rule("path(a, c) = edge(a, c) or path(a, b) and edge(b, c)")

ctx.run()

print(list(ctx.relation("path"))) # [(1, 2), (1, 3), (2, 3)]

In this very simple edge-path example, we are interacting with Scallop through a Python class called Context. Basically, a Context manages a Scallop program, along with the relations, facts, and execution results corresponding to that Scallop program. We create a Context by ctx = scallopy.Context. Relations, facts, and rules are added through the functions add_relation(...), add_facts(...), and add_rule(...). With everything set, we can execute the program inside the context by calling run() Lastly, we pull the result from ctx by using relation(...). Please refer to a more detailed explanation of this example in Scallop Context.

Machine Learning with Scallopy and PyTorch

When doing machine learning, we usually want to have batched inputs and outputs. Instead of building the Scallop context incrementally and explicitly run the program, we can create a Module at once and be able to run the program for a batch of inputs. This offers a few advantages, such as optimization during compilation, batched execution for integration with machine learning pipelines, simplified interaction between data structures, and so on. For example, we can create a module and run it like the following:

import scallopy
import torch

# Creating a module for execution
my_sum2 = scallopy.Module(
  program="""
    type digit_1(a: i32), digit_2(b: i32)
    rel sum_2(a + b) = digit_1(a) and digit_2(b)
  """,
  input_mappings={"digit_1": range(10), "digit_2": range(10)},
  output_mappings={"sum_2": range(19)},
  provenance="difftopkproofs")

# Invoking the module with torch tensors. `result` is a tensor of 16 x 19
result = my_sum2(
  digit_1=torch.softmax(torch.randn(16, 10), dim=0),
  digit_2=torch.softmax(torch.randn(16, 10), dim=0))

As can be seen in this example, we have defined a Module which can be treated also as a PyTorch module. Similar to other PyTorch modules, it can take in torch tensors and return torch tensors. The logical symbols (such as the i32 numbers used in digit_1 and digit_2) are configured in input_mappings and output_mappings, and can be automatically converted from tensors. We also see that it is capable of handling a batch of inputs (here, the batch size is 16). Internally, Scallop also knows to execute in parallel, making it performing much faster than normal. Please refer to Scallop Module for more information.

Scallop Context

The most fundamental point of interaction of scallopy is ScallopContext. The following is a very simple example setting up a ScallopContext to compute the edge-path program:

import scallopy

# Creating a new context
ctx = scallopy.ScallopContext()

# Add relation of `edge`
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(0, 1), (1, 2)])

# Add rule of `path`
ctx.add_rule("path(a, c) = edge(a, c) or path(a, b) and edge(b, c)")

# Run!
ctx.run()

# Check the result!
print(list(ctx.relation("path"))) # [(0, 1), (0, 2), (1, 2)]

Roughly, the program above can be divided into three phases:

  1. Setup the context: this involves defining relations, adding facts to relations, and adding rules that do the computation
  2. Running the program inside of context
  3. Fetch the results

While the 2nd and 3rd steps are the place where the computation really happens, it’s more important for the programmers to correctly setup the full context for computation. We now elaborate on what are the high-level things to do when setting up the context

Configurations

When creating a new ScallopContext, one should configure it with intended provenance. If no argument is supplied, as shown in the above example, the context will be initialized with the default provenance, unit, which resembles untagged semantics (a.k.a. discrete Datalog). To explicitly specify this, you can do

ctx = scallopy.ScallopContext(provenance="unit")

Of course, Scallop can be used to perform reasoning on probabilistic and differentiable inputs. For instance, you can write the following

ctx = scallopy.ScallopContext(provenance="minmaxprob") # Probabilistic
# or
ctx = scallopy.ScallopContext(provenance="diffminmaxprob") # Differentiable

For more information on possible provenance information, please refer to the provenance section. It it worth noting that some provenance, such as topkproofs, accept additional parameters such as k. In this case, you can supply this as additional arguments when creating the context:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=5) # top-k-proofs provenance with k = 5

Adding Program

Given that a context has been configured and initialized, we can set it up the quickest by loading a program into the context. One can either load an external .scl file, or directly inserting a program written as Python string. To directly add a full program string to the context, one can do

ctx.add_program("""
  rel edge = {(0, 1), (1, 2)}
  rel path(a, c) = edge(a, c) or path(a, b) and edge(b, c)
""")

On the other hand, assuming that there is a file edge_path.scl that contains the same content as the above string, one can do

ctx.import_file("edge_path.scl")

Adding Relations

Instead of adding program as a whole, one can also add relations one-at-a-time. When adding new relations, one would need to supply the name as well as the type of the relation. For example, the edge relation can be defined as follows

ctx.add_relation("edge", (int, int))

Here, we are saying that edge is an arity-2 relation storing pairs of integers. Note that we are specifying the type using Python’s int type. This is equivalent to the i32 type inside Scallop. Therefore, the above instruction tranlates to the following Scallop code:

rel edge(i32, i32)

Many existing Python types can directly translate to Scallop type. In particular, we have the mapping listed as follows:

Python TypeScallop Type
inti32
boolbool
floatf32
strString

In case one want to use types other than the listed ones (e.g., usize), they can be accessed directly using the string "usize", or they can be accessed through predefined types such as scallopy.usize. The example below defines a relation of type (usize, f64, i32):

ctx.add_relation("my_relation", (scallopy.usize, "f64", int))

Specifically for arity-1 relations, users don’t need to use a tuple to specify the type. For instance,

ctx.add_relation("digit", int)

Adding Facts

The most basic version of adding facts into an existing relation inside of an existing context. We are assuming that the context has a provenance of "unit".

ctx.add_facts("edge", [(1, 2), (2, 3)])

If the relation is declared to be having arity-1 and that the type is a singleton type instead of a 1-tuple, then the facts inside of the list do not need to be a tuple.

ctx.add_relation("digit", int)
ctx.add_facts("digit", [1, 2, 3])

Probabilistic Facts (Tagged Facts)

When the Scallop context is configured to use a provenance other than. If one wants to add facts along with probabilities, they can wrap their non-probabilistic facts into tuples whose first element is a simple probability. For example, if originally we have a fact 1, wrapping it with a corresponding probability gives us (0.1, 1), where 0.1 is the probability.

ctx.add_facts("digit", [1, 2, 3])                      # without probability
ctx.add_facts("digit", [(0.1, 1), (0.2, 2), (0.7, 3)]) # with probability

Of course, if the original facts are tuples, the ones with probability will be required to wrap further:

ctx.add_facts("color", [("A", "blue"), ("A", "green"), ...])               # without probability
ctx.add_facts("color", [(0.1, ("A", "blue")), (0.2, ("A", "green")), ...]) # with probability

We can extend this syntax into tagged facts in general. Suppose we are using the boolean semiring (boolean), we are going to tag each fact using values such as True or False.

ctx = scallopy.Context(provenance="boolean")
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(True, (1, 2)), (False, (2, 3))])

Non-tagged Facts in Tagged Context

Adding Rules

Tagged Rules

Running

Additional Features

There are more features provided by the ScallopContext interface. We hereby list them for reference.

Cloning

One can copy a context to create a new context. The resulting context will contain all the program, configurations, and provenance information.

new_ctx = ctx.clone()

The cloning feature relates to pseudo-incremental computation and branching computation. We elaborate on this in the Branching Computation section.

Compiling

Iteration Count Limit

One can configure the

Early Discarding

Obtaining Context Information

Foreign Functions and Predicates

Saving and Loading

Please refer to the Saving and Loading section for more information.

Branching Executions

One cool feature that scallopy supports is branching execution. People can create a context, clone it to form new contexts, and modify the new context without touching the old ones. This is particularly useful when incremental computation is desired.

# Create the first version of the context
ctx = scallopy.ScallopContext()
ctx.add_relation(...)
ctx.add_facts(...)

# Branch it into another context
ctx1 = ctx.clone()
ctx1.add_relation(...)
ctx1.add_facts(...)
ctx1.run() # Running the first context

# Branch it into one more context; `ctx1` and `ctx2` are completely disjoint
ctx2 = ctx.clone()
ctx2.add_relation(...)
ctx2.add_facts(...)
ctx2.run() # Running the second context

Configuring Provenance

Provenance in Scallopy determines how Scallop tracks and computes probabilities during execution. This guide shows you how to configure provenance in Python and work with probabilistic facts using the ScallopContext API.

Setting Provenance in Python

When creating a ScallopContext, you specify the provenance type as a string parameter.

Basic Provenance Configuration

import scallopy

# Standard DataLog (no probability tracking)
ctx = scallopy.ScallopContext(provenance="unit")

# Min-max probability bounds (fast, conservative)
ctx = scallopy.ScallopContext(provenance="minmaxprob")

# Add-mult probability (fast, approximate)
ctx = scallopy.ScallopContext(provenance="addmultprob")

# Top-K proofs with exact probability
ctx = scallopy.ScallopContext(provenance="topkproofs", k=3)

Available Provenance Types

The provenance parameter accepts the following values:

Discrete Provenances (no probability):

  • "unit" - Standard DataLog, no provenance tracking
  • "proofs" - Collect all derivation proofs
  • "tropical" - Min-add semiring (positive integers + infinity)
  • "boolean" - Boolean logic tracking
  • "natural" - Natural number semiring

Probabilistic Provenances:

  • "minmaxprob" - Min-max probability bounds (fast, conservative)
  • "addmultprob" - Add-mult probability (fast, approximate)
  • "topkproofs" - Top-K most probable proofs with exact probability via WMC
  • "probproofs" - All proofs with exact probability
  • "topbottomkclauses" - Top/bottom-K clauses for negation and aggregation

Differentiable Provenances (for PyTorch integration):

  • "difftopkproofs" - Differentiable top-K proofs
  • "difftopkproofsdebug" - With stable fact IDs for debugging
  • "diffminmaxprob" - Differentiable min-max probability
  • "diffaddmultprob" - Differentiable add-mult probability
  • "diffsamplekproofs" - Differentiable unbiased sampling of K proofs
  • "difftopbottomkclauses" - Differentiable top/bottom-K with full negation support

The K Parameter

Provenances like topkproofs and difftopkproofs require a k parameter specifying how many proofs to track:

# Keep top 5 most probable proofs for each derived fact
ctx = scallopy.ScallopContext(provenance="topkproofs", k=5)

Tradeoff:

  • Larger K = More memory, more accurate probabilities, slower execution
  • Smaller K = Less memory, approximate probabilities, faster execution

Typical values: k=3 for development, k=5-10 for production

Train vs. Test K

For machine learning applications, you can use different K values during training and testing:

ctx = scallopy.ScallopContext(
  provenance="difftopkproofs",
  train_k=3,  # Smaller K during training (faster)
  test_k=10   # Larger K during inference (more accurate)
)

Adding Probabilistic Facts

Once you’ve configured provenance, you add facts with probabilities using the add_facts() method.

Basic Probabilistic Facts

The most common format is a list of (probability, tuple) pairs:

import scallopy

ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.add_relation("edge", (int, int))

# Add facts with probabilities
ctx.add_facts("edge", [
  (0.1, (0, 1)),  # 10% probability
  (0.2, (1, 2)),  # 20% probability
  (0.3, (2, 3)),  # 30% probability
])

ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = edge(a, b), path(b, c)")
ctx.run()

# Inspect results with probabilities
for (prob, (start, end)) in ctx.relation("path"):
  print(f"Path {start}→{end}: probability {prob:.3f}")

Output:

Path 0→1: probability 0.100
Path 1→2: probability 0.200
Path 2→3: probability 0.300
Path 0→2: probability 0.200
Path 1→3: probability 0.300
Path 0→3: probability 0.300

Facts Without Probabilities

If provenance supports it, you can omit probabilities (defaults to 1.0):

ctx.add_facts("edge", [
  (0, 1),  # Implicitly probability 1.0
  (1, 2),
])

Mutual Exclusion (Disjunctions)

When facts are mutually exclusive (only one can be true), specify disjunctions to get correct probabilities:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=3)
ctx.add_relation("color", (int, str))

# Object 0 can be blue OR green (mutually exclusive)
ctx.add_facts("color", [
  (0.7, (0, "blue")),
  (0.3, (0, "green")),
], disjunctions=[
  [0, 1]  # Indices 0 and 1 are mutually exclusive
])

# Object 1 can be red OR yellow
ctx.add_facts("color", [
  (0.6, (1, "red")),
  (0.4, (1, "yellow")),
], disjunctions=[
  [2, 3]  # Indices 2 and 3 are mutually exclusive
])

Important: The disjunctions parameter uses indices into the facts list being added. Each disjunction is a list of indices that are mutually exclusive.

Multiple disjunction groups:

ctx.add_facts("relation", [
  (0.5, (0,)),  # Index 0
  (0.5, (1,)),  # Index 1
  (0.3, (2,)),  # Index 2
  (0.7, (3,)),  # Index 3
], disjunctions=[
  [0, 1],  # Facts 0 and 1 are mutually exclusive
  [2, 3],  # Facts 2 and 3 are mutually exclusive
])

Loading Facts from CSV

For large datasets, load facts directly from CSV files:

ctx = scallopy.ScallopContext(provenance="minmaxprob")

# Load CSV with implicit probability 1.0
ctx.add_relation("edge", (int, int), load_csv="edges.csv")

CSV format with probabilities:

0.9,0,1
0.8,1,2
0.7,2,3
# First column is probability
ctx.add_relation("edge", (int, int), load_csv="edges_prob.csv")

For advanced CSV options, see ScallopContext documentation.


Differentiable Provenance

Differentiable provenances integrate with PyTorch, enabling gradient-based learning over symbolic reasoning.

Basic Setup with PyTorch

import torch
import scallopy

ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("edge", (int, int))

# Add facts with torch tensors
ctx.add_facts("edge", [
  (torch.tensor(0.9), (0, 1)),
  (torch.tensor(0.8), (1, 2)),
  (torch.tensor(0.7), (2, 3)),
])

ctx.add_rule("path(a, c) = edge(a, c)")
ctx.add_rule("path(a, c) = edge(a, b), path(b, c)")
ctx.run()

# Results are tensors with gradients
for (prob_tensor, (start, end)) in ctx.relation("path"):
  print(f"Path {start}→{end}: {prob_tensor}")

Forward Functions for Neural Networks

The most common pattern is using forward_function() to create differentiable modules:

import torch
import scallopy

# Create context and define program
ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("digit_1", int, range(10))
ctx.add_relation("digit_2", int, range(10))
ctx.add_rule("sum_2(a + b) = digit_1(a) and digit_2(b)")

# Create forward function
forward = ctx.forward_function("sum_2", list(range(19)))

# Use in training loop
digit_1 = torch.softmax(torch.randn((16, 10), requires_grad=True), dim=1)
digit_2 = torch.softmax(torch.randn((16, 10), requires_grad=True), dim=1)

# Forward pass through Scallop
sum_2 = forward(digit_1=digit_1, digit_2=digit_2)

# Backward pass computes gradients
loss = torch.nn.BCELoss()(sum_2, ground_truth)
loss.backward()

# Gradients flow back to digit_1 and digit_2

Disjunctions with PyTorch

Mutual exclusion works with differentiable provenances too:

ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("obj_color", (int, str))

# Object colors with mutual exclusion
ctx.add_facts("obj_color", [
  (torch.tensor(0.99), (0, "blue")),
  (torch.tensor(0.01), (0, "green")),
], disjunctions=[[0, 1]])

ctx.add_facts("obj_color", [
  (torch.tensor(0.86), (1, "blue")),
  (torch.tensor(0.14), (1, "green")),
], disjunctions=[[0, 1]])

Stable Fact IDs for Debugging

The difftopkproofsdebug provenance supports user-provided stable fact IDs for tracking and debugging:

import torch
import scallopy

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("edge", (int, int))

# Add facts with stable IDs
ctx.add_facts("edge", [
  ((torch.tensor(0.9), 1), (0, 1)),  # Fact ID = 1
  ((torch.tensor(0.8), 2), (1, 2)),  # Fact ID = 2
  ((torch.tensor(0.7), 3), (2, 3)),  # Fact ID = 3
])

Fact ID requirements:

  • Start from 1 (not 0)
  • Contiguous (no gaps)
  • Unique across all facts

Use cases:

  • Debugging: Trace which facts contributed to conclusions
  • HNLE MCP: Retract facts by stable ID
  • Provenance auditing: Track data lineage

For detailed usage, see Debugging Probabilistic Programs.


Custom Provenance

For advanced use cases, you can implement custom provenance semirings in Python.

Built-in Python Provenances

Scallopy includes Python-implemented provenances:

ctx = scallopy.ScallopContext(provenance="diffaddmultprob2")
# Equivalent to diffaddmultprob but implemented in Python

Available Python provenances:

  • "diffaddmultprob2" - Add-mult probability
  • "diffnandmultprob2" - NAND-mult probability
  • "diffmaxmultprob2" - Max-mult probability

Custom Provenance Objects

You can create custom provenance semirings by subclassing ScallopProvenance:

from scallopy import ScallopProvenance

class MyCustomProvenance(ScallopProvenance):
  def __init__(self):
    super().__init__()

  def base(self, tag):
    # Define base tagging
    return tag

  def add(self, tag1, tag2):
    # Define disjunction (OR)
    return tag1 + tag2

  def mult(self, tag1, tag2):
    # Define conjunction (AND)
    return tag1 * tag2

# Use custom provenance
ctx = scallopy.ScallopContext(custom_provenance=MyCustomProvenance())

Provenance semiring operations:

  • base(tag) - Tag a base fact
  • add(t1, t2) - Combine alternative derivations (disjunction)
  • mult(t1, t2) - Combine rule body atoms (conjunction)
  • negate(tag) - Negate a tag (optional, for negation support)

Common Patterns

Pattern 1: Choosing the Right Provenance

Decision tree:

  1. Do you need probabilities?

    • No → Use "unit" (fastest)
    • Yes → Continue
  2. Do you need gradients for ML?

    • No → Continue to #3
    • Yes → Continue to #4
  3. Non-differentiable probabilistic:

    • Fast bounds okay? → "minmaxprob"
    • Need exact probability? → "topkproofs" with k=5-10
    • Need all proofs? → "probproofs" (expensive)
  4. Differentiable for ML:

    • Standard case → "difftopkproofs" with k=3-5
    • Need negation/aggregation → "difftopbottomkclauses"
    • Debugging needed → "difftopkproofsdebug"

Pattern 2: Incremental Fact Addition

Add facts incrementally during execution:

ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.add_relation("edge", (int, int))

# Add initial facts
ctx.add_facts("edge", [(0.9, (0, 1))])
ctx.add_rule("path(a, c) = edge(a, c)")
ctx.run()

# Add more facts later
ctx.add_facts("edge", [(0.8, (1, 2))])
ctx.run()  # Re-run with new facts

Pattern 3: Batched Facts with Input Mappings

For ML applications, use input mappings to define the domain:

ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)

# Define relation with input mapping (domain)
ctx.add_relation("digit", int, input_mapping=range(10))

# Add batched facts (probability distribution over domain)
digit_probs = torch.softmax(torch.randn(10), dim=0)
ctx.add_facts("digit", digit_probs)

Input mapping defines the expected domain, and facts can be provided as:

  • Tensor of shape (domain_size,) - probability distribution
  • List of tuples - sparse facts

Pattern 4: Provenance for Different Reasoning Tasks

Knowledge graph reasoning:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=10)
# Need exact probabilities for multi-hop reasoning

Neurosymbolic learning:

ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
# Gradients for training, K=3 for efficiency

Approximate inference:

ctx = scallopy.ScallopContext(provenance="minmaxprob")
# Fast bounds for large-scale applications

Debugging and explanation:

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=5)
# Stable fact IDs for tracing derivations

Pattern 5: WMC Optimization

Enable WMC with disjunctions for better probability computation:

ctx = scallopy.ScallopContext(
  provenance="topkproofs",
  k=5,
  wmc_with_disjunctions=True  # Better handling of disjunctions
)

This improves probability computation when you have many mutually exclusive facts.


Summary

  • Set provenance when creating ScallopContext(provenance="...")
  • 18 provenance types available: discrete, probabilistic, differentiable
  • Add probabilistic facts with add_facts(relation, [(prob, tuple), ...])
  • Mutual exclusion specified via disjunctions parameter
  • Differentiable provenances integrate with PyTorch for gradient-based learning
  • Stable fact IDs available in difftopkproofsdebug for debugging
  • Custom provenances possible by subclassing ScallopProvenance

For more details:

Creating Modules

Scallop modules are PyTorch-compatible components that wrap Scallop programs for seamless integration with neural networks. They enable end-to-end differentiable neurosymbolic learning by combining neural perception with symbolic reasoning.

API Overview

Scallop provides two primary APIs for creating differentiable modules:

  • scallopy.Module - High-level PyTorch nn.Module wrapper
  • scallopy.ScallopForwardFunction - Functional interface for forward passes

Both APIs provide the same functionality. Use ScallopForwardFunction for functional-style code or when you need fine-grained control. This guide uses both interchangeably.

Note: All examples in this guide work with both APIs. Replace scallopy.Module(...) with scallopy.ScallopForwardFunction(...) as needed.

What is a Scallop Module?

A Scallop module is a torch.nn.Module subclass that:

  • Wraps a Scallop program (relations, rules, and facts)
  • Accepts PyTorch tensors as probabilistic input facts
  • Performs symbolic reasoning via Scallop’s execution engine
  • Returns PyTorch tensors as output with gradient support
  • Integrates seamlessly into neural network architectures

Key Benefits

1. Declarative Logic: Express reasoning symbolically instead of learning it from data

# Instead of training a neural network to learn addition...
# ...declare it symbolically:
"sum_2(a + b) = digit_a(a) and digit_b(b)"

2. Gradient Flow: Backpropagation works through symbolic reasoning

loss.backward()  # Gradients flow through Scallop module

3. Interpretability: Logic is explicit and human-readable

# Rules are visible: you know exactly what the model is doing
ctx.add_rule("path(a, c) = edge(a, b), path(b, c)")

4. Sample Efficiency: Inject domain knowledge instead of learning everything

# Neural network learns perception (digit classification)
# Scallop handles symbolic reasoning (multi-digit arithmetic)

Important: Program Requirements

Type Declarations Required

When creating modules, your Scallop program must include type declarations for input relations:

# ✓ Correct - includes type declarations
sum_2 = scallopy.Module(
  program="""
    type digit_a(usize), digit_b(usize)
    rel sum_2(a + b) = digit_a(a) and digit_b(b)
  """,
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

# ✗ Incorrect - missing type declarations
sum_2 = scallopy.Module(
  program="rel sum_2(a + b) = digit_a(a) and digit_b(b)",  # Will cause warnings
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

Why? Type declarations help Scallop’s compiler optimize the program and ensure type safety.

Batch Dimension Required

Input tensors must have a batch dimension as the first axis:

# ✓ Correct - shape: (batch_size, num_classes)
digit_a = torch.randn(16, 10)  # Batch of 16, 10 digit classes
digit_b = torch.randn(16, 10)
result = sum_2(digit_a=digit_a, digit_b=digit_b)
# result.shape: (16, 19)

# ✗ Incorrect - missing batch dimension
digit_a = torch.randn(10)  # Shape error!
result = sum_2(digit_a=digit_a, digit_b=digit_b)

Why? Scallop processes batches of inputs for efficiency. Even for single examples, use tensor.unsqueeze(0) to add batch dimension.


Creating Basic Modules

There are three ways to create a Scallop module:

Method 1: Inline Program String

The most common approach for simple programs:

import scallopy

sum_2 = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="""
    type digit_a(usize), digit_b(usize)
    rel sum_2(a + b) = digit_a(a) and digit_b(b)
  """,
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

When to use: Short programs (< 20 lines), quick prototyping

Method 2: External File

For larger programs, load from a .scl file:

# File: reasoning.scl contains your Scallop program
module = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  file="reasoning.scl",
  input_mappings={"input_rel": range(100)},
  output_mapping=("output_rel", range(50))
)

When to use: Large programs, code reuse, version control

Method 3: Programmatic Construction

Build the program piece by piece:

module = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  relations=[
    ("digit_1", (int,)),
    ("digit_2", (int,)),
  ],
  rules=[
    "sum_2(a + b) = digit_1(a) and digit_2(b)",
    "mult_2(a * b) = digit_1(a) and digit_2(b)",
  ],
  input_mappings={
    "digit_1": range(10),
    "digit_2": range(10),
  },
  output_mappings={
    "sum_2": range(19),
    "mult_2": range(100),
  }
)

When to use: Dynamic program generation, conditional logic structure


Input and Output Mappings

Mappings define how PyTorch tensors are converted to/from Scallop facts.

Input Mappings

Input mappings specify the domain of input relations - the set of possible values:

input_mappings={
  "digit": range(10),  # Domain: digits 0-9
  "color": ["red", "green", "blue"],  # Domain: three colors
}

How it works:

When you call the module with a tensor:

digit_probs = torch.tensor([0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.1])
result = module(digit=digit_probs)

Scallop interprets this as:

digit(0) with probability 0.1
digit(1) with probability 0.0
...
digit(8) with probability 0.8
digit(9) with probability 0.1

Formats:

  1. Range: range(n) for integers 0 to n-1

    input_mappings={"digit": range(10)}
    
  2. List: Explicit enumeration

    input_mappings={"color": ["red", "green", "blue"]}
    
  3. List of tuples: For multi-column relations

    input_mappings={
      "edge": [(0,1), (1,2), (2,3), (0,3)]
    }
    

Output Mappings

Output mappings specify how to extract results from Scallop relations:

Single output:

output_mapping=("sum_2", range(19))
# Extracts sum_2(0), sum_2(1), ..., sum_2(18)
# Returns tensor of shape (batch_size, 19)

Multiple outputs:

output_mappings={
  "sum_2": range(19),
  "mult_2": range(100),
}
# Returns dictionary: {"sum_2": tensor1, "mult_2": tensor2}

Tuple outputs:

output_mapping=("path", [(0,1), (0,2), (1,2)])
# Extracts path(0,1), path(0,2), path(1,2)
# Returns tensor of shape (batch_size, 3)

Forward Pass

Once created, use the module like any PyTorch component.

Basic Forward Pass

import torch
import scallopy

# Create module
sum_2 = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="rel sum_2(a + b) = digit_a(a) and digit_b(b)",
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

# Prepare input: probability distributions over digits
digit_a = torch.tensor([
  [0.0, 0.9, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],  # Mostly 1
  [0.0, 0.0, 0.0, 0.8, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0],  # Mostly 3
])
digit_b = torch.tensor([
  [0.0, 0.0, 0.8, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],  # Mostly 2
  [0.0, 0.0, 0.0, 0.0, 0.9, 0.1, 0.0, 0.0, 0.0, 0.0],  # Mostly 4
])

# Forward pass
result = sum_2(digit_a=digit_a, digit_b=digit_b)

# Result shape: (2, 19) - batch of 2, sums 0-18
print(result)
# Output:
# tensor([[0.0, 0.0, 0.0, 0.72, 0.18, ...],   # 1+2=3 (prob 0.72)
#         [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.72, ...]])  # 3+4=7

Gradient Computation

Gradients flow through the Scallop module:

# Inputs with gradients
digit_a = torch.randn(16, 10, requires_grad=True)
digit_a = torch.softmax(digit_a, dim=1)

digit_b = torch.randn(16, 10, requires_grad=True)
digit_b = torch.softmax(digit_b, dim=1)

# Forward pass
sum_result = sum_2(digit_a=digit_a, digit_b=digit_b)

# Compute loss
ground_truth = torch.zeros(16, 19)
ground_truth[:, 5] = 1.0  # Target: sum should be 5

loss_fn = torch.nn.BCELoss()
loss = loss_fn(sum_result, ground_truth)

# Backward pass
loss.backward()

# Gradients are computed for digit_a and digit_b
assert digit_a.grad is not None
assert digit_b.grad is not None

Multiple Outputs

When using output_mappings (plural), the module returns a dictionary:

module = scallopy.Module(
  provenance="diffaddmultprob",
  program="""
    rel sum_2(a + b) = digit_1(a) and digit_2(b)
    rel mult_2(a * b) = digit_1(a) and digit_2(b)
  """,
  input_mappings={"digit_1": range(10), "digit_2": range(10)},
  output_mappings={
    "sum_2": range(20),
    "mult_2": range(100),
  }
)

digit_1 = torch.randn(16, 10)
digit_2 = torch.randn(16, 10)

result = module(digit_1=digit_1, digit_2=digit_2)

# Result is a dictionary
print(result["sum_2"].shape)   # (16, 20)
print(result["mult_2"].shape)  # (16, 100)

Integration with Neural Networks

Scallop modules compose naturally with PyTorch layers.

Pattern 1: Symbolic Reasoning Layer

Use Scallop as a reasoning component in a larger network:

import torch
import torch.nn as nn
import scallopy

class DigitAdder(nn.Module):
  def __init__(self):
    super().__init__()

    # Neural perception: classify digits from images
    self.digit_classifier = nn.Sequential(
      nn.Linear(784, 128),
      nn.ReLU(),
      nn.Linear(128, 10),
    )

    # Symbolic reasoning: add digits
    self.adder = scallopy.Module(
      provenance="difftopkproofs",
      k=3,
      program="rel sum(a + b) = digit_a(a) and digit_b(b)",
      input_mappings={"digit_a": range(10), "digit_b": range(10)},
      output_mapping=("sum", range(19))
    )

  def forward(self, img_a, img_b):
    # Neural: classify digits
    logits_a = self.digit_classifier(img_a)
    logits_b = self.digit_classifier(img_b)

    probs_a = torch.softmax(logits_a, dim=1)
    probs_b = torch.softmax(logits_b, dim=1)

    # Symbolic: add them
    sum_probs = self.adder(digit_a=probs_a, digit_b=probs_b)

    return sum_probs

# Training loop
model = DigitAdder()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

for img_a, img_b, target_sum in dataloader:
  optimizer.zero_grad()

  sum_probs = model(img_a, img_b)
  loss = loss_fn(sum_probs, target_sum)

  loss.backward()
  optimizer.step()

Pattern 2: Knowledge Graph Reasoning

Inject symbolic knowledge into perception:

class KnowledgeEnhancedClassifier(nn.Module):
  def __init__(self):
    super().__init__()

    # Neural perception
    self.encoder = nn.Linear(input_dim, 64)

    # Symbolic reasoning with domain knowledge
    self.reasoner = scallopy.Module(
      provenance="difftopkproofs",
      k=5,
      program="""
        rel parent(p, c) = raw_parent(p, c)
        rel ancestor(a, d) = parent(a, d)
        rel ancestor(a, d) = ancestor(a, c), parent(c, d)
        rel sibling(a, b) = parent(p, a), parent(p, b), a != b
      """,
      input_mappings={"raw_parent": parent_pairs},
      output_mapping=("sibling", sibling_pairs)
    )

  def forward(self, features):
    encoded = self.encoder(features)
    parent_probs = torch.softmax(encoded, dim=1)

    sibling_probs = self.reasoner(raw_parent=parent_probs)
    return sibling_probs

Pattern 3: Multi-Task Learning

Use multiple output relations for joint predictions:

classifier = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="""
    rel is_animal(x) = is_cat(x)
    rel is_animal(x) = is_dog(x)
    rel has_tail(x) = is_animal(x), not is_human(x)
    rel can_fly(x) = is_bird(x)
  """,
  input_mappings={
    "is_cat": range(100),
    "is_dog": range(100),
    "is_bird": range(100),
    "is_human": range(100),
  },
  output_mappings={
    "is_animal": range(100),
    "has_tail": range(100),
    "can_fly": range(100),
  }
)

# Single forward pass computes all outputs jointly
outputs = classifier(is_cat=cat_probs, is_dog=dog_probs, ...)
animal_pred = outputs["is_animal"]
tail_pred = outputs["has_tail"]
fly_pred = outputs["can_fly"]

Common Patterns

Pattern 1: Train/Test K Configuration

Use smaller K during training for speed, larger K during inference for accuracy:

model = scallopy.Module(
  provenance="difftopkproofs",
  train_k=3,   # Fast during training
  test_k=10,   # Accurate during inference
  program=program_str,
  input_mappings=input_maps,
  output_mapping=output_map
)

# Automatically uses train_k during training
model.train()
output = model(input_data)

# Automatically uses test_k during evaluation
model.eval()
output = model(input_data)

Pattern 2: JIT Compilation

Enable JIT compilation for faster execution:

model = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program=program_str,
  input_mappings=input_maps,
  output_mapping=output_map,
  jit=True,
  jit_name="my_model"  # Optional: for caching
)

Benefits:

  • First run compiles the model
  • Subsequent runs use cached compiled version
  • Significant speedup for repeated executions

Pattern 3: Sparse Gradients

For large domains with sparse activations:

# Create forward function with sparse Jacobian
ctx = scallopy.ScallopContext(provenance="difftopkproofs", k=3)
ctx.add_relation("input", int, range(1000))
ctx.add_rule("output(x * 2) = input(x)")

forward = ctx.forward_function(
  "output",
  range(2000),
  sparse_jacobian=True  # Use sparse gradients
)

# Gradients are now sparse tensors
input_data = torch.randn(1000, requires_grad=True)
output = forward(input=input_data)
loss = output.sum()
loss.backward()  # Efficient sparse gradient computation

Pattern 4: Non-Probabilistic Inputs

Some inputs don’t need probability tracking:

model = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="""
    rel weighted_sum(x * w + y * w) = value(x), value(y), weight(w)
  """,
  input_mappings={
    "value": range(10),
    "weight": [0.5, 1.0, 1.5, 2.0],
  },
  non_probabilistic=["weight"],  # Weights are fixed
  output_mapping=("weighted_sum", range(100))
)

Pattern 5: Iteration Limits

Control recursion depth:

model = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="""
    rel path(a, b) = edge(a, b)
    rel path(a, c) = path(a, b), edge(b, c)
  """,
  input_mappings={"edge": edge_pairs},
  output_mapping=("path", path_pairs),
  iter_limit=10  # Maximum 10 iterations of recursion
)

Summary

  • Scallop modules integrate symbolic reasoning into PyTorch neural networks
  • Three creation methods: inline program, external file, programmatic
  • Input mappings define domains; tensors represent probability distributions
  • Output mappings extract results; support single or multiple outputs
  • Gradient flow enables end-to-end differentiable neurosymbolic learning
  • Common patterns: train/test K, JIT compilation, sparse gradients, non-probabilistic inputs

For more details:

Module Input

Input mappings define how PyTorch tensors are converted into Scallop facts. They specify the domain (set of possible values) of input relations, allowing you to pass probability distributions as tensors and have them automatically converted to probabilistic facts.

What is an Input Mapping?

An input mapping establishes a correspondence between tensor indices and Scallop tuples:

input_mappings={"digit": range(10)}

This mapping says: “The digit relation has domain 0-9, and a tensor of shape (10,) represents probabilities for each digit.”

Example:

# Tensor: [0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.1]
# Interpreted as:
# digit(0) with probability 0.1
# digit(8) with probability 0.8
# digit(9) with probability 0.1

Input Mapping Formats

Scallop supports multiple formats for defining input mappings:

Format 1: Range (Simple Integer Domain)

The most common format for integer-valued relations:

input_mappings={"digit": range(10)}
# Domain: digit(0), digit(1), ..., digit(9)
# Expected tensor shape: (10,) or (batch_size, 10)

Properties:

  • kind: "list"
  • shape: (10,)
  • dimension: 1
  • is_singleton: True (single-column relation)

Format 2: List (Explicit Enumeration)

For arbitrary values:

input_mappings={
  "color": ["red", "green", "blue"]
}
# Domain: color(red), color(green), color(blue)
# Expected tensor shape: (3,)

With tuples:

input_mappings={
  "edge": [(0,1), (1,2), (2,3), (0,3)]
}
# Domain: edge(0,1), edge(1,2), edge(2,3), edge(0,3)
# Expected tensor shape: (4,)

Properties:

  • kind: "list"
  • shape: (len(list),)
  • is_singleton: False if tuples, True if values

Format 3: Dictionary (Multi-Dimensional)

For relations with multiple columns, use a dictionary mapping dimension indices to value lists:

input_mappings={
  "edge": {
    0: range(5),  # First column: nodes 0-4
    1: range(5),  # Second column: nodes 0-4
  }
}
# Domain: all pairs (i, j) where i, j ∈ {0, 1, 2, 3, 4}
# Expected tensor shape: (5, 5) - 25 possible edges

Mixed types:

input_mappings={
  "likes": {
    0: ["alice", "bob", "charlie"],
    1: ["pizza", "salad", "burger"],
  }
}
# Domain: likes(person, food) for all person-food combinations
# Expected tensor shape: (3, 3)

Properties:

  • kind: "dict"
  • shape: (len(dim0), len(dim1), ...)
  • dimension: Number of dimensions
  • is_singleton: False

Format 4: Tuple (Fixed Constant)

For a single fixed tuple:

input_mappings={
  "start_node": (0,)
}
# Domain: start_node(0) only
# Expected tensor shape: ()

Properties:

  • kind: "tuple"
  • shape: ()
  • dimension: 0

Format 5: Value (Single Constant)

For a single value:

input_mappings={
  "threshold": 0.5
}
# Domain: threshold(0.5) only
# Expected tensor shape: ()

Properties:

  • kind: "value"
  • shape: ()
  • dimension: 0
  • is_singleton: True

Tensor Shapes and Batching

Input mappings automatically handle batching.

Single Example

If tensor shape matches the mapping shape exactly, it’s treated as a single example:

im = scallopy.InputMapping(range(10))
tensor = torch.randn(10)  # Single probability distribution

facts = im.process_tensor(tensor, batched=False)
# Returns: list of 10 facts

Batched Input

If tensor has an extra leading dimension, it’s treated as a batch:

im = scallopy.InputMapping(range(10))
tensor = torch.randn(16, 10)  # Batch of 16 distributions

facts = im.process_tensor(tensor, batched=False)
# Returns: list of 16 lists, each with 10 facts

Multi-Dimensional Mappings

For multi-dimensional mappings, the tensor shape must match:

im = scallopy.InputMapping({0: range(5), 1: range(3)})
tensor = torch.randn(5, 3)  # Single example
# OR
tensor = torch.randn(16, 5, 3)  # Batch of 16

facts = im.process_tensor(tensor)

Sparse Inputs

For large domains, you often don’t want to include all facts. Scallop provides filtering mechanisms:

Retain Top-K

Keep only the K highest-probability facts:

input_mappings={
  "digit": scallopy.InputMapping(
    range(10),
    retain_k=3  # Keep only top 3 digits
  )
}

# Tensor: [0.05, 0.02, 0.30, 0.01, 0.40, 0.03, 0.10, 0.02, 0.05, 0.02]
# After retain_k=3: only facts for indices 4 (0.40), 2 (0.30), 6 (0.10) are kept

With multi-dimensional mappings:

input_mappings={
  "edge": scallopy.InputMapping(
    {0: range(10), 1: range(10)},
    retain_k=5,  # Keep only top 5 edges across all 100 possibilities
  )
}

Per-dimension sampling:

input_mappings={
  "edge": scallopy.InputMapping(
    {0: range(10), 1: range(10)},
    retain_k=2,
    sample_dim=1,  # Keep top 2 for each value of dimension 1
  )
}
# Result: 10 * 2 = 20 facts (top 2 destinations for each source)

Retain Threshold

Keep only facts above a probability threshold:

input_mappings={
  "digit": scallopy.InputMapping(
    range(10),
    retain_threshold=0.1  # Only keep probabilities > 0.1
  )
}

# Tensor: [0.05, 0.02, 0.30, 0.01, 0.40, 0.03, 0.10, 0.02, 0.05, 0.02]
# After threshold: only facts for indices 2 (0.30), 4 (0.40) are kept
# Note: 0.10 is NOT kept (must be strictly greater than threshold)

Categorical Sampling

Instead of deterministic top-K, sample K facts according to their probabilities:

input_mappings={
  "digit": scallopy.InputMapping(
    range(10),
    retain_k=3,
    sample_strategy="categorical"  # Stochastic sampling
  )
}
# Each forward pass samples 3 different digits based on probabilities

Disjunctions in Input Mappings

When facts are mutually exclusive, mark them as disjunctive:

Global Disjunction

All facts in the relation are mutually exclusive:

input_mappings={
  "digit": scallopy.InputMapping(
    range(10),
    disjunctive=True
  )
}
# All 10 digit facts share one mutual exclusion ID

Per-Dimension Disjunction

Mutual exclusion along a specific dimension:

input_mappings={
  "color": scallopy.InputMapping(
    {0: range(3), 1: ["red", "green", "blue"]},
    disjunctive_dim=1  # Each object has mutually exclusive colors
  )
}
# color(0, red), color(0, green), color(0, blue) are mutually exclusive
# color(1, red), color(1, green), color(1, blue) are mutually exclusive
# But color(0, red) and color(1, red) are NOT mutually exclusive

Complete Example

Putting it all together:

import torch
import scallopy

# Create module with complex input mappings
module = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="""
    // Classify objects
    rel class(o, c) = color(o, col), shape(o, sh), classifier(col, sh, c)
  """,
  input_mappings={
    # Simple list
    "color": scallopy.InputMapping(
      {0: range(10), 1: ["red", "green", "blue"]},
      disjunctive_dim=1,  # Each object has one color
      retain_k=1,  # Keep most likely color per object
      sample_dim=1,
    ),

    # Multi-dimensional with threshold
    "shape": scallopy.InputMapping(
      {0: range(10), 1: ["circle", "square", "triangle"]},
      disjunctive_dim=1,
      retain_threshold=0.2,  # Only confident shapes
    ),

    # Fixed classifier (non-probabilistic)
    "classifier": [
      ("red", "circle", "apple"),
      ("green", "circle", "lime"),
      # ... more rules
    ],
  },
  output_mapping=("class", [(i, c) for i in range(10) for c in ["apple", "lime"]])
)

# Use with batched tensors
color_probs = torch.softmax(torch.randn(16, 10, 3), dim=2)
shape_probs = torch.softmax(torch.randn(16, 10, 3), dim=2)

result = module(color=color_probs, shape=shape_probs)
# Result shape: (16, 20) - batch of 16, (object, class) pairs

Summary

  • Input mappings define the domain of input relations
  • Five formats: range, list, dict, tuple, value
  • Batching is automatic - extra leading dimension = batch
  • Sparse inputs via retain_k, retain_threshold, or sample_strategy
  • Disjunctions mark mutually exclusive facts (global or per-dimension)
  • Properties: kind, shape, dimension, is_singleton

For more details:

Module Output

Output mappings define how Scallop results are extracted and converted back into PyTorch tensors. They specify which tuples from output relations should be included in the final tensor and in what order.

What is an Output Mapping?

An output mapping specifies which facts to extract from a Scallop relation and how to arrange them in the output tensor.

output_mapping=("sum_2", range(19))

This says: “Extract facts sum_2(0), sum_2(1), …, sum_2(18) and return them as a tensor of shape (19,) (or (batch_size, 19) if batched).”

Flow:

Scallop execution → Relations with facts → Output mapping → Tensor

Single Output

Use output_mapping (singular) when your module produces one output relation.

Format: Tuple Notation

output_mapping=("relation_name", mapping_list)

Example 1: Simple integer range

module = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="rel sum_2(a + b) = digit_a(a) and digit_b(b)",
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

result = module(digit_a=probs_a, digit_b=probs_b)
# Result shape: (batch_size, 19)
# result[:, 0] = probability of sum_2(0)
# result[:, 1] = probability of sum_2(1)
# ...
# result[:, 18] = probability of sum_2(18)

Format: List of Values

output_mapping=("relation_name", [value1, value2, ...])

Example 2: Explicit list

module = scallopy.Module(
  ...
  output_mapping=("color", ["red", "green", "blue"])
)

result = module(...)
# Result shape: (batch_size, 3)
# result[:, 0] = probability of color("red")
# result[:, 1] = probability of color("green")
# result[:, 2] = probability of color("blue")

Format: List of Tuples

For multi-column relations:

output_mapping=("relation_name", [(tuple1), (tuple2), ...])

Example 3: Multi-column relation

# Extract specific paths
output_mapping=("path", [(0,1), (0,2), (1,2), (2,3)])

result = module(...)
# Result shape: (batch_size, 4)
# result[:, 0] = probability of path(0, 1)
# result[:, 1] = probability of path(0, 2)
# result[:, 2] = probability of path(1, 2)
# result[:, 3] = probability of path(2, 3)

Example 4: All pairs

# Generate all pairs programmatically
all_pairs = [(i, j) for i in range(5) for j in range(5)]
output_mapping=("edge", all_pairs)

result = module(...)
# Result shape: (batch_size, 25)

Multiple Outputs

Use output_mappings (plural) when your module produces multiple output relations.

Format: Dictionary

output_mappings={
  "relation1": mapping1,
  "relation2": mapping2,
  ...
}

Example:

module = scallopy.Module(
  provenance="diffaddmultprob",
  program="""
    rel sum_2(a + b) = digit_1(a) and digit_2(b)
    rel mult_2(a * b) = digit_1(a) and digit_2(b)
  """,
  input_mappings={
    "digit_1": range(10),
    "digit_2": range(10),
  },
  output_mappings={
    "sum_2": range(20),    # 0-19
    "mult_2": range(100),  # 0-99
  }
)

result = module(digit_1=probs_1, digit_2=probs_2)

# Result is a DICTIONARY
print(result["sum_2"].shape)   # (batch_size, 20)
print(result["mult_2"].shape)  # (batch_size, 100)

Accessing Multiple Outputs

# Forward pass
outputs = module(input_a=tensor_a, input_b=tensor_b)

# Access individual outputs
sum_probs = outputs["sum_2"]
mult_probs = outputs["mult_2"]

# Compute losses separately
loss_sum = criterion(sum_probs, target_sum)
loss_mult = criterion(mult_probs, target_mult)

total_loss = loss_sum + loss_mult
total_loss.backward()

Output Formats

Format 1: Range

Most common for integer-valued relations:

output_mapping=("digit", range(10))
# Extracts: digit(0), digit(1), ..., digit(9)

Format 2: List of Values

For explicit enumeration:

output_mapping=("color", ["red", "green", "blue", "yellow"])
# Extracts: color("red"), color("green"), color("blue"), color("yellow")

Format 3: List of Tuples

For multi-column relations:

# Binary relation
output_mapping=("edge", [(0,1), (1,2), (2,3)])

# Ternary relation
output_mapping=("triple", [(a, b, c) for a in range(3) for b in range(3) for c in range(3)])

Format 4: None (No Output Mapping)

When you don’t need tensor output:

output_mapping=None
# Module runs Scallop but doesn't extract results as tensor
# Useful for intermediate computations or side effects

Advanced Patterns

Pattern 1: Filtering Relevant Outputs

Only extract the outputs you care about:

# Don't need all 100 paths, just specific ones
relevant_paths = [(0, 3), (1, 4), (2, 5)]
output_mapping=("path", relevant_paths)

Pattern 2: Multi-Task Learning

Different outputs for different tasks:

module = scallopy.Module(
  ...
  output_mappings={
    "classification": ["cat", "dog", "bird"],
    "has_tail": [True, False],
    "can_fly": [True, False],
  }
)

outputs = module(...)
class_pred = outputs["classification"]
tail_pred = outputs["has_tail"]
fly_pred = outputs["can_fly"]

Pattern 3: Hierarchical Outputs

Extract results at multiple levels:

module = scallopy.Module(
  ...
  program="""
    rel direct_neighbor(a, b) = edge(a, b)
    rel two_hop(a, c) = edge(a, b), edge(b, c)
    rel three_hop(a, d) = two_hop(a, c), edge(c, d)
  """,
  output_mappings={
    "direct_neighbor": pairs,
    "two_hop": pairs,
    "three_hop": pairs,
  }
)

Pattern 4: Dynamic Output Generation

Generate output mappings programmatically:

# Generate all combinations
num_objects = 10
object_pairs = [(i, j) for i in range(num_objects) for j in range(num_objects) if i != j]

module = scallopy.Module(
  ...
  output_mapping=("similarity", object_pairs)
)

Output Tensor Properties

Shape

Single output:

  • Without batch: (num_tuples,)
  • With batch: (batch_size, num_tuples)

Multiple outputs:

  • Returns a dictionary where each value has shape (batch_size, num_tuples_for_that_relation)

Values

Each element is the probability of that tuple existing:

  • 0.0 = tuple doesn’t exist
  • 1.0 = tuple certainly exists
  • 0.0 < p < 1.0 = tuple exists with probability p

Gradients

If inputs have requires_grad=True, output tensors will have gradients:

digit_probs = torch.softmax(
  torch.randn(16, 10, requires_grad=True),
  dim=1
)

result = module(digit=digit_probs)
loss = criterion(result, target)
loss.backward()

# Gradients flow back to digit_probs
assert digit_probs.grad is not None

Complete Example

import torch
import scallopy

# Multi-output module for knowledge graph reasoning
module = scallopy.Module(
  provenance="difftopkproofs",
  k=5,
  program="""
    // Input relations
    type parent(person, person)
    type sibling(person, person)

    // Derived relations
    rel ancestor(a, d) = parent(a, d)
    rel ancestor(a, d) = ancestor(a, c), parent(c, d)
    rel cousin(a, b) = parent(pa, a), parent(pb, b), sibling(pa, pb)
    rel related(a, b) = ancestor(a, b)
    rel related(a, b) = cousin(a, b)
  """,
  input_mappings={
    "parent": person_pairs,
    "sibling": person_pairs,
  },
  output_mappings={
    "ancestor": person_pairs,
    "cousin": person_pairs,
    "related": person_pairs,
  }
)

# Forward pass
parent_probs = torch.softmax(torch.randn(8, len(person_pairs)), dim=1)
sibling_probs = torch.softmax(torch.randn(8, len(person_pairs)), dim=1)

outputs = module(parent=parent_probs, sibling=sibling_probs)

# Three output tensors
ancestor_probs = outputs["ancestor"]  # Shape: (8, len(person_pairs))
cousin_probs = outputs["cousin"]      # Shape: (8, len(person_pairs))
related_probs = outputs["related"]    # Shape: (8, len(person_pairs))

# Compute multi-task loss
loss = (
  criterion(ancestor_probs, ancestor_target) +
  criterion(cousin_probs, cousin_target) +
  criterion(related_probs, related_target)
)

loss.backward()

Summary

  • Output mappings extract Scallop results as PyTorch tensors
  • Single output: output_mapping=("relation", list)
  • Multiple outputs: output_mappings={"rel1": list1, "rel2": list2}
  • Formats: range, list of values, list of tuples
  • Tensor shape: (batch_size, num_tuples) with probability values
  • Gradients: Flow back to inputs for end-to-end learning

For more details:

Foreign Functions

While there are existing foreign functions such as $hash and $abs, people sometimes want more functions to be included for specialized computation. scallopy provides such interface and allows user to define foreign functions in Python. Here is an example defining a custom $sum function in Python which is later used in Scallop:

# Create a new foreign function by annotating an existing function with `@scallopy.foreign_function`
# Note that this function has variable arguments!
@scallopy.foreign_function
def my_sum(*args: int) -> int:
  s = 0
  for x in args:
    s += x
  return s

# Create a context
ctx = scallopy.ScallopContext()

# Register the declared foreign function (`my_sum`)
# Note that the function needs to be registered before it is used
ctx.register_foreign_function(my_sum)

# Add some relations
ctx.add_relation("I", (int, int))
ctx.add_facts("I", [(1, 2), (2, 3), (3, 4)])

# Add a rule which uses the registered function!
ctx.add_rule("R($my_sum(a, b)) = I(a, b)")

# Run the context
ctx.run()

# See the result, should be [(3,), (5,), (7,)]
print(list(ctx.relation("R")))

Now we elaborate on how we define new foreign functions in Python.

Function Signature

The annotator @scallopy.foreign_function performs analysis of the annotated Python function and makes sure that it is accepted as a Scallop foreign function. We require that types are annotated on all arguments and the return value. For simplicity, Python types such as int, bool, and str are mapped to Scallop types (and type families) as following:

Python typeScallop typeScallop base types
intInteger familyi8, i16, …, u8, u16, …, usize
floatFloat familyf32, f64
boolboolbool
strStringString

If one desires to use a more fine-grained type

Argument Types

Optional Arguments

Variable Arguments

Error Handling

Foreign Predicates

Foreign predicates allow you to implement Scallop predicates in Python, enabling custom logic, external data sources, and integration with Python libraries.

What is a Foreign Predicate?

A foreign predicate is a Python function that Scallop can call during execution to generate facts dynamically. Instead of declaring facts statically, foreign predicates compute facts on-the-fly based on input arguments.

Use cases:

  • Custom logic: Implement complex computations not expressible in Scallop
  • External data: Query databases, APIs, or files during reasoning
  • Python libraries: Use NumPy, scikit-learn, or other Python tools
  • Semantic similarity: Fuzzy matching, embeddings, neural networks

Important: Required Imports and Type Signature

Must Import from scallopy

Foreign predicates require specific imports from the scallopy package:

# ✓ Correct - import required types
from scallopy import foreign_predicate, Facts
from typing import Tuple

@foreign_predicate
def string_length(s: str) -> Facts[float, Tuple[int]]:
  yield (1.0, (len(s),))

# ✗ Incorrect - missing imports
import scallopy

@scallopy.foreign_predicate  # Will fail
def string_length(s: str) -> int:  # Wrong return type
  return len(s)  # Wrong - must yield

Return Type Must Be Facts Generator

The return type must be Facts[TagType, TupleType] and use yield, not return:

# ✓ Correct - yields Facts
def my_predicate(x: int) -> Facts[float, Tuple[int]]:
  yield (1.0, (x * 2,))

# ✗ Incorrect - returns value directly
def my_predicate(x: int) -> int:
  return x * 2  # Error: "Return type must be Facts"

Basic Usage

Defining a Foreign Predicate

Use the @foreign_predicate decorator:

from scallopy import foreign_predicate, Facts
from typing import Tuple

@foreign_predicate
def string_length(s: str) -> Facts[float, Tuple[int]]:
  length = len(s)
  yield (1.0, (length,))  # (probability, tuple)

Anatomy:

  • Decorator: @foreign_predicate marks the function
  • Type hints: Input parameters are typed (e.g., s: str)
  • Return type: Facts[TagType, TupleType] - generator of (tag, tuple) pairs
  • Yield: Produce facts lazily using yield (not return)

Registering with Context

import scallopy

ctx = scallopy.ScallopContext(provenance="minmaxprob")

# Register the foreign predicate
ctx.register_foreign_predicate(string_length)

# Use in rules
ctx.add_relation("word", str)
ctx.add_facts("word", [("apple",), ("banana",), ("cat",)])
ctx.add_rule("length(w, l) = word(w) and string_length(w, l)")
ctx.run()

# Results
for (prob, (word, length)) in ctx.relation("length"):
  print(f"{word}: {length} letters (prob={prob})")

Output:

apple: 5 letters (prob=1.0)
banana: 6 letters (prob=1.0)
cat: 3 letters (prob=1.0)

Type Annotations

Foreign predicates require proper type hints for Scallop to understand the interface.

Supported Types

Primitive types:

  • inti32
  • floatf32
  • boolbool
  • strString

Scallop types:

  • i8, i16, i32, i64, i128, isize
  • u8, u16, u32, u64, u128, usize
  • f32, f64
  • char, bool, String

Input Arguments

Input argument types define what Scallop passes to your function:

@foreign_predicate
def add(x: int, y: int) -> Facts[float, Tuple[int]]:
  result = x + y
  yield (1.0, (result,))

Output Types

The Facts type annotation specifies:

  1. Tag type (first parameter): Probability type (usually float)
  2. Tuple type (second parameter): Output tuple structure

Single-column output:

def length(s: str) -> Facts[float, Tuple[int]]:
  yield (1.0, (len(s),))

Multi-column output:

def split_name(full: str) -> Facts[float, Tuple[str, str]]:
  parts = full.split(" ")
  if len(parts) == 2:
    yield (1.0, (parts[0], parts[1]))

Empty tuple (boolean predicate):

def is_palindrome(s: str) -> Facts[float, Tuple]:
  if s == s[::-1]:
    yield (1.0, ())  # Empty tuple = just a boolean check

Yielding Facts

Foreign predicates use yield to produce facts lazily.

Single Fact

@foreign_predicate
def square(x: int) -> Facts[float, Tuple[int]]:
  yield (1.0, (x * x,))

Multiple Facts

@foreign_predicate
def divisors(n: int) -> Facts[float, Tuple[int]]:
  for i in range(1, n + 1):
    if n % i == 0:
      yield (1.0, (i,))

# Usage in Scallop:
# divisors(12, x) generates: x ∈ {1, 2, 3, 4, 6, 12}

Probabilistic Facts

@foreign_predicate
def semantic_similar(s1: str, s2: str) -> Facts[float, Tuple]:
  # Use embedding similarity, edit distance, etc.
  similarity = compute_similarity(s1, s2)
  if similarity > 0.5:
    yield (similarity, ())

Conditional Facts

@foreign_predicate
def classify_age(age: int) -> Facts[float, Tuple[str]]:
  if age < 18:
    yield (1.0, ("minor",))
  elif age < 65:
    yield (1.0, ("adult",))
  else:
    yield (1.0, ("senior",))

Complete Example

Here’s a realistic example using foreign predicates for semantic similarity:

from typing import Tuple
import scallopy
from scallopy import foreign_predicate, Facts

# Foreign predicate for semantic equivalence
@foreign_predicate
def string_semantic_eq(s1: str, s2: str) -> Facts[float, Tuple]:
  """Check if two strings are semantically equivalent"""
  equivalents = {
    ("mom", "mother"): 0.99,
    ("mom", "mom"): 1.0,
    ("mother", "mother"): 1.0,
    ("dad", "father"): 0.99,
    ("dad", "dad"): 1.0,
    ("father", "father"): 1.0,
  }

  if (s1, s2) in equivalents:
    yield (equivalents[(s1, s2)], ())

# Create context and register
ctx = scallopy.ScallopContext(provenance="minmaxprob")
ctx.register_foreign_predicate(string_semantic_eq)

# Add kinship data with varied terminology
ctx.add_relation("kinship", (str, str, str))
ctx.add_facts("kinship", [
  (1.0, ("alice", "mom", "bob")),
  (1.0, ("alice", "mother", "casey")),
  (1.0, ("david", "father", "emma")),
])

# Define rules using foreign predicate
ctx.add_rule("""
  parent(person, child) =
    kinship(person, relation, child) and
    string_semantic_eq(relation, "mother")
""")

ctx.add_rule("""
  parent(person, child) =
    kinship(person, relation, child) and
    string_semantic_eq(relation, "father")
""")

ctx.add_rule("""
  sibling(a, b) =
    parent(p, a) and parent(p, b) and a != b
""")

ctx.run()

# Results
print("Parents:")
for (prob, (person, child)) in ctx.relation("parent"):
  print(f"  {person} is parent of {child} (prob={prob})")

print("\nSiblings:")
for (prob, (a, b)) in ctx.relation("sibling"):
  print(f"  {a} and {b} are siblings (prob={prob})")

Output:

Parents:
  alice is parent of bob (prob=0.99)
  alice is parent of casey (prob=1.0)
  david is parent of emma (prob=1.0)

Siblings:
  bob and casey are siblings (prob=0.99)
  casey and bob are siblings (prob=0.99)

Advanced Patterns

Pattern 1: External Data Source

Query a database during reasoning:

import sqlite3

@foreign_predicate
def lookup_price(product: str) -> Facts[float, Tuple[float]]:
  conn = sqlite3.connect("products.db")
  cursor = conn.execute("SELECT price FROM products WHERE name = ?", (product,))
  row = cursor.fetchone()
  if row:
    yield (1.0, (row[0],))
  conn.close()

Pattern 2: Python Library Integration

Use NumPy for numerical operations:

import numpy as np

@foreign_predicate
def cosine_similarity(vec_id1: int, vec_id2: int) -> Facts[float, Tuple[float]]:
  vec1 = embeddings[vec_id1]
  vec2 = embeddings[vec_id2]
  similarity = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
  yield (1.0, (float(similarity),))

Pattern 3: Caching Results

Avoid redundant computation:

from functools import lru_cache

@lru_cache(maxsize=1000)
def _compute_expensive(x: int) -> int:
  # Expensive computation
  return expensive_function(x)

@foreign_predicate
def cached_predicate(x: int) -> Facts[float, Tuple[int]]:
  result = _compute_expensive(x)
  yield (1.0, (result,))

Pattern 4: Error Handling

Handle errors gracefully:

@foreign_predicate
def safe_divide(a: float, b: float) -> Facts[float, Tuple[float]]:
  try:
    result = a / b
    yield (1.0, (result,))
  except ZeroDivisionError:
    # Don't yield anything - fact doesn't exist
    pass

Summary

  • Foreign predicates implement Scallop predicates in Python
  • @foreign_predicate decorator marks functions
  • Type annotations required for inputs and outputs
  • Facts[float, Tuple[...]] return type with generator
  • yield produces facts lazily
  • ctx.register_foreign_predicate() registers with context
  • Use cases: custom logic, external data, Python libraries

For more details:

Save and Load

Scallop modules integrate seamlessly with PyTorch’s serialization system, allowing you to save trained models and load them later for inference or continued training.

Why Save and Load?

Training is expensive: After training a neurosymbolic model, you want to save the learned neural parameters without retraining.

Deployment: Load trained models in production environments for inference.

Checkpointing: Save intermediate models during long training runs to resume if interrupted.

Model sharing: Distribute trained models to others.


Basic Saving and Loading

Scallop modules are torch.nn.Module subclasses, so they use standard PyTorch serialization.

Saving a Module

Use torch.save() to save the entire module:

import torch
import scallopy

# Create and train your model
model = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="rel sum_2(a + b) = digit_a(a) and digit_b(b)",
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

# ... training code ...

# Save the entire module
torch.save(model, "my_model.pt")

Loading a Module

Use torch.load() to load the saved module:

import torch

# Load the entire module
loaded_model = torch.load("my_model.pt")

# Use immediately for inference
input_data = torch.randn(16, 10)
result = loaded_model(digit_a=input_data, digit_b=input_data)

Important: The Scallop program, rules, and mappings are all preserved when saving/loading.


Saving State Dictionaries

For more flexibility, save only the model parameters (state dict) instead of the entire module.

Saving State Dict

# Save only the parameters
torch.save(model.state_dict(), "model_weights.pth")

Loading State Dict

# First, recreate the model architecture
model = scallopy.Module(
  provenance="difftopkproofs",
  k=3,
  program="rel sum_2(a + b) = digit_a(a) and digit_b(b)",
  input_mappings={"digit_a": range(10), "digit_b": range(10)},
  output_mapping=("sum_2", range(19))
)

# Then load the saved parameters
model.load_state_dict(torch.load("model_weights.pth"))

# Set to evaluation mode
model.eval()

Advantage: State dicts are more portable across PyTorch versions and modifications to the module structure.

Requirement: You must recreate the exact same module architecture before loading the state dict.


Complete Example

Here’s a full workflow showing training, saving, and loading:

import torch
import torch.nn as nn
import scallopy

# Define a neural network with Scallop reasoning
class DigitAdder(nn.Module):
  def __init__(self):
    super().__init__()

    # Neural perception layers
    self.encoder = nn.Sequential(
      nn.Linear(784, 128),
      nn.ReLU(),
      nn.Linear(128, 10),
    )

    # Symbolic reasoning layer
    self.scallop_adder = scallopy.Module(
      provenance="difftopkproofs",
      k=3,
      program="rel sum(a + b) = digit_a(a) and digit_b(b)",
      input_mappings={"digit_a": range(10), "digit_b": range(10)},
      output_mapping=("sum", range(19))
    )

  def forward(self, img_a, img_b):
    logits_a = self.encoder(img_a)
    logits_b = self.encoder(img_b)

    probs_a = torch.softmax(logits_a, dim=1)
    probs_b = torch.softmax(logits_b, dim=1)

    sum_probs = self.scallop_adder(digit_a=probs_a, digit_b=probs_b)
    return sum_probs

# Training
model = DigitAdder()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

for epoch in range(num_epochs):
  for img_a, img_b, target_sum in train_loader:
    optimizer.zero_grad()

    sum_probs = model(img_a, img_b)
    loss = loss_fn(sum_probs, target_sum)

    loss.backward()
    optimizer.step()

  # Save checkpoint after each epoch
  torch.save({
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss.item(),
  }, f"checkpoint_epoch_{epoch}.pth")

# Save final model
torch.save(model, "digit_adder_final.pt")

# Later: Load for inference
loaded_model = torch.load("digit_adder_final.pt")
loaded_model.eval()

with torch.no_grad():
  result = loaded_model(test_img_a, test_img_b)
  predicted_sum = torch.argmax(result, dim=1)

Checkpointing

For long training runs, save checkpoints with full training state:

Saving Checkpoints

# Save everything needed to resume training
checkpoint = {
  'epoch': epoch,
  'model_state_dict': model.state_dict(),
  'optimizer_state_dict': optimizer.state_dict(),
  'loss': loss.item(),
  'train_accuracy': train_acc,
  'val_accuracy': val_acc,
}

torch.save(checkpoint, f"checkpoint_epoch_{epoch}.pth")

Resuming from Checkpoint

# Recreate model and optimizer
model = DigitAdder()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Load checkpoint
checkpoint = torch.load("checkpoint_epoch_42.pth")

# Restore state
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
start_epoch = checkpoint['epoch'] + 1
last_loss = checkpoint['loss']

# Resume training
model.train()
for epoch in range(start_epoch, num_epochs):
  # Continue training...
  pass

GPU/CPU Compatibility

Handle device mismatches when loading models:

Saving on GPU, Loading on CPU

# Model was trained on GPU
# ...

# Save
torch.save(model, "model_gpu.pt")

# Load on CPU
model = torch.load("model_gpu.pt", map_location=torch.device('cpu'))

Saving on CPU, Loading on GPU

# Load and move to GPU
model = torch.load("model_cpu.pt")
model = model.to('cuda')

# Or in one step
model = torch.load("model_cpu.pt", map_location='cuda:0')

Flexible Loading

# Load to current device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load("model.pt", map_location=device)

Best Practices

1. Save State Dicts for Production

# Recommended: Save state dict
torch.save(model.state_dict(), "model_weights.pth")

# Less recommended: Save entire module
torch.save(model, "model_full.pt")

Why? State dicts are more robust to code changes and PyTorch version updates.

2. Include Metadata

torch.save({
  'model_state_dict': model.state_dict(),
  'optimizer_state_dict': optimizer.state_dict(),
  'epoch': epoch,
  'loss': loss.item(),
  'config': {
    'k': 3,
    'provenance': 'difftopkproofs',
    'learning_rate': 1e-3,
  }
}, "checkpoint.pth")

3. Version Your Models

torch.save({
  'version': '1.0.0',
  'model_state_dict': model.state_dict(),
  # ...
}, f"model_v1.0.0_{timestamp}.pth")

4. Validate After Loading

# Load model
model = torch.load("model.pt")

# Sanity check on validation data
model.eval()
with torch.no_grad():
  val_loss = compute_validation_loss(model, val_loader)
  print(f"Validation loss after loading: {val_loss:.4f}")

5. Use Eval Mode for Inference

# Always set to eval mode after loading for inference
model = torch.load("model.pt")
model.eval()  # Disables dropout, batch norm, etc.

with torch.no_grad():  # Disable gradient computation
  predictions = model(input_data)

Troubleshooting

Error: “Can’t find module”

Problem: Loading a saved module but Scallopy is not imported.

Solution: Import scallopy before loading:

import scallopy
import torch

model = torch.load("model.pt")

Error: “State dict keys don’t match”

Problem: Module architecture changed between saving and loading.

Solution: Ensure the module architecture is identical:

# Must recreate exact same architecture
model = scallopy.Module(
  # ... exact same parameters as when saved ...
)
model.load_state_dict(torch.load("weights.pth"))

Model Behavior Differs After Loading

Problem: Model gives different results after loading.

Checklist:

  1. Set model to eval mode: model.eval()
  2. Disable gradients: with torch.no_grad():
  3. Check device (CPU vs GPU)
  4. Verify input preprocessing is identical

Summary

  • Standard PyTorch: Use torch.save() and torch.load()
  • Two approaches: Save entire module or just state dict
  • State dict recommended: More portable and robust
  • Checkpointing: Save epoch, model, optimizer state for resuming
  • Device handling: Use map_location for GPU/CPU compatibility
  • Best practices: Eval mode, validation, versioning

For more details:

Debugging Proofs

We offer capability in Scallop to debug the internal proofs generated by provenance. This can be done through using special provenances specifically designed for debugging.

Debugging Top-K Proofs

We can use the provenance difftopkproofsdebug. Take the sum_2 application as an example, we have

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("digit_a", int)
ctx.add_relation("digit_b", int)
ctx.add_rule("sum_2(a + b) = digit_a(a) and digit_b(b)")

# !!! SPECIAL TREATMENT WHEN INSERTING FACTS !!!
ctx.add_facts("digit_a", [((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))])
ctx.add_facts("digit_b", [((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))])

ctx.run()
result = ctx.relation("sum_2")

A majority of the code will look identical to the original example, but special treatment is required when adding new facts to the context. Originally, tags are just torch.tensor(SOME_PROBABILITY). But now, we need to supply an extra positive integer which we call Fact ID. Each of the fact added should be the following:

ctx.add_facts("SOME_RELATION", [
  ( ( torch.tensor(0.1),       1       ) , (    0,   ) ) # Fact 1
  #   -- PROBABILITY --  -- FACT ID --
  # --------------- TAG --------------     -- TUPLE --
  ( ( torch.tensor(0.9),       2       ) , (    1,   ) ) # Fact 2
  ( ( torch.tensor(0.1),       3       ) , (    0,   ) ) # Fact 2
  # ...
])

Basically, fact IDs are labels that the programmers can add to instrument the computation process. All IDs should start from 1, and should be distinct and contiguous for all added facts. The programmers will need to be responsible for managing the IDs so that there is no collision. Let’s say 10 facts are added with probability, then there should be 10 fact IDs ranging from 1 to 10, inclusive. Of course, for non-probabilistic facts, the whole tag can be specified as None.

Debug in Forward Mode

When used in forward mode, one should do the following. The forward module should be setup just like before, with the only change being the provenance configuration:

sum_2 = scallopy.Module(
  provenance="difftopkproofsdebug",
  k=3,
  program="""
    type digit_a(a: i32), digit_b(b: i32)
    rel sum_2(a + b) = digit_a(a) and digit_b(b)
  """,
  output_relation="sum_2",
  output_mapping=[2, 3, 4])

Let’s assume that only 1 and 2 are added for each digit in the sum_2 task. That is, we have digit A and digit B. Digit A can be 1 or 2, and the digit B can be 1 or 2 as well. Looking at the following code, we have the fact IDs being

  • Digit A is 1: Fact ID 1
  • Digit A is 2: Fact ID 2
  • Digit B is 1: Fact ID 3
  • Digit B is 2: Fact ID 4

Notice that the fact IDs all start from 1 and are contiguous (i.e., no gap in the used fact IDs). Also notice that, when in forward mode, the inputs need to be batched. In this example, let’s only focus on one single data-point, say Datapoint 1.

digit_a = [
  [((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))], # Datapoint 1
  # ...
]
digit_b = [
  [((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))], # Datapoint 1
  # ...
]

After preparing the input facts, we can invoke the debug module as the following

(result_tensor, proofs) = sum_2(digit_a=digit_a, digit_b=digit_b)

Here, the output will be a tuple (result_tensor, proofs), unlike the non-debug version where only one single PyTorch tensor is returned. Specifically, the two components will be

result_tensor = torch.tensor([
  [0.09, 0.8119, 0.09], # Datapoint 1
  # ...
])

proofs = [
  [ # Datapoint 1
    [ # Proofs of (2,)
      [ (True, 1), (True, 3) ], # The first proof is 1 + 1 (using positive fact 1 and 3)
    ],
    [ # Proofs of (3,)
      [ (True, 1), (True, 4) ], # The first proof is 1 + 2 (using positive fact 1 and 4)
      [ (True, 2), (True, 3) ], # The second proof is 2 + 1 (using positive fact 2 and 3)
    ],
    [ # Proofs of (4,)
      [ (True, 2), (True, 4) ], # The first proof is 2 + 2 (using positive fact 2 and 4)
    ]
  ],
  # ...
]

Notice that result_tensor is just the original output probability tensor. The proofs will be List[List[List[List[Tuple[bool, int]]]]]. In particular, it is Batch -> Datapoint Results -> Proofs -> Proof -> Literal. Where each Literal is Tuple[bool, int] where the boolean indicates the positivity of the literal and the integer indicates the used fact ID.

Getting Started with Scallop Rust API

This guide introduces the scallop-core Rust library for embedding Scallop programs in Rust applications.

Overview

The scallop-core crate provides a complete Rust API for:

  • Compiling and executing Scallop programs
  • Registering foreign functions and predicates in pure Rust
  • Provenance tracking with various semiring types
  • Incremental evaluation for efficient updates
  • Runtime configuration and debugging

Unlike the Python bindings (scallopy), the Rust API gives direct access to Scallop’s core runtime without serialization overhead.

Installation

Add scallop-core to your Cargo.toml:

[dependencies]
scallop-core = { path = "../path/to/scallop/core" }
# Or from crates.io when published:
# scallop-core = "0.2.5"

Note: Requires nightly Rust due to unstable features:

  • min_specialization
  • extract_if
  • hash_extract_if
  • proc_macro_span

Set your toolchain:

rustup default nightly

Quick Start

Example 1: Basic Program

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;

fn main() {
    // Create context with unit provenance (standard DataLog)
    let prov_ctx = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov_ctx);

    // Add Scallop program
    ctx.add_program(r#"
        rel edge = {(0, 1), (1, 2), (2, 3)}
        rel path(a, b) = edge(a, b)
        rel path(a, c) = path(a, b), edge(b, c)
        query path
    "#).unwrap();

    // Execute
    ctx.run().unwrap();

    // Get results
    let path_relation = ctx.computed_relation_ref("path").unwrap();
    for tuple in path_relation.iter() {
        println!("{:?}", tuple);
    }
}

Output:

(0, 1)
(1, 2)
(2, 3)
(0, 2)
(1, 3)
(0, 3)

Example 2: Adding Facts Programmatically

use scallop_core::common::tuple::Tuple;

fn main() {
    let prov_ctx = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov_ctx);

    // Declare relation type
    ctx.add_relation("edge(i32, i32)").unwrap();

    // Add facts programmatically
    ctx.add_facts(
        "edge",
        vec![
            (None, Tuple::from((0i32, 1i32))),
            (None, Tuple::from((1i32, 2i32))),
            (None, Tuple::from((2i32, 3i32))),
        ],
        false, // type_check
    ).unwrap();

    // Add rules
    ctx.add_rule("path(a, b) = edge(a, b)").unwrap();
    ctx.add_rule("path(a, c) = path(a, b), edge(b, c)").unwrap();

    // Execute
    ctx.run().unwrap();

    // Query
    let path = ctx.computed_relation_ref("path").unwrap();
    println!("Path tuples: {}", path.len());
}

Example 3: Probabilistic Reasoning

use scallop_core::runtime::provenance::min_max_prob::MinMaxProbProvenance;

fn main() {
    // Use min-max probability provenance
    let prov_ctx = MinMaxProbProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov_ctx);

    ctx.add_relation("edge(i32, i32)").unwrap();

    // Add facts with probabilities
    ctx.add_facts(
        "edge",
        vec![
            (Some(0.8.into()), Tuple::from((0i32, 1i32))),
            (Some(0.9.into()), Tuple::from((1i32, 2i32))),
            (Some(0.7.into()), Tuple::from((2i32, 3i32))),
        ],
        false,
    ).unwrap();

    ctx.add_rule("path(a, b) = edge(a, b)").unwrap();
    ctx.add_rule("path(a, c) = path(a, b), edge(b, c)").unwrap();

    ctx.run().unwrap();

    // Results include probabilities
    let path = ctx.computed_relation_ref("path").unwrap();
    for elem in path.iter() {
        println!("Probability: {}, Tuple: {:?}", elem.tag, elem.tuple);
    }
}

Output:

Probability: 0.8, Tuple: (0, 1)
Probability: 0.9, Tuple: (1, 2)
Probability: 0.7, Tuple: (2, 3)
Probability: 0.72, Tuple: (0, 2)    // min(0.8, 0.9)
Probability: 0.63, Tuple: (1, 3)    // min(0.9, 0.7)
Probability: 0.56, Tuple: (0, 3)    // min(0.72, 0.7)

Core Concepts

IntegrateContext

The main entry point for Scallop programs. Generic over:

  • Provenance type (Prov) - defines reasoning semantics
  • Pointer family (P) - typically RcFamily for reference counting
#![allow(unused)]
fn main() {
pub struct IntegrateContext<Prov: Provenance, P: PointerFamily> {
    // Internal compiler and runtime state
}
}

Key methods:

  • add_program(&mut self, program: &str) - Add complete Scallop program
  • add_relation(&mut self, decl: &str) - Declare relation type
  • add_rule(&mut self, rule: &str) - Add single rule
  • add_facts(&mut self, rel: &str, facts: Vec<_>) - Add fact tuples
  • run(&mut self) - Execute program
  • computed_relation_ref(&mut self, name: &str) - Get query results

Provenance Types

Provenance determines how facts are tagged and how tags combine:

ProvenanceUse CaseTag TypeSemantics
UnitProvenanceStandard DataLogUnitNo tracking
BooleanProvenanceNegation-as-failureboolBoolean algebra
NaturalProvenanceCountingusizeCardinality
MinMaxProbProvenanceProbabilisticf64Min/max on paths
AddMultProbProvenanceProbabilisticf64Add/mult on paths
TopKProofsProvenanceProof trackingDNFFormulaTop-K most probable proofs
ProbProofsProvenanceComplete proofsProofsAll proofs + probabilities

Tuples and Values

Facts are represented as Tuple containing Value elements:

#![allow(unused)]
fn main() {
use scallop_core::common::value::Value;
use scallop_core::common::tuple::Tuple;

// Create a tuple (0, "hello", 3.14)
let tuple = Tuple::from((
    Value::I32(0),
    Value::String("hello".to_string()),
    Value::F64(3.14),
));

// Or use From trait
let tuple: Tuple = (0i32, "hello", 3.14).into();
}

Value types:

  • I8, I16, I32, I64, I128, ISize - Signed integers
  • U8, U16, U32, U64, U128, USize - Unsigned integers
  • F32, F64 - Floating point
  • Bool - Boolean
  • Char - Character
  • String - UTF-8 string
  • Symbol - Interned symbol
  • Entity - Algebraic data type value

Common Patterns

Pattern 1: Incremental Evaluation

fn main() {
    let prov_ctx = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new_incremental(prov_ctx);

    // Initial program
    ctx.add_relation("edge(i32, i32)").unwrap();
    ctx.add_rule("path(a, b) = edge(a, b)").unwrap();
    ctx.add_rule("path(a, c) = path(a, b), edge(b, c)").unwrap();

    // Initial facts
    ctx.add_facts("edge", vec![
        (None, (0i32, 1i32).into()),
        (None, (1i32, 2i32).into()),
    ], false).unwrap();

    ctx.run().unwrap();
    println!("Initial path count: {}", ctx.computed_relation_ref("path").unwrap().len());

    // Add more facts incrementally
    ctx.add_facts("edge", vec![
        (None, (2i32, 3i32).into()),
    ], false).unwrap();

    ctx.run().unwrap();
    println!("Updated path count: {}", ctx.computed_relation_ref("path").unwrap().len());
}

Pattern 2: Query-Driven Execution

fn main() {
    let prov_ctx = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov_ctx);

    ctx.add_program(r#"
        rel edge = {(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)}
        rel path(a, b) = edge(a, b)
        rel path(a, c) = path(a, b), edge(b, c)

        // Only query paths from node 0
        query path(0, x)
    "#).unwrap();

    ctx.run().unwrap();

    // Only paths starting from 0 are computed
    let result = ctx.computed_relation_ref("path").unwrap();
    for elem in result.iter() {
        println!("{:?}", elem.tuple);
    }
}

Pattern 3: Error Handling

use scallop_core::integrate::IntegrateError;

fn run_scallop_program(program: &str) -> Result<(), IntegrateError> {
    let prov_ctx = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov_ctx);

    // All operations return Result
    ctx.add_program(program)?;
    ctx.run()?;

    let result = ctx.computed_relation_ref("result")
        .ok_or_else(|| IntegrateError::RelationNotFound("result".to_string()))?;

    println!("Result: {} tuples", result.len());
    Ok(())
}

fn main() {
    match run_scallop_program("rel result = {1, 2, 3}") {
        Ok(_) => println!("Success"),
        Err(e) => eprintln!("Error: {:?}", e),
    }
}

Configuration Options

Debugging

Enable debug output for different compilation stages:

#![allow(unused)]
fn main() {
ctx.set_debug_front(true);   // Front-end (parsing, type checking)
ctx.set_debug_back(true);    // Back-end (RAM generation)
ctx.set_debug_ram(true);     // RAM execution trace
}

Iteration Limits

Control recursion depth:

#![allow(unused)]
fn main() {
ctx.set_iter_limit(100);     // Maximum 100 iterations
ctx.remove_iter_limit();     // Unlimited (default)
}

Early Discard

Optimize by discarding zero-tagged facts:

#![allow(unused)]
fn main() {
ctx.set_early_discard(true);  // Discard facts with tag=0 early
}

Building and Running

As a Library Dependency

[dependencies]
scallop-core = { path = "../scallop/core" }
// src/main.rs
use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;

fn main() {
    let prov_ctx = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov_ctx);

    ctx.add_program(r#"
        rel answer = {42}
        query answer
    "#).unwrap();

    ctx.run().unwrap();

    let result = ctx.computed_relation_ref("answer").unwrap();
    for elem in result.iter() {
        println!("Answer: {:?}", elem.tuple);
    }
}

Build and run:

cargo build
cargo run

Development Setup

# Clone Scallop repository
git clone https://github.com/scallop-lang/scallop.git
cd scallop

# Create example project
cargo new --bin my-scallop-app
cd my-scallop-app

# Add dependency (edit Cargo.toml)
[dependencies]
scallop-core = { path = "../core" }

# Build with nightly
rustup default nightly
cargo build

Next Steps

Common Issues

Nightly Rust Required

Error:

error[E0658]: use of unstable library feature 'min_specialization'

Solution:

rustup default nightly

Missing Relation

Error:

IntegrateError::RelationNotFound("path")

Solution: Ensure relation is declared or computed:

#![allow(unused)]
fn main() {
ctx.add_relation("path(i32, i32)").unwrap();  // Declare first
// Or add query to compute it
ctx.add_rule("query path").unwrap();
}

Type Mismatch

Error:

TypeError: Expected i32, got String

Solution: Match Value types to relation declarations:

#![allow(unused)]
fn main() {
ctx.add_relation("edge(i32, i32)").unwrap();  // Declare types
ctx.add_facts("edge", vec![
    (None, Tuple::from((0i32, 1i32))),  // Use i32, not usize
], false).unwrap();
}

Resources

IntegrateContext API

Overview

IntegrateContext is the main entry point for embedding Scallop programs in Rust applications. It provides a complete API for compiling Scallop code, adding facts programmatically, executing queries, and retrieving results—all from pure Rust.

The context is generic over two type parameters:

  • Prov: Provenance - Determines the reasoning semantics (standard DataLog, probabilistic, differentiable, etc.)
  • P: PointerFamily - Controls pointer representation (typically RcFamily for reference counting)
#![allow(unused)]
fn main() {
pub struct IntegrateContext<Prov: Provenance, P: PointerFamily = RcFamily> {
    // Internal state
}
}

Comparison to Python API

Python (scallopy)Rust (scallop-core)
ScallopContext()IntegrateContext::new(prov)
ctx.add_program(...)ctx.add_program(...)?
ctx.run()ctx.run()?
Exception handlingResult types with ? operator

Creating a Context

Basic Creation

The most common way to create an IntegrateContext is with the new() method:

#![allow(unused)]
fn main() {
use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;

let prov_ctx = UnitProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov_ctx);
}

The type parameters can usually be inferred:

#![allow(unused)]
fn main() {
let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::new(prov);  // P defaults to RcFamily
}

Incremental Execution

For incremental evaluation where facts are added dynamically over time:

#![allow(unused)]
fn main() {
let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::new_incremental(prov);
}

Incremental mode:

  • Maintains internal state between run() calls
  • Only recomputes affected parts when facts are added
  • More efficient for dynamic updates

Choosing Provenance Type

The provenance type determines how facts are tagged and combined:

#![allow(unused)]
fn main() {
// Standard DataLog (no provenance tracking)
use scallop_core::runtime::provenance::unit::UnitProvenance;
let prov = UnitProvenance::default();

// Probabilistic reasoning with min-max semiring
use scallop_core::runtime::provenance::min_max_prob::MinMaxProbProvenance;
let prov = MinMaxProbProvenance::default();

// Top-K proofs tracking
use scallop_core::runtime::provenance::top_k_proofs::TopKProofsProvenance;
let prov = TopKProofsProvenance::<RcFamily>::new(3); // Track top 3 proofs
}

See Provenance Types for complete details on all available provenance types.

Choosing Pointer Family

The pointer family controls how internal data structures are reference-counted:

#![allow(unused)]
fn main() {
use scallop_core::runtime::env::{RcFamily, ArcFamily};

// Single-threaded (default, faster)
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

// Thread-safe (for concurrent access)
let mut ctx = IntegrateContext::<_, ArcFamily>::new(prov);
}

RcFamily (default): Uses std::rc::Rc - faster but not thread-safe ArcFamily: Uses std::sync::Arc - thread-safe but slightly slower

Adding Programs and Rules

Adding Complete Programs

The add_program() method compiles a complete Scallop program from a string:

#![allow(unused)]
fn main() {
ctx.add_program(r#"
    rel edge = {(0, 1), (1, 2), (2, 3)}

    rel path(a, b) = edge(a, b)
    rel path(a, c) = path(a, b), edge(b, c)

    query path
"#)?;
}

Usage notes:

  • Use raw string literals (r#"..."#) to avoid escaping quotes
  • Can include relation declarations, rules, queries, type definitions
  • Returns Result<(), IntegrateError> - use ? to propagate errors
  • Multiple calls append to existing program

Adding Relation Declarations

Declare relation types explicitly:

#![allow(unused)]
fn main() {
ctx.add_relation("edge(i32, i32)")?;
ctx.add_relation("node(i32, String)")?;
ctx.add_relation("weighted_edge(i32, i32, f64)")?;
}

When to use:

  • When adding facts programmatically (before add_facts())
  • To enforce type constraints
  • For better error messages

Adding Individual Rules

Add single rules incrementally:

#![allow(unused)]
fn main() {
ctx.add_rule("path(a, b) = edge(a, b)")?;
ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")?;
}

Equivalent to:

#![allow(unused)]
fn main() {
ctx.add_program(r#"
    rel path(a, b) = edge(a, b)
    rel path(a, c) = path(a, b), edge(b, c)
"#)?;
}

Error Handling

All compilation methods return Result<_, IntegrateError>:

#![allow(unused)]
fn main() {
use scallop_core::integrate::IntegrateError;

match ctx.add_program("rel invalid syntax!") {
    Ok(_) => println!("Success"),
    Err(IntegrateError::Compile(errors)) => {
        eprintln!("Compilation failed:");
        for err in errors {
            eprintln!("  {}", err);
        }
    }
    Err(e) => eprintln!("Other error: {:?}", e),
}
}

Common error types:

  • IntegrateError::Compile - Syntax or type errors
  • IntegrateError::Runtime - Execution errors
  • IntegrateError::Front - Front-end compilation errors

Adding Facts Programmatically

Basic Fact Insertion

Add facts to existing relations using add_facts():

#![allow(unused)]
fn main() {
use scallop_core::common::tuple::Tuple;

// Declare the relation first
ctx.add_relation("edge(i32, i32)")?;

// Add facts without tags (standard DataLog)
ctx.add_facts("edge", vec![
    (None, Tuple::from((0i32, 1i32))),
    (None, Tuple::from((1i32, 2i32))),
    (None, Tuple::from((2i32, 3i32))),
], false)?;
}

Parameters:

  • predicate: &str - Relation name
  • facts: Vec<(Option<Tag>, Tuple)> - List of (tag, tuple) pairs
  • type_check: bool - Whether to validate types (false for performance)

Creating Tuples

Tuples can be created from Rust values using the From trait:

#![allow(unused)]
fn main() {
// From tuple
let t1: Tuple = (0i32, 1i32).into();

// From explicit values
let t2 = Tuple::from((0i32, "hello", 3.14));

// Manual construction
use scallop_core::common::value::Value;
let t3 = Tuple::from(vec![
    Value::I32(0),
    Value::String("world".to_string()),
    Value::F64(2.71),
]);
}

Type Checking

Enable type checking to validate tuples against relation schemas:

#![allow(unused)]
fn main() {
ctx.add_relation("edge(i32, i32)")?;

// This will fail with type_check = true
ctx.add_facts("edge", vec![
    (None, Tuple::from(("string", 42))),  // Wrong type!
], true)?;  // Errors with TypeError
}

Recommendations:

  • Use type_check = false for performance in tight loops
  • Use type_check = true during development for safety
  • Always ensure types match the declared relation schema

Adding Facts with Probabilities

For probabilistic provenances, tag facts with probabilities:

#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::min_max_prob::MinMaxProbProvenance;

let prov = MinMaxProbProvenance::default();
let mut ctx = IntegrateContext::new(prov);

ctx.add_relation("edge(i32, i32)")?;
ctx.add_facts("edge", vec![
    (Some(0.8.into()), Tuple::from((0i32, 1i32))),
    (Some(0.9.into()), Tuple::from((1i32, 2i32))),
    (Some(0.7.into()), Tuple::from((2i32, 3i32))),
], false)?;
}

The tag type (Some(0.8.into())) automatically converts to the provenance’s InputTag type.

Complete Example

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;
use scallop_core::common::tuple::Tuple;

fn main() -> Result<(), IntegrateError> {
    let prov = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

    // Declare relations
    ctx.add_relation("node(i32, String)")?;
    ctx.add_relation("edge(i32, i32)")?;

    // Add node facts
    ctx.add_facts("node", vec![
        (None, Tuple::from((0i32, "Alice"))),
        (None, Tuple::from((1i32, "Bob"))),
        (None, Tuple::from((2i32, "Charlie"))),
    ], false)?;

    // Add edge facts
    ctx.add_facts("edge", vec![
        (None, Tuple::from((0i32, 1i32))),
        (None, Tuple::from((1i32, 2i32))),
    ], false)?;

    // Add rules
    ctx.add_rule("query node")?;
    ctx.add_rule("query edge")?;

    Ok(())
}

Executing Programs

Basic Execution

Execute the program to fixpoint with run():

#![allow(unused)]
fn main() {
ctx.run()?;
}

What happens:

  1. Compiles any new rules/facts since last run
  2. Executes the Scallop program to fixpoint
  3. Stores results internally for querying

Returns:

  • Ok(()) if execution succeeded
  • Err(IntegrateError::Runtime(_)) if execution failed

Incremental Execution

For incremental contexts, multiple run() calls only recompute affected parts:

#![allow(unused)]
fn main() {
let mut ctx = IntegrateContext::new_incremental(prov);

ctx.add_relation("edge(i32, i32)")?;
ctx.add_rule("path(a, b) = edge(a, b)")?;
ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")?;

// Initial facts and run
ctx.add_facts("edge", vec![
    (None, (0i32, 1i32).into()),
    (None, (1i32, 2i32).into()),
], false)?;
ctx.run()?;

println!("Initial results: {}",
    ctx.computed_relation_ref("path").unwrap().len());

// Add more facts and re-run (incremental update)
ctx.add_facts("edge", vec![
    (None, (2i32, 3i32).into()),
], false)?;
ctx.run()?;

println!("Updated results: {}",
    ctx.computed_relation_ref("path").unwrap().len());
}

Iteration Limits

Control recursion depth with iteration limits:

#![allow(unused)]
fn main() {
// Set maximum iterations
ctx.set_iter_limit(100);
ctx.run()?;

// Remove limit (default: unlimited)
ctx.remove_iter_limit();
ctx.run()?;
}

Use cases:

  • Preventing infinite loops in recursive rules
  • Testing convergence behavior
  • Performance benchmarking

Querying Results

Getting Result Collections

Retrieve computed relations using computed_relation_ref():

#![allow(unused)]
fn main() {
let path_relation = ctx.computed_relation_ref("path")
    .ok_or("Relation 'path' not found")?;

println!("Found {} path tuples", path_relation.len());
}

Returns:

  • Some(&DynamicOutputCollection<Prov>) if relation exists and is computed
  • None if relation doesn’t exist or hasn’t been queried

Iterating Over Results

Each collection provides an iterator over elements:

#![allow(unused)]
fn main() {
let path = ctx.computed_relation_ref("path").unwrap();

for elem in path.iter() {
    println!("Tag: {}, Tuple: {:?}", elem.tag, elem.tuple);
}
}

Element structure:

#![allow(unused)]
fn main() {
pub struct DynamicElement<Prov: Provenance> {
    pub tag: Prov::OutputTag,  // Probability, proof, etc.
    pub tuple: Tuple,           // The fact tuple
}
}

Extracting Values from Tuples

Access tuple elements and convert to Rust types:

#![allow(unused)]
fn main() {
let path = ctx.computed_relation_ref("path").unwrap();

for elem in path.iter() {
    // Pattern match on tuple elements
    if let (Some(Value::I32(from)), Some(Value::I32(to))) =
        (elem.tuple.get(0), elem.tuple.get(1)) {
        println!("Path from {} to {}", from, to);
    }
}
}

Available Value variants:

#![allow(unused)]
fn main() {
use scallop_core::common::value::Value;

match value {
    Value::I32(n) => println!("Integer: {}", n),
    Value::String(s) => println!("String: {}", s),
    Value::F64(f) => println!("Float: {}", f),
    Value::Bool(b) => println!("Boolean: {}", b),
    Value::Char(c) => println!("Char: {}", c),
    // ... and many more
    _ => println!("Other type"),
}
}

Complete Querying Example

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::min_max_prob::MinMaxProbProvenance;
use scallop_core::runtime::env::RcFamily;
use scallop_core::common::value::Value;

fn main() -> Result<(), IntegrateError> {
    let prov = MinMaxProbProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

    ctx.add_program(r#"
        rel edge = {
            0.8::(0, 1),
            0.9::(1, 2),
            0.7::(2, 3)
        }
        rel path(a, b) = edge(a, b)
        rel path(a, c) = path(a, b), edge(b, c)
        query path
    "#)?;

    ctx.run()?;

    // Get results
    let path = ctx.computed_relation_ref("path")
        .ok_or("Path relation not found")?;

    println!("Probabilistic paths:");
    for elem in path.iter() {
        if let (Some(Value::I32(from)), Some(Value::I32(to))) =
            (elem.tuple.get(0), elem.tuple.get(1)) {
            println!("  {} -> {}: probability = {}", from, to, elem.tag);
        }
    }

    Ok(())
}

Output:

Probabilistic paths:
  0 -> 1: probability = 0.8
  1 -> 2: probability = 0.9
  2 -> 3: probability = 0.7
  0 -> 2: probability = 0.8
  1 -> 3: probability = 0.7
  0 -> 3: probability = 0.7

Configuration Options

Debug Modes

Enable detailed output for different compilation stages:

#![allow(unused)]
fn main() {
// Front-end debugging (parsing, type checking)
ctx.set_debug_front(true);

// Back-end debugging (RAM generation)
ctx.set_debug_back(true);

// RAM execution trace
ctx.set_debug_ram(true);
}

Output goes to stdout:

  • Front debug: AST, type information, relation schemas
  • Back debug: Back-IR, RAM program
  • RAM debug: Execution trace, iteration counts

Iteration Control

Configure recursion limits:

#![allow(unused)]
fn main() {
// Set maximum iterations (prevents infinite loops)
ctx.set_iter_limit(100);

// Remove iteration limit (default: unlimited)
ctx.remove_iter_limit();
}

Early Discard

Optimize by discarding facts with zero tags early:

#![allow(unused)]
fn main() {
ctx.set_early_discard(true);
}

When to use:

  • Provenance types where tag = 0 means “impossible” (probabilities, proofs)
  • Large programs with many zero-probability derivations
  • Memory-constrained environments

When NOT to use:

  • Standard DataLog (UnitProvenance)
  • Provenances where 0 is meaningful

Configuration Example

#![allow(unused)]
fn main() {
fn create_optimized_context() -> IntegrateContext<MinMaxProbProvenance> {
    let prov = MinMaxProbProvenance::default();
    let mut ctx = IntegrateContext::new(prov);

    // Enable optimizations
    ctx.set_early_discard(true);
    ctx.set_iter_limit(1000);

    // Enable debugging for development
    #[cfg(debug_assertions)]
    {
        ctx.set_debug_front(true);
        ctx.set_debug_ram(true);
    }

    ctx
}
}

Error Handling

IntegrateError Enum

All IntegrateContext methods that can fail return Result<T, IntegrateError>:

#![allow(unused)]
fn main() {
pub enum IntegrateError {
    Compile(Vec<CompileError>),
    Front(FrontCompileError),
    Runtime(RuntimeError),
}
}

Common Error Patterns

Compilation errors:

#![allow(unused)]
fn main() {
match ctx.add_program("rel invalid!") {
    Ok(_) => {},
    Err(IntegrateError::Compile(errors)) => {
        for err in errors {
            eprintln!("Compile error: {}", err);
        }
    }
    Err(e) => eprintln!("Other error: {:?}", e),
}
}

Runtime errors:

#![allow(unused)]
fn main() {
match ctx.run() {
    Ok(_) => {},
    Err(IntegrateError::Runtime(err)) => {
        eprintln!("Runtime error: {:?}", err);
    }
    Err(e) => eprintln!("Other error: {:?}", e),
}
}

Using the ? Operator

Most code can simply use ? to propagate errors:

fn setup_context() -> Result<IntegrateContext<UnitProvenance>, IntegrateError> {
    let prov = UnitProvenance::default();
    let mut ctx = IntegrateContext::new(prov);

    ctx.add_relation("edge(i32, i32)")?;
    ctx.add_rule("path(a, b) = edge(a, b)")?;

    Ok(ctx)
}

fn main() {
    match setup_context() {
        Ok(ctx) => println!("Context created successfully"),
        Err(e) => eprintln!("Failed to create context: {:?}", e),
    }
}

Type Errors

Type mismatches when adding facts:

#![allow(unused)]
fn main() {
ctx.add_relation("edge(i32, i32)")?;

// This will error if type_check = true
match ctx.add_facts("edge", vec![
    (None, Tuple::from(("not", "integers"))),
], true) {
    Ok(_) => {},
    Err(IntegrateError::Runtime(RuntimeError::Database(
        DatabaseError::TypeError { relation, relation_type, tuple }
    ))) => {
        eprintln!("Type error in relation '{}':", relation);
        eprintln!("  Expected: {}", relation_type);
        eprintln!("  Got: {:?}", tuple);
    }
    Err(e) => eprintln!("Other error: {:?}", e),
}
}

Complete Example

Putting it all together:

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;
use scallop_core::common::tuple::Tuple;
use scallop_core::common::value::Value;

fn main() -> Result<(), IntegrateError> {
    // Create context
    let prov = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

    // Configure
    ctx.set_iter_limit(100);

    // Add program
    ctx.add_program(r#"
        rel edge = {(0, 1), (1, 2), (2, 3)}
        rel path(a, b) = edge(a, b)
        rel path(a, c) = path(a, b), edge(b, c)
        query path
    "#)?;

    // Execute
    ctx.run()?;

    // Query results
    let path = ctx.computed_relation_ref("path")
        .ok_or("Path relation not found")?;

    println!("Paths found: {}", path.len());
    for elem in path.iter() {
        if let (Some(Value::I32(from)), Some(Value::I32(to))) =
            (elem.tuple.get(0), elem.tuple.get(1)) {
            println!("  {} -> {}", from, to);
        }
    }

    Ok(())
}

Next Steps

Foreign Functions

Overview

Foreign functions extend Scallop with pure, deterministic computations implemented in Rust. They allow you to call Rust code from within Scallop programs, enabling operations that Scallop cannot express natively—such as string manipulation, mathematical operations, or domain-specific calculations.

Key characteristics:

  • Pure functions - No side effects; same input always produces same output
  • Deterministic - Single output value for any given input
  • Type-safe - Static type checking enforced at compile time
  • Partial functions - Can return None to indicate failure

Use Cases

  • String operations (length, concatenation, formatting)
  • Mathematical functions (abs, max, min, trigonometry)
  • Type conversions (string to int, etc.)
  • Domain-specific operations (hashing, encoding, etc.)

Comparison to Python

Python @foreign_functionRust ForeignFunction trait
Simple decoratorTrait implementation
Type annotations optionalType system required
Runtime type checkingCompile-time type checking
Return value or NoneOption<Value> return type

Example usage in Scallop:

// After registering a foreign function
rel lengths(s, len) = strings(s), len = $string_length(s)
rel max_val(a, b, m) = numbers(a, b), m = $max(a, b)

The ForeignFunction Trait

The ForeignFunction trait defines the interface for all foreign functions:

#![allow(unused)]
fn main() {
use scallop_core::common::foreign_function::*;
use scallop_core::common::value::Value;

pub trait ForeignFunction: DynClone {
    // Required methods
    fn name(&self) -> String;
    fn return_type(&self) -> ForeignFunctionParameterType;
    fn execute(&self, args: Vec<Value>) -> Option<Value>;

    // Optional methods (with defaults)
    fn num_generic_types(&self) -> usize { 0 }
    fn generic_type_family(&self, i: usize) -> TypeFamily { ... }
    fn num_static_arguments(&self) -> usize { 0 }
    fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType { ... }
    fn num_optional_arguments(&self) -> usize { 0 }
    fn optional_argument_type(&self, i: usize) -> ForeignFunctionParameterType { ... }
    fn has_variable_arguments(&self) -> bool { false }
    fn variable_argument_type(&self) -> ForeignFunctionParameterType { ... }
}
}

Required Methods

name(&self) -> String Returns the function name as it appears in Scallop programs (without the $ prefix):

#![allow(unused)]
fn main() {
fn name(&self) -> String {
    "string_length".to_string()
}
}

Used in Scallop as: $string_length(s)

return_type(&self) -> ForeignFunctionParameterType Specifies the return value type:

#![allow(unused)]
fn main() {
fn return_type(&self) -> ForeignFunctionParameterType {
    ForeignFunctionParameterType::BaseType(ValueType::USize)
}
}

execute(&self, args: Vec<Value>) -> Option<Value> The actual computation logic:

#![allow(unused)]
fn main() {
fn execute(&self, args: Vec<Value>) -> Option<Value> {
    if let Value::String(s) = &args[0] {
        Some(Value::USize(s.len()))
    } else {
        None
    }
}
}
  • Input: Vector of Value objects (arguments)
  • Output: Some(Value) on success, None on error

Optional Methods (Type System)

Static arguments (required parameters):

#![allow(unused)]
fn main() {
fn num_static_arguments(&self) -> usize { 2 }  // e.g., $max(a, b)

fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
    match i {
        0 => ForeignFunctionParameterType::Generic(0),
        1 => ForeignFunctionParameterType::Generic(0),
        _ => panic!("Invalid argument index"),
    }
}
}

Optional arguments:

#![allow(unused)]
fn main() {
fn num_optional_arguments(&self) -> usize { 1 }  // e.g., $substring(s, start, end?)

fn optional_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
    assert_eq!(i, 0);
    ForeignFunctionParameterType::BaseType(ValueType::USize)
}
}

Variable arguments:

#![allow(unused)]
fn main() {
fn has_variable_arguments(&self) -> bool { true }  // e.g., $concat(strs...)

fn variable_argument_type(&self) -> ForeignFunctionParameterType {
    ForeignFunctionParameterType::BaseType(ValueType::String)
}
}

Parameter Type System

The ForeignFunctionParameterType enum describes argument and return types:

#![allow(unused)]
fn main() {
pub enum ForeignFunctionParameterType {
    /// A generic type parameter (e.g., T0, T1)
    Generic(usize),

    /// A type family (Integer, Float, String, etc.)
    TypeFamily(TypeFamily),

    /// A concrete base type (i32, String, f64, etc.)
    BaseType(ValueType),
}
}

BaseType - Concrete Types

Use BaseType for specific, fixed types:

#![allow(unused)]
fn main() {
use scallop_core::common::value_type::ValueType;

// i32 type
ForeignFunctionParameterType::BaseType(ValueType::I32)

// String type
ForeignFunctionParameterType::BaseType(ValueType::String)

// f64 type
ForeignFunctionParameterType::BaseType(ValueType::F64)
}

Available ValueTypes:

  • Integers: I8, I16, I32, I64, I128, ISize
  • Unsigned: U8, U16, U32, U64, U128, USize
  • Floats: F32, F64
  • Others: Bool, Char, String, Symbol

TypeFamily - Type Groups

Use TypeFamily when a function works with multiple related types:

#![allow(unused)]
fn main() {
use scallop_core::common::type_family::TypeFamily;

// Works with any integer type
ForeignFunctionParameterType::TypeFamily(TypeFamily::Integer)

// Works with any numeric type (integers + floats)
ForeignFunctionParameterType::TypeFamily(TypeFamily::Number)

// Works with any type
ForeignFunctionParameterType::TypeFamily(TypeFamily::Any)
}

Available TypeFamilies:

  • TypeFamily::Integer - All integer types (signed and unsigned)
  • TypeFamily::SignedInteger - Only signed integers
  • TypeFamily::UnsignedInteger - Only unsigned integers
  • TypeFamily::Float - F32 and F64
  • TypeFamily::Number - All numeric types
  • TypeFamily::String - String and Symbol
  • TypeFamily::Any - Any type

Generic - Parameterized Types

Use Generic(id) for type parameters that maintain consistency across arguments:

#![allow(unused)]
fn main() {
// Function signature: $max<T: Number>(a: T, b: T) -> T
fn num_generic_types(&self) -> usize { 1 }  // One type parameter T

fn generic_type_family(&self, i: usize) -> TypeFamily {
    assert_eq!(i, 0);
    TypeFamily::Number  // T must be a Number
}

fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
    ForeignFunctionParameterType::Generic(0)  // Both args use T
}

fn return_type(&self) -> ForeignFunctionParameterType {
    ForeignFunctionParameterType::Generic(0)  // Return type is also T
}
}

Type parameter rules:

  • Generic IDs start at 0
  • Return type can only be Generic or BaseType (not TypeFamily)
  • All generic types must be used in arguments

Implementing Simple Functions

Step-by-Step: String Length Function

Let’s implement $string_length(String) -> usize:

Step 1: Create the struct

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct StringLength;
}

Step 2: Implement the trait

#![allow(unused)]
fn main() {
use scallop_core::common::foreign_function::*;
use scallop_core::common::value::Value;
use scallop_core::common::value_type::ValueType;

impl ForeignFunction for StringLength {
    fn name(&self) -> String {
        "string_length".to_string()
    }

    fn num_static_arguments(&self) -> usize {
        1  // Takes one argument
    }

    fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
        assert_eq!(i, 0);
        ForeignFunctionParameterType::BaseType(ValueType::String)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::USize)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        if let Value::String(s) = &args[0] {
            Some(Value::USize(s.len()))
        } else {
            None  // Type mismatch (shouldn't happen if types are correct)
        }
    }
}
}

Step 3: Register with context

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;

fn main() -> Result<(), IntegrateError> {
    let prov = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

    // Register the function
    ctx.register_foreign_function(StringLength)?;

    // Use it in Scallop
    ctx.add_program(r#"
        rel words = {"hello", "world", "rust"}
        rel lengths(w, len) = words(w), len = $string_length(w)
        query lengths
    "#)?;

    ctx.run()?;

    // Print results
    let results = ctx.computed_relation_ref("lengths").unwrap();
    for elem in results.iter() {
        println!("{:?}", elem.tuple);
    }

    Ok(())
}

Output:

("hello", 5)
("world", 5)
("rust", 4)

Example: Integer Addition

A simple function that adds two integers:

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct Add;

impl ForeignFunction for Add {
    fn name(&self) -> String {
        "add".to_string()
    }

    fn num_static_arguments(&self) -> usize {
        2
    }

    fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::I32)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::I32)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        if let (Value::I32(a), Value::I32(b)) = (&args[0], &args[1]) {
            Some(Value::I32(a + b))
        } else {
            None
        }
    }
}
}

Usage:

rel numbers = {(1, 2), (3, 4), (5, 6)}
rel sums(a, b, sum) = numbers(a, b), sum = $add(a, b)
query sums

Result:

(1, 2, 3)
(3, 4, 7)
(5, 6, 11)

Generic Functions

Generic functions work with multiple types while maintaining type consistency:

Example: Max Function

Implements $max<T: Number>(a: T, b: T) -> T:

#![allow(unused)]
fn main() {
use scallop_core::common::foreign_function::*;
use scallop_core::common::value::Value;
use scallop_core::common::type_family::TypeFamily;

#[derive(Clone)]
pub struct Max;

impl ForeignFunction for Max {
    fn name(&self) -> String {
        "max".to_string()
    }

    fn num_generic_types(&self) -> usize {
        1  // One type parameter T
    }

    fn generic_type_family(&self, i: usize) -> TypeFamily {
        assert_eq!(i, 0);
        TypeFamily::Number  // T must be a number
    }

    fn num_static_arguments(&self) -> usize {
        2  // Two arguments
    }

    fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
        // Both arguments have type T (Generic(0))
        ForeignFunctionParameterType::Generic(0)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        // Return type is also T
        ForeignFunctionParameterType::Generic(0)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        // Handle all numeric types
        match (&args[0], &args[1]) {
            (Value::I32(a), Value::I32(b)) => Some(Value::I32(*a.max(b))),
            (Value::I64(a), Value::I64(b)) => Some(Value::I64(*a.max(b))),
            (Value::F64(a), Value::F64(b)) => Some(Value::F64(a.max(*b))),
            (Value::U32(a), Value::U32(b)) => Some(Value::U32(*a.max(b))),
            // Add more types as needed...
            _ => None,
        }
    }
}
}

Type safety in action:

rel int_pairs = {(5, 10), (20, 15)}
rel float_pairs = {(3.14, 2.71), (1.41, 1.73)}

// Valid: both args are i32
rel int_max(a, b, m) = int_pairs(a, b), m = $max(a, b)

// Valid: both args are f64
rel float_max(a, b, m) = float_pairs(a, b), m = $max(a, b)

// Invalid: mixing types would fail at compile time
// rel mixed(a, b, m) = int_pairs(a, _), float_pairs(_, b), m = $max(a, b)

Example: Fibonacci (Generic Integer)

Works with any integer type:

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct Fib;

impl ForeignFunction for Fib {
    fn name(&self) -> String {
        "fib".to_string()
    }

    fn num_generic_types(&self) -> usize {
        1
    }

    fn generic_type_family(&self, i: usize) -> TypeFamily {
        assert_eq!(i, 0);
        TypeFamily::Integer  // Only integers, not floats
    }

    fn num_static_arguments(&self) -> usize {
        1
    }

    fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
        assert_eq!(i, 0);
        ForeignFunctionParameterType::Generic(0)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::Generic(0)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        match &args[0] {
            Value::I32(n) => compute_fib(*n).map(Value::I32),
            Value::I64(n) => compute_fib(*n).map(Value::I64),
            Value::U32(n) => compute_fib(*n).map(Value::U32),
            // ... handle other integer types
            _ => None,
        }
    }
}

fn compute_fib<T: num_traits::PrimInt>(n: T) -> Option<T> {
    // Fibonacci implementation for generic integer type
    // ...
}
}

Optional and Variable Arguments

Optional Arguments

Functions can have optional parameters that default if not provided:

Example: Substring with optional end

$substring(s: String, start: usize, end: usize?) -> String

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct Substring;

impl ForeignFunction for Substring {
    fn name(&self) -> String {
        "substring".to_string()
    }

    fn num_static_arguments(&self) -> usize {
        2  // s and start are required
    }

    fn static_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
        match i {
            0 => ForeignFunctionParameterType::BaseType(ValueType::String),
            1 => ForeignFunctionParameterType::BaseType(ValueType::USize),
            _ => panic!("Invalid argument index"),
        }
    }

    fn num_optional_arguments(&self) -> usize {
        1  // end is optional
    }

    fn optional_argument_type(&self, i: usize) -> ForeignFunctionParameterType {
        assert_eq!(i, 0);
        ForeignFunctionParameterType::BaseType(ValueType::USize)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::String)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        let s = if let Value::String(s) = &args[0] {
            s
        } else {
            return None;
        };

        let start = if let Value::USize(start) = args[1] {
            start
        } else {
            return None;
        };

        let end = if args.len() > 2 {
            if let Value::USize(end) = args[2] {
                end
            } else {
                return None;
            }
        } else {
            s.len()  // Default: to end of string
        };

        Some(Value::String(s.get(start..end)?.to_string()))
    }
}
}

Usage:

rel text = {"hello world"}

// With both arguments
rel part1(t, sub) = text(t), sub = $substring(t, 0, 5)  // "hello"

// With optional argument omitted
rel part2(t, sub) = text(t), sub = $substring(t, 6)     // "world"

Variable Arguments

Functions that accept unlimited arguments:

Example: String concatenation

$concat(strings: String...) -> String

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct Concat;

impl ForeignFunction for Concat {
    fn name(&self) -> String {
        "concat".to_string()
    }

    fn has_variable_arguments(&self) -> bool {
        true  // Accept any number of arguments
    }

    fn variable_argument_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::String)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::String)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        let mut result = String::new();

        for arg in args {
            if let Value::String(s) = arg {
                result.push_str(&s);
            } else {
                return None;
            }
        }

        Some(Value::String(result))
    }
}
}

Usage:

rel parts = {("Hello", " "), ("world", "!")}

// Concat multiple arguments
rel message = {$concat("Hello", " ", "world", "!")}  // "Hello world!"

// Variable number of arguments
rel concat2(a, b) = parts(a, b), _ = $concat(a, b)
rel concat4(a, b, c, d) = parts(a, b), parts(c, d), _ = $concat(a, b, c, d)

Note: Optional and variable arguments cannot coexist in the same function.

Error Handling

Returning None for Errors

When execute() encounters an error, return None:

#![allow(unused)]
fn main() {
fn execute(&self, args: Vec<Value>) -> Option<Value> {
    // Type check
    let n = if let Value::I32(n) = args[0] {
        n
    } else {
        return None;  // Wrong type
    };

    // Validation
    if n < 0 {
        return None;  // Invalid input (negative factorial)
    }

    // Computation that might fail
    let result = compute_factorial(n)?;  // Propagate None on overflow

    Some(Value::I32(result))
}
}

When to Return None

  • Type mismatch: Arguments don’t match expected types (shouldn’t happen if trait is correct)
  • Invalid input: Mathematically invalid (sqrt of negative, division by zero)
  • Computation error: Overflow, underflow, out of range
  • External failure: I/O error, resource unavailable (avoid in pure functions!)

Panic vs. None

Use None:

  • Invalid inputs that can occur during normal operation
  • Computation failures (overflow, domain errors)
  • Partial functions (e.g., division by zero)

Use panic!:

  • Programming errors (wrong trait implementation)
  • Internal invariant violations
  • Invalid argument indices in trait methods

Example:

#![allow(unused)]
fn main() {
fn execute(&self, args: Vec<Value>) -> Option<Value> {
    if let Value::I32(n) = args[0] {
        if n < 0 {
            None  // Graceful: negative input is user error
        } else {
            Some(Value::I32(n * 2))
        }
    } else {
        // Should never happen if types are correct
        panic!("Type system violated!")
    }
}
}

Error Propagation

In execute(), use ? to propagate None from fallible operations:

#![allow(unused)]
fn main() {
fn execute(&self, args: Vec<Value>) -> Option<Value> {
    let s = if let Value::String(s) = &args[0] {
        s
    } else {
        return None;
    };

    // Parse string to int (returns Option)
    let n: i32 = s.parse().ok()?;  // ? propagates None on failure

    // More operations...
    let result = some_fallible_op(n)?;

    Some(Value::I32(result))
}
}

Complete Working Example

Here’s a complete example demonstrating multiple foreign functions:

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;
use scallop_core::common::foreign_function::*;
use scallop_core::common::value::Value;
use scallop_core::common::value_type::ValueType;
use scallop_core::common::type_family::TypeFamily;

// Function 1: String length
#[derive(Clone)]
struct StrLen;

impl ForeignFunction for StrLen {
    fn name(&self) -> String {
        "str_len".to_string()
    }

    fn num_static_arguments(&self) -> usize {
        1
    }

    fn static_argument_type(&self, _i: usize) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::String)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::BaseType(ValueType::USize)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        if let Value::String(s) = &args[0] {
            Some(Value::USize(s.len()))
        } else {
            None
        }
    }
}

// Function 2: Max of two numbers
#[derive(Clone)]
struct Max;

impl ForeignFunction for Max {
    fn name(&self) -> String {
        "max".to_string()
    }

    fn num_generic_types(&self) -> usize {
        1
    }

    fn generic_type_family(&self, _i: usize) -> TypeFamily {
        TypeFamily::Number
    }

    fn num_static_arguments(&self) -> usize {
        2
    }

    fn static_argument_type(&self, _i: usize) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::Generic(0)
    }

    fn return_type(&self) -> ForeignFunctionParameterType {
        ForeignFunctionParameterType::Generic(0)
    }

    fn execute(&self, args: Vec<Value>) -> Option<Value> {
        match (&args[0], &args[1]) {
            (Value::I32(a), Value::I32(b)) => Some(Value::I32(*a.max(b))),
            (Value::F64(a), Value::F64(b)) => Some(Value::F64(a.max(*b))),
            _ => None,
        }
    }
}

fn main() -> Result<(), IntegrateError> {
    let prov = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

    // Register foreign functions
    ctx.register_foreign_function(StrLen)?;
    ctx.register_foreign_function(Max)?;

    // Add Scallop program
    ctx.add_program(r#"
        rel words = {"hello", "world", "rust", "scallop"}
        rel numbers = {(5, 10), (20, 15), (8, 12)}

        // Use string length function
        rel word_lengths(w, len) = words(w), len = $str_len(w)

        // Use max function
        rel maximums(a, b, m) = numbers(a, b), m = $max(a, b)

        query word_lengths
        query maximums
    "#)?;

    // Run the program
    ctx.run()?;

    // Query and display results
    println!("Word lengths:");
    let word_lengths = ctx.computed_relation_ref("word_lengths").unwrap();
    for elem in word_lengths.iter() {
        println!("  {:?}", elem.tuple);
    }

    println!("\nMaximums:");
    let maximums = ctx.computed_relation_ref("maximums").unwrap();
    for elem in maximums.iter() {
        println!("  {:?}", elem.tuple);
    }

    Ok(())
}

Output:

Word lengths:
  ("hello", 5)
  ("world", 5)
  ("rust", 4)
  ("scallop", 7)

Maximums:
  (5, 10, 10)
  (20, 15, 20)
  (8, 12, 12)

Next Steps

Foreign Predicates

This guide covers implementing foreign predicates in Rust to extend Scallop with custom fact generators.

Overview

Foreign predicates are non-deterministic relations that generate facts dynamically at runtime. Unlike foreign functions (which are pure and deterministic), foreign predicates can:

  • Yield multiple results for a single input
  • Generate facts from external sources (databases, files, APIs)
  • Support different input/output modes via binding patterns
  • Tag results with probabilities for provenance tracking

Comparison to Foreign Functions:

FeatureForeign FunctionsForeign Predicates
DeterminismPure, deterministicNon-deterministic
ResultsSingle valueMultiple tuples
Use caseComputationFact generation
Example$string_length(s)range(n, i)

Comparison to Python API:

The Rust ForeignPredicate trait corresponds to Python’s @foreign_predicate decorator:

# Python
@foreign_predicate(name="range", output_arg_types=[int])
def range_pred(n: int) -> Facts[float, Tuple[int, int]]:
    for i in range(n):
        yield (1.0, (n, i))
#![allow(unused)]
fn main() {
// Rust equivalent (shown later in this guide)
impl ForeignPredicate for Range {
    fn name(&self) -> String { "range".to_string() }
    fn arity(&self) -> usize { 2 }
    fn num_bounded(&self) -> usize { 1 }
    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        // Implementation
    }
}
}

The ForeignPredicate Trait

Trait Definition

#![allow(unused)]
fn main() {
pub trait ForeignPredicate: DynClone {
    /// Name of the predicate
    fn name(&self) -> String;

    /// Total number of arguments
    fn arity(&self) -> usize;

    /// Type of the i-th argument
    fn argument_type(&self, i: usize) -> ValueType;

    /// Number of bounded (input) arguments
    fn num_bounded(&self) -> usize;

    /// Number of free (output) arguments (computed)
    fn num_free(&self) -> usize {
        self.arity() - self.num_bounded()
    }

    /// Evaluate predicate with bounded arguments, yield free arguments
    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)>;

    /// Optional: evaluate with all arguments provided (for validation)
    fn evaluate_with_all_arguments(&self, args: &[Value]) -> Vec<DynamicInputTag> {
        vec![]  // Default: no validation
    }
}
}

Key Points:

  • name() - Predicate name used in Scallop programs
  • arity() - Total number of arguments (bounded + free)
  • argument_type(i) - ValueType for each argument position
  • num_bounded() - How many arguments are inputs
  • evaluate(bounded) - Core method that generates results

Return Type: Tagged Tuples

The evaluate() method returns:

#![allow(unused)]
fn main() {
Vec<(DynamicInputTag, Vec<Value>)>
}

Structure:

  • Outer Vec - Multiple results (non-deterministic)
  • DynamicInputTag - Probability or ID for provenance tracking
  • Vec<Value> - Complete tuple (bounded + free arguments)

Example:

#![allow(unused)]
fn main() {
vec![
    (DynamicInputTag::None, vec![Value::I32(5), Value::I32(0)]),
    (DynamicInputTag::None, vec![Value::I32(5), Value::I32(1)]),
    (DynamicInputTag::None, vec![Value::I32(5), Value::I32(2)]),
]
// Three results from range(5, i): (5, 0), (5, 1), (5, 2)
}

Binding Patterns

Foreign predicates support different input/output modes based on which arguments are bounded (input) vs free (output).

Binding Pattern Notation

PatternMeaningExample CallDescription
bbBoth boundedpred(5, 10)Both arguments provided
bfFirst bounded, second freepred(5, x)First is input, second is output
fbFirst free, second boundedpred(x, 10)First is output, second is input
ffBoth freepred(x, y)Generate all pairs

In Scallop programs:

// Binding pattern bf: n is bounded, i is free
rel result(n, i) = n in {5, 10}, range(n, i)
// Calls: range(5, i) and range(10, i)

// Binding pattern bb: both bounded (for validation)
rel check = range(5, 3)
// Calls: range(5, 3) - checks if (5, 3) is valid

How Scallop Determines Binding Patterns

Scallop analyzes the query to determine which arguments are bounded (known values) vs free (variables):

#![allow(unused)]
fn main() {
// In foreign predicate implementation:
fn num_bounded(&self) -> usize { 1 }  // First argument is bounded

// Scallop automatically determines:
// - Call with n=5: bounded = [Value::I32(5)]
// - Predicate returns: [(tag, [Value::I32(5), Value::I32(0)]), ...]
}

Important: The bounded slice in evaluate() contains only the bounded arguments, but the returned tuple must contain all arguments (bounded + free).


Implementing Simple Predicates

Example 1: Range Generator (Pattern: bf)

Generates integers from 0 to n-1 for a given n.

#![allow(unused)]
fn main() {
use scallop_core::common::foreign_predicate::*;
use scallop_core::common::value::*;
use scallop_core::common::input_tag::DynamicInputTag;

#[derive(Clone)]
pub struct Range;

impl ForeignPredicate for Range {
    fn name(&self) -> String {
        "range".to_string()
    }

    fn arity(&self) -> usize {
        2  // (n, i)
    }

    fn argument_type(&self, i: usize) -> ValueType {
        ValueType::I32  // Both arguments are i32
    }

    fn num_bounded(&self) -> usize {
        1  // First argument (n) is bounded
    }

    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        // Extract bounded argument
        if let Value::I32(n) = &bounded[0] {
            // Generate range [0, n)
            (0..*n).map(|i| {
                (
                    DynamicInputTag::None,
                    vec![Value::I32(*n), Value::I32(i)]  // Full tuple: (n, i)
                )
            }).collect()
        } else {
            vec![]  // Type mismatch
        }
    }
}
}

Usage in Scallop:

rel numbers(n, i) = n in {5, 10}, range(n, i)
query numbers

// Results:
// (5, 0), (5, 1), (5, 2), (5, 3), (5, 4)
// (10, 0), (10, 1), ..., (10, 9)

Register with IntegrateContext:

#![allow(unused)]
fn main() {
use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;

let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

ctx.register_foreign_predicate(Range);

ctx.add_program(r#"
    rel numbers(n, i) = n in {5, 10}, range(n, i)
    query numbers
"#).unwrap();

ctx.run().unwrap();

let numbers = ctx.computed_relation_ref("numbers").unwrap();
for elem in numbers.iter() {
    println!("{:?}", elem.tuple);
}
}

Example 2: String Splitter (Pattern: bf, Multiple Results)

Splits a string into individual characters.

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct StringChars;

impl ForeignPredicate for StringChars {
    fn name(&self) -> String {
        "string_chars".to_string()
    }

    fn arity(&self) -> usize {
        2  // (string, char)
    }

    fn argument_type(&self, i: usize) -> ValueType {
        if i == 0 {
            ValueType::String
        } else {
            ValueType::Char
        }
    }

    fn num_bounded(&self) -> usize {
        1  // First argument (string) is bounded
    }

    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        if let Value::String(s) = &bounded[0] {
            s.chars().map(|c| {
                (
                    DynamicInputTag::None,
                    vec![bounded[0].clone(), Value::Char(c)]
                )
            }).collect()
        } else {
            vec![]
        }
    }
}
}

Usage:

rel word = {"hello", "world"}
rel letters(w, c) = word(w), string_chars(w, c)
query letters

// Results:
// ("hello", 'h'), ("hello", 'e'), ("hello", 'l'), ("hello", 'l'), ("hello", 'o')
// ("world", 'w'), ("world", 'o'), ("world", 'r'), ("world", 'l'), ("world", 'd')

Multiple Binding Patterns

Some predicates support different binding patterns for bidirectional lookup.

Example: Key-Value Store (Patterns: bf, fb, ff)

#![allow(unused)]
fn main() {
use std::collections::HashMap;

#[derive(Clone)]
pub struct Lookup {
    data: HashMap<String, String>,
}

impl Lookup {
    pub fn new() -> Self {
        let mut data = HashMap::new();
        data.insert("name".to_string(), "Alice".to_string());
        data.insert("age".to_string(), "30".to_string());
        data.insert("city".to_string(), "NYC".to_string());
        Self { data }
    }
}

impl ForeignPredicate for Lookup {
    fn name(&self) -> String {
        "lookup".to_string()
    }

    fn arity(&self) -> usize {
        2  // (key, value)
    }

    fn argument_type(&self, _: usize) -> ValueType {
        ValueType::String
    }

    fn num_bounded(&self) -> usize {
        1  // Can be either first or second argument
    }

    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        // Note: This simplified example only handles bf pattern
        // For multiple patterns, you'd need to track which argument is bounded

        if let Value::String(key) = &bounded[0] {
            // Pattern bf: key → value
            if let Some(value) = self.data.get(key) {
                vec![(
                    DynamicInputTag::None,
                    vec![bounded[0].clone(), Value::String(value.clone())]
                )]
            } else {
                vec![]
            }
        } else {
            vec![]
        }
    }
}
}

Note: Full multi-pattern support requires tracking which arguments are bounded. In practice, you might implement separate predicates for different patterns or use Scallop’s built-in pattern matching.

Usage:

rel keys = {"name", "age"}
rel values(k, v) = keys(k), lookup(k, v)
query values

// Results:
// ("name", "Alice")
// ("age", "30")

Tagging Facts with Probabilities

Foreign predicates can tag results with probabilities for provenance tracking.

Using DynamicInputTag Variants

#![allow(unused)]
fn main() {
pub enum DynamicInputTag {
    None,                              // No tag (unit provenance)
    Bool(bool),                        // Boolean tag
    Natural(usize),                    // Natural number tag
    Float(f64),                        // Probability tag
    // ... other variants
}
}

Example: Probabilistic Results

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct WeatherForecast;

impl ForeignPredicate for WeatherForecast {
    fn name(&self) -> String {
        "forecast".to_string()
    }

    fn arity(&self) -> usize {
        2  // (city, weather)
    }

    fn argument_type(&self, _: usize) -> ValueType {
        ValueType::String
    }

    fn num_bounded(&self) -> usize {
        1  // City is bounded
    }

    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        if let Value::String(city) = &bounded[0] {
            match city.as_str() {
                "NYC" => vec![
                    (DynamicInputTag::Float(0.7), vec![bounded[0].clone(), Value::String("sunny".into())]),
                    (DynamicInputTag::Float(0.2), vec![bounded[0].clone(), Value::String("rainy".into())]),
                    (DynamicInputTag::Float(0.1), vec![bounded[0].clone(), Value::String("cloudy".into())]),
                ],
                "LA" => vec![
                    (DynamicInputTag::Float(0.9), vec![bounded[0].clone(), Value::String("sunny".into())]),
                    (DynamicInputTag::Float(0.1), vec![bounded[0].clone(), Value::String("cloudy".into())]),
                ],
                _ => vec![]
            }
        } else {
            vec![]
        }
    }
}
}

Usage with Probabilistic Provenance:

#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::min_max_prob::MinMaxProbProvenance;

let prov = MinMaxProbProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

ctx.register_foreign_predicate(WeatherForecast);

ctx.add_program(r#"
    rel city = {"NYC", "LA"}
    rel weather(c, w) = city(c), forecast(c, w)
    query weather
"#).unwrap();

ctx.run().unwrap();

let weather = ctx.computed_relation_ref("weather").unwrap();
for elem in weather.iter() {
    println!("Probability: {}, Tuple: {:?}", elem.tag, elem.tuple);
}
}

Output:

Probability: 0.7, Tuple: ("NYC", "sunny")
Probability: 0.2, Tuple: ("NYC", "rainy")
Probability: 0.1, Tuple: ("NYC", "cloudy")
Probability: 0.9, Tuple: ("LA", "sunny")
Probability: 0.1, Tuple: ("LA", "cloudy")

Complete Working Example

Here’s a full program demonstrating foreign predicates with file I/O.

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;
use scallop_core::common::foreign_predicate::*;
use scallop_core::common::value::*;
use scallop_core::common::input_tag::DynamicInputTag;

// Foreign predicate: read CSV file
#[derive(Clone)]
pub struct ReadCSV {
    data: Vec<(String, i32, String)>,
}

impl ReadCSV {
    pub fn new() -> Self {
        // Simulated CSV data: (name, age, city)
        Self {
            data: vec![
                ("Alice".into(), 30, "NYC".into()),
                ("Bob".into(), 25, "LA".into()),
                ("Charlie".into(), 35, "Chicago".into()),
            ]
        }
    }
}

impl ForeignPredicate for ReadCSV {
    fn name(&self) -> String {
        "read_csv".to_string()
    }

    fn arity(&self) -> usize {
        3  // (name, age, city)
    }

    fn argument_type(&self, i: usize) -> ValueType {
        match i {
            0 => ValueType::String,  // name
            1 => ValueType::I32,     // age
            2 => ValueType::String,  // city
            _ => panic!("Invalid argument index"),
        }
    }

    fn num_bounded(&self) -> usize {
        0  // All free (ff pattern)
    }

    fn evaluate(&self, _bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        self.data.iter().map(|(name, age, city)| {
            (
                DynamicInputTag::None,
                vec![
                    Value::String(name.clone()),
                    Value::I32(*age),
                    Value::String(city.clone()),
                ]
            )
        }).collect()
    }
}

fn main() -> Result<(), IntegrateError> {
    let prov = UnitProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

    // Register foreign predicates
    ctx.register_foreign_predicate(Range);
    ctx.register_foreign_predicate(ReadCSV::new());

    ctx.add_program(r#"
        // Load data from CSV
        rel person(name, age, city) = read_csv(name, age, city)

        // Find adults
        rel adult(name) = person(name, age, city) and age >= 30

        // Generate ID range
        rel ids(n, id) = n in {3}, range(n, id)

        query person
        query adult
        query ids
    "#)?;

    ctx.run()?;

    // Display results
    println!("People:");
    let person = ctx.computed_relation_ref("person")?;
    for elem in person.iter() {
        println!("  {:?}", elem.tuple);
    }

    println!("\nAdults:");
    let adult = ctx.computed_relation_ref("adult")?;
    for elem in adult.iter() {
        println!("  {:?}", elem.tuple);
    }

    println!("\nIDs:");
    let ids = ctx.computed_relation_ref("ids")?;
    for elem in ids.iter() {
        println!("  {:?}", elem.tuple);
    }

    Ok(())
}

Expected Output:

People:
  ("Alice", 30, "NYC")
  ("Bob", 25, "LA")
  ("Charlie", 35, "Chicago")

Adults:
  ("Alice")
  ("Charlie")

IDs:
  (3, 0)
  (3, 1)
  (3, 2)

Best Practices

1. Type Safety

Always validate argument types before processing:

#![allow(unused)]
fn main() {
fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
    // Good: Type check
    if let Value::I32(n) = &bounded[0] {
        // Process
    } else {
        return vec![];  // Type mismatch
    }
}
}

2. Return Complete Tuples

The returned tuples must include all arguments (bounded + free):

#![allow(unused)]
fn main() {
// Predicate: range(n, i) with arity=2, num_bounded=1
fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
    if let Value::I32(n) = &bounded[0] {
        (0..*n).map(|i| {
            (
                DynamicInputTag::None,
                vec![
                    bounded[0].clone(),  // Include bounded argument (n)
                    Value::I32(i)        // Add free argument (i)
                ]
            )
        }).collect()
    } else {
        vec![]
    }
}
}

3. Use Appropriate Tags

Match tag type to provenance:

#![allow(unused)]
fn main() {
// For UnitProvenance
DynamicInputTag::None

// For probabilistic provenance
DynamicInputTag::Float(0.8)

// For counting provenance
DynamicInputTag::Natural(5)
}

4. Handle Empty Results

Return empty vec for invalid inputs or no results:

#![allow(unused)]
fn main() {
fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
    if let Value::String(key) = &bounded[0] {
        if let Some(value) = self.lookup(key) {
            vec![/* result */]
        } else {
            vec![]  // Key not found
        }
    } else {
        vec![]  // Type mismatch
    }
}
}

5. Clone Bounded Arguments

When including bounded arguments in results, clone them:

#![allow(unused)]
fn main() {
vec![
    bounded[0].clone(),  // Clone bounded argument
    Value::String(result)  // Add free argument
]
}

Common Patterns

Pattern 1: Database Query

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct SQLQuery {
    // Connection pool, etc.
}

impl ForeignPredicate for SQLQuery {
    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        if let Value::String(table) = &bounded[0] {
            // Execute: SELECT * FROM table
            // Return rows as tuples
        }
        vec![]
    }
}
}

Pattern 2: File Reader

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct ReadLines {
    path: String,
}

impl ForeignPredicate for ReadLines {
    fn evaluate(&self, _bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        std::fs::read_to_string(&self.path)
            .ok()
            .map(|content| {
                content.lines().enumerate().map(|(i, line)| {
                    (
                        DynamicInputTag::None,
                        vec![Value::USize(i), Value::String(line.to_string())]
                    )
                }).collect()
            })
            .unwrap_or_else(Vec::new)
    }
}
}

Pattern 3: API Call

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct RestAPI;

impl ForeignPredicate for RestAPI {
    fn evaluate(&self, bounded: &[Value]) -> Vec<(DynamicInputTag, Vec<Value>)> {
        if let Value::String(endpoint) = &bounded[0] {
            // HTTP GET request
            // Parse JSON response
            // Return fields as tuples
        }
        vec![]
    }
}
}

Next Steps

Resources

  • Trait Definition: scallop-core/src/common/foreign_predicate.rs
  • Test Examples: scallop-core/tests/integrate/adt.rs
  • Python API Comparison: Foreign Predicates (Python)

Provenance Types

This guide covers Scallop’s provenance tracking system - a powerful semiring-based framework for reasoning with different semantics.

Overview

Provenance determines how facts are tagged and how tags combine during reasoning. Scallop’s unified execution engine can switch between discrete logic, probabilistic reasoning, and differentiable computation while maintaining full traceability of derivations.

What is Provenance?

In Scallop, every fact has an associated tag that tracks metadata:

#![allow(unused)]
fn main() {
// Without provenance (standard DataLog)
edge(0, 1)

// With probabilistic provenance
0.8::edge(0, 1)  // 80% confidence

// With counting provenance
5::edge(0, 1)    // Appears 5 times
}

Key insight: The same Scallop program can execute with different provenance types to answer different questions:

  • Unit - “Does this fact hold?” (true/false)
  • Natural - “How many derivations exist?” (count)
  • MinMaxProb - “What’s the confidence?” (probability)
  • TopKProofs - “What are the top-K explanations?” (proofs + probability)

The Three-Stage Tag Flow

Input Facts              Runtime Execution        Output Results
    ↓                           ↓                        ↓
InputTag ──tagging_fn()→ Tag ──operations→ Tag ──recover_fn()→ OutputTag

Example:

#![allow(unused)]
fn main() {
// Input: User provides probability
InputTag = 0.8

// Internal: Converted to provenance tag
Tag = 0.8  // For MinMaxProbProvenance

// Output: Result displayed to user
OutputTag = 0.56  // After combining: 0.8 * 0.7 = 0.56
}

The Provenance Trait

Trait Definition

#![allow(unused)]
fn main() {
pub trait Provenance: Clone + 'static {
    /// The input tag space (what users provide)
    type InputTag: Clone + Debug + StaticInputTag;

    /// The internal tag space (used during execution)
    type Tag: Tag;

    /// The output tag space (what users see in results)
    type OutputTag: Clone + Debug + Display;

    /// Name of the provenance
    fn name(&self) -> String;

    /// Convert input tag to internal tag
    fn tagging_fn(&self, ext_tag: Self::InputTag) -> Self::Tag;

    /// Convert optional input tag (None → one())
    fn tagging_optional_fn(&self, ext_tag: Option<Self::InputTag>) -> Self::Tag {
        match ext_tag {
            Some(et) => self.tagging_fn(et),
            None => self.one(),
        }
    }

    /// Convert internal tag to output tag
    fn recover_fn(&self, t: &Self::Tag) -> Self::OutputTag;

    /// Check if a fact should be discarded
    fn discard(&self, t: &Self::Tag) -> bool;

    /// Zero element (disjunction identity)
    fn zero(&self) -> Self::Tag;

    /// One element (conjunction identity)
    fn one(&self) -> Self::Tag;

    /// Add operation (disjunction, OR)
    fn add(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag;

    /// Multiply operation (conjunction, AND)
    fn mult(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag;

    /// Negate operation (NOT)
    fn negate(&self, t: &Self::Tag) -> Option<Self::Tag> {
        None  // Default: negation not supported
    }

    /// Check if tag has saturated (convergence)
    fn saturated(&self, t_old: &Self::Tag, t_new: &Self::Tag) -> bool;

    /// Get weight of a tag (for ranking)
    fn weight(&self, tag: &Self::Tag) -> f64 {
        1.0  // Default: all tags equally weighted
    }
}
}

Associated Types

InputTag - What users provide when adding facts:

#![allow(unused)]
fn main() {
ctx.add_facts("edge", vec![
    (Some(0.8.into()), (0, 1).into()),  // InputTag = f64 for MinMaxProb
], false)?;
}

Tag - Internal representation during execution:

#![allow(unused)]
fn main() {
// MinMaxProbProvenance uses f64 as Tag
type Tag = f64;

// TopKProofsProvenance uses DNFFormula
type Tag = Rc<DNFFormula>;
}

OutputTag - What users see in results:

#![allow(unused)]
fn main() {
for elem in results.iter() {
    println!("Tag: {}, Tuple: {:?}", elem.tag, elem.tuple);
    // elem.tag is OutputTag type
}
}

Available Provenance Types

Discrete Provenances

UnitProvenance - Standard DataLog

No tracking - Classic logical reasoning.

#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::discrete::unit::UnitProvenance;

impl Provenance for UnitProvenance {
    type InputTag = ();
    type Tag = Unit;
    type OutputTag = Unit;

    fn add(&self, _t1: &Unit, _t2: &Unit) -> Unit { Unit }  // OR
    fn mult(&self, _t1: &Unit, _t2: &Unit) -> Unit { Unit } // AND
}
}

Use case: Traditional DataLog queries without metadata.

#![allow(unused)]
fn main() {
let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

ctx.add_facts("edge", vec![
    (None, (0, 1).into()),  // No tag
    (None, (1, 2).into()),
], false)?;
}

BooleanProvenance - Boolean Algebra

Boolean tags - Negation-as-failure support.

#![allow(unused)]
fn main() {
type InputTag = bool;
type Tag = bool;
type OutputTag = bool;

// Semiring operations:
fn add(&self, t1: &bool, t2: &bool) -> bool { *t1 || *t2 }   // OR
fn mult(&self, t1: &bool, t2: &bool) -> bool { *t1 && *t2 }  // AND
fn negate(&self, t: &bool) -> Option<bool> { Some(!*t) }     // NOT
}

Use case: Programs with negation.

NaturalProvenance - Counting

Count multiplicity - Track number of derivations.

#![allow(unused)]
fn main() {
type InputTag = usize;
type Tag = usize;
type OutputTag = usize;

// Semiring operations:
fn add(&self, t1: &usize, t2: &usize) -> usize { t1 + t2 }   // Sum
fn mult(&self, t1: &usize, t2: &usize) -> usize { t1 * t2 }  // Product
}

Use case: Cardinality queries, bag semantics.

#![allow(unused)]
fn main() {
let prov = NaturalProvenance::default();
// Fact appears 3 times
ctx.add_facts("edge", vec![(Some(3), (0, 1).into())], false)?;
}

ProofsProvenance - Derivation Tracking

Proof trees - Track all derivation paths (no probabilities).

#![allow(unused)]
fn main() {
type InputTag = Exclusion;
type Tag = Rc<Proofs>;
type OutputTag = Rc<Proofs>;
}

Use case: Debugging, explainability without probabilities.

Probabilistic Provenances

MinMaxProbProvenance - Probabilistic (Min-Max Semiring)

Probability tracking with min-max semantics.

#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::probabilistic::min_max_prob::MinMaxProbProvenance;

impl Provenance for MinMaxProbProvenance {
    type InputTag = f64;
    type Tag = f64;
    type OutputTag = f64;

    fn add(&self, t1: &f64, t2: &f64) -> f64 {
        t1.max(*t2)  // Best alternative (OR)
    }

    fn mult(&self, t1: &f64, t2: &f64) -> f64 {
        t1.min(*t2)  // Weakest link (AND)
    }

    fn negate(&self, p: &f64) -> Option<f64> {
        Some(1.0 - p)  // Complement probability
    }
}
}

Semiring intuition:

  • add (OR) = take maximum probability (best alternative)
  • mult (AND) = take minimum probability (weakest link)

Use case: Fuzzy logic, confidence propagation.

#![allow(unused)]
fn main() {
let prov = MinMaxProbProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

ctx.add_facts("edge", vec![
    (Some(0.8.into()), (0, 1).into()),
    (Some(0.7.into()), (1, 2).into()),
], false)?;

// path(0, 2) derives with prob = min(0.8, 0.7) = 0.7
}

AddMultProbProvenance - Probabilistic (Add-Mult Semiring)

Independent events probability.

#![allow(unused)]
fn main() {
fn add(&self, t1: &f64, t2: &f64) -> f64 {
    t1 + t2 - (t1 * t2)  // Inclusion-exclusion (OR)
}

fn mult(&self, t1: &f64, t2: &f64) -> f64 {
    t1 * t2  // Independent events (AND)
}
}

Semiring intuition:

  • add (OR) = inclusion-exclusion principle
  • mult (AND) = independent probability multiplication

Use case: Statistical reasoning, independent events.

TopKProofsProvenance - Top-K Most Probable Proofs

Track top-K derivation proofs with probabilities.

#![allow(unused)]
fn main() {
type InputTag = InputExclusiveProb;
type Tag = Rc<DNFFormula>;
type OutputTag = f64;
}

Internally tracks:

  • DNF formula representing proof combinations
  • Computes probability via Weighted Model Counting (WMC)
  • Returns top-K most probable derivations

Use case: Explanation generation, ranking derivations.

#![allow(unused)]
fn main() {
use scallop_core::runtime::provenance::probabilistic::top_k_proofs::TopKProofsProvenance;

let prov = TopKProofsProvenance::<RcFamily>::new(3);  // Top-3 proofs
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

// Results include probability computed from top proofs
}

ProbProofsProvenance - Exact Probability with All Proofs

Complete proof tracking with exact probabilities.

#![allow(unused)]
fn main() {
type InputTag = ProbProofs<RcFamily>;
type Tag = Rc<ProbProofs<RcFamily>>;
type OutputTag = f64;
}

Tracks all derivation paths and computes exact probability via SDD (Sentential Decision Diagram).

Use case: Exact probabilistic reasoning, complete explanations.

Differentiable Provenances

These provenances support gradient computation for integration with machine learning frameworks like PyTorch.

DiffTopKProofsProvenance<T> - Differentiable Top-K

Backpropagation support for neural-symbolic integration.

#![allow(unused)]
fn main() {
type InputTag = InputExclusiveDiffProb<T>;
type Tag = Rc<DNFFormula>;
type OutputTag = (f64, Vec<T>);  // (probability, gradients)
}

External tag T: Typically a PyTorch tensor for gradient tracking.

Use case: Neural-symbolic learning, gradient-based optimization.

DiffTopKProofsDebugProvenance<T> - Differentiable with Proofs

Debug variant that returns proofs alongside gradients.

#![allow(unused)]
fn main() {
type InputTag = InputExclusiveDiffProbWithID<T>;
type OutputTag = (f64, Vec<T>, Vec<Proofs>);  // (prob, gradients, proofs)
}

Unique feature: Supports user-provided stable IDs for facts.

Use case: Debugging neural-symbolic systems, stable fact identification.


Using Different Provenances

Creating a Context with Provenance

#![allow(unused)]
fn main() {
use scallop_core::integrate::*;
use scallop_core::runtime::provenance::*;
use scallop_core::runtime::env::RcFamily;

// Standard DataLog
let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

// Probabilistic (min-max)
let prov = MinMaxProbProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

// Top-3 proofs
let prov = TopKProofsProvenance::<RcFamily>::new(3);
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
}

Adding Facts with Tags

#![allow(unused)]
fn main() {
use scallop_core::common::tuple::Tuple;

// Unit provenance (no tag)
ctx.add_facts("edge", vec![
    (None, Tuple::from((0i32, 1i32))),
], false)?;

// Probabilistic provenance
ctx.add_facts("edge", vec![
    (Some(0.8.into()), Tuple::from((0i32, 1i32))),
    (Some(0.9.into()), Tuple::from((1i32, 2i32))),
], false)?;

// Natural provenance (count)
ctx.add_facts("edge", vec![
    (Some(5usize), Tuple::from((0i32, 1i32))),  // Appears 5 times
], false)?;
}

Interpreting Output Tags

#![allow(unused)]
fn main() {
ctx.run()?;

let results = ctx.computed_relation_ref("path")?;
for elem in results.iter() {
    match provenance_type {
        "unit" => {
            println!("Tuple: {:?}", elem.tuple);
            // elem.tag is Unit (no info)
        }
        "minmaxprob" => {
            println!("Probability: {}, Tuple: {:?}", elem.tag, elem.tuple);
            // elem.tag is f64
        }
        "natural" => {
            println!("Count: {}, Tuple: {:?}", elem.tag, elem.tuple);
            // elem.tag is usize
        }
        _ => {}
    }
}
}

Semiring Operations

Provenance forms a semiring with two operations:

Addition (Disjunction, OR)

Combines alternative derivations of the same fact.

rel path(0, 2) :- edge(0, 1), edge(1, 2)  // Derivation 1
rel path(0, 2) :- edge(0, 2)               // Derivation 2

How provenances combine alternatives:

Provenanceadd(t1, t2)Example
UnitUnitNo change
Booleant1 ∨ t2true ∨ false = true
Naturalt1 + t23 + 5 = 8
MinMaxProbmax(t1, t2)max(0.8, 0.6) = 0.8
AddMultProbt1 + t2 - t1*t20.8 + 0.6 - 0.48 = 0.92

Multiplication (Conjunction, AND)

Combines dependent facts in a rule body.

rel path(a, c) :- edge(a, b), edge(b, c)  // Both facts needed

How provenances combine conjunctions:

Provenancemult(t1, t2)Example
UnitUnitNo change
Booleant1 ∧ t2true ∧ false = false
Naturalt1 * t23 * 5 = 15
MinMaxProbmin(t1, t2)min(0.8, 0.9) = 0.8
AddMultProbt1 * t20.8 * 0.9 = 0.72

Identity Elements

Every semiring has zero (additive identity) and one (multiplicative identity):

ProvenanceZeroOne
UnitUnitUnit
Booleanfalsetrue
Natural01
MinMaxProb0.01.0

Properties:

add(t, zero) = t
mult(t, one) = t

Example: MinMaxProb Semiring

rel 0.8::edge(0, 1)
rel 0.9::edge(1, 2)
rel 0.7::edge(0, 2)

rel path(a, b) = edge(a, b)
rel path(a, c) = edge(a, b), edge(b, c)

query path(0, 2)

Derivation:

  1. Direct path: edge(0, 2) → probability = 0.7
  2. Indirect path: edge(0, 1) ∧ edge(1, 2) → probability = min(0.8, 0.9) = 0.8
  3. Combine alternatives: add(0.7, 0.8) = max(0.7, 0.8) = 0.8

Result: path(0, 2) has probability 0.8


Complete Example: Probabilistic Path Finding

Here’s a full program demonstrating probabilistic reasoning.

use scallop_core::integrate::*;
use scallop_core::runtime::provenance::probabilistic::min_max_prob::MinMaxProbProvenance;
use scallop_core::runtime::env::RcFamily;
use scallop_core::common::tuple::Tuple;
use scallop_core::common::value::Value;

fn main() -> Result<(), IntegrateError> {
    // Create context with min-max probability provenance
    let prov = MinMaxProbProvenance::default();
    let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);

    // Define schema
    ctx.add_relation("edge(i32, i32)")?;

    // Add probabilistic edges
    ctx.add_facts("edge", vec![
        (Some(0.8.into()), Tuple::from((0i32, 1i32))),  // 80% confidence
        (Some(0.9.into()), Tuple::from((1i32, 2i32))),  // 90% confidence
        (Some(0.7.into()), Tuple::from((2i32, 3i32))),  // 70% confidence
        (Some(0.6.into()), Tuple::from((0i32, 2i32))),  // 60% confidence (shortcut)
    ], false)?;

    // Define transitive closure
    ctx.add_rule("path(a, b) = edge(a, b)")?;
    ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")?;

    // Execute
    ctx.run()?;

    // Query paths with probabilities
    let path = ctx.computed_relation_ref("path")?;

    println!("Probabilistic Paths:");
    for elem in path.iter() {
        if let (Some(Value::I32(from)), Some(Value::I32(to))) =
            (elem.tuple.get(0), elem.tuple.get(1))
        {
            println!("  path({}, {}) with confidence: {:.2}", from, to, elem.tag);
        }
    }

    Ok(())
}

Expected Output:

Probabilistic Paths:
  path(0, 1) with confidence: 0.80
  path(1, 2) with confidence: 0.90
  path(2, 3) with confidence: 0.70
  path(0, 2) with confidence: 0.80  // max(0.6 direct, 0.8 via 1)
  path(1, 3) with confidence: 0.70  // min(0.9, 0.7)
  path(0, 3) with confidence: 0.70  // multiple paths, best is 0.70

Explanation:

  • path(0, 2) has two derivations:

    1. Direct edge with prob 0.6
    2. Via node 1: min(0.8, 0.9) = 0.8
    3. Combined: max(0.6, 0.8) = 0.8
  • path(0, 3) has multiple paths:

    1. Via 1, 2: min(0.8, 0.9, 0.7) = 0.7
    2. Via 2: min(0.8, 0.7) = 0.7 (using shortcut)
    3. Combined: max(0.7, 0.7) = 0.7

Comparing Probabilistic Provenances

MinMaxProb vs AddMultProb vs TopKProofs

#![allow(unused)]
fn main() {
// Same input facts
let facts = vec![
    (Some(0.8.into()), (0, 1).into()),
    (Some(0.9.into()), (1, 2).into()),
];

// Program: path(a, c) :- edge(a, b), edge(b, c)
}

Results for path(0, 2):

ProvenanceProbabilityInterpretation
MinMaxProb0.8Weakest link: min(0.8, 0.9)
AddMultProb0.72Independent: 0.8 × 0.9
TopKProofs0.72Via WMC (equivalent to AddMultProb for single proof)

When to use each:

  • MinMaxProb - Fuzzy logic, confidence propagation, when conjunction means “limited by weakest”
  • AddMultProb - Statistical independence, Bayesian reasoning
  • TopKProofs - When you need explanations and multiple derivation paths
  • ProbProofs - When you need exact probabilities with all proof trees

Advanced: Weighted Model Counting (WMC)

For proof-based provenances (TopKProofs, ProbProofs), probabilities are computed via Weighted Model Counting over Boolean formulas.

How It Works

  1. Proofs to Formula:

    Proofs: {{fact_0, fact_1}, {fact_2}}
    Formula: (f₀ ∧ f₁) ∨ f₂
    
  2. Build SDD (Sentential Decision Diagram) for efficient computation

  3. Evaluate with probability semiring:

    f₀ = 0.8, f₁ = 0.9, f₂ = 0.5
    WMC = (0.8 × 0.9) + 0.5 - (0.8 × 0.9 × 0.5)
        = 0.72 + 0.5 - 0.36
        = 0.86
    

Key insight: Proof-based provenances use inclusion-exclusion to compute exact probabilities from potentially overlapping proofs.


Provenance Selection Guide

Quick Reference

Use CaseRecommended ProvenanceWhy
Standard DataLogUnitProvenanceNo overhead, classic semantics
Counting derivationsNaturalProvenanceTrack multiplicity
Fuzzy logic / confidenceMinMaxProbProvenanceSimple, efficient
Statistical reasoningAddMultProbProvenanceModels independence
Need explanationsTopKProofsProvenanceProvides proof trees
Exact probabilitiesProbProofsProvenanceComplete computation
ML integrationDiffTopKProofsProvenanceGradient support
Debugging proofsDiffTopKProofsDebugProvenanceFull observability

Performance Considerations

Computational cost (low to high):

  1. UnitProvenance - No overhead
  2. NaturalProvenance, MinMaxProbProvenance - Simple arithmetic
  3. AddMultProbProvenance - Inclusion-exclusion
  4. TopKProofsProvenance - WMC with top-K pruning
  5. ProbProofsProvenance - Full WMC (most expensive)
  6. DiffTopKProofsProvenance - WMC + gradient computation

Memory usage:

  • Unit, Boolean, Natural - Minimal (single value)
  • Probabilistic (f64) - 8 bytes per fact
  • Proof-based - Stores DNF formulas (can be large)

Next Steps

Resources

Rust Examples

This page provides an overview of the Rust example projects included with Scallop.

Location

All Rust examples are located in the repository at:

scallop/examples/rust/

Available Examples

The following 6 example projects demonstrate different aspects of the Scallop Rust API:

1. basic_datalog - Getting Started

What it demonstrates:

  • Creating an IntegrateContext
  • Adding Scallop programs
  • Running queries
  • Iterating over results

Difficulty: ⭐ Beginner

Run:

cd examples/rust/basic_datalog
cargo run

2. probabilistic_reasoning - Probabilistic Queries

What it demonstrates:

  • Using MinMaxProbProvenance
  • Adding facts with probabilities
  • Interpreting confidence scores
  • Probabilistic transitive closure

Difficulty: ⭐⭐ Intermediate

Run:

cd examples/rust/probabilistic_reasoning
cargo run

3. foreign_functions - Custom Rust Functions

What it demonstrates:

  • Implementing the ForeignFunction trait
  • String manipulation functions
  • Numeric operations
  • Registering functions with IntegrateContext

Difficulty: ⭐⭐ Intermediate

Run:

cd examples/rust/foreign_functions
cargo run

4. foreign_predicates - Fact Generators

What it demonstrates:

  • Implementing the ForeignPredicate trait
  • Binding patterns (bf, ff)
  • Generating multiple results
  • External data integration

Difficulty: ⭐⭐⭐ Advanced

Run:

cd examples/rust/foreign_predicates
cargo run

5. incremental_evaluation - Dynamic Updates

What it demonstrates:

  • Creating incremental contexts
  • Adding facts incrementally
  • Re-running after updates
  • Efficient incremental computation

Difficulty: ⭐⭐ Intermediate

Run:

cd examples/rust/incremental_evaluation
cargo run

6. complex_reasoning - Advanced Provenance

What it demonstrates:

  • TopKProofsProvenance for proof tracking
  • Extracting derivation proofs
  • Weighted Model Counting
  • Complex reasoning patterns

Difficulty: ⭐⭐⭐ Advanced

Run:

cd examples/rust/complex_reasoning
cargo run

Prerequisites

Nightly Rust Required:

rustup default nightly

Scallop requires nightly Rust due to unstable features:

  • min_specialization
  • extract_if
  • hash_extract_if
  • proc_macro_span

Building All Examples

From the examples/rust/ directory:

for example in basic_datalog probabilistic_reasoning foreign_functions foreign_predicates incremental_evaluation complex_reasoning; do
    echo "Building $example..."
    cd $example && cargo build && cd ..
done

Running All Examples

From the examples/rust/ directory:

for example in basic_datalog probabilistic_reasoning foreign_functions foreign_predicates incremental_evaluation complex_reasoning; do
    echo "=== Running $example ==="
    cd $example && cargo run && cd ..
    echo ""
done

Project Structure

Each example follows this structure:

example_name/
├── Cargo.toml          # Dependencies and project config
├── src/
│   └── main.rs         # Example code with comments
└── README.md           # Example-specific documentation

Common Patterns

Creating a Context

#![allow(unused)]
fn main() {
use scallop_core::integrate::*;
use scallop_core::runtime::provenance::discrete::unit::UnitProvenance;
use scallop_core::runtime::env::RcFamily;

let prov = UnitProvenance::default();
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
}

Adding a Program

#![allow(unused)]
fn main() {
ctx.add_program(r#"
    rel edge = {(0, 1), (1, 2)}
    rel path(a, b) = edge(a, b)
    query path
"#)?;
}

Running and Querying

#![allow(unused)]
fn main() {
ctx.run()?;

let results = ctx.computed_relation_ref("path")?;
for elem in results.iter() {
    println!("{:?}", elem.tuple);
}
}

Troubleshooting

Nightly Rust Error

Error:

error[E0658]: use of unstable library feature 'min_specialization'

Solution:

rustup default nightly

Missing scallop-core Dependency

Error:

error: failed to load manifest for dependency `scallop-core`

Solution: Update Cargo.toml with correct path:

[dependencies]
scallop-core = { path = "../../../core" }

Type Inference Issues

If you encounter type inference errors with generics, specify types explicitly:

#![allow(unused)]
fn main() {
// Instead of:
let mut ctx = IntegrateContext::new(prov);

// Use:
let mut ctx = IntegrateContext::<_, RcFamily>::new(prov);
}

Next Steps

After exploring the examples:

Contributing

Have an example you’d like to add? Contributions welcome!

  1. Create a new directory under examples/rust/
  2. Follow the standard project structure
  3. Add entry to this documentation
  4. Ensure it builds with cargo build
  5. Submit a pull request

Resources

Scallop with Foundation Models

Installing Scallop Plugins

This guide covers how to install and configure Scallop plugins for your environment.

Prerequisites

Before installing plugins, ensure you have:

1. Python Environment

Python 3.8 or higher:

python --version  # Should show Python 3.8+

Virtual environment (recommended):

# Create virtual environment
python -m venv scallop-env

# Activate it
source scallop-env/bin/activate  # On macOS/Linux
# or
scallop-env\Scripts\activate     # On Windows

2. Scallopy Installed

Plugins require the scallopy Python package:

# Install from PyPI
pip install scallopy

# Or install from source
git clone https://github.com/scallop-lang/scallop.git
cd scallop/etc/scallopy
pip install -e .

3. Additional Dependencies

Some plugins have specific requirements:

  • GPU Plugins (CLIP, SAM, Face Detection):

    • CUDA-compatible GPU (optional but recommended)
    • PyTorch with CUDA support
  • API-based Plugins (GPT, Gemini):

    • API keys (see Configuration section)
  • CodeQL Plugin:

    • GitHub CodeQL CLI installed separately

Installing Plugins

Method 1: Install All Plugins (Easiest)

From the Scallop repository root:

cd /path/to/scallop
make -C etc/scallopy-plugins develop

This installs all 11 plugins in development mode, allowing you to modify source code without reinstalling.

Method 2: Install Specific Plugins

Install individual plugin with make:

make -C etc/scallopy-plugins develop-gpt
make -C etc/scallopy-plugins develop-clip
make -C etc/scallopy-plugins develop-gpu

Or install directly with pip:

cd /path/to/scallop/etc/scallopy-plugins/gpt
pip install -e .

cd ../clip
pip install -e .

Method 3: Build and Install Wheels

For production environments, build wheel files:

# Build all plugins
make -C etc/scallopy-plugins build

# Install from wheels
make -C etc/scallopy-plugins install

# Or install specific wheel
pip install etc/scallopy-plugins/gpt/dist/scallop-gpt-*.whl

Development vs Production Installation

Installation TypeCommandUse CaseEditable
Developmentmake develop or pip install -e .Active development, testingYes
Productionmake install or pip install dist/*.whlDeployment, distributionNo

Development mode creates a symlink - changes to source code take effect immediately. Production mode copies files - requires reinstall after changes.

Configuration

Environment Variables

Many plugins require configuration via environment variables:

GPT Plugin

export OPENAI_API_KEY="sk-..."

Get an API key: https://platform.openai.com/api-keys

Gemini Plugin

export GEMINI_API_KEY="your-api-key-here"

Get an API key: https://aistudio.google.com/app/apikey

CodeQL Plugin

export CODEQL_PATH="/path/to/codeql-cli"

Install CodeQL: https://github.com/github/codeql-cli-binaries

Persistent Configuration

Add environment variables to your shell profile:

Bash/Zsh (~/.bashrc or ~/.zshrc):

export OPENAI_API_KEY="sk-..."
export GEMINI_API_KEY="your-key..."
export CODEQL_PATH="/usr/local/bin/codeql"

Or use a .env file:

# .env
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=your-key...

# Load with
source .env

Command-Line Arguments

Most plugins support configuration via CLI arguments:

# GPT configuration
scli program.scl \
  --openai-gpt-model gpt-4 \
  --openai-gpt-temperature 0.0 \
  --num-allowed-openai-request 50

# GPU configuration
scli program.scl --cuda --gpu 0

# CLIP configuration
scli program.scl --clip-model-checkpoint ViT-L/14

List available arguments:

scli --help

Python Configuration

When using the Python API, configure plugins programmatically:

import scallopy

# Create context
ctx = scallopy.ScallopContext(provenance="minmaxprob")

# Create plugin registry
plugin_registry = scallopy.PluginRegistry(load_stdlib=True)

# Load plugins from installed packages
plugin_registry.load_plugins_from_entry_points()

# Configure with arguments
plugin_registry.configure({
    "openai_gpt_model": "gpt-4",
    "num_allowed_openai_request": 50,
    "cuda": True,
    "gpu": 0
}, [])

# Load into context
plugin_registry.load_into_ctx(ctx)

Verification

Check Installed Plugins

From command line:

python -c "import scallopy; registry = scallopy.PluginRegistry(); registry.load_plugins_from_entry_points(); print(registry.loaded_plugins())"

Expected output:

['gpt', 'gemini', 'clip', 'sam', 'transformers', 'opencv', 'face-detection', 'plip', 'codeql', 'gpu']

In Python:

import scallopy

registry = scallopy.PluginRegistry()
registry.load_plugins_from_entry_points()

print("Loaded plugins:", registry.loaded_plugins())

Test a Plugin

Test GPT plugin:

import os
import scallopy

# Set API key
os.environ["OPENAI_API_KEY"] = "sk-..."

# Create context with GPT plugin
ctx = scallopy.ScallopContext()
registry = scallopy.PluginRegistry()
registry.load_plugins_from_entry_points()
registry.configure({}, [])
registry.load_into_ctx(ctx)

# Run simple test
ctx.add_program("""
  rel question = {"What is 2+2?"}
  rel answer(q, a) = question(q), gpt(q, a)
  query answer
""")
ctx.run()

result = list(ctx.relation("answer"))
print("GPT response:", result)

Test CLIP plugin:

import scallopy

ctx = scallopy.ScallopContext()
registry = scallopy.PluginRegistry()
registry.load_plugins_from_entry_points()
registry.configure({}, [])
registry.load_into_ctx(ctx)

# This will trigger model download on first run
ctx.add_program("""
  @clip(labels=["cat", "dog"])
  rel classify(img: Tensor, label: String)

  rel test_img = {$load_image("test.jpg")}
  rel result(img, label) = test_img(img), classify(img, label)
  query result
""")
ctx.run()

Troubleshooting

Plugin Not Found

Error: Plugin 'gpt' not found

Solution: Install the plugin

make -C etc/scallopy-plugins develop-gpt

Import Error

Error: ModuleNotFoundError: No module named 'scallopy'

Solution: Install scallopy first

pip install scallopy

API Key Not Set

Error: OpenAI API key not found

Solution: Set environment variable

export OPENAI_API_KEY="sk-..."

CUDA Out of Memory

Error: RuntimeError: CUDA out of memory

Solution: Use CPU or smaller batch sizes

scli program.scl  # Use CPU (default)
# or reduce model size
scli program.scl --clip-model-checkpoint ViT-B/32  # Smaller model

Model Download Fails

Error: Failed to download model checkpoint

Solution: Check internet connection or download manually

# CLIP models are cached in ~/.cache/clip/
# SAM models in ~/.cache/torch/hub/
# HuggingFace models in ~/.cache/huggingface/

Next Steps

For more help, see the Plugin Reference page.

OpenAI GPT Plugin

The OpenAI GPT plugin integrates GPT-3.5 and GPT-4 language models into Scallop, enabling LLM-powered text processing, classification, information extraction, and generation within logical programs.

Overview

The GPT plugin provides four foreign constructs for different use cases:

  1. $gpt(prompt) - Foreign function for simple text generation
  2. gpt(input, output) - Foreign predicate for flexible fact generation
  3. @gpt - Foreign attribute for few-shot classification/extraction
  4. @gpt_extract_info - Foreign attribute for structured information extraction

Use Cases

  • Text classification: Sentiment analysis, intent detection, category labeling
  • Information extraction: Named entity recognition, relation extraction
  • Text generation: Question answering, translation, summarization
  • Few-shot learning: Provide examples, get consistent results
  • Probabilistic reasoning: Combine LLM outputs with logical rules

Model Support

ModelDescriptionUse Case
gpt-3.5-turboFast, cost-effectiveGeneral classification, extraction
gpt-4Most capableComplex reasoning, nuanced understanding
gpt-4-turboLatest GPT-4Balanced cost and capability

Setup and Configuration

Installation

# Install GPT plugin
cd /path/to/scallop
make -C etc/scallopy-plugins develop-gpt

# Or install with pip
cd etc/scallopy-plugins/gpt
pip install -e .

API Key Configuration

The plugin requires an OpenAI API key:

Set environment variable:

export OPENAI_API_KEY="sk-..."

Get an API key:

  1. Visit https://platform.openai.com/api-keys
  2. Sign up or log in
  3. Create a new secret key
  4. Copy and save it securely

Verify configuration:

echo $OPENAI_API_KEY
# Should print: sk-...

Command-Line Options

Configure the plugin when running Scallop programs:

scli program.scl \
  --openai-gpt-model gpt-4 \
  --openai-gpt-temperature 0.0 \
  --num-allowed-openai-request 50

Options:

OptionTypeDefaultDescription
--openai-gpt-modelstringgpt-3.5-turboGPT model to use
--openai-gpt-temperaturefloat0.0Sampling temperature (0.0 = deterministic)
--num-allowed-openai-requestint100Maximum API calls per run

Python API Configuration

import scallopy

# Create context
ctx = scallopy.ScallopContext()

# Load plugin registry
plugin_registry = scallopy.PluginRegistry()
plugin_registry.load_plugins_from_entry_points()

# Configure GPT plugin
plugin_registry.configure({
    "openai_gpt_model": "gpt-4",
    "openai_gpt_temperature": 0.0,
    "num_allowed_openai_request": 50
}, [])

# Load into context
plugin_registry.load_into_ctx(ctx)

# Now use GPT constructs in your program

Rate Limiting

The plugin automatically limits API calls to prevent runaway costs:

  • Default: 100 requests per run
  • Customize with --num-allowed-openai-request
  • Exceeding limit raises exception
  • Memoization reduces redundant calls

Foreign Function: $gpt

The $gpt foreign function provides simple text-to-text generation:

Signature

$gpt(prompt: String) -> String

Usage

Direct invocation:

rel questions = {
  "What is the capital of France?",
  "Translate to Spanish: Good morning",
  "What is 15 * 24?"
}

rel answers(q) = questions(q), a = $gpt(q)
query answers

Expected output (mock when API key not set):

answers: {
  ("Paris"),
  ("Buenos días"),
  ("360")
}

In expressions:

rel prompts = {"Explain quantum computing in one sentence"}
rel response = {$gpt(p) | prompts(p)}
query response

Memoization

The function automatically memoizes results:

rel repeated = {
  "What is 2+2?",
  "What is 2+2?",  // Same prompt
  "What is 2+2?"   // Same prompt
}

rel answers = {$gpt(q) | repeated(q)}
// Only makes 1 API call!

Error Handling

If API key is not set or rate limit exceeded:

[scallop_openai] `OPENAI_API_KEY` not found, consider setting it in the environment variable

Foreign Predicate: gpt(…)

The gpt foreign predicate provides flexible fact generation with multiple calling patterns:

Signature

gpt(input: String, output: String)

Calling Patterns

Bound-Free (bf) - Generation:

rel questions = {"What is the capital of Spain?", "What is 10 + 5?"}
rel qa(q, a) = questions(q), gpt(q, a)
query qa

// Result: {
//   ("What is the capital of Spain?", "Madrid"),
//   ("What is 10 + 5?", "15")
// }

Bound-Bound (bb) - Verification:

rel candidate_answers = {
  ("What is 2+2?", "4"),
  ("What is 2+2?", "5"),
  ("What is 2+2?", "22")
}

rel verified(q, a) = candidate_answers(q, a), gpt(q, a)
query verified

// Result: {("What is 2+2?", "4")}
// Only the correct answer passes verification

Multiple Outputs

GPT can return multiple completions (requires API configuration):

// With n=3 in API call
rel question = {"What are some programming languages?"}
rel languages(q, lang) = question(q), gpt(q, lang)

// Result (multiple facts from one input):
// languages: {
//   ("What are some programming languages?", "Python, Java, C++"),
//   ("What are some programming languages?", "JavaScript, Ruby, Go"),
//   ("What are some programming languages?", "Rust, Swift, Kotlin")
// }

Foreign Attribute: @gpt

The @gpt attribute provides few-shot classification and extraction with prompt engineering:

Syntax

@gpt(
  header: String,
  prompts: List[{key: value, ...}],
  model: String = "gpt-3.5-turbo",
  debug: bool = false
)
rel relation_name(input1: String, ..., inputN: String, output1: String, ..., outputM: String)

Parameters

ParameterTypeDescription
headerStringInstruction/context for the task
promptsList[{…}]Few-shot examples (input → output)
modelStringGPT model to use (default: gpt-3.5-turbo)
debugboolPrint prompts and responses (default: false)

Example: Sentiment Classification

@gpt(
  header="Classify the sentiment of the following text:",
  prompts=[
    {text: "I love this product!", sentiment: "positive"},
    {text: "This is terrible.", sentiment: "negative"},
    {text: "It's okay, nothing special.", sentiment: "neutral"}
  ],
  model="gpt-3.5-turbo"
)
rel classify_sentiment(text: String, sentiment: String)

rel reviews = {
  "Amazing quality and fast shipping!",
  "Worst purchase ever.",
  "Not bad, could be better.",
  "Absolutely fantastic!"
}

rel results(review, sent) = reviews(review), classify_sentiment(review, sent)
query results

Expected output (mock when API key not set):

results: {
  ("Amazing quality and fast shipping!", "positive"),
  ("Worst purchase ever.", "negative"),
  ("Not bad, could be better.", "neutral"),
  ("Absolutely fantastic!", "positive")
}

Example: Named Entity Extraction

@gpt(
  header="Extract the person's name from the text:",
  prompts=[
    {text: "John went to the store", name: "John"},
    {text: "Mary likes apples", name: "Mary"},
    {text: "Dr. Smith gave a lecture", name: "Dr. Smith"}
  ]
)
rel extract_name(text: String, name: String)

rel sentences = {
  "Alice bought a new car",
  "Bob and Charlie went fishing",
  "Professor Johnson teaches physics"
}

rel people(sentence, person) = sentences(sentence), extract_name(sentence, person)
query people

Expected output (mock when API key not set):

people: {
  ("Alice bought a new car", "Alice"),
  ("Bob and Charlie went fishing", "Bob, Charlie"),
  ("Professor Johnson teaches physics", "Professor Johnson")
}

Example: Multi-Input Pattern

@gpt(
  header="Determine the relationship between two people:",
  prompts=[
    {person1: "Alice", person2: "Bob", relation: "parent"},
    {person1: "Carol", person2: "Dave", relation: "sibling"}
  ]
)
rel infer_relation(person1: String, person2: String, relation: String)

rel pairs = {
  ("John", "Mary"),
  ("Sarah", "Tom"),
  ("Emily", "Emma")
}

rel relationships(p1, p2, rel) = pairs(p1, p2), infer_relation(p1, p2, rel)
query relationships

How It Works

  1. Pattern detection: Analyzes relation signature to determine bound/free variables
  2. Prompt construction: Builds prompt with header + examples + user input
  3. API call: Sends to OpenAI with configured model and temperature
  4. Response parsing: Extracts answer and yields as Scallop fact
  5. Memoization: Caches results to avoid redundant API calls

Foreign Attribute: @gpt_extract_info

The @gpt_extract_info attribute provides structured information extraction with JSON output:

Syntax

@gpt_extract_info(
  header: String,
  prompts: List[String],
  examples: List[(List[String], List[List[Tuple[...]]])],
  model: String = "gpt-3.5-turbo",
  cot: List[bool] = None,
  debug: bool = false
)
rel relation1(input1: String, ..., output1: String, ...)
rel relation2(input1: String, ..., output2: String, ...)
...

Parameters

ParameterTypeDescription
headerStringTask instruction
promptsList[String]One prompt per relation
examplesList[(inputs, outputs)]Few-shot examples with expected JSON
modelStringGPT model (default: gpt-3.5-turbo)
cotList[bool]Chain-of-thought per relation (default: None)
debugboolPrint debugging info (default: false)

Example: Multi-Relation Extraction

@gpt_extract_info(
  header="Extract entities and their properties from the text:",
  prompts=[
    "Extract all people mentioned",
    "Extract all locations mentioned",
    "Extract all organizations mentioned"
  ],
  examples=[
    // (inputs, [people_output, location_output, org_output])
    (
      ["Alice works at Google in Mountain View."],
      [
        [("Alice",)],              // people
        [("Mountain View",)],      // locations
        [("Google",)]              // organizations
      ]
    ),
    (
      ["Bob visited Microsoft headquarters in Redmond."],
      [
        [("Bob",)],
        [("Redmond",)],
        [("Microsoft",)]
      ]
    )
  ]
)
rel person(text: String, name: String)
rel location(text: String, place: String)
rel organization(text: String, org: String)

rel texts = {
  "Sarah joined Apple in Cupertino last year.",
  "The meeting with Amazon was held in Seattle."
}

rel all_people(t, p) = texts(t), person(t, p)
rel all_locations(t, l) = texts(t), location(t, l)
rel all_orgs(t, o) = texts(t), organization(t, o)

query all_people
query all_locations
query all_orgs

Expected output (mock when API key not set):

all_people: {
  ("Sarah joined Apple in Cupertino last year.", "Sarah")
}
all_locations: {
  ("Sarah joined Apple in Cupertino last year.", "Cupertino"),
  ("The meeting with Amazon was held in Seattle.", "Seattle")
}
all_orgs: {
  ("Sarah joined Apple in Cupertino last year.", "Apple"),
  ("The meeting with Amazon was held in Seattle.", "Amazon")
}

JSON Output Format

GPT returns JSON like:

{
  "person": [{"name": "Sarah"}],
  "location": [{"place": "Cupertino"}],
  "organization": [{"org": "Apple"}]
}

The plugin automatically:

  • Parses JSON response
  • Maps to declared relations
  • Yields facts with correct types

Best Practices

Temperature Settings

TemperatureBehaviorUse Case
0.0DeterministicClassification, extraction
0.3-0.5Slightly variedCreative text generation
0.7-1.0Very creativeBrainstorming, diverse outputs

Recommendation: Use 0.0 for classification/extraction to ensure consistent results.

Prompt Engineering

✓ Good prompts:

  • Clear, specific instructions
  • 2-5 few-shot examples
  • Consistent formatting
  • Edge cases covered

✗ Bad prompts:

  • Vague instructions
  • No examples
  • Inconsistent formats
  • Missing edge cases

Cost Management

Minimize API calls:

  • Use memoization (automatic)
  • Set --num-allowed-openai-request limit
  • Use gpt-3.5-turbo for simple tasks
  • Batch similar queries

Example:

# Limit to 20 requests for testing
scli program.scl --num-allowed-openai-request 20

Debugging

Enable debug mode to see prompts and responses:

@gpt(
  header="Classify:",
  prompts=[...],
  debug=true  // Enable debugging
)
rel classify(text: String, label: String)

Output:

Prompt: Classify:
Example: {text: "I love this", label: "positive"}
Now classify: {text: "This is great"}
Responses: ["positive"]

Troubleshooting

API Key Not Found

Error:

[scallop_openai] `OPENAI_API_KEY` not found, consider setting it in the environment variable

Solution:

export OPENAI_API_KEY="sk-..."
# Verify
echo $OPENAI_API_KEY

Rate Limit Exceeded

Error:

Exceeding allowed number of requests

Solution:

# Increase limit
scli program.scl --num-allowed-openai-request 200

Incorrect Response Format

If GPT returns unexpected format:

  1. Check prompts: Ensure examples are clear and consistent
  2. Lower temperature: Use 0.0 for deterministic output
  3. Add more examples: 3-5 examples usually work best
  4. Use debug=true: See actual prompts and responses

Model Not Available

If model not accessible:

# Fall back to gpt-3.5-turbo
scli program.scl --openai-gpt-model gpt-3.5-turbo

Next Steps

For API details and pricing, see the OpenAI API Documentation.

Text Embeddings

Text embeddings are vector representations of text that capture semantic meaning. Scallop integrates with various text embedding models to combine neural language understanding with symbolic reasoning.

Overview

Text embeddings enable Scallop to:

  • Match natural language descriptions to structured data
  • Perform semantic similarity comparisons
  • Bridge neural text understanding with logical reasoning
  • Handle multi-modal tasks (text + vision, text + video)

Integration Pattern

Text embeddings are typically provided as input relations to Scallop programs:

import scallopy
from transformers import AutoTokenizer, AutoModel

# Create embedding model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModel.from_pretrained("distilbert-base-uncased")

# Create Scallop context
ctx = scallopy.ScallopContext()

# Define input relation for text embeddings
ctx.add_relation("text_embedding", (int, str, list))

# Process text and add embeddings
text = "example description"
embedding = get_embedding(text)  # Get embedding vector
ctx.add_facts("text_embedding", [(0, text, embedding)])

# Add reasoning rules
ctx.add_rule("match(id) = text_embedding(id, text, emb), similarity(emb, target) > 0.8")

Example: Video-Text Matching (Mugen Dataset)

This example demonstrates using text embeddings with video action recognition to match natural language descriptions to video content.

Neural Components

  • Text Embedding: DistilBERT for text description encoding
  • Vision Embedding: S3D for video frame encoding
  • MLP: 2-layer network (hidden size 256) for feature fusion

Scallop Program

// Input from neural networks
type action(usize, String)        // Video actions detected
type expr(usize, String)          // Text expressions from description
type expr_start(usize)            // Start of text expression
type expr_end(usize)              // End of text expression
type action_start(usize)          // Start of video action
type action_end(usize)            // End of video action

type match_single(usize, usize, usize)      // Single action-expression match
type match_sub(usize, usize, usize, usize)  // Subsequence match

// Check whether a text expression matches a video action
rel match_single(tid, vid, vid + 1) =
    expr(tid, a),
    action(vid, a)

// Match a single text expression to video subsequence
rel match_sub(tid, tid, vid_start, vid_end) =
    match_single(tid, vid_start, vid_end)

rel match_sub(tid, tid, vid_start, vid_end) =
    match_sub(tid, tid, vid_start, vid_mid),
    match_single(tid, vid_mid, vid_end)

// Match a sequence of text expressions to video subsequence
rel match_sub(tid_start, tid_end, vid_start, vid_end) =
    match_sub(tid_start, tid_end - 1, vid_start, vid_mid),
    match_single(tid_end, vid_mid, vid_end)

// Check whether the whole text specification matches the video
rel match() =
    expr_start(tid_start),
    expr_end(tid_end),
    action_start(vid_start),
    action_end(vid_end),
    match_sub(tid_start, tid_end, vid_start, vid_end)

// Integrity constraint: detect too many consecutive identical expressions
rel too_many_consecutive_expr() =
    expr(tid, a),
    expr(tid + 1, a),
    expr(tid + 2, a),
    expr(tid + 3, a)

Training Configuration

  • Dataset: 1K Mugen video-text pairs (training), 1K (testing)
  • Training: 1000 epochs, learning rate 0.0001, batch size 3
  • Loss: BCE-loss for end-to-end training
  • Neural-Symbolic Integration: Embeddings flow into Scallop’s logical reasoning

Key Insights

  1. Structured Matching: Logical rules enforce alignment between text sequence and video sequence
  2. Compositional Reasoning: Text expressions can match video action subsequences
  3. Constraint Enforcement: Integrity constraints detect anomalies (repeated expressions)
  4. Differentiable: Entire pipeline is trainable end-to-end

Common Text Embedding Models

Transformer-based

  • BERT (bert-base-uncased) - General-purpose text understanding
  • DistilBERT (distilbert-base-uncased) - Faster, lighter BERT variant
  • RoBERTa (roberta-base) - Robustly optimized BERT
  • T5 (t5-base) - Text-to-text transformer

Sentence Embeddings

  • Sentence-BERT (sentence-transformers) - Optimized for sentence similarity
  • MPNet - Strong general-purpose sentence embeddings
  • Universal Sentence Encoder - Google’s multilingual embeddings

Domain-Specific

  • BioBERT - Biomedical text
  • SciBERT - Scientific literature
  • CodeBERT - Source code

Integration with Scallop Plugins

The scallop-ext plugin system provides built-in support for text embeddings:

import scallopy

# Use OpenAI embeddings
ctx = scallopy.ScallopContext()
ctx.import_plugin("openai_gpt")

# Text similarity using embeddings
ctx.add_rule("""
  rel similar_docs(d1, d2) =
    document(d1, text1),
    document(d2, text2),
    $openai_text_similarity(text1, text2) > 0.85
""")

Best Practices

  1. Normalize embeddings - Use L2 normalization for cosine similarity
  2. Cache embeddings - Compute once, reuse for multiple queries
  3. Batch processing - Embed multiple texts together for efficiency
  4. Threshold tuning - Adjust similarity thresholds for your domain
  5. Hybrid approaches - Combine embeddings with symbolic rules for robustness

Example Use Cases

  • Document retrieval - Semantic search over document collections
  • Text classification - Combine neural embeddings with logical rules
  • Named entity resolution - Match entities using semantic similarity
  • Multi-modal reasoning - Align text with images/video using embeddings
  • Question answering - Match questions to answers semantically

References


For more examples of using embeddings with Scallop, see the OpenAI GPT and Transformers integration guides.

Processing Image Data

Vision Models

Scallop Plugins

Scallop plugins extend the language with external capabilities like large language models, vision processing, code analysis, and more. This guide explains how the plugin system works and what plugins are available.

What are Scallop Plugins?

Scallop plugins are Python packages that extend Scallop’s capabilities by registering foreign functions, foreign predicates, and foreign attributes into the Scallop runtime. They allow you to:

  • Integrate external APIs - Connect to OpenAI GPT, Google Gemini, etc.
  • Process vision and images - Use CLIP, SAM, face detection models
  • Analyze code - Integrate GitHub CodeQL for static analysis
  • Configure execution - Manage GPU/CPU device selection
  • Create domain-specific constructs - Build specialized reasoning tools

Plugins work seamlessly with Scallop’s probabilistic reasoning, provenance tracking, and logical inference capabilities.

Three Extension Mechanisms

Plugins extend Scallop through three primary mechanisms:

  1. Foreign Functions - Pure computations called with $function_name(args)

    rel result = {$gpt("Translate to French: Hello")}
    
  2. Foreign Predicates - Generate facts with bound/free variable patterns

    rel answer(q, a) = question(q), gpt(q, a)
    
  3. Foreign Attributes - Metaprogramming decorators like @gpt, @clip

    @gpt(header="Classify sentiment:", prompts=[...])
    rel classify_sentiment(text: String, sentiment: String)
    

Plugin Architecture

The Three-Hook Lifecycle

Every Scallop plugin implements three lifecycle hooks:

1. setup_argparse(parser) - Declare Command-Line Arguments

def setup_argparse(self, parser):
    parser.add_argument("--gpt-model", type=str, default="gpt-3.5-turbo")
    parser.add_argument("--num-allowed-openai-request", type=int, default=100)

2. configure(args, unknown_args) - Initialize Plugin State

def configure(self, args, unknown_args):
    import os
    self.api_key = os.getenv("OPENAI_API_KEY")
    self.model = args["gpt_model"]

3. load_into_ctx(ctx) - Register Extensions

def load_into_ctx(self, ctx):
    ctx.register_foreign_function(my_function)
    ctx.register_foreign_predicate(my_predicate)
    ctx.register_foreign_attribute(my_attribute)

Plugin Discovery

Plugins are discovered automatically via Python entry points:

# pyproject.toml
[project.entry-points."scallop.plugin"]
gpt = "scallop_gpt:ScallopGPTPlugin"

When you run a Scallop program, the plugin registry:

  1. Discovers all installed plugins via entry points
  2. Calls setup_argparse() to gather CLI arguments
  3. Parses command-line arguments
  4. Calls configure() to initialize plugins
  5. Calls load_into_ctx() to register extensions
  6. Runs your Scallop program with all extensions available

Using Plugins in Python

import scallopy

# Create context and plugin registry
ctx = scallopy.ScallopContext(provenance="minmaxprob")
plugin_registry = scallopy.PluginRegistry(load_stdlib=True)

# Load plugins from installed packages
plugin_registry.load_plugins_from_entry_points()

# Configure plugins with arguments
plugin_registry.configure({"gpt_model": "gpt-4"}, [])

# Load plugins into context
plugin_registry.load_into_ctx(ctx)

# Now use plugins in your program
ctx.add_program("""
  rel question = {"What is the capital of France?"}
  rel answer(q, a) = question(q), gpt(q, a)
  query answer
""")
ctx.run()

Using Plugins with CLI

# Plugins are automatically loaded when using scli
scli program.scl --gpt-model gpt-4 --num-allowed-openai-request 10

# Set API keys via environment variables
export OPENAI_API_KEY="sk-..."
scli program.scl

Available Plugins

Scallop provides 11 plugins across 4 categories:

Language Models (API-based)

PluginDescriptionAPI Key Required
GPTOpenAI GPT-3.5/4 integration for text generation, extraction, classificationYes (OPENAI_API_KEY)
GeminiGoogle Gemini 2.0 integration with similar capabilities to GPTYes (GEMINI_API_KEY)

Use cases: Sentiment analysis, information extraction, text classification, question answering

Vision Models (Local)

PluginDescriptionModel Download
CLIPOpenAI CLIP for zero-shot image classificationAuto-download
SAMMeta’s Segment Anything Model for image segmentationAuto-download (~2.5GB)
Face DetectionDSFD-based face localization and croppingAuto-download
OWL-ViTOpen-vocabulary object detection via text queriesAuto-download

Use cases: Image classification, object detection, segmentation, face recognition

Utilities

PluginDescriptionPurpose
GPUDevice management for CUDA/CPU selectionConfigure execution device globally
OpenCVImage I/O and manipulation (load, save, crop, transform)Image processing pipelines

Use cases: Load/save images, crop regions, GPU acceleration

Specialized

PluginDescriptionDomain
TransformersHuggingFace models: ViLT (VQA), RoBERTa (text encoding)Multi-modal AI
PLIPProtein-ligand interaction prediction (fine-tuned CLIP)Scientific computing
CodeQLGitHub CodeQL integration for static code analysisSoftware engineering

Use cases: Visual question answering, protein analysis, vulnerability detection

Getting Started

Quick Example: Using GPT Plugin

1. Install the plugin

cd /path/to/scallop
make -C etc/scallopy-plugins develop-gpt

2. Set API key

export OPENAI_API_KEY="sk-..."

3. Create a Scallop program

// sentiment.scl
@gpt(
  header="Classify the sentiment:",
  prompts=[
    {text: "I love this!", sentiment: "positive"},
    {text: "This is terrible.", sentiment: "negative"}
  ]
)
rel classify_sentiment(text: String, sentiment: String)

rel reviews = {
  "Amazing product!",
  "Worst purchase ever.",
  "It's okay."
}

rel result(review, sent) = reviews(review), classify_sentiment(review, sent)
query result

4. Run it

scli sentiment.scl

Expected output:

result: {
  ("Amazing product!", "positive"),
  ("Worst purchase ever.", "negative"),
  ("It's okay.", "neutral")
}

Quick Example: Vision with CLIP

1. Install plugin

make -C etc/scallopy-plugins develop-clip

2. Create program

// classify_images.scl
@clip(labels=["cat", "dog", "car", "person"])
rel classify(img: Tensor, label: String)

rel images = {$load_image("photo1.jpg"), $load_image("photo2.jpg")}
rel result(img, label) = images(img), classify(img, label)
query result

3. Run it

scli classify_images.scl --cuda  # Use GPU if available

Common Workflows

Workflow 1: LLM-Powered Classification

// 1. Define input data
rel documents = {
  "This product exceeded expectations.",
  "Delivery was slow and frustrating.",
  "Average quality for the price."
}

// 2. Use @gpt attribute for classification
@gpt(
  header="Classify as positive/negative/neutral:",
  prompts=[{text: "Great!", label: "positive"}]
)
rel classify(text: String, label: String)

// 3. Apply classification
rel classification(doc, label) = documents(doc), classify(doc, label)

// 4. Aggregate results
rel positive_count(n) = n = count(doc: classification(doc, "positive"))
rel negative_count(n) = n = count(doc: classification(doc, "negative"))

query positive_count
query negative_count

Workflow 2: Vision Pipeline

// 1. Load images
rel image_paths = {"img1.jpg", "img2.jpg", "img3.jpg"}
rel loaded(path, img) = image_paths(path), img = $load_image(path)

// 2. Classify with CLIP
@clip(labels=["indoor", "outdoor"], score_threshold=0.7)
rel classify_scene(img: Tensor, scene: String)

rel scenes(path, scene) = loaded(path, img), classify_scene(img, scene)

// 3. Filter and analyze
rel outdoor_images(path) = scenes(path, "outdoor")

query outdoor_images

Workflow 3: Multi-Plugin Integration

// Combine GPT + Transformers
rel questions = {"What color is the sky?", "How many people?"}

// Use ViLT for visual QA
@vilt(top=3)
rel visual_answer(img: Tensor, q: String, a: String)

// Use GPT to refine answers
@gpt(header="Summarize answer:")
rel refine(raw_answer: String, summary: String)

rel image = {$load_image("scene.jpg")}
rel raw_answers(q, a) = questions(q), image(img), visual_answer(img, q, a)
rel final_answers(q, s) = raw_answers(q, a), refine(a, s)

query final_answers

Documentation Roadmap

Next Steps

  1. Install a plugin - Start with Installation Guide
  2. Try an example - Pick a plugin from the list above and follow its guide
  3. Combine plugins - Use multiple plugins together for complex reasoning
  4. Create your own - Follow the Plugin Development Guide

For questions or issues, see the References page for troubleshooting tips.

Create Your Own Plugin

This guide walks through creating a custom Scallop plugin from scratch. You’ll learn the plugin development workflow, implement foreign constructs, and package your plugin for distribution.

Overview

Creating a Scallop plugin involves:

  1. Project setup - Directory structure and configuration
  2. Plugin class - Implement three lifecycle hooks
  3. Foreign constructs - Add functions, predicates, or attributes
  4. Testing - Verify functionality locally
  5. Distribution - Package and share your plugin

Prerequisites

  • Python 3.8+
  • scallopy installed (pip install scallopy)
  • Basic understanding of Scallop syntax
  • Familiarity with Python decorators

Complete Tutorial: Weather Plugin

We’ll build a plugin that fetches weather data from an API and makes it available in Scallop programs.

Step 1: Project Structure

Create the following directory structure:

scallop-weather/
├── pyproject.toml
├── README.md
└── src/
    └── scallop_weather/
        ├── __init__.py
        └── plugin.py

Create the project directory:

mkdir -p scallop-weather/src/scallop_weather
cd scallop-weather

Step 2: Configuration File

File: pyproject.toml

[project]
name = "scallop-weather"
version = "0.1.0"
description = "Weather data integration for Scallop"
authors = [{name = "Your Name", email = "you@example.com"}]
readme = "README.md"
requires-python = ">=3.8"
dependencies = [
    "scallopy>=0.1.0",
    "requests>=2.28.0",
]

[project.entry-points."scallop.plugin"]
weather = "scallop_weather:ScallopWeatherPlugin"

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
where = ["src"]

Key elements:

  • dependencies - Required packages (scallopy, requests for API calls)
  • project.entry-points."scallop.plugin" - Registers plugin for auto-discovery
  • Entry point format: plugin_name = "module:PluginClass"

Step 3: Plugin Implementation

File: src/scallop_weather/__init__.py

from .plugin import ScallopWeatherPlugin

__all__ = ["ScallopWeatherPlugin"]

File: src/scallop_weather/plugin.py

import os
import requests
import scallopy
from scallopy import foreign_function, foreign_predicate, Facts
from typing import Tuple, Optional

class ScallopWeatherPlugin(scallopy.Plugin):
    """Plugin for fetching weather data in Scallop programs."""

    def __init__(self):
        super().__init__("weather")
        self._api_key: Optional[str] = None
        self._base_url = "https://api.openweathermap.org/data/2.5/weather"
        self._cache = {}  # Memoization cache

    def setup_argparse(self, parser):
        """Hook 1: Declare command-line arguments."""
        parser.add_argument(
            "--weather-api-key",
            type=str,
            help="OpenWeatherMap API key"
        )
        parser.add_argument(
            "--weather-units",
            type=str,
            default="metric",
            choices=["metric", "imperial"],
            help="Temperature units (metric=Celsius, imperial=Fahrenheit)"
        )

    def configure(self, args, unknown_args):
        """Hook 2: Initialize plugin state from arguments."""
        # Get API key from args or environment
        self._api_key = args.get("weather_api_key") or os.getenv("WEATHER_API_KEY")
        self._units = args.get("weather_units", "metric")

        if not self._api_key:
            print("[scallop-weather] Warning: No API key provided.")
            print("  Set WEATHER_API_KEY environment variable or use --weather-api-key")
            print("  Using mock data for demonstrations.")

    def load_into_ctx(self, ctx):
        """Hook 3: Register foreign constructs with Scallop context."""

        # Foreign function: Simple temperature lookup
        @foreign_function(name="get_temperature")
        def get_temperature(city: str) -> float:
            """Get current temperature for a city."""
            if not self._api_key:
                # Mock data when no API key
                mock_temps = {"London": 15.5, "Paris": 18.2, "Tokyo": 22.0}
                return mock_temps.get(city, 20.0)

            # Check cache first
            cache_key = f"temp_{city}"
            if cache_key in self._cache:
                return self._cache[cache_key]

            try:
                response = requests.get(
                    self._base_url,
                    params={
                        "q": city,
                        "appid": self._api_key,
                        "units": self._units
                    },
                    timeout=5
                )
                response.raise_for_status()
                data = response.json()
                temp = data["main"]["temp"]

                # Cache result
                self._cache[cache_key] = temp
                return temp

            except Exception as e:
                print(f"[scallop-weather] Error fetching temperature for {city}: {e}")
                return 0.0

        # Foreign predicate: Full weather data with multiple results
        @foreign_predicate(
            name="weather",
            input_arg_types=[str],
            output_arg_types=[str, float, int]
        )
        def weather_data(city: str) -> Facts[float, Tuple[str, float, int]]:
            """Get weather condition, temperature, and humidity."""
            if not self._api_key:
                # Mock data when no API key
                mock_data = {
                    "London": [("cloudy", 15.5, 72)],
                    "Paris": [("sunny", 18.2, 45), ("partly cloudy", 18.0, 50)],
                    "Tokyo": [("rainy", 22.0, 85)]
                }
                results = mock_data.get(city, [("clear", 20.0, 50)])
                for condition, temp, humidity in results:
                    yield (1.0, (condition, temp, humidity))
                return

            # Check cache
            cache_key = f"weather_{city}"
            if cache_key in self._cache:
                condition, temp, humidity = self._cache[cache_key]
                yield (1.0, (condition, temp, humidity))
                return

            try:
                response = requests.get(
                    self._base_url,
                    params={
                        "q": city,
                        "appid": self._api_key,
                        "units": self._units
                    },
                    timeout=5
                )
                response.raise_for_status()
                data = response.json()

                condition = data["weather"][0]["description"]
                temp = data["main"]["temp"]
                humidity = data["main"]["humidity"]

                # Cache result
                self._cache[cache_key] = (condition, temp, humidity)

                # Yield with probability 1.0 (certain fact)
                yield (1.0, (condition, temp, humidity))

            except Exception as e:
                print(f"[scallop-weather] Error fetching weather for {city}: {e}")

        # Register both constructs with context
        ctx.register_foreign_function(get_temperature)
        ctx.register_foreign_predicate(weather_data)

Step 4: Testing Locally

Install in development mode:

# From scallop-weather/ directory
pip install -e .

Create test Scallop program (test_weather.scl):

// Test foreign function
rel cities = {"London", "Paris", "Tokyo"}
rel temperatures(city, temp) = cities(city), temp = $get_temperature(city)

// Test foreign predicate
rel forecast(city, condition, temp, humidity) =
  cities(city),
  weather(city, condition, temp, humidity)

query temperatures
query forecast

Run the test:

# Without API key (uses mock data)
scli test_weather.scl

# With API key
export WEATHER_API_KEY="your-api-key"
scli test_weather.scl --weather-units metric

Expected output (mock data):

temperatures: {
  ("London", 15.5),
  ("Paris", 18.2),
  ("Tokyo", 22.0)
}

forecast: {
  ("London", "cloudy", 15.5, 72),
  ("Paris", "sunny", 18.2, 45),
  ("Paris", "partly cloudy", 18.0, 50),
  ("Tokyo", "rainy", 22.0, 85)
}

Step 5: Advanced Features

Foreign Attribute Example

Add a foreign attribute for automatic weather monitoring:

# In plugin.py, add to load_into_ctx():

from scallopy import foreign_attribute

@foreign_attribute
def monitor_weather(
    item,
    cities: list,
    update_interval: int = 3600
):
    """
    Foreign attribute that periodically fetches weather.

    Usage:
      @monitor_weather(cities=["London", "Paris"], update_interval=3600)
      rel current_weather(city: String, condition: String, temp: f32)
    """
    # Validate the relation has correct arity
    if not item.is_relation():
        raise Exception("@monitor_weather can only be applied to relations")

    # Generate a foreign predicate
    pred_name = f"_monitor_{item.relation_name()}"

    @foreign_predicate(name=pred_name, output_arg_types=[str, str, float])
    def monitor_impl() -> Facts[float, Tuple[str, str, float]]:
        for city in cities:
            # Call the weather API
            try:
                response = requests.get(
                    self._base_url,
                    params={"q": city, "appid": self._api_key, "units": self._units},
                    timeout=5
                )
                data = response.json()
                condition = data["weather"][0]["description"]
                temp = data["main"]["temp"]
                yield (1.0, (city, condition, temp))
            except:
                pass

    ctx.register_foreign_predicate(monitor_impl)

    # Add rule to populate the relation
    ctx.add_rule(f"{item.relation_name()}(city, condition, temp) :- {pred_name}(city, condition, temp)")

ctx.register_foreign_attribute(monitor_weather)

Usage in Scallop:

@monitor_weather(cities=["London", "Paris"])
rel current_weather(city: String, condition: String, temp: f32)

query current_weather

Best Practices

1. Error Handling

Always provide graceful fallbacks:

try:
    result = expensive_operation()
    return result
except Exception as e:
    print(f"[plugin-name] Error: {e}")
    return default_value

2. Memoization

Cache expensive operations:

def __init__(self):
    super().__init__("plugin_name")
    self._cache = {}

def expensive_function(self, key):
    if key in self._cache:
        return self._cache[key]

    result = compute_result(key)
    self._cache[key] = result
    return result

3. Lazy Loading

Load heavy dependencies only when needed:

class MyPlugin(scallopy.Plugin):
    def __init__(self):
        super().__init__("my_plugin")
        self._model = None  # Don't load yet

    def _load_model(self):
        if self._model is None:
            import heavy_library
            self._model = heavy_library.load_model()
        return self._model

    def load_into_ctx(self, ctx):
        @foreign_function(name="predict")
        def predict(input_data):
            model = self._load_model()  # Load on first use
            return model.predict(input_data)

        ctx.register_foreign_function(predict)

4. GPU Support

Integrate with GPU utilities plugin:

def load_into_ctx(self, ctx):
    try:
        from scallop_gpu import get_device
        device = get_device()
    except ImportError:
        device = "cpu"

    # Use device for PyTorch models
    model = load_model().to(device)

5. API Key Management

Support multiple configuration methods:

def configure(self, args, unknown_args):
    # Priority: CLI arg > environment > config file
    self._api_key = (
        args.get("my_api_key") or
        os.getenv("MY_API_KEY") or
        self._load_from_config()
    )

    if not self._api_key:
        print("[plugin] Warning: No API key provided")

6. Type Annotations

Always specify types for foreign constructs:

# Foreign function with types
@foreign_function(name="func", return_type=float)
def func(x: int, y: str) -> float:
    ...

# Foreign predicate with input/output types
@foreign_predicate(
    name="pred",
    input_arg_types=[str, int],
    output_arg_types=[float, bool]
)
def pred(s: str, n: int) -> Facts[float, Tuple[float, bool]]:
    ...

7. Mock Data Pattern

Always provide mock data for API-based plugins:

def api_call(self, param):
    if not self._has_credentials():
        # Return mock data with comment
        mock_result = {"data": "example"}
        print("[plugin] Using mock data (no API key)")
        return mock_result

    # Real API call
    return requests.get(self._url, params={"q": param}).json()

Distribution

Building Wheels

Create distributable package:

# Install build tools
pip install build

# Build wheel and source distribution
python -m build

# Output in dist/
# dist/scallop_weather-0.1.0-py3-none-any.whl
# dist/scallop_weather-0.1.0.tar.gz

Publishing to PyPI

Upload to PyPI:

pip install twine

# Test on TestPyPI first
twine upload --repository testpypi dist/*

# Production upload
twine upload dist/*

Users can then install:

pip install scallop-weather

Local Installation Methods

For development:

# Editable install (changes reflected immediately)
pip install -e .

# Or use in Scallop's plugin directory
cd /path/to/scallop/etc/scallopy-plugins
ln -s /path/to/scallop-weather ./weather
make develop-weather

Common Patterns

Pattern 1: Database Integration

@foreign_predicate(name="sql_query", output_arg_types=[str, int])
def sql_query(query: str) -> Facts[float, Tuple[str, int]]:
    """Execute SQL query and return results."""
    import sqlite3
    conn = sqlite3.connect(self._db_path)
    cursor = conn.execute(query)
    for row in cursor:
        yield (1.0, tuple(row))

Pattern 2: File Processing

@foreign_function(name="read_csv")
def read_csv(filepath: str) -> list:
    """Read CSV file and return as list of tuples."""
    import csv
    with open(filepath) as f:
        reader = csv.reader(f)
        return [tuple(row) for row in reader]

Pattern 3: External Tool Integration

@foreign_predicate(name="lint_code", output_arg_types=[str, int, str])
def lint_code(filepath: str) -> Facts[float, Tuple[str, int, str]]:
    """Run linter and yield warnings."""
    import subprocess
    result = subprocess.run(
        ["pylint", filepath],
        capture_output=True,
        text=True
    )
    # Parse output and yield issues
    for line in result.stdout.split('\n'):
        if match := parse_warning(line):
            yield (1.0, match)

Debugging Tips

Enable debug mode:

def __init__(self):
    super().__init__("plugin_name")
    self._debug = False

def setup_argparse(self, parser):
    parser.add_argument("--plugin-debug", action="store_true")

def configure(self, args, unknown_args):
    self._debug = args.get("plugin_debug", False)
    if self._debug:
        print("[plugin] Debug mode enabled")

def load_into_ctx(self, ctx):
    @foreign_function(name="func")
    def func(x):
        if self._debug:
            print(f"[plugin] func called with x={x}")
        return compute(x)

Run with debug flag:

scli program.scl --plugin-debug

Testing Your Plugin

Create test suite (tests/test_plugin.py):

import scallopy
from scallop_weather import ScallopWeatherPlugin

def test_plugin_loads():
    ctx = scallopy.ScallopContext()
    plugin = ScallopWeatherPlugin()
    plugin.configure({}, [])
    plugin.load_into_ctx(ctx)
    assert "get_temperature" in ctx.list_foreign_functions()

def test_foreign_function():
    ctx = scallopy.ScallopContext()
    plugin = ScallopWeatherPlugin()
    plugin.configure({}, [])
    plugin.load_into_ctx(ctx)

    ctx.add_program("""
        rel result = {$get_temperature("London")}
        query result
    """)
    ctx.run()
    results = list(ctx.relation("result"))
    assert len(results) == 1
    assert results[0][0] > 0  # Temperature is positive

def test_foreign_predicate():
    ctx = scallopy.ScallopContext()
    plugin = ScallopWeatherPlugin()
    plugin.configure({}, [])
    plugin.load_into_ctx(ctx)

    ctx.add_program("""
        rel forecast(c, cond, t, h) = weather("Paris", cond, t, h)
        query forecast
    """)
    ctx.run()
    results = list(ctx.relation("forecast"))
    assert len(results) > 0

Run tests:

pip install pytest
pytest tests/

Next Steps

Now that you’ve created a plugin, explore these advanced topics:

For questions or contributions, see the Scallop GitHub repository.

Foreign Functions

Foreign functions are pure computational functions that extend Scallop with external capabilities. They allow plugins to provide custom operations that can be called directly within Scallop programs using the $function_name(args) syntax.

What are Foreign Functions?

Definition

A foreign function is a deterministic computation that:

  • Takes one or more input values
  • Returns a single output value
  • Has no side effects (pure function)
  • Can fail gracefully (e.g., divide-by-zero returns no result)

Syntax

Foreign functions are invoked with a dollar sign prefix:

rel result = {$function_name(arg1, arg2, ...)}
rel computed(x, y) = data(x), y = $function_name(x)

Key Characteristics

Pure computation:

  • Same inputs always produce same outputs
  • No randomness, no external state modification
  • Can be memoized for efficiency

Type-safe:

  • Arguments and return types are declared
  • Type checking at compile time
  • Automatic type conversion where possible

Partial functions:

  • May fail on some inputs (division by zero, index out of bounds)
  • Failed computations are silently dropped from results
  • No exceptions propagated to Scallop runtime

Using Foreign Functions

Basic Usage

Foreign functions are typically provided by plugins and become available after plugin loading:

import scallopy

# Create context and load plugins
ctx = scallopy.ScallopContext()
plugin_registry = scallopy.PluginRegistry()
plugin_registry.load_plugins_from_entry_points()
plugin_registry.load_into_ctx(ctx)

# Now foreign functions from plugins are available
ctx.add_program("""
  rel image_path = {"photo.jpg"}
  rel loaded(img) = image_path(path), img = $load_image(path)
  query loaded
""")
ctx.run()

In Scallop Programs

As value generators:

rel images = {
  $load_image("photo1.jpg"),
  $load_image("photo2.jpg"),
  $load_image("photo3.jpg")
}

In rule bodies:

rel processed(path, result) =
  image_paths(path),
  img = $load_image(path),
  result = $apply_filter(img, "blur")

With aggregations:

rel total = {$sum(x) | values(x)}
rel concat_all = {$string_join(s, ", ") | strings(s)}

Type Conversion

Scallop automatically converts between compatible types:

Scallop TypePython TypeNotes
i8, i16, i32, i64, isizeintInteger family
u8, u16, u32, u64, usizeintUnsigned integers
f32, f64floatFloating point
StringstrText
boolboolBoolean
Tensortorch.TensorPyTorch tensors

Error Handling

Functions that fail produce no output:

rel indices = {0, 1, 2, 5, 10}
rel chars(i, c) = indices(i), c = $string_char_at("hello", i)
query chars

// Result: {(0, 'h'), (1, 'e'), (2, 'l')}
// Indices 5 and 10 are out of bounds and silently dropped

Argument Types

Foreign functions support a rich type system for arguments and return values.

Basic Types

Python TypeScallop TypesDescriptionExample
inti8, i16, i32, i64, isizeSigned integers42, -100
intu8, u16, u32, u64, usizeUnsigned integers255, 1000
floatf32, f64Floating point3.14, 2.718
strStringText strings"hello"
boolboolBoolean valuestrue, false
torch.TensorTensorPyTorch tensorsImages, embeddings

Type Annotations

Always use explicit type annotations:

# ✓ Correct: explicit types
@scallopy.foreign_function
def add(x: int, y: int) -> int:
    return x + y

# ✗ Wrong: missing annotations (will fail)
@scallopy.foreign_function
def add(x, y):
    return x + y

Automatic Type Conversion

Scallop performs automatic conversions between compatible types:

Integer conversions:

@scallopy.foreign_function
def process(x: int) -> int:
    return x * 2

# Works with any Scallop integer type:
# i32(5) → 10
# u64(3) → 6

Float conversions:

@scallopy.foreign_function
def square_root(x: float) -> float:
    import math
    return math.sqrt(x)

# Accepts f32 or f64:
# f32(16.0) → 4.0
# f64(25.0) → 5.0

String handling:

@scallopy.foreign_function
def uppercase(s: str) -> str:
    return s.upper()

# String("hello") → "HELLO"

Tensor Types

PyTorch tensors are first-class types in Scallop:

import torch
import scallopy

@scallopy.foreign_function
def normalize_image(img: scallopy.Tensor) -> scallopy.Tensor:
    """Normalize image tensor to [0, 1] range."""
    tensor = img.float()
    return tensor / 255.0

@scallopy.foreign_function
def tensor_shape(img: scallopy.Tensor) -> str:
    """Get tensor shape as string."""
    return str(tuple(img.shape))

Usage:

rel image = {$load_image("photo.jpg")}
rel normalized(n) = image(img), n = $normalize_image(img)
rel shape(s) = image(img), s = $tensor_shape(img)

Generic Types

Use generic type parameters for functions that work with multiple types:

T = scallopy.ScallopGenericTypeParam(scallopy.Number)

@scallopy.foreign_function
def maximum(*values: T) -> T:
    """Return maximum of any numeric type."""
    return max(values)

# Works with any numeric type:
# $maximum(1, 5, 3) → 5 (integers)
# $maximum(1.5, 2.7, 0.3) → 2.7 (floats)

Built-in generic constraints:

  • scallopy.Number - Any numeric type (int or float)
  • scallopy.Any - Any Scallop type

Optional Arguments

Foreign functions support optional arguments with default values.

Basic Optional Arguments

@scallopy.foreign_function
def greet(name: str, title: str = "Mr./Ms.") -> str:
    """Greet someone with optional title."""
    return f"Hello, {title} {name}!"

Scallop usage:

// With default title
rel greeting1 = {$greet("Smith")}
// Result: {"Hello, Mr./Ms. Smith!"}

// With custom title
rel greeting2 = {$greet("Johnson", "Dr.")}
// Result: {"Hello, Dr. Johnson!"}

Multiple Optional Arguments

@scallopy.foreign_function
def format_number(
    value: float,
    decimals: int = 2,
    prefix: str = "",
    suffix: str = ""
) -> str:
    """Format number with optional prefix, suffix, and precision."""
    formatted = f"{value:.{decimals}f}"
    return f"{prefix}{formatted}{suffix}"

Scallop usage:

rel numbers = {3.14159, 2.71828}

// Just value (all defaults)
rel simple(n, s) = numbers(n), s = $format_number(n)
// Result: {(3.14159, "3.14"), (2.71828, "2.72")}

// With precision
rel precise(n, s) = numbers(n), s = $format_number(n, 4)
// Result: {(3.14159, "3.1416"), (2.71828, "2.7183")}

// With all options
rel fancy(n, s) = numbers(n), s = $format_number(n, 2, "$", " USD")
// Result: {(3.14159, "$3.14 USD"), (2.71828, "$2.72 USD")}

Optional with None

Use None as default for truly optional parameters:

from typing import Optional

@scallopy.foreign_function
def fetch_or_default(key: str, default: Optional[str] = None) -> str:
    """Fetch value or return default."""
    if key in STORAGE:
        return STORAGE[key]
    return default if default is not None else "N/A"

Scallop usage:

rel keys = {"existing_key", "missing_key"}

// Without default
rel results1(k, v) = keys(k), v = $fetch_or_default(k)
// Result: {("existing_key", "value"), ("missing_key", "N/A")}

// With default
rel results2(k, v) = keys(k), v = $fetch_or_default(k, "DEFAULT")
// Result: {("existing_key", "value"), ("missing_key", "DEFAULT")}

Variable Arguments

Foreign functions can accept variable numbers of arguments using *args.

Basic Variable Arguments

@scallopy.foreign_function
def sum_all(*args: int) -> int:
    """Sum any number of integers."""
    return sum(args)

Scallop usage:

// Different numbers of arguments
rel sum2 = {$sum_all(1, 2)}           // 3
rel sum3 = {$sum_all(1, 2, 3)}        // 6
rel sum5 = {$sum_all(1, 2, 3, 4, 5)}  // 15

String Concatenation

@scallopy.foreign_function
def concat(*strings: str) -> str:
    """Concatenate any number of strings."""
    return "".join(strings)

Scallop usage:

rel greeting = {$concat("Hello", ", ", "World", "!")}
// Result: {"Hello, World!"}

rel path = {$concat("/", "usr", "/", "local", "/", "bin")}
// Result: {"/usr/local/bin"}

Variable Arguments with Separator

@scallopy.foreign_function
def join_with(separator: str, *parts: str) -> str:
    """Join strings with specified separator."""
    return separator.join(parts)

Scallop usage:

rel csv = {$join_with(",", "apple", "banana", "cherry")}
// Result: {"apple,banana,cherry"}

rel path = {$join_with("/", "home", "user", "documents")}
// Result: {"home/user/documents"}

Mixed Fixed and Variable Arguments

@scallopy.foreign_function
def weighted_average(weight: float, *values: float) -> float:
    """Compute weighted average."""
    if not values:
        return 0.0
    return sum(v * weight for v in values) / len(values)

Scallop usage:

rel numbers = {1.0, 2.0, 3.0, 4.0, 5.0}
rel weighted(w, avg) = w = 0.8, avg = $weighted_average(w, 1.0, 2.0, 3.0)
// Result: {(0.8, 1.6)}

Generic Variable Arguments

T = scallopy.ScallopGenericTypeParam(scallopy.Number)

@scallopy.foreign_function
def min_value(*values: T) -> T:
    """Find minimum of any numeric type."""
    return min(values)

@scallopy.foreign_function
def max_value(*values: T) -> T:
    """Find maximum of any numeric type."""
    return max(values)

Scallop usage:

rel int_min = {$min_value(5, 2, 8, 1, 9)}      // 1
rel float_max = {$max_value(1.5, 3.2, 0.7)}    // 3.2

Error Handling in Foreign Functions

Foreign functions should handle errors gracefully to maintain Scallop’s declarative semantics.

Exception Handling

When a foreign function raises an exception, Scallop drops that computation from results:

@scallopy.foreign_function
def safe_divide(a: float, b: float) -> float:
    """Divide with zero-check."""
    if b == 0:
        raise ValueError("Division by zero")
    return a / b

Scallop behavior:

rel operations = {(10.0, 2.0), (15.0, 3.0), (8.0, 0.0), (20.0, 4.0)}
rel results(a, b, r) = operations(a, b), r = $safe_divide(a, b)
query results

// Result: {(10.0, 2.0, 5.0), (15.0, 3.0, 5.0), (20.0, 4.0, 5.0)}
// (8.0, 0.0) is silently dropped - no error message

Graceful Degradation

Return default values instead of raising exceptions when appropriate:

@scallopy.foreign_function
def safe_index(lst: str, idx: int) -> str:
    """Get character at index, return empty string if out of bounds."""
    try:
        return lst[idx]
    except IndexError:
        return ""  # Graceful fallback

Scallop usage:

rel text = {"hello"}
rel indices = {0, 1, 2, 10, 20}
rel chars(i, c) = text(t), indices(i), c = $safe_index(t, i)

// Result: {(0, "h"), (1, "e"), (2, "l"), (10, ""), (20, "")}
// Out-of-bounds indices return empty string instead of failing

Partial Functions

Some functions are inherently partial (undefined for some inputs). Use exceptions to signal undefined cases:

@scallopy.foreign_function
def sqrt(x: float) -> float:
    """Square root - undefined for negative numbers."""
    import math
    if x < 0:
        raise ValueError("Cannot take square root of negative number")
    return math.sqrt(x)

Scallop behavior:

rel numbers = {16.0, 25.0, -9.0, 36.0}
rel roots(n, r) = numbers(n), r = $sqrt(n)

// Result: {(16.0, 4.0), (25.0, 5.0), (36.0, 6.0)}
// -9.0 is dropped (undefined)

Input Validation

Validate inputs and raise exceptions for invalid cases:

@scallopy.foreign_function
def parse_age(s: str) -> int:
    """Parse age string, must be valid integer."""
    try:
        age = int(s)
        if age < 0 or age > 150:
            raise ValueError(f"Invalid age: {age}")
        return age
    except ValueError as e:
        raise ValueError(f"Cannot parse age from '{s}': {e}")

Scallop usage:

rel age_strings = {"25", "30", "invalid", "200", "45"}
rel ages(s, a) = age_strings(s), a = $parse_age(s)

// Result: {("25", 25), ("30", 30), ("45", 45)}
// "invalid" (not a number) and "200" (out of range) are dropped

Logging Errors

Log errors for debugging while still maintaining graceful behavior:

import logging

@scallopy.foreign_function
def fetch_data(url: str) -> str:
    """Fetch data from URL with error logging."""
    import requests
    try:
        response = requests.get(url, timeout=5)
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        logging.error(f"Failed to fetch {url}: {e}")
        raise  # Re-raise to drop from Scallop results

Behavior:

  • Successful fetches return data
  • Failed fetches are logged and dropped from results
  • Scallop program continues execution

Best Practices for Error Handling

✓ Good practices:

# 1. Clear error messages
@scallopy.foreign_function
def validate_email(email: str) -> bool:
    if "@" not in email:
        raise ValueError(f"Invalid email format: {email}")
    return True

# 2. Explicit None checks
@scallopy.foreign_function
def safe_operation(value: Optional[str]) -> str:
    if value is None:
        raise ValueError("Value cannot be None")
    return value.upper()

# 3. Type validation
@scallopy.foreign_function
def process_positive(x: int) -> int:
    if x <= 0:
        raise ValueError(f"Expected positive integer, got {x}")
    return x * 2

✗ Avoid:

# Don't silence all errors
@scallopy.foreign_function
def bad_function(x: str) -> str:
    try:
        return risky_operation(x)
    except:  # Too broad!
        return ""  # Hides real problems

# Don't use print() for errors
@scallopy.foreign_function
def bad_logging(x: int) -> int:
    if x < 0:
        print("Error: negative value")  # User won't see this
        raise ValueError("Negative value")
    return x

Error Recovery Patterns

Pattern 1: Try multiple strategies

@scallopy.foreign_function
def flexible_parse(s: str) -> float:
    """Try multiple parsing strategies."""
    # Strategy 1: Direct float conversion
    try:
        return float(s)
    except ValueError:
        pass

    # Strategy 2: Remove commas
    try:
        return float(s.replace(",", ""))
    except ValueError:
        pass

    # Strategy 3: Extract first number
    import re
    match = re.search(r'-?\d+\.?\d*', s)
    if match:
        return float(match.group())

    raise ValueError(f"Cannot parse number from '{s}'")

Pattern 2: Fallback values

@scallopy.foreign_function
def get_or_default(key: str, default: str = "UNKNOWN") -> str:
    """Get value with fallback."""
    if key in DATABASE:
        return DATABASE[key]
    return default  # No exception, returns default

Pattern 3: Validation before computation

@scallopy.foreign_function
def safe_compute(x: int, y: int) -> int:
    """Compute with pre-validation."""
    # Validate inputs first
    if x < 0 or y < 0:
        raise ValueError("Inputs must be non-negative")
    if x + y > 1000000:
        raise ValueError("Result would be too large")

    # Safe to compute
    return expensive_operation(x, y)

Examples from Plugins

OpenCV Plugin: Image I/O

The OpenCV plugin provides several foreign functions for image manipulation:

Loading images:

@scallopy.foreign_function
def load_image(image_dir: str) -> scallopy.Tensor:
    from PIL import Image
    import torch, numpy

    image = Image.open(image_dir).convert("RGB")
    image_tensor = torch.tensor(numpy.asarray(image))
    return image_tensor

Usage:

rel image_paths = {"cat.jpg", "dog.jpg", "bird.jpg"}
rel images(path, img) = image_paths(path), img = $load_image(path)
query images

Cropping images:

@scallopy.foreign_function
def crop_image(
    img: scallopy.Tensor,
    bbox_x: scallopy.u32,
    bbox_y: scallopy.u32,
    bbox_w: scallopy.u32,
    bbox_h: scallopy.u32,
    loc: str = None
) -> scallopy.Tensor:
    # Crop implementation with optional location modifier
    # ...
    return img[y1:y2, x1:x2, :]

Usage:

rel original = {$load_image("photo.jpg")}
rel face_region(cropped) = original(img), cropped = $crop_image(img, 100, 50, 200, 200)
rel enlarged(result) = face_region(img), result = $crop_image(img, 0, 0, 300, 300, "enlarge(1.5)")

GPT Plugin: Text Generation

The GPT plugin provides a simple text-to-text foreign function:

Implementation:

@scallopy.foreign_function
def gpt(prompt: str) -> str:
    if prompt in STORAGE:  # Memoization
        return STORAGE[prompt]

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0,
    )
    result = response["choices"][0]["message"]["content"].strip()
    STORAGE[prompt] = result
    return result

Usage:

rel questions = {
  "What is the capital of France?",
  "Translate to Spanish: Hello",
  "What is 15 * 24?"
}

rel answers(q, a) = questions(q), a = $gpt(q)
query answers

// Expected output (mock when API key not set):
// answers: {
//   ("What is the capital of France?", "Paris"),
//   ("Translate to Spanish: Hello", "Hola"),
//   ("What is 15 * 24?", "360")
// }

Built-in Mathematical Functions

Scallop includes many built-in foreign functions:

rel numbers = {-5, 0, 3, 7}

// Absolute value
rel abs_vals(x, y) = numbers(x), y = $abs(x)
// Result: {(-5, 5), (0, 0), (3, 3), (7, 7)}

// String formatting
rel formatted(s) = numbers(x), s = $format("Value: {}", x)
// Result: {("Value: -5"), ("Value: 0"), ...}

// Hash function
rel hashes(x, h) = numbers(x), h = $hash(x)

Creating Foreign Functions in Plugins

Basic Foreign Function

To create a foreign function in a plugin:

import scallopy

class MyPlugin(scallopy.Plugin):
    def __init__(self):
        super().__init__("my_plugin")

    def load_into_ctx(self, ctx):
        # Define and register foreign function
        @scallopy.foreign_function
        def double(x: int) -> int:
            return x * 2

        ctx.register_foreign_function(double)

With Optional Arguments

@scallopy.foreign_function
def greet(name: str, title: str = "Mr./Ms.") -> str:
    return f"Hello, {title} {name}!"

# Can be called as:
# $greet("Smith") → "Hello, Mr./Ms. Smith!"
# $greet("Smith", "Dr.") → "Hello, Dr. Smith!"

With Variable Arguments

@scallopy.foreign_function
def my_sum(*args: int) -> int:
    return sum(args)

# Can be called with any number of arguments:
# $my_sum(1, 2) → 3
# $my_sum(1, 2, 3, 4, 5) → 15

With Generic Types

T = scallopy.ScallopGenericTypeParam(scallopy.Number)

@scallopy.foreign_function
def maximum(*values: T) -> T:
    return max(values)

# Works with any numeric type:
# $maximum(1, 5, 3) → 5 (integers)
# $maximum(1.5, 2.7, 0.3) → 2.7 (floats)

Error Handling

Foreign functions should handle errors gracefully:

@scallopy.foreign_function
def safe_divide(a: float, b: float) -> float:
    if b == 0:
        raise ValueError("Division by zero")  # Handled by Scallop
    return a / b

When the function raises an exception, Scallop drops that computation:

rel operations = {(10, 2), (15, 3), (8, 0)}
rel results(a, b, r) = operations(a, b), r = $safe_divide(a, b)
query results

// Result: {(10, 2, 5.0), (15, 3, 5.0)}
// (8, 0) is dropped due to division by zero

Best Practices

Memoization for Expensive Operations

Cache results of expensive computations:

CACHE = {}

@scallopy.foreign_function
def expensive_operation(x: str) -> str:
    if x not in CACHE:
        # Expensive computation here
        CACHE[x] = compute_result(x)
    return CACHE[x]

Lazy Loading

Load heavy dependencies only when needed:

_MODEL = None

@scallopy.foreign_function
def use_model(input: str) -> str:
    global _MODEL
    if _MODEL is None:
        import heavy_ml_library
        _MODEL = heavy_ml_library.load_model()
    return _MODEL.predict(input)

Type Safety

Always annotate types explicitly:

# ✓ Good: explicit types
@scallopy.foreign_function
def process(x: int, y: str) -> bool:
    return len(y) > x

# ✗ Bad: missing type annotations
def process(x, y):  # Will fail at registration
    return len(y) > x

Next Steps

For more details on the language-level foreign function syntax, see Foreign Functions (Language).

Foreign Predicates

Foreign predicates are fact generators that extend Scallop by dynamically producing facts based on input patterns. Unlike foreign functions that return single values, foreign predicates can yield multiple results and support flexible bound/free variable patterns.

What are Foreign Predicates?

Definition

A foreign predicate is a Python generator function that:

  • Takes input arguments (bound variables)
  • Yields zero or more facts (bound + free variables)
  • Can produce probabilistic facts with tags
  • Supports pattern-based querying

Syntax

Foreign predicates are called like regular Scallop predicates:

rel result(input, output) = data(input), my_predicate(input, output)
rel check = data(x) and my_predicate(x)  // Boolean predicate

Foreign Predicates vs Foreign Functions

FeatureForeign FunctionForeign Predicate
Invocation$function(args)predicate(args)
ReturnsSingle valueZero or more facts
PatternAll inputs boundSupports bound/free patterns
Use casePure computationFact generation, search
TagInherits from contextPer-fact probability

Key Characteristics

Multi-valued:

  • Can yield multiple results for a single input
  • Empty results are valid (no facts generated)
  • Each result is independent

Pattern-driven:

  • Supports multiple calling patterns (bb, bf, fb, ff)
  • “b” = bound (input provided), “f” = free (output to generate)
  • Example: gpt(input, output) supports bb (check) and bf (generate)

Probabilistic:

  • Each yielded fact has a probability/tag
  • Tags integrate with Scallop’s provenance system
  • Enables fuzzy matching and uncertain reasoning

Using Foreign Predicates

Basic Usage

Foreign predicates become available after plugin loading:

import scallopy

# Create context with provenance
ctx = scallopy.ScallopContext(provenance="minmaxprob")

# Load plugins (e.g., GPT plugin)
plugin_registry = scallopy.PluginRegistry()
plugin_registry.load_plugins_from_entry_points()
plugin_registry.load_into_ctx(ctx)

# Now use foreign predicates from plugins
ctx.add_program("""
  rel question = {"What is the capital of France?"}
  rel answer(q, a) = question(q), gpt(q, a)
  query answer
""")
ctx.run()

Calling Patterns

Bound-Free (bf) - Generation:

// Input is bound, output is free - generate answers
rel questions = {"What is 2+2?", "What is the capital of Spain?"}
rel answers(q, a) = questions(q), gpt(q, a)

Bound-Bound (bb) - Verification:

// Both bound - check if answer is correct
rel question_answer_pairs = {
  ("What is 2+2?", "4"),
  ("What is 2+2?", "5")
}
rel verified(q, a) = question_answer_pairs(q, a), gpt(q, a)
// Only ("What is 2+2?", "4") passes verification

Multiple results:

// Foreign predicate can yield multiple answers
rel query = {"What are some French cities?"}
rel cities(c) = query(q), list_cities(q, c)
// Yields: Paris, Lyon, Marseille, Nice, ...

Integration with Rules

Chaining predicates:

rel document = {"The quick brown fox jumps over the lazy dog"}
rel extracted_entity(doc, entity) = document(doc), extract_entities(doc, entity)
rel classified(entity, type) = extracted_entity(_, entity), classify_entity(entity, type)

Filtering results:

rel candidates(x) = source(s), generate_options(s, x)
rel filtered(x) = candidates(x), validate(x)  // Boolean predicate

Aggregation:

rel all_answers = {a | question(q), gpt(q, a)}
rel num_unique_answers(n) = n = count(a: all_answers(a))

Examples from Plugins

GPT Plugin: Text Generation

The GPT plugin provides a foreign predicate that queries the OpenAI API:

Implementation:

@scallopy.foreign_predicate
def gpt(s: str) -> scallopy.Facts[None, str]:
    # Check memoization cache
    if s in STORAGE:
        response = STORAGE[s]
    else:
        # Call OpenAI API
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": s}],
            temperature=0.0,
        )
        STORAGE[s] = response

    # Iterate through all choices (typically 1)
    for choice in response["choices"]:
        result = choice["message"]["content"].strip()
        yield (result,)

Usage:

rel questions = {
  "What is the capital of France?",
  "Translate to Spanish: Good morning",
  "List three colors"
}

rel qa(question, answer) = questions(question), gpt(question, answer)
query qa

// Expected output (mock when API key not set):
// qa: {
//   ("What is the capital of France?", "Paris"),
//   ("Translate to Spanish: Good morning", "Buenos días"),
//   ("List three colors", "Red, blue, green")
// }

Multiple outputs:

// Configure GPT to return multiple answers
rel question = {"What are some programming languages?"}
rel languages(q, lang) = question(q), gpt_multi(q, lang)

// With n=3 completions, yields multiple facts:
// languages: {
//   ("What are some programming languages?", "Python, Java, C++"),
//   ("What are some programming languages?", "JavaScript, Ruby, Go"),
//   ("What are some programming languages?", "Rust, Swift, Kotlin")
// }

Custom Example: Semantic Similarity

@scallopy.foreign_predicate
def string_semantic_eq(s1: str, s2: str) -> scallopy.Facts[float, Tuple]:
    """Fuzzy string matching for kinship terms"""
    equivalents = {
        ("mom", "mother"): 0.99,
        ("mom", "mom"): 1.0,
        ("mother", "mother"): 1.0,
        ("dad", "father"): 0.99,
        ("dad", "dad"): 1.0,
        ("father", "father"): 1.0,
    }

    if (s1, s2) in equivalents:
        yield (equivalents[(s1, s2)], ())

Usage:

rel kinship = {
  ("alice", "mom", "bob"),
  ("alice", "mother", "casey"),
  ("david", "father", "emma")
}

rel parent(person, child) =
  kinship(person, relation, child),
  string_semantic_eq(relation, "mother")

rel parent(person, child) =
  kinship(person, relation, child),
  string_semantic_eq(relation, "father")

query parent

// Result (with probabilities):
// parent: {
//   0.99::("alice", "bob"),      // "mom" ~= "mother" with prob 0.99
//   1.0::("alice", "casey"),     // "mother" == "mother" exactly
//   1.0::("david", "emma")       // "father" == "father" exactly
// }

Custom Example: Divisor Generation

@scallopy.foreign_predicate
def divisors(n: int) -> scallopy.Facts[float, Tuple[int]]:
    """Generate all divisors of n"""
    for i in range(1, n + 1):
        if n % i == 0:
            yield (1.0, (i,))

Usage:

rel numbers = {12, 15, 20}
rel has_divisor(num, div) = numbers(num), divisors(num, div)
query has_divisor

// Result:
// has_divisor: {
//   (12, 1), (12, 2), (12, 3), (12, 4), (12, 6), (12, 12),
//   (15, 1), (15, 3), (15, 5), (15, 15),
//   (20, 1), (20, 2), (20, 4), (20, 5), (20, 10), (20, 20)
// }

Creating Foreign Predicates in Plugins

Basic Foreign Predicate

To create a foreign predicate in a plugin:

import scallopy
from scallopy import Facts
from typing import Tuple

class MyPlugin(scallopy.Plugin):
    def __init__(self):
        super().__init__("my_plugin")

    def load_into_ctx(self, ctx):
        @scallopy.foreign_predicate
        def my_pred(x: int) -> Facts[float, Tuple[int]]:
            # Generate facts
            yield (1.0, (x * 2,))
            yield (1.0, (x * 3,))

        ctx.register_foreign_predicate(my_pred)

Type Signature

The return type must be Facts[TagType, TupleType]:

from scallopy import Facts
from typing import Tuple

# Single output column
def single_output(x: int) -> Facts[float, Tuple[int]]:
    yield (1.0, (x * 2,))

# Multiple output columns
def multi_output(x: int) -> Facts[float, Tuple[int, str]]:
    yield (1.0, (x, "even" if x % 2 == 0 else "odd"))

# Boolean predicate (empty tuple)
def is_prime(n: int) -> Facts[float, Tuple]:
    if check_prime(n):
        yield (1.0, ())

Yielding Facts

Use yield to produce facts lazily:

@scallopy.foreign_predicate
def range_values(start: int, end: int) -> Facts[float, Tuple[int]]:
    for i in range(start, end + 1):
        yield (1.0, (i,))

# Generates facts: 1, 2, 3, 4, 5 for range_values(1, 5)

Probabilistic Facts

Tag values integrate with provenance:

@scallopy.foreign_predicate
def fuzzy_match(s1: str, s2: str) -> Facts[float, Tuple]:
    # Calculate similarity score
    similarity = compute_similarity(s1, s2)

    # Only yield if similarity is high enough
    if similarity > 0.5:
        yield (similarity, ())

# Usage:
# rel similar(a, b) = strings(a), strings(b), fuzzy_match(a, b)

Error Handling

Handle errors gracefully by not yielding:

@scallopy.foreign_predicate
def safe_operation(x: int) -> Facts[float, Tuple[int]]:
    try:
        result = risky_computation(x)
        yield (1.0, (result,))
    except Exception as e:
        # Don't yield - fact doesn't exist for this input
        pass

Memoization

Cache expensive computations:

CACHE = {}

@scallopy.foreign_predicate
def expensive_pred(x: str) -> Facts[float, Tuple[str]]:
    if x not in CACHE:
        CACHE[x] = expensive_api_call(x)

    for result in CACHE[x]:
        yield (1.0, (result,))

Best Practices

Use Predicates for Multi-Valued Results

✓ Good - Multiple results with predicate:

@foreign_predicate
def get_synonyms(word: str) -> Facts[float, Tuple[str]]:
    for synonym in lookup_synonyms(word):
        yield (1.0, (synonym,))

✗ Bad - Single result with function:

@foreign_function
def get_synonyms(word: str) -> str:
    return ", ".join(lookup_synonyms(word))  # Returns string, loses structure

Pattern Validation

Document and validate expected patterns:

@scallopy.foreign_predicate
def my_pred(x: int, y: int) -> Facts[float, Tuple[int]]:
    """
    Supports patterns:
    - (bound, bound) → (free): Given x and y, compute result
    - (bound, free) → not supported (would require search)
    """
    result = x + y
    yield (1.0, (result,))

Lazy Evaluation

Take advantage of generator laziness:

@scallopy.foreign_predicate
def infinite_sequence(start: int) -> Facts[float, Tuple[int]]:
    # This won't run forever - Scallop only pulls what it needs
    i = start
    while True:
        yield (1.0, (i,))
        i += 1

Next Steps

For detailed Python API documentation, see Foreign Predicates (Python API).

Foreign Attributes

Foreign attributes are metaprogramming decorators that transform Scallop declarations at load time. They allow plugins to provide high-level abstractions that automatically generate foreign functions, foreign predicates, or complex rule patterns based on declaration annotations.

What are Foreign Attributes?

Definition

A foreign attribute is a Python function that:

  • Processes Scallop declarations (relations, functions, types)
  • Receives attribute parameters from user code
  • Generates and registers foreign predicates/functions dynamically
  • Can validate types and argument patterns

Syntax

Foreign attributes are applied with @ syntax:

@attribute_name(param1, param2, key=value)
rel relation_name(arg1: Type1, arg2: Type2)

@attribute_name(params)
type function_name(args) -> ReturnType

Foreign Attributes vs Functions/Predicates

FeatureForeign FunctionForeign PredicateForeign Attribute
Applied toCalled in expressionsCalled as relationsDecorates declarations
When runsQuery executionQuery executionProgram load time
PurposeCompute valuesGenerate factsTransform declarations
OutputSingle valueFactsFunction/predicate/nothing
Example$load_image(path)gpt(input, output)@gpt(prompts=[...])

Key Characteristics

Metaprogramming:

  • Runs at program load time, not query execution
  • Can inspect and validate declaration structure
  • Generates code dynamically

High-level abstraction:

  • Wraps complex patterns with simple syntax
  • Provides domain-specific language extensions
  • Reduces boilerplate for common operations

Type-aware:

  • Can check argument types and patterns
  • Validates adornment (bound/free patterns)
  • Ensures correct usage at load time

How Attributes Work

Attribute Lifecycle

  1. User writes Scallop program:

    @gpt(header="Classify:", prompts=[...])
    rel classify(text: String, label: String)
    
  2. Scallop parser creates AST with attribute attached to declaration

  3. Plugin’s attribute processor is called:

    @scallopy.foreign_attribute
    def gpt(item, header, prompts):
        # Receives the declaration and parameters
        # Returns a foreign predicate or function
    
  4. Generated predicate/function is registered in the context

  5. User code can call the generated construct:

    rel result(t, l) = texts(t), classify(t, l)
    

Attribute Parameters

Foreign attributes receive:

Positional parameters:

@clip(["cat", "dog", "bird"])  // labels list

Keyword parameters:

@gpt(header="Question:", model="gpt-4", temperature=0.0)

In Python:

@scallopy.foreign_attribute
def my_attr(
    item,                    # The AST item being decorated
    pos_param,               # Positional parameter
    *,                       # Force keyword-only arguments
    key_param="default",     # Keyword parameter with default
    optional_param=None      # Optional parameter
):
    # Process item and parameters
    pass

Inspecting Declarations

The item parameter provides access to the declaration structure:

Check declaration type:

item.is_relation_decl()  # Is it a relation declaration?
item.is_function_decl()  # Is it a function declaration?
item.is_type_decl()      # Is it a type declaration?

Access relation details:

relation_decl = item.relation_decl(0)
name = relation_decl.name.name  # Relation name
args = relation_decl.arg_bindings  # Argument list

for arg in args:
    arg_name = arg.name.name    # Argument name
    arg_type = arg.ty            # Type (String, Tensor, etc.)
    arg_adornment = arg.adornment  # Bound/free annotation

Check argument adornment (bound/free pattern):

pattern = "".join([
    "b" if ab.adornment and ab.adornment.is_bound() else "f"
    for ab in relation_decl.arg_bindings
])
# Example patterns: "bf", "bbf", "bff"

Returning Constructs

Attributes return what should replace the declaration:

Return a foreign predicate:

@scallopy.foreign_attribute
def my_attr(item, ...):
    @scallopy.foreign_predicate(name=relation_name)
    def generated_predicate(...):
        # Implementation
        yield (tag, tuple)

    return generated_predicate

Return a foreign function:

@scallopy.foreign_attribute
def my_attr(item, ...):
    @scallopy.foreign_function(name=function_name)
    def generated_function(...):
        return result

    return generated_function

Return None (remove declaration):

@scallopy.foreign_attribute
def my_attr(item, ...):
    # Attribute has side effects but doesn't create a construct
    do_something_with(item)
    return None  # Declaration is removed from program

Examples from Plugins

GPT Plugin: @gpt Attribute

The @gpt attribute provides LLM-powered predicates with few-shot learning:

Usage:

@gpt(
  header="Classify the sentiment:",
  prompts=[
    {text: "I love this!", sentiment: "positive"},
    {text: "This is terrible.", sentiment: "negative"},
    {text: "It's okay.", sentiment: "neutral"}
  ],
  model="gpt-3.5-turbo",
  temperature=0.0
)
rel classify_sentiment(text: String, sentiment: String)

rel reviews = {
  "Amazing quality!",
  "Worst purchase ever.",
  "Not bad, could be better."
}

rel results(review, sent) = reviews(review), classify_sentiment(review, sent)
query results

Expected output (mock when API key not set):

results: {
  ("Amazing quality!", "positive"),
  ("Worst purchase ever.", "negative"),
  ("Not bad, could be better.", "neutral")
}

Implementation details:

@scallopy.foreign_attribute
def gpt(
    item,
    prompt: str,
    *,
    header: str = "",
    examples: List[List[str]] = [],
    model: Optional[str] = None,
    debug: bool = False,
):
    # Validate: must be relation declaration
    assert item.is_relation_decl()

    # Extract relation info
    relation_decl = item.relation_decl(0)
    arg_names = [ab.name.name for ab in relation_decl.arg_bindings]
    arg_types = [ab.ty for ab in relation_decl.arg_bindings]

    # Check pattern: must be "b+f+" (one or more bound, one or more free)
    pattern = get_pattern(relation_decl.arg_bindings)
    assert re.match("^(b*)(f+)$", pattern), "Pattern must be bound* followed by free+"

    # Build prompt from header, examples, and user inputs
    # ...

    # Generate foreign predicate
    @scallopy.foreign_predicate(name=relation_decl.name.name)
    def invoke_gpt(*args):
        # Call OpenAI API with filled prompt
        # Parse response
        # Yield facts
        pass

    return invoke_gpt

CLIP Plugin: @clip Attribute

The @clip attribute provides zero-shot image classification:

Usage:

@clip(
  labels=["cat", "dog", "bird", "car"],
  score_threshold=0.3
)
rel classify_image(img: Tensor, label: String)

rel images = {
  $load_image("photo1.jpg"),
  $load_image("photo2.jpg")
}

rel classifications(img, label) = images(img), classify_image(img, label)
query classifications

With dynamic labels:

@clip(score_threshold=0.5)
rel classify_dynamic(img: Tensor, labels: String, label: String)

rel image = {$load_image("photo.jpg")}
rel labels_str = {"cat;dog;bird;fish"}  // Semicolon-separated
rel result(img, label) = image(img), labels_str(ls), classify_dynamic(img, ls, label)

Implementation highlights:

@scallopy.foreign_attribute
def clip(
    item,
    labels: Optional[List[str]] = None,
    *,
    score_threshold: float = 0,
    unknown_class: str = "?",
    debug: bool = False,
):
    relation_decl = item.relation_decl(0)
    args = relation_decl.arg_bindings

    # Static labels: (img: Tensor, label: String)
    if labels is not None:
        assert len(args) == 2
        assert args[0].ty.is_tensor() and args[0].adornment.is_bound()
        assert args[1].ty.is_string() and args[1].adornment.is_free()

        @scallopy.foreign_predicate(name=relation_decl.name.name)
        def clip_classify(img: scallopy.Tensor):
            # Run CLIP model
            # Yield (probability, (label,)) for each class
            pass

        return clip_classify

    # Dynamic labels: (img: Tensor, labels: String, label: String)
    else:
        assert len(args) == 3
        # Similar but parse labels from input string
        pass

Stdlib: @cmd_arg Attribute

The @cmd_arg attribute binds command-line arguments to relations:

Usage:

@cmd_arg("-n", long="--num-iterations", default=10)
rel num_iterations(n: i32)

// Run: scli program.scl --num-iterations 20
// num_iterations: {(20,)}

Implementation:

@foreign_attribute
def cmd_arg(item, short: str, *, long: Optional[str] = None, default: Optional[Any] = None):
    relation_type_decl = item.relation_decl(0)
    name = relation_type_decl.name.name

    # Must be arity-1
    assert len(relation_type_decl.arg_bindings) == 1
    arg_type = relation_type_decl.arg_bindings[0].ty

    # Create argument parser
    parser = ArgumentParser()
    if long is not None:
        parser.add_argument(short, long, default=default, type=arg_type.to_python_type())
    else:
        parser.add_argument(short, default=default, type=arg_type.to_python_type())

    @foreign_predicate(name=name, output_arg_types=[arg_type])
    def get_arg():
        args, _ = parser.parse_known_args(unknown_args)
        if len(args.__dict__) > 0:
            value = list(args.__dict__.values())[0]
            if value is not None:
                yield (value,)

    return get_arg

Stdlib: @py_eval Attribute

The @py_eval attribute evaluates Python expressions:

Usage:

@py_eval
type eval_python(expr: String) -> i32

rel expressions = {"2 + 2", "10 * 5", "3 ** 4"}
rel results(expr, val) = expressions(expr), val = $eval_python(expr)
query results

// Result: {("2 + 2", 4), ("10 * 5", 50), ("3 ** 4", 81)}

Implementation:

@foreign_attribute
def py_eval(item, *, suppress_warning=True):
    assert item.is_function_decl()

    name = item.function_decl_name()
    arg_types = item.function_decl_arg_types()
    ret_type = item.function_decl_ret_type()

    assert len(arg_types) == 1 and arg_types[0].is_string()

    @foreign_function(name=name, ret_type=ret_type)
    def python_evaluate(text: str):
        return eval(text, None, None)

    return python_evaluate

Advanced Usage

Pattern Validation

Ensure correct adornment patterns:

@scallopy.foreign_attribute
def my_attr(item, ...):
    relation_decl = item.relation_decl(0)

    # Build pattern string
    pattern = "".join([
        "b" if ab.adornment and ab.adornment.is_bound() else "f"
        for ab in relation_decl.arg_bindings
    ])

    # Validate pattern
    if pattern == "bf":
        # Input-output pattern: good
        pass
    elif pattern == "bbf":
        # Two inputs, one output: good
        pass
    elif pattern == "ff":
        # No inputs: error
        raise ValueError("Attribute requires at least one bound argument")
    else:
        raise ValueError(f"Unsupported pattern: {pattern}")

Type Checking

Validate argument types:

@scallopy.foreign_attribute
def my_attr(item, ...):
    relation_decl = item.relation_decl(0)
    args = relation_decl.arg_bindings

    # Check first arg is Tensor
    assert args[0].ty.is_tensor(), "First argument must be Tensor"

    # Check all output args are String
    for arg in args[1:]:
        if not arg.adornment or arg.adornment.is_free():
            assert arg.ty.is_string(), "Output arguments must be String"

Prompt Engineering

Build prompts from attribute parameters:

def build_prompt(header, examples, user_inputs, arg_names):
    prompt = header + "\n\n"

    # Add few-shot examples
    for example in examples:
        example_str = ", ".join([
            f"{name}: {example[name]}"
            for name in arg_names
        ])
        prompt += f"Example: {example_str}\n"

    # Add user inputs
    input_str = ", ".join([
        f"{name}: {value}"
        for name, value in zip(arg_names, user_inputs)
    ])
    prompt += f"\nNow classify: {input_str}\n"
    prompt += "Answer:"

    return prompt

Memoization at Attribute Level

Cache results across invocations:

@scallopy.foreign_attribute
def cached_attr(item, ...):
    CACHE = {}  # Shared across all calls to generated predicate

    @scallopy.foreign_predicate(...)
    def cached_predicate(*args):
        key = tuple(args)
        if key not in CACHE:
            CACHE[key] = expensive_operation(*args)

        for result in CACHE[key]:
            yield result

    return cached_predicate

Error Messages

Provide clear error messages with attribute name:

ERR_HEAD = "[@my_attr]"

@scallopy.foreign_attribute
def my_attr(item, ...):
    assert item.is_relation_decl(), \
        f"{ERR_HEAD} must be applied to a relation declaration"

    assert len(item.relation_decls()) == 1, \
        f"{ERR_HEAD} cannot annotate multiple relations"

    relation_decl = item.relation_decl(0)
    args = relation_decl.arg_bindings

    assert len(args) >= 2, \
        f"{ERR_HEAD} requires at least 2 arguments, got {len(args)}"

Best Practices

Use Attributes for High-Level Patterns

✓ Good - Complex pattern wrapped in attribute:

@gpt(header="Extract name:", prompts=[...])
rel extract_name(text: String, name: String)

✗ Bad - Manual implementation every time:

rel extract_name(text, name) = text(text), gpt_raw(complex_prompt, name)
// User has to build prompt manually each time

Validate Early

Fail fast at load time, not query time:

@scallopy.foreign_attribute
def my_attr(item, param):
    # ✓ Check at load time
    assert item.is_relation_decl(), "Must be relation"
    assert param > 0, "Param must be positive"

    @scallopy.foreign_predicate(...)
    def pred(*args):
        # Don't check here - too late!
        pass

Document Patterns

Clearly document supported patterns:

@scallopy.foreign_attribute
def my_attr(item, ...):
    """
    Attribute for custom processing.

    Supported patterns:
    - (bound Tensor, free String) → bf pattern
    - (bound String, bound String, free String) → bbf pattern

    Example:
        @my_attr(param=value)
        rel classify(img: Tensor, label: String)
    """
    pass

Keep Attribute Logic Simple

Attributes should orchestrate, not implement:

# ✓ Good - delegate to helper functions
@scallopy.foreign_attribute
def my_attr(item, ...):
    validate_declaration(item)
    config = build_config(item, params)
    predicate = create_predicate(config)
    return predicate

# ✗ Bad - too much logic in attribute
@scallopy.foreign_attribute
def my_attr(item, ...):
    # 100 lines of complex logic here...
    pass

Next Steps

For implementation details, see the Plugin Development Guide.

Plugin Quick Reference

This page provides quick reference tables for all documented Scallop plugins, common configuration patterns, and troubleshooting guidance.

Plugin Overview

PluginPurposeAPI Key RequiredModels Run LocallyGPU Support
GPTLLM text processing✅ OPENAI_API_KEY❌ Cloud APIN/A
GeminiLLM text processing✅ GEMINI_API_KEY❌ Cloud APIN/A
TransformersVision & language models❌ No✅ Local✅ CUDA
PLIPProtein-ligand analysis❌ No✅ Local✅ CUDA
CodeQLStatic code analysis❌ No✅ Local❌ No
GPUDevice management❌ NoN/A✅ CUDA

Foreign Constructs by Plugin

GPT Plugin

TypeNameSignatureDescription
Function$gptString → StringSimple text generation
Predicategpt(String, String)Fact generation with memoization
Attribute@gptVariousFew-shot classification/extraction
Attribute@gpt_extract_infoVariousStructured JSON extraction
Attribute@gpt_encoderString → TensorText embeddings

Example:

// Foreign function
rel answer = {$gpt("What is 2+2?")}

// Foreign attribute
@gpt(header="Classify:", prompts=[...])
rel classify(text: String, label: String)

Gemini Plugin

TypeNameSignatureDescription
Function$geminiString → StringSimple text generation
Predicategemini(String, String)Fact generation with memoization
Attribute@geminiVariousFew-shot classification/extraction
Attribute@gemini_extract_infoVariousStructured JSON extraction

Example:

// Foreign function
rel answer = {$gemini("Translate to French: Hello")}

// Foreign attribute
@gemini(header="Extract:", prompts=[...])
rel extract(text: String, entity: String)

Transformers Plugin

TypeNameSignatureDescription
Attribute@vilt(Tensor, String, String)Visual question answering
Attribute@owl_vit(Tensor, String, ...)Open-vocabulary object detection
Attribute@roberta_encoderString → TensorText embeddings

Example:

// ViLT for VQA
@vilt(question="What is in the image?", top=5)
rel answer(img: Tensor, ans: String)

// OWL-ViT for detection
@owl_vit(object_queries=["cat", "dog"], output_fields=["class", "bbox-x", "bbox-y"])
rel detect(img: Tensor, class: String, x: u32, y: u32)

// RoBERTa for embeddings
@roberta_encoder
type encode(text: String) -> Tensor

PLIP Plugin

TypeNameSignatureDescription
Attribute@plip(Tensor, String)Protein-ligand classification

Example:

@plip(labels=["active", "inactive"], score_threshold=0.5)
rel classify(img: Tensor, label: String)

CodeQL Plugin

TypeNameSignatureDescription
Attribute@codeql_databaseVariousExtract code analysis relations

Available Relations:

  • get_class_definition(class_id, class_name, package, source_file)
  • get_method_definition(method_id, method_name, class_id, return_type)
  • get_local_dataflow_edge(from_node, to_node)
  • get_dataflow_node(node_id, node_type, node_value)

Example:

@codeql_database(debug=false)
rel get_class_definition(class_id: String, class_name: String, package: String, file: String)

GPU Plugin

No foreign constructs - provides device management via configuration only.

Configuration Reference

Command-Line Arguments

PluginFlagTypeDefaultDescription
GPT--openai-gpt-modelstringgpt-3.5-turboOpenAI model name
GPT--openai-gpt-temperaturefloat0.0Sampling temperature
GPT--num-allowed-openai-requestint100Request limit
Gemini--gemini-modelstringgemini-2.0-flashGemini model name
Gemini--gemini-temperaturefloat0.0Sampling temperature
Gemini--num-allowed-gemini-requestint100Request limit
CodeQL--codeql-dbstring-Path to CodeQL database
CodeQL--codeql-pathstring-Path to CodeQL CLI
GPU--cudaflagfalseEnable CUDA
GPU--gpuint0GPU device ID

Environment Variables

PluginVariableRequiredDescription
GPTOPENAI_API_KEY✅ YesOpenAI API key from platform.openai.com
GeminiGEMINI_API_KEY✅ YesGoogle Gemini key from aistudio.google.com
CodeQLCODEQL_PATH⚠️ OptionalPath to CodeQL CLI (if not in PATH)
Weather (example)WEATHER_API_KEY⚠️ OptionalFor custom weather plugin

Python API Configuration

import scallopy

ctx = scallopy.ScallopContext()
plugin_registry = scallopy.PluginRegistry()
plugin_registry.load_plugins_from_entry_points()

# Configure plugins
plugin_registry.configure({
    # GPT configuration
    "openai_gpt_model": "gpt-4",
    "openai_gpt_temperature": 0.0,
    "num_allowed_openai_request": 50,

    # Gemini configuration
    "gemini_model": "gemini-1.5-pro",
    "gemini_temperature": 0.0,

    # CodeQL configuration
    "codeql_db": "./my-java-db",
    "codeql_path": "/usr/local/bin/codeql",

    # GPU configuration
    "cuda": True,
    "gpu": 0,
}, [])

plugin_registry.load_into_ctx(ctx)

Installation Quick Reference

Install All Plugins

cd /path/to/scallop
make -C etc/scallopy-plugins develop

Install Specific Plugin

# Using make
make -C etc/scallopy-plugins develop-gpt
make -C etc/scallopy-plugins develop-gemini
make -C etc/scallopy-plugins develop-transformers
make -C etc/scallopy-plugins develop-plip
make -C etc/scallopy-plugins develop-codeql
make -C etc/scallopy-plugins develop-gpu

# Using pip
cd etc/scallopy-plugins/gpt
pip install -e .

Install from Wheels

# Build wheels
cd etc/scallopy-plugins
make wheel-<plugin_name>

# Install wheel
pip install dist/scallop_<plugin>-*.whl

Common Patterns

Pattern 1: Few-Shot Classification

GPT/Gemini:

@gpt(
  header="Classify the sentiment:",
  prompts=[
    {text: "Great!", sentiment: "positive"},
    {text: "Terrible", sentiment: "negative"},
    {text: "Okay", sentiment: "neutral"}
  ]
)
rel classify(text: String, sentiment: String)

rel reviews = {"Amazing product", "Waste of money"}
rel results(r, s) = reviews(r), classify(r, s)
query results

Pattern 2: Information Extraction

GPT with extract_info:

@gpt_extract_info(
  header="Extract entities:",
  prompts=["Extract all people", "Extract all companies"],
  examples=[
    (
      ["Alice works at Google."],
      [[("Alice",)], [("Google",)]]
    )
  ]
)
rel person(text: String, name: String)
rel company(text: String, org: String)

rel text = {"Bob joined Microsoft."}
rel people(n) = text(t), person(t, n)
query people

Pattern 3: Visual Question Answering

ViLT:

@vilt(question="What color is the car?", top=3)
rel answer_question(img: Tensor, answer: String)

rel image = {$load_image("photo.jpg")}
rel answers(a) = image(img), answer_question(img, a)
query answers

Pattern 4: Object Detection

OWL-ViT:

@owl_vit(
  object_queries=["person", "car"],
  output_fields=["class", "bbox-x", "bbox-y", "bbox-w", "bbox-h"],
  score_threshold=0.3
)
rel detect(img: Tensor, cls: String, x: u32, y: u32, w: u32, h: u32)

rel image = {$load_image("street.jpg")}
rel detections(c, x, y, w, h) = image(img), detect(img, c, x, y, w, h)
query detections

Pattern 5: Code Analysis

CodeQL:

@codeql_database
rel get_class_definition(cid: String, cname: String, pkg: String, file: String)
rel get_method_definition(mid: String, mname: String, cid: String, rtype: String)

// Find classes in specific package
rel security_classes(cid, cname) =
  get_class_definition(cid, cname, "com.example.security", _)

// Count methods per class
rel method_count(cid, count) =
  cid = get_class_definition(cid, _, _, _),
  count = count(mid: get_method_definition(mid, _, cid, _))

query security_classes
query method_count

Pattern 6: GPU Acceleration

Any vision/language plugin:

# Use GPU for faster inference
scli program.scl --cuda --gpu 0

# Or in Python
plugin_registry.configure({"cuda": True, "gpu": 0}, [])

Troubleshooting Guide

API Key Issues

Error: “API key not found”

Symptoms:

[scallop_gpt] `OPENAI_API_KEY` not found in environment variable

Solutions:

# Set environment variable
export OPENAI_API_KEY="sk-..."
export GEMINI_API_KEY="your-key"

# Or use command-line flag (if supported)
scli program.scl --openai-api-key "sk-..."

# Verify
echo $OPENAI_API_KEY

Error: “Invalid API key”

Solutions:

  1. Check key is correct (no extra spaces)
  2. Verify key is active on provider website
  3. Check account has credits/quota
  4. Try regenerating key

Model Loading Issues

Error: “Failed to download model”

Symptoms:

Failed to download model checkpoint from HuggingFace

Solutions:

# Check internet connection
ping huggingface.co

# Manually download
python -c "from transformers import ViltForQuestionAnswering; ViltForQuestionAnswering.from_pretrained('dandelin/vilt-b32-finetuned-vqa')"

# Check HuggingFace cache
ls ~/.cache/huggingface/hub/

Error: “Out of memory”

Symptoms:

RuntimeError: CUDA out of memory

Solutions:

  1. Use CPU instead: Remove --cuda flag
  2. Use smaller model checkpoints
  3. Reduce batch size / top-k / limit parameters
  4. Free GPU memory: torch.cuda.empty_cache()
  5. Use different GPU: --cuda --gpu 1

CodeQL Issues

Error: “codeql executable not found”

Solutions:

# Install CodeQL CLI
curl -L https://github.com/github/codeql-cli-binaries/releases/latest/download/codeql-osx64.zip -o codeql.zip
unzip codeql.zip
mv codeql /usr/local/bin/

# Set path
export CODEQL_PATH="/usr/local/bin/codeql"

# Or use flag
scli program.scl --codeql-path /usr/local/bin/codeql

Error: “Database not finalized”

Solutions:

# Finalize database
codeql database finalize my-java-db

# Verify
codeql database info my-java-db

Rate Limiting

Error: “Exceeding allowed number of requests”

Solutions:

# Increase limit
scli program.scl --num-allowed-openai-request 200

# Or in Python
plugin_registry.configure({
    "num_allowed_openai_request": 200
}, [])

GPU Issues

Error: “CUDA not available”

Solutions:

# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

# If False:
# 1. Install CUDA toolkit from nvidia.com
# 2. Install PyTorch with CUDA support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# 3. Verify NVIDIA drivers
nvidia-smi

Error: “Wrong GPU selected”

Solutions:

# Use specific GPU
scli program.scl --cuda --gpu 1

# Or in Python
plugin_registry.configure({"cuda": True, "gpu": 1}, [])

Plugin Not Found

Error: “Plugin not loaded”

Solutions:

# Verify plugin installed
pip list | grep scallop

# Reinstall plugin
cd etc/scallopy-plugins/gpt
pip install -e .

# Check entry points
python -c "import scallopy; print(scallopy.PluginRegistry().available_plugins())"

Performance Tips

Optimization Strategies

ScenarioRecommendationImprovement
Vision models (CLIP, ViLT, PLIP)Use GPU (--cuda)~10x faster
Multiple API callsRely on memoizationAutomatic caching
Large batch processingIncrease request limitsAvoid premature stops
Slow model loadingUse lazy loading patternFaster startup
Repeated queriesCache results externallyReduce API costs
High memory usageUse smaller modelsLower memory footprint

Model Size Comparison

ModelSizeSpeed (CPU)Speed (GPU)Use Case
ViLT~450MBMediumFastVisual QA
OWL-ViT~500MBSlowMediumObject detection
RoBERTa-base~500MBFastVery FastText encoding
PLIP~600MBMediumFastProtein analysis
GPT-3.5APIN/AN/AGeneral LLM tasks
Gemini FlashAPIN/AN/AFast LLM tasks

Type Reference

Scallop ↔ Python Type Mapping

Scallop TypePython TypeExample
i8, i16, i32, i64int42
u8, u16, u32, u64int100
f32, f64float3.14
boolboolTrue
Stringstr"hello"
Tensortorch.TensorImage or embedding
(T1, T2, ...)tuple(1, "a", 3.14)

Common Output Type Patterns

# Single output
@foreign_function(name="func")
def func(x: int) -> float:
    return x * 1.5

# Multiple outputs (use predicate)
@foreign_predicate(
    name="pred",
    input_arg_types=[int],
    output_arg_types=[float, str]
)
def pred(x: int) -> Facts[float, Tuple[float, str]]:
    yield (1.0, (x * 1.5, "result"))

Next Steps

For more examples, see the /examples/plugins/ directory.

Scallop CLI

The Scallop toolchain provides command-line tools for working with Scallop programs. These tools enable you to run programs, experiment interactively, and compile Scallop code.

Available Tools

Scallop includes three main command-line tools:

scli - Scallop Interpreter

The primary tool for running Scallop programs from .scl files.

scli program.scl

Use cases:

  • Execute Scallop programs
  • Test and debug logic
  • Run with different provenances
  • Query specific relations

Full documentation →

sclrepl - Interactive REPL

An interactive Read-Eval-Print-Loop for experimenting with Scallop.

sclrepl

Use cases:

  • Interactive exploration
  • Quick prototyping
  • Learning Scallop syntax
  • Testing small programs

Full documentation →

sclc - Scallop Compiler

Compiles Scallop programs (future feature).

sclc program.scl

Use cases:

  • Compile to standalone executables
  • Optimize performance
  • Generate intermediate representations

Full documentation →


Installation

From Binary Releases

Download prebuilt binaries from the GitHub releases page:

# Download and extract
tar -xzf scallop-<version>-<platform>.tar.gz

# Move to PATH
sudo mv scli sclrepl sclc /usr/local/bin/

From Source

Build from source using Cargo:

# Clone repository
git clone https://github.com/scallop-lang/scallop.git
cd scallop

# Build release binaries
cargo build --release

# Binaries in target/release/
./target/release/scli --version

Verify Installation

scli --version
# Output: scli 0.2.5

Quick Start

Running Your First Program

Create a file hello.scl:

rel greeting = {"Hello", "Bonjour", "Hola"}
rel target = {"World", "Monde", "Mundo"}
rel message(g, t) = greeting(g), target(t)

query message

Run it:

scli hello.scl

Output:

message: {("Hello", "World"), ("Hello", "Monde"), ...}

With Probabilistic Reasoning

Create prob_example.scl:

rel 0.9::reliable_edge(0, 1)
rel 0.8::reliable_edge(1, 2)
rel path(a, b) = reliable_edge(a, b)
rel path(a, c) = path(a, b), reliable_edge(b, c)

query path

Run with provenance:

scli --provenance minmaxprob prob_example.scl

Output:

path: {0.9::(0, 1), 0.8::(1, 2), 0.8::(0, 2)}

Common Workflows

Development Workflow

  1. Write - Create .scl file
  2. Test - Run with scli
  3. Debug - Add --debug flags
  4. Iterate - Modify and re-run

Experimentation Workflow

  1. Explore - Use sclrepl for quick tests
  2. Prototype - Develop logic interactively
  3. Save - Export to .scl file
  4. Run - Execute with scli

Production Workflow

  1. Develop - Write and test programs
  2. Integrate - Use Python API (scallopy)
  3. Deploy - Embed in applications
  4. Monitor - Use debug flags if needed

Summary

  • Three tools: scli (interpreter), sclrepl (REPL), sclc (compiler)
  • scli is the main tool for running .scl programs
  • sclrepl provides interactive exploration
  • sclc compiles programs (future)
  • Install from releases or build from source

For detailed usage:

Scallop Interpreter (scli)

scli is the Scallop interpreter for running Scallop programs from .scl files. It supports probabilistic reasoning, debugging, and query execution.

Basic Usage

scli <input-file>

Example:

scli program.scl

Options

Provenance Configuration

--provenance / -p - Set the provenance type

scli --provenance minmaxprob program.scl
scli -p topkproofs program.scl

Available provenances: unit, proofs, minmaxprob, addmultprob, topkproofs, probproofs, etc.

Default: unit (discrete DataLog)

--top-k / -k - Set K value for top-K provenances

scli --provenance topkproofs --top-k 5 program.scl
scli -p topkproofs -k 10 program.scl

Default: 3

Query Options

--query / -q - Query a specific relation

scli --query path program.scl

Without this flag, all query declarations in the file are executed.

--output-all - Output all relations (including hidden ones)

scli --output-all program.scl

Execution Control

--iter-limit - Set iteration limit for recursion

scli --iter-limit 100 program.scl

Useful for preventing infinite loops in recursive programs.

--stop-at-goal - Stop when goal relation is derived

scli --stop-at-goal program.scl

Terminates execution as soon as the goal relation has facts.

--no-early-discard - Disable early discarding

scli --no-early-discard program.scl

Keeps all intermediate results instead of discarding low-probability facts.

Optimization Options

--do-not-remove-unused-relations - Keep unused relations

scli --do-not-remove-unused-relations program.scl

By default, relations not used in queries are removed for efficiency.

--wmc-with-disjunctions - Use WMC for disjunctions

scli --wmc-with-disjunctions program.scl

Enables weighted model counting with disjunctive facts for better probability computation.

--scheduler - Set execution scheduler

scli --scheduler <scheduler-type> program.scl

Controls execution order of rules.

Debugging Options

--debug / -d - Enable general debugging

scli --debug program.scl

Prints execution information and intermediate states.

--debug-front - Debug front-end IR

scli --debug-front program.scl

Shows intermediate representation after parsing.

--debug-back - Debug back-end IR

scli --debug-back program.scl

Shows intermediate representation before execution.

--debug-ram - Debug RAM program

scli --debug-ram program.scl

Shows the compiled RAM (Relational Algebra Machine) program.

--debug-runtime - Monitor runtime execution

scli --debug-runtime program.scl

Prints detailed execution traces.

--debug-tag - Monitor tag propagation

scli --debug-tag program.scl

Shows how provenance tags propagate through execution.

Other Options

--version / -V - Print version

scli --version

--help / -h - Print help

scli --help

Examples

Basic Execution

# Run simple program
scli edge_path.scl

Probabilistic Reasoning

# Run with min-max probability
scli --provenance minmaxprob uncertain_graph.scl

# Run with top-K proofs
scli -p topkproofs -k 5 uncertain_graph.scl

Query Specific Relation

# Only output the 'result' relation
scli --query result computation.scl

Debugging

# Debug execution
scli --debug program.scl

# Monitor runtime with tag propagation
scli --debug-runtime --debug-tag program.scl

Performance Tuning

# Limit recursion depth
scli --iter-limit 50 recursive_program.scl

# Use WMC optimization
scli --wmc-with-disjunctions --provenance topkproofs program.scl

Common Patterns

Development

# Quick test
scli test.scl

# With debugging
scli --debug test.scl

Testing Different Provenances

# Compare results
scli --provenance unit program.scl
scli --provenance minmaxprob program.scl
scli --provenance topkproofs -k 5 program.scl

Production

# Optimized execution
scli --provenance topkproofs -k 10 \
     --wmc-with-disjunctions \
     --iter-limit 1000 \
     production.scl

Summary

  • Basic: scli program.scl
  • Provenance: -p <type> and -k <value>
  • Query: -q <relation> for specific output
  • Debug: --debug* flags for troubleshooting
  • Optimize: --iter-limit, --wmc-with-disjunctions
  • Version: scli --version (current: 0.2.5)

For more details:

Scallop REPL

sclrepl is the interactive Read-Eval-Print-Loop for Scallop, allowing you to experiment with Scallop programs interactively.

Starting the REPL

sclrepl

This starts an interactive session where you can enter Scallop declarations and queries.


Basic Usage

Entering Declarations

Type Scallop declarations directly:

scallop> rel edge = {(0, 1), (1, 2), (2, 3)}

Defining Rules

scallop> rel path(a, b) = edge(a, b)
scallop> rel path(a, c) = path(a, b), edge(b, c)

Querying Relations

scallop> query path
path: {(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)}

REPL Commands

Special Commands

:help - Show available commands

scallop> :help

:quit or :exit - Exit the REPL

scallop> :quit

:clear - Clear all declarations

scallop> :clear

:provenance <type> - Change provenance

scallop> :provenance minmaxprob

:relations - List all relations

scallop> :relations

Interactive Workflow

Experimentation

scallop> rel number = {1, 2, 3, 4, 5}
scallop> rel square(n, n * n) = number(n)
scallop> query square
square: {(1, 1), (2, 4), (3, 9), (4, 16), (5, 25)}

Iterative Development

scallop> rel person = {"alice", "bob", "charlie"}
scallop> query person
person: {"alice", "bob", "charlie"}

scallop> rel friend(a, b) = person(a), person(b), a != b
scallop> query friend
friend: {("alice", "bob"), ("alice", "charlie"), ("bob", "alice"), ...}

Probabilistic Reasoning

scallop> :provenance minmaxprob
scallop> rel 0.8::edge(0, 1)
scallop> rel 0.9::edge(1, 2)
scallop> rel path(a, b) = edge(a, b)
scallop> query path
path: {0.8::(0, 1), 0.9::(1, 2)}

Tips and Tricks

Multi-line Input

Use \ for line continuation:

scallop> rel complicated_rule(x, y) = \
         some_relation(x), \
         another_relation(y), \
         x > y

Inspecting Relations

scallop> :relations
Available relations:
  - edge: (i32, i32)
  - path: (i32, i32)

Quick Testing

scallop> rel test = {1, 2, 3}
scallop> query test
test: {1, 2, 3}
scallop> :clear
scallop> # Start fresh

Common Use Cases

Learning Scallop

Quickly test syntax and semantics:

scallop> rel fact = {"a", "b", "c"}
scallop> query fact

Prototyping

Develop logic interactively before saving to files:

scallop> # Try different rule formulations
scallop> rel version1(x) = ...
scallop> query version1
scallop> # Refine
scallop> rel version2(x) = ...
scallop> query version2

Debugging

Test problematic rules in isolation:

scallop> :provenance unit
scallop> # Add minimal facts
scallop> rel edge = {(0, 1)}
scallop> # Test rule
scallop> rel path(a, b) = edge(a, b)
scallop> query path

Summary

  • Start: sclrepl
  • Enter declarations, rules, and queries interactively
  • Commands: :help, :quit, :clear, :provenance, :relations
  • Use for: Learning, prototyping, debugging
  • Multi-line: Use \ for continuation

For more details:

Scallop Compiler (sclc)

sclc is the Scallop compiler for compiling Scallop programs into optimized executables or intermediate representations.

Status

Note: The Scallop compiler is currently under development. This documentation describes planned features.

Basic Usage

sclc [OPTIONS] <input-file>

Example:

sclc program.scl -o program

Planned Features

Compilation to Native Code

Compile Scallop programs to standalone executables:

sclc program.scl -o program
./program

Intermediate Representations

Generate IR for inspection and optimization:

# Generate LLVM IR
sclc --emit-llvm program.scl

# Generate assembly
sclc --emit-asm program.scl

# Generate object file
sclc --emit-obj program.scl

Optimization Levels

Control optimization:

# No optimization (fast compile)
sclc -O0 program.scl

# Basic optimization
sclc -O1 program.scl

# Full optimization (default)
sclc -O2 program.scl

# Aggressive optimization
sclc -O3 program.scl

Static Analysis

Analyze programs without execution:

# Type checking
sclc --check program.scl

# Unused relation detection
sclc --warn-unused program.scl

# Complexity analysis
sclc --analyze program.scl

Use Cases

Production Deployment

Compile programs for production use:

sclc -O3 --static production.scl -o prod_binary

Cross-Platform Builds

Target different platforms:

sclc --target x86_64-linux program.scl
sclc --target aarch64-macos program.scl

Library Generation

Create reusable libraries:

sclc --lib program.scl -o libscallop_logic.a

Current Alternative

While sclc is under development, use:

For execution: Use scli to run programs

scli program.scl

For Python integration: Use scallopy

import scallopy
ctx = scallopy.ScallopContext()
# ...

For optimization: Use scli flags

scli --wmc-with-disjunctions \
     --iter-limit 1000 \
     program.scl

Summary

  • Status: Under development
  • Goal: Compile Scallop to native code
  • Current: Use scli for execution
  • Future: Standalone executables, optimization, static analysis

For more details:

Developer Guide

This section covers the internal architecture of Scallop for contributors and developers who want to understand or extend the system.

Overview

Scallop is implemented in Rust and consists of several major components:

  • Compiler - Parses Scallop programs and produces intermediate representations
  • Runtime - Executes programs using various provenance semirings
  • Bindings - Python (scallopy), C, and other language integrations
  • Utils - Common utilities, data structures, and algorithms

Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────┐
│                     User Programs                       │
│  .scl files, Python API calls, CLI commands             │
└───────────────────┬─────────────────────────────────────┘
                    │
        ┌───────────┴──────────┐
        │                      │
        ▼                      ▼
┌───────────────┐      ┌──────────────┐
│   Compiler    │      │   Bindings   │
│  (Front-end)  │      │  (scallopy)  │
└───────┬───────┘      └──────┬───────┘
        │                     │
        ▼                     ▼
┌──────────────────────────────────────┐
│          Runtime Engine               │
│  - Provenance semirings               │
│  - Execution strategies               │
│  - Storage and indexing               │
└──────────────────────────────────────┘

Component Overview

Compiler (core/src/compiler/)

  • Lexer and parser (LALRPOP-based)
  • Type inference and checking
  • Intermediate representation (IR) generation
  • Query planning and optimization

Runtime (core/src/runtime/)

  • Provenance semiring framework
  • Execution engine (semi-naive evaluation, stratification)
  • Storage backend (relations, tuples, indexes)
  • Foreign function/predicate interface

Bindings (etc/scallopy/, C bindings)

  • Python API (scallopy)
  • PyTorch integration
  • Foreign function registration

Utils (core/src/utils/)

  • Data structures (B-trees, tries, etc.)
  • SDD (Sentential Decision Diagram) for WMC
  • Type system utilities

Code Organization

Repository Structure

scallop/
├── core/                    # Core Scallop implementation (Rust)
│   ├── src/
│   │   ├── compiler/        # Front-end compilation
│   │   ├── runtime/         # Execution engine
│   │   ├── common/          # Shared types and utilities
│   │   ├── integrate/       # Integration layer
│   │   └── utils/           # Helper utilities
│   └── tests/               # Integration tests
├── etc/
│   ├── scallopy/            # Python bindings
│   ├── scli/                # CLI tools
│   └── sclrepl/             # REPL
├── doc/                     # Documentation (mdBook)
└── examples/                # Example programs

Key Modules

Compiler Modules:

  • compiler/front/ - Front-end (parsing, AST)
  • compiler/type_check/ - Type inference and checking
  • compiler/back/ - Back-end (IR generation)
  • compiler/ram/ - RAM (Relational Algebra Machine) compilation

Runtime Modules:

  • runtime/provenance/ - Provenance semiring implementations
  • runtime/database/ - Storage and indexing
  • runtime/monitor/ - Execution monitoring
  • runtime/env/ - Execution environment

Common Modules:

  • common/expr/ - Expression types
  • common/foreign_function/ - Foreign function interface
  • common/tuple/ - Tuple types and operations
  • common/value/ - Value types

Key Concepts for Contributors

1. Provenance Semirings

Scallop’s core abstraction is the provenance semiring:

#![allow(unused)]
fn main() {
pub trait Provenance {
    type InputTag;        // What users provide
    type OutputTag;       // What users get back
    type Tag;             // Internal representation

    fn tagging_fn(&self, input_tag: Self::InputTag) -> Self::Tag;
    fn recover_fn(&self, tag: &Self::Tag) -> Self::OutputTag;

    fn add(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag;  // OR operation
    fn mult(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag; // AND operation
    fn negate(&self, tag: &Self::Tag) -> Self::Tag;              // NOT operation
}
}

All 18 provenance types implement this trait.

2. Compilation Pipeline

.scl source
    ↓ Parse (LALRPOP)
AST (Abstract Syntax Tree)
    ↓ Type inference
Typed AST
    ↓ Lower to IR
Front IR
    ↓ Transform
Back IR
    ↓ Compile
RAM Program
    ↓ Execute
Results

3. Execution Model

Scallop uses semi-naive evaluation for recursive rules:

  1. Start with base facts (iteration 0)
  2. Apply rules to derive new facts (iteration i)
  3. Only use facts from iteration i-1 (semi-naive)
  4. Repeat until fixpoint (no new facts)

4. Storage Backend

Relations are stored in efficient data structures:

  • Extensional relations (base facts) - stored as sorted vectors
  • Intensional relations (derived) - computed on demand or materialized
  • Indexes - B-trees for fast lookups

Building from Source

Prerequisites

  • Rust 1.70+ (rustup install stable)
  • Cargo (comes with Rust)
  • Python 3.8+ (for scallopy bindings)
  • PyTorch (optional, for ML integration)

Build Steps

# Clone repository
git clone https://github.com/scallop-lang/scallop.git
cd scallop

# Build CLI tools
cd core
cargo build --release

# Binaries in target/release/
./target/release/scli --version

# Build Python bindings
cd ../etc/scallopy
pip install -e .

Running Tests

# Core tests
cd core
cargo test

# Integration tests
cargo test --test integrate

# Python tests
cd ../etc/scallopy
python -m pytest tests/

Contributing

Getting Started

  1. Read the codebase - Start with /core/src/lib.rs
  2. Run examples - cargo run --example <name>
  3. Add tests - All new features need tests
  4. Follow conventions - Match existing code style

Code Style

  • Follow Rust conventions (rustfmt)
  • Document public APIs
  • Write integration tests for user-facing features
  • Keep PRs focused and atomic

Testing Guidelines

Unit tests - In the same file as the code:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_my_feature() {
        // Test code
    }
}
}

Integration tests - In core/tests/integrate/:

#![allow(unused)]
fn main() {
#[test]
fn test_new_language_feature() {
    let program = r#"
        rel edge = {(0, 1), (1, 2)}
        rel path(a, b) = edge(a, b)
        query path
    "#;
    // Test execution
}
}

Next Steps

For questions, join the Scallop community or open an issue on GitHub.

Implementing Language Constructs

This guide explains how to add new language features to Scallop. We’ll walk through the process from grammar to runtime execution.

Overview

Adding a new language feature typically involves these steps:

  1. Grammar - Define syntax in LALRPOP grammar
  2. AST - Add AST nodes to represent the construct
  3. Type checking - Implement type inference/checking
  4. IR generation - Lower AST to intermediate representation
  5. Execution - Implement runtime behavior
  6. Testing - Add comprehensive tests

Step 1: Grammar Definition

Grammar is defined in /core/src/compiler/front/grammar.lalrpop using LALRPOP.

Example: Adding a New Operator

Suppose we want to add a ** (power) operator:

// In grammar.lalrpop

pub Expr: Expr = {
    // Existing operators
    <l:Expr> "+" <r:Factor> => Expr::Binary(BinaryOp::Add, Box::new(l), Box::new(r)),
    <l:Expr> "-" <r:Factor> => Expr::Binary(BinaryOp::Sub, Box::new(l), Box::new(r)),

    // New power operator
    <l:Expr> "**" <r:Factor> => Expr::Binary(BinaryOp::Pow, Box::new(l), Box::new(r)),

    Factor,
}

Key Grammar Sections

Expressions (Expr):

  • Arithmetic: +, -, *, /
  • Comparison: ==, !=, <, >, <=, >=
  • Logical: and, or, not
  • Aggregation: count, sum, max, etc.

Atoms (Atom):

  • Predicates: rel_name(args)
  • Constraints: x > 5
  • Pattern matching: case x is Variant(y)

Rules (Rule):

  • Basic: head = body
  • Disjunctive: { head1; head2 } = body
  • Conjunctive: head1; head2 = body

Step 2: AST Nodes

AST definitions are in /core/src/compiler/front/ast.rs.

Adding to AST

#![allow(unused)]
fn main() {
// In ast.rs

#[derive(Clone, Debug, PartialEq)]
pub enum BinaryOp {
    Add,
    Sub,
    Mul,
    Div,
    Pow,  // New operator
    // ... other ops
}

#[derive(Clone, Debug)]
pub enum Expr {
    Binary(BinaryOp, Box<Expr>, Box<Expr>),
    // ... other expression types
}
}

AST Best Practices

  1. Use Box for recursive types - Prevents infinite size
  2. Implement Clone, Debug - Required for compiler passes
  3. Add Span information - For error reporting
  4. Document semantics - Explain what the node represents

Step 3: Type Checking

Type checking is in /core/src/compiler/type_check/.

Type Inference

#![allow(unused)]
fn main() {
// In type_check/expr.rs

impl Expr {
    pub fn infer_type(&self, env: &TypeEnv) -> Result<Type, TypeError> {
        match self {
            Expr::Binary(BinaryOp::Add, l, r) => {
                let lt = l.infer_type(env)?;
                let rt = r.infer_type(env)?;
                unify_numeric(lt, rt)
            }
            Expr::Binary(BinaryOp::Pow, l, r) => {
                // New: Type check power operator
                let lt = l.infer_type(env)?;
                let rt = r.infer_type(env)?;
                match (lt, rt) {
                    (Type::Int, Type::Int) => Ok(Type::Int),
                    (Type::Float, Type::Float) => Ok(Type::Float),
                    (Type::Float, Type::Int) => Ok(Type::Float),
                    _ => Err(TypeError::InvalidPowerOp(lt, rt))
                }
            }
            // ... other cases
        }
    }
}
}

Type System Components

  • Type - Represents Scallop types (int, float, string, ADTs)
  • TypeEnv - Environment mapping variables to types
  • TypeError - Type checking errors
  • unify - Type unification for inference

Step 4: IR Generation

Lower AST to intermediate representation in /core/src/compiler/back/.

Generating IR

#![allow(unused)]
fn main() {
// In back/compile.rs

impl Compiler {
    fn compile_expr(&mut self, expr: &Expr) -> IRExpr {
        match expr {
            Expr::Binary(BinaryOp::Add, l, r) => {
                let l_ir = self.compile_expr(l);
                let r_ir = self.compile_expr(r);
                IRExpr::Binary(IRBinaryOp::Add, Box::new(l_ir), Box::new(r_ir))
            }
            Expr::Binary(BinaryOp::Pow, l, r) => {
                // New: Compile power operator
                let l_ir = self.compile_expr(l);
                let r_ir = self.compile_expr(r);
                IRExpr::Binary(IRBinaryOp::Pow, Box::new(l_ir), Box::new(r_ir))
            }
            // ... other cases
        }
    }
}
}

IR Structure

IR is closer to execution than AST:

  • Variables become explicit
  • Types are attached
  • Control flow is explicit
  • Aggregations are lowered to loops

Step 5: Runtime Execution

Implement execution in /core/src/runtime/.

Adding Runtime Support

#![allow(unused)]
fn main() {
// In runtime/eval.rs

impl Executor {
    fn eval_binary(&self, op: &BinaryOp, l: &Value, r: &Value) -> Result<Value> {
        match op {
            BinaryOp::Add => {
                match (l, r) {
                    (Value::I32(a), Value::I32(b)) => Ok(Value::I32(a + b)),
                    (Value::F32(a), Value::F32(b)) => Ok(Value::F32(a + b)),
                    _ => Err(RuntimeError::TypeError)
                }
            }
            BinaryOp::Pow => {
                // New: Execute power operator
                match (l, r) {
                    (Value::I32(a), Value::I32(b)) => {
                        Ok(Value::I32(a.pow(*b as u32)))
                    }
                    (Value::F32(a), Value::F32(b)) => {
                        Ok(Value::F32(a.powf(*b)))
                    }
                    _ => Err(RuntimeError::TypeError)
                }
            }
            // ... other cases
        }
    }
}
}

Runtime Components

  • Value - Runtime value representation
  • Executor - Executes IR
  • Database - Stores relations
  • ProvenanceContext - Manages provenance

Step 6: Testing

Add tests in /core/tests/integrate/.

Integration Test Example

#![allow(unused)]
fn main() {
// In tests/integrate/operators.rs

#[test]
fn test_power_operator() {
    let program = r#"
        rel base = {2, 3, 4}
        rel power(x, y) = base(x), y = x ** 2
        query power
    "#;

    let expected = vec![
        (2, 4),
        (3, 9),
        (4, 16),
    ];

    test_program(program, "power", expected);
}

#[test]
fn test_power_with_float() {
    let program = r#"
        rel x = {2.0, 3.0}
        rel result(x ** 0.5) = x(x)
        query result
    "#;

    let expected = vec![
        1.414,  // sqrt(2)
        1.732,  // sqrt(3)
    ];

    test_program_float(program, "result", expected, 0.01);
}
}

Test Categories

  • Unit tests - Test individual components
  • Integration tests - Test complete programs
  • Type error tests - Ensure bad programs fail
  • Provenance tests - Test with different provenances
  • Performance tests - Benchmark critical operations

Complete Example: Adding let Bindings

Let’s walk through a complete example: adding local variable bindings.

1. Grammar

// Add to Atom rule
pub Atom: Atom = {
    // Existing patterns...

    // New: let binding
    "let" <var:Name> "=" <val:Expr> "," <body:Atom> => {
        Atom::Let(var, val, Box::new(body))
    },
}

2. AST

#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub enum Atom {
    // Existing variants...
    Let(String, Expr, Box<Atom>),
}
}

3. Type Checking

#![allow(unused)]
fn main() {
impl Atom {
    pub fn type_check(&self, env: &mut TypeEnv) -> Result<(), TypeError> {
        match self {
            Atom::Let(var, val, body) => {
                let val_type = val.infer_type(env)?;
                env.insert(var.clone(), val_type);
                body.type_check(env)?;
                env.remove(var);
                Ok(())
            }
            // ... other cases
        }
    }
}
}

4. IR Generation

#![allow(unused)]
fn main() {
impl Compiler {
    fn compile_atom(&mut self, atom: &Atom) -> Vec<IRStmt> {
        match atom {
            Atom::Let(var, val, body) => {
                let val_ir = self.compile_expr(val);
                let var_id = self.fresh_var();

                vec![
                    IRStmt::Assign(var_id, val_ir),
                    IRStmt::Scope {
                        bindings: vec![(var.clone(), var_id)],
                        body: Box::new(self.compile_atom(body)),
                    }
                ]
            }
            // ... other cases
        }
    }
}
}

5. Execution

#![allow(unused)]
fn main() {
impl Executor {
    fn eval_stmt(&mut self, stmt: &IRStmt) -> Result<()> {
        match stmt {
            IRStmt::Scope { bindings, body } => {
                // Push new scope
                for (name, var_id) in bindings {
                    let value = self.read_var(*var_id)?;
                    self.env.push(name.clone(), value);
                }

                // Execute body
                self.eval_stmt(body)?;

                // Pop scope
                for (name, _) in bindings {
                    self.env.pop(name);
                }

                Ok(())
            }
            // ... other cases
        }
    }
}
}

6. Tests

#![allow(unused)]
fn main() {
#[test]
fn test_let_binding() {
    let program = r#"
        rel edge(0, 1)
        rel edge(1, 2)

        rel result(a, c, d) =
            edge(a, b),
            let x = a + b,
            edge(b, c),
            let y = b + c,
            d = x + y

        query result
    "#;

    let expected = vec![(0, 1, 2), (1, 2, 4)];
    test_program(program, "result", expected);
}
}

Common Pitfalls

1. Forgetting Provenance

New operations must handle provenance correctly:

#![allow(unused)]
fn main() {
// ✗ Bad: ignores provenance
fn eval_binary(l: Value, r: Value) -> Value {
    Value::new(l.data + r.data)
}

// ✓ Good: propagates provenance
fn eval_binary(&self, l: TaggedValue, r: TaggedValue) -> TaggedValue {
    let tag = self.provenance.mult(&l.tag, &r.tag);  // Combine tags
    TaggedValue::new(l.value + r.value, tag)
}
}

2. Not Handling All Types

Operations must work with all relevant types:

#![allow(unused)]
fn main() {
// ✗ Bad: only handles i32
match (l, r) {
    (Value::I32(a), Value::I32(b)) => Value::I32(a + b),
    _ => panic!("Unexpected types")
}

// ✓ Good: handles all numeric types
match (l, r) {
    (Value::I32(a), Value::I32(b)) => Value::I32(a + b),
    (Value::F32(a), Value::F32(b)) => Value::F32(a + b),
    (Value::I64(a), Value::I64(b)) => Value::I64(a + b),
    _ => Err(TypeError::InvalidOperation)
}
}

3. Breaking Semi-Naive Evaluation

New constructs must preserve monotonicity for correctness.


Summary

To add a new language feature:

  1. Update grammar in grammar.lalrpop
  2. Add AST nodes in ast.rs
  3. Implement type checking in type_check/
  4. Generate IR in back/
  5. Implement execution in runtime/
  6. Write comprehensive tests in tests/integrate/

For more details:

Language Bindings

This guide explains how to create bindings for Scallop in other programming languages. We’ll focus on the Python bindings (scallopy) as a reference implementation.

Overview

Language bindings expose Scallop’s Rust API to other languages. The process involves:

  1. FFI Layer - Foreign Function Interface in Rust
  2. Wrapper Layer - Language-specific wrapper (Python, C, etc.)
  3. High-Level API - Idiomatic API for the target language
  4. Integration - Package and distribute

Python Bindings Architecture

The Python bindings (scallopy) use PyO3 for Rust-Python interop.

Architecture Layers

┌────────────────────────────────────────┐
│   Python User Code                     │
│   ctx = scallopy.ScallopContext()      │
└────────────────┬───────────────────────┘
                 │
┌────────────────▼───────────────────────┐
│   Python API Layer (scallopy/)         │
│   - ScallopContext                     │
│   - Module, Forward                    │
│   - Type conversions                   │
└────────────────┬───────────────────────┘
                 │
┌────────────────▼───────────────────────┐
│   PyO3 Bindings (src/)                 │
│   - #[pyclass], #[pyfunction]          │
│   - Rust ↔ Python conversions          │
└────────────────┬───────────────────────┘
                 │
┌────────────────▼───────────────────────┐
│   Scallop Core (scallop-core)         │
│   - Compiler, Runtime                  │
└────────────────────────────────────────┘

PyO3 Basics

PyO3 is a Rust library for Python interop.

Exposing Rust Structs to Python

#![allow(unused)]
fn main() {
use pyo3::prelude::*;

#[pyclass]
pub struct ScallopContext {
    internal: scallop_core::runtime::Context,
}

#[pymethods]
impl ScallopContext {
    #[new]
    fn new(provenance: Option<String>) -> PyResult<Self> {
        let prov = provenance.unwrap_or("unit".to_string());
        let internal = scallop_core::runtime::Context::new(&prov)?;
        Ok(Self { internal })
    }

    fn add_rule(&mut self, rule: String) -> PyResult<()> {
        self.internal.add_rule(&rule)
            .map_err(|e| PyErr::new::<pyo3::exceptions::PyValueError, _>(e.to_string()))
    }

    fn run(&mut self) -> PyResult<()> {
        self.internal.run()
            .map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))
    }
}
}

Module Definition

#![allow(unused)]
fn main() {
#[pymodule]
fn scallopy_internal(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<ScallopContext>()?;
    m.add_function(wrap_pyfunction!(version, m)?)?;
    Ok(())
}

#[pyfunction]
fn version() -> String {
    env!("CARGO_PKG_VERSION").to_string()
}
}

Type Conversions

Converting between Rust and Python types is crucial.

Rust → Python

#![allow(unused)]
fn main() {
use pyo3::types::{PyList, PyTuple};

impl ScallopContext {
    fn relation(&self, py: Python, name: &str) -> PyResult<PyObject> {
        let tuples = self.internal.relation(name)?;

        // Convert Vec<Tuple> to Python list
        let py_list = PyList::new(py, tuples.iter().map(|tuple| {
            match tuple.arity() {
                1 => tuple.get(0).to_py_object(py),
                _ => {
                    let items: Vec<PyObject> = tuple.iter()
                        .map(|v| v.to_py_object(py))
                        .collect();
                    PyTuple::new(py, items).to_object(py)
                }
            }
        }));

        Ok(py_list.to_object(py))
    }
}
}

Python → Rust

#![allow(unused)]
fn main() {
impl ScallopContext {
    fn add_facts(&mut self, relation: String, facts: &PyAny) -> PyResult<()> {
        let py_list = facts.downcast::<PyList>()?;

        let rust_facts: Vec<Tuple> = py_list.iter()
            .map(|item| {
                if let Ok(py_tuple) = item.downcast::<PyTuple>() {
                    // Convert Python tuple to Rust Tuple
                    let values: Vec<Value> = py_tuple.iter()
                        .map(|v| python_to_value(v))
                        .collect::<Result<_, _>>()?;
                    Ok(Tuple::from(values))
                } else {
                    // Single value
                    Ok(Tuple::from(vec![python_to_value(item)?]))
                }
            })
            .collect::<Result<_, PyErr>>()?;

        self.internal.add_facts(&relation, rust_facts)?;
        Ok(())
    }
}

fn python_to_value(obj: &PyAny) -> PyResult<Value> {
    if let Ok(i) = obj.extract::<i32>() {
        Ok(Value::I32(i))
    } else if let Ok(f) = obj.extract::<f64>() {
        Ok(Value::F64(f))
    } else if let Ok(s) = obj.extract::<String>() {
        Ok(Value::String(s))
    } else if let Ok(b) = obj.extract::<bool>() {
        Ok(Value::Bool(b))
    } else {
        Err(PyErr::new::<pyo3::exceptions::PyTypeError, _>("Unsupported type"))
    }
}
}

Handling Errors

Convert Rust errors to Python exceptions properly.

Error Conversion

#![allow(unused)]
fn main() {
use pyo3::exceptions::{PyValueError, PyRuntimeError};

impl From<scallop_core::Error> for PyErr {
    fn from(err: scallop_core::Error) -> Self {
        match err {
            scallop_core::Error::CompileError(msg) => {
                PyValueError::new_err(format!("Compile error: {}", msg))
            }
            scallop_core::Error::RuntimeError(msg) => {
                PyRuntimeError::new_err(format!("Runtime error: {}", msg))
            }
            scallop_core::Error::TypeError(msg) => {
                PyValueError::new_err(format!("Type error: {}", msg))
            }
            _ => PyRuntimeError::new_err(err.to_string())
        }
    }
}
}

Using Error Conversion

#![allow(unused)]
fn main() {
#[pymethods]
impl ScallopContext {
    fn add_rule(&mut self, rule: String) -> PyResult<()> {
        self.internal.add_rule(&rule)
            .map_err(|e| e.into())  // Automatically converts to PyErr
    }
}
}

PyTorch Integration

Scallopy integrates with PyTorch for differentiable reasoning.

Tensor Conversion

#![allow(unused)]
fn main() {
use pyo3::types::PyAny;

fn python_tensor_to_rust(tensor: &PyAny) -> PyResult<Vec<f32>> {
    // Get tensor as numpy array
    let numpy = tensor.call_method0("cpu")?.call_method0("numpy")?;

    // Extract values
    let values: Vec<f32> = numpy.extract()?;
    Ok(values)
}

fn rust_tensor_to_python(py: Python, values: Vec<f32>, shape: Vec<usize>) -> PyResult<PyObject> {
    // Import torch
    let torch = py.import("torch")?;

    // Create tensor
    let tensor = torch.call_method1("tensor", (values,))?
        .call_method1("reshape", (shape,))?;

    Ok(tensor.to_object(py))
}
}

Gradient Support

#![allow(unused)]
fn main() {
#[pyclass]
pub struct ScallopForward {
    context: ScallopContext,
    provenance: String,
}

#[pymethods]
impl ScallopForward {
    fn forward(&mut self, py: Python, inputs: &PyDict) -> PyResult<PyObject> {
        // Extract input tensors
        let rust_inputs = self.extract_inputs(inputs)?;

        // Run Scallop forward pass
        let outputs = self.context.forward(rust_inputs)?;

        // Convert to PyTorch tensors with gradient support
        self.create_output_tensors(py, outputs)
    }
}
}

Building and Packaging

Build Configuration

Cargo.toml:

[package]
name = "scallopy"
version = "0.2.5"
edition = "2021"

[lib]
name = "scallopy_internal"
crate-type = ["cdylib"]  # Create dynamic library for Python

[dependencies]
pyo3 = { version = "0.19", features = ["extension-module"] }
scallop-core = { path = "../../core" }

Python Setup

setup.py or pyproject.toml:

# pyproject.toml
[build-system]
requires = ["maturin>=0.14,<0.15"]
build-backend = "maturin"

[project]
name = "scallopy"
version = "0.2.5"
requires-python = ">=3.8"
dependencies = ["torch>=1.13"]

[tool.maturin]
bindings = "pyo3"
module-name = "scallopy.scallopy_internal"

Building

# Install maturin
pip install maturin

# Build in debug mode
maturin develop

# Build release wheel
maturin build --release

# Install from wheel
pip install target/wheels/scallopy-*.whl

C Bindings

For languages without Rust interop, expose a C API.

C Header Generation

#![allow(unused)]
fn main() {
// In src/c_api.rs

#[no_mangle]
pub extern "C" fn scallop_context_new(provenance: *const c_char) -> *mut Context {
    let prov_str = unsafe {
        assert!(!provenance.is_null());
        CStr::from_ptr(provenance).to_str().unwrap()
    };

    let context = Box::new(Context::new(prov_str).unwrap());
    Box::into_raw(context)
}

#[no_mangle]
pub extern "C" fn scallop_context_free(ctx: *mut Context) {
    if !ctx.is_null() {
        unsafe { Box::from_raw(ctx) };
    }
}

#[no_mangle]
pub extern "C" fn scallop_add_rule(ctx: *mut Context, rule: *const c_char) -> bool {
    let context = unsafe {
        assert!(!ctx.is_null());
        &mut *ctx
    };

    let rule_str = unsafe {
        assert!(!rule.is_null());
        CStr::from_ptr(rule).to_str().unwrap()
    };

    context.add_rule(rule_str).is_ok()
}
}

C Header File

// scallop.h

#ifndef SCALLOP_H
#define SCALLOP_H

#include <stdint.h>
#include <stdbool.h>

typedef struct ScallopContext ScallopContext;

ScallopContext* scallop_context_new(const char* provenance);
void scallop_context_free(ScallopContext* ctx);
bool scallop_add_rule(ScallopContext* ctx, const char* rule);
bool scallop_run(ScallopContext* ctx);

#endif

Testing Bindings

Python Tests

# tests/test_context.py

import scallopy

def test_basic_program():
    ctx = scallopy.ScallopContext()
    ctx.add_relation("edge", (int, int))
    ctx.add_facts("edge", [(0, 1), (1, 2)])
    ctx.add_rule("path(a, b) = edge(a, b)")
    ctx.add_rule("path(a, c) = path(a, b), edge(b, c)")
    ctx.run()

    result = list(ctx.relation("path"))
    assert (0, 1) in result
    assert (1, 2) in result
    assert (0, 2) in result

def test_probabilistic():
    ctx = scallopy.ScallopContext(provenance="minmaxprob")
    ctx.add_relation("edge", (int, int))
    ctx.add_facts("edge", [(0.8, (0, 1)), (0.9, (1, 2))])
    ctx.add_rule("path(a, b) = edge(a, b)")
    ctx.run()

    result = list(ctx.relation("path"))
    assert len(result) == 2
    assert result[0][0] == 0.8  # Probability
    assert result[0][1] == (0, 1)  # Tuple

Integration Tests

# tests/test_pytorch.py

import torch
import scallopy

def test_differentiable_forward():
    sum_2 = scallopy.ScallopForwardFunction(
        program="rel sum_2(a + b) = digit_a(a), digit_b(b)",
        provenance="difftopkproofs",
        input_mappings={"digit_a": list(range(10)), "digit_b": list(range(10))},
        output_mappings={"sum_2": list(range(19))}
    )

    digit_a = torch.randn(16, 10, requires_grad=True)
    digit_b = torch.randn(16, 10, requires_grad=True)

    result = sum_2(digit_a=digit_a, digit_b=digit_b)

    assert result.shape == (16, 19)
    assert result.requires_grad

    # Test gradient flow
    loss = result.sum()
    loss.backward()

    assert digit_a.grad is not None
    assert digit_b.grad is not None

Summary

To create language bindings:

  1. Use FFI framework - PyO3 for Python, cbindgen for C
  2. Convert types carefully - Handle all Scallop types
  3. Map errors properly - Convert to target language exceptions
  4. Test thoroughly - Unit tests, integration tests, examples
  5. Package properly - Use language-specific tools (maturin, setuptools)

For more details:

Full Scallop Grammar

SCALLOP_PROGRAM ::= ITEM*

ITEM ::= TYPE_DECL
       | RELATION_DECL
       | CONST_DECL
       | QUERY_DECL

TYPE ::= u8 | u16 | u32 | u64 | u128 | usize
       | i8 | i16 | i32 | i64 | i128 | isize
       | f32 | f64 | char | bool
       | String
       | CUSTOM_TYPE_NAME

TYPE_DECL ::= type CUSTOM_TYPE_NAME = TYPE
            | type CUSTOM_TYPE_NAME <: TYPE
            | type ENUM_TYPE_NAME = VARIANT1 [= VAL1] | VARIANT2 [= VAL2] | ...
            | type ADT_TYPE_NAME = CONSTRUCTOR1(TYPE*) | CONSTRUCTOR2(TYPE*) | ...
            | type RELATION_NAME(TYPE*)
            | type RELATION_NAME(VAR1: TYPE1, VAR2: TYPE2, ...)
            | type $FUNCTION_NAME(VAR1: TYPE1, VAR2: TYPE2, ...) -> TYPE_RET

CONST_DECL ::= const CONSTANT_NAME : TYPE = CONSTANT
             | const CONSTANT_NAME1 [: TYPE1] = CONSTANT1, CONSTANT_NAME2 [: TYPE2] = CONSTANT2, ...

RELATION_DECL ::= FACT_DECL
                | FACTS_SET_DECL
                | RULE_DECL

CONSTANT ::= true | false | NUMBER_LITERAL | STRING_LITERAL

CONST_TUPLE ::= CONSTANT | (CONSTANT1, CONSTANT2, ...)

FOREIGN_FN ::= hash | string_length | string_concat | substring | abs

BIN_OP ::= + | - | * | / | % | == | != | <= | < | >= | > | && | || | ^

UNARY_OP ::= ! | -

CONST_EXPR ::= CONSTANT
             | CONST_EXPR BIN_OP CONST_EXPR | UNARY_OP CONST_EXPR
             | $ FOREIGN_FN(CONST_EXPR*)
             | if CONST_EXPR then CONST_EXPR else CONST_EXPR
             | ( CONST_EXPR )

TAG ::= true | false | NUMBER_LITERAL  // true/false is for boolean tags; NUMBER_LITERAL is used for probabilities

FACT_DECL ::= rel RELATION_NAME(CONST_EXPR*)         // Untagged fact
            | rel TAG :: RELATION_NAME(CONST_EXPR*)  // Tagged fact

FACTS_SET_DECL ::= rel RELATION_NAME = {CONST_TUPLE1, CONST_TUPLE2, ...}                  // Untagged tuples
                 | rel RELATION_NAME = {TAG1 :: CONST_TUPLE1, TAG2 :: CONST_TUPLE2, ...}  // Tagged tuples
                 | rel RELATION_NAME = {TAG1 :: CONST_TUPLE1; TAG2 :: CONST_TUPLE2; ...}  // Tagged tuples forming annotated disjunction

EXPR ::= VARIABLE
       | CONSTANT
       | EXPR BIN_OP EXPR | UNARY_OP EXPR
       | new CONSTRUCTOR(EXPR*)
       | $ FOREIGN_FN(EXPR*)
       | if EXPR then EXPR else EXPR
       | ( EXPR )

ATOM ::= RELATION_NAME(EXPR*)

RULE_DECL ::= rel ATOM :- FORMULA | rel ATOM = FORMULA                // Normal rule
            | rel TAG :: ATOM :- FORMULA | rel TAG :: ATOM = FORMULA  // Tagged rule

FORMULA ::= ATOM
          | not ATOM | ~ ATOM                                                   // negation
          | FORMULA1, FORMULA2, ... | FORMULA and FORMULA | FORMULA /\ FORMULA  // conjunction
          | FORMULA or FORMULA | FORMULA \/ FORMULA                             // disjunction
          | FORMULA implies FORMULA | FORMULA => FORMULA                        // implies
          | case VARIABLE is ENTITY
          | CONSTRAINT
          | AGGREGATION
          | ( FORMULA )

ENTITY ::= CONSTRUCTOR(ENTITY*)

CONSTRAINT ::= EXPR // When expression returns a boolean value

AGGREGATOR ::= count | sum | prod | min | max | exists | forall | unique

AGGREGATION ::= VAR* = AGGREGATOR(VAR* : FORMULA)                             // Normal aggregation
              | VAR* = AGGREGATOR(VAR* : FORMULA where VAR* : FORMULA)        // Aggregation with group-by condition
              | VAR* = AGGREGATOR[VAR*](VAR* : FORMULA)                       // Aggregation with arg (only applied to AGGREGATOR = min or max)
              | VAR* = AGGREGATOR[VAR*](VAR* : FORMULA where VAR* : FORMULA)  // Aggregation with arg and group-by condition (only applied to AGGREGATOR = min or max)
              | forall(VAR* : FORMULA)
              | exists(VAR* : FORMULA)

QUERY_DECL ::= query RELATION_NAME
             | query ATOM

Contributors