Developer Guide
This section covers the internal architecture of Scallop for contributors and developers who want to understand or extend the system.
Overview
Scallop is implemented in Rust and consists of several major components:
- Compiler - Parses Scallop programs and produces intermediate representations
- Runtime - Executes programs using various provenance semirings
- Bindings - Python (scallopy), C, and other language integrations
- Utils - Common utilities, data structures, and algorithms
Architecture
High-Level Architecture
┌─────────────────────────────────────────────────────────┐
│ User Programs │
│ .scl files, Python API calls, CLI commands │
└───────────────────┬─────────────────────────────────────┘
│
┌───────────┴──────────┐
│ │
▼ ▼
┌───────────────┐ ┌──────────────┐
│ Compiler │ │ Bindings │
│ (Front-end) │ │ (scallopy) │
└───────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────────────────────────────────┐
│ Runtime Engine │
│ - Provenance semirings │
│ - Execution strategies │
│ - Storage and indexing │
└──────────────────────────────────────┘
Component Overview
Compiler (core/src/compiler/)
- Lexer and parser (LALRPOP-based)
- Type inference and checking
- Intermediate representation (IR) generation
- Query planning and optimization
Runtime (core/src/runtime/)
- Provenance semiring framework
- Execution engine (semi-naive evaluation, stratification)
- Storage backend (relations, tuples, indexes)
- Foreign function/predicate interface
Bindings (etc/scallopy/, C bindings)
- Python API (scallopy)
- PyTorch integration
- Foreign function registration
Utils (core/src/utils/)
- Data structures (B-trees, tries, etc.)
- SDD (Sentential Decision Diagram) for WMC
- Type system utilities
Code Organization
Repository Structure
scallop/
├── core/ # Core Scallop implementation (Rust)
│ ├── src/
│ │ ├── compiler/ # Front-end compilation
│ │ ├── runtime/ # Execution engine
│ │ ├── common/ # Shared types and utilities
│ │ ├── integrate/ # Integration layer
│ │ └── utils/ # Helper utilities
│ └── tests/ # Integration tests
├── etc/
│ ├── scallopy/ # Python bindings
│ ├── scli/ # CLI tools
│ └── sclrepl/ # REPL
├── doc/ # Documentation (mdBook)
└── examples/ # Example programs
Key Modules
Compiler Modules:
compiler/front/- Front-end (parsing, AST)compiler/type_check/- Type inference and checkingcompiler/back/- Back-end (IR generation)compiler/ram/- RAM (Relational Algebra Machine) compilation
Runtime Modules:
runtime/provenance/- Provenance semiring implementationsruntime/database/- Storage and indexingruntime/monitor/- Execution monitoringruntime/env/- Execution environment
Common Modules:
common/expr/- Expression typescommon/foreign_function/- Foreign function interfacecommon/tuple/- Tuple types and operationscommon/value/- Value types
Key Concepts for Contributors
1. Provenance Semirings
Scallop’s core abstraction is the provenance semiring:
#![allow(unused)]
fn main() {
pub trait Provenance {
type InputTag; // What users provide
type OutputTag; // What users get back
type Tag; // Internal representation
fn tagging_fn(&self, input_tag: Self::InputTag) -> Self::Tag;
fn recover_fn(&self, tag: &Self::Tag) -> Self::OutputTag;
fn add(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag; // OR operation
fn mult(&self, t1: &Self::Tag, t2: &Self::Tag) -> Self::Tag; // AND operation
fn negate(&self, tag: &Self::Tag) -> Self::Tag; // NOT operation
}
}
All 18 provenance types implement this trait.
2. Compilation Pipeline
.scl source
↓ Parse (LALRPOP)
AST (Abstract Syntax Tree)
↓ Type inference
Typed AST
↓ Lower to IR
Front IR
↓ Transform
Back IR
↓ Compile
RAM Program
↓ Execute
Results
3. Execution Model
Scallop uses semi-naive evaluation for recursive rules:
- Start with base facts (iteration 0)
- Apply rules to derive new facts (iteration i)
- Only use facts from iteration i-1 (semi-naive)
- Repeat until fixpoint (no new facts)
4. Storage Backend
Relations are stored in efficient data structures:
- Extensional relations (base facts) - stored as sorted vectors
- Intensional relations (derived) - computed on demand or materialized
- Indexes - B-trees for fast lookups
Building from Source
Prerequisites
- Rust 1.70+ (
rustup install stable) - Cargo (comes with Rust)
- Python 3.8+ (for scallopy bindings)
- PyTorch (optional, for ML integration)
Build Steps
# Clone repository
git clone https://github.com/scallop-lang/scallop.git
cd scallop
# Build CLI tools
cd core
cargo build --release
# Binaries in target/release/
./target/release/scli --version
# Build Python bindings
cd ../etc/scallopy
pip install -e .
Running Tests
# Core tests
cd core
cargo test
# Integration tests
cargo test --test integrate
# Python tests
cd ../etc/scallopy
python -m pytest tests/
Contributing
Getting Started
- Read the codebase - Start with
/core/src/lib.rs - Run examples -
cargo run --example <name> - Add tests - All new features need tests
- Follow conventions - Match existing code style
Code Style
- Follow Rust conventions (rustfmt)
- Document public APIs
- Write integration tests for user-facing features
- Keep PRs focused and atomic
Testing Guidelines
Unit tests - In the same file as the code:
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_my_feature() {
// Test code
}
}
}
Integration tests - In core/tests/integrate/:
#![allow(unused)]
fn main() {
#[test]
fn test_new_language_feature() {
let program = r#"
rel edge = {(0, 1), (1, 2)}
rel path(a, b) = edge(a, b)
query path
"#;
// Test execution
}
}
Next Steps
- Language Constructs - Implementing new language features
- Bindings - Adding language bindings
- Contributing Guide
For questions, join the Scallop community or open an issue on GitHub.