Implementing Language Constructs
This guide explains how to add new language features to Scallop. We’ll walk through the process from grammar to runtime execution.
Overview
Adding a new language feature typically involves these steps:
- Grammar - Define syntax in LALRPOP grammar
- AST - Add AST nodes to represent the construct
- Type checking - Implement type inference/checking
- IR generation - Lower AST to intermediate representation
- Execution - Implement runtime behavior
- Testing - Add comprehensive tests
Step 1: Grammar Definition
Grammar is defined in /core/src/compiler/front/grammar.lalrpop using LALRPOP.
Example: Adding a New Operator
Suppose we want to add a ** (power) operator:
// In grammar.lalrpop
pub Expr: Expr = {
// Existing operators
<l:Expr> "+" <r:Factor> => Expr::Binary(BinaryOp::Add, Box::new(l), Box::new(r)),
<l:Expr> "-" <r:Factor> => Expr::Binary(BinaryOp::Sub, Box::new(l), Box::new(r)),
// New power operator
<l:Expr> "**" <r:Factor> => Expr::Binary(BinaryOp::Pow, Box::new(l), Box::new(r)),
Factor,
}
Key Grammar Sections
Expressions (Expr):
- Arithmetic:
+,-,*,/ - Comparison:
==,!=,<,>,<=,>= - Logical:
and,or,not - Aggregation:
count,sum,max, etc.
Atoms (Atom):
- Predicates:
rel_name(args) - Constraints:
x > 5 - Pattern matching:
case x is Variant(y)
Rules (Rule):
- Basic:
head = body - Disjunctive:
{ head1; head2 } = body - Conjunctive:
head1; head2 = body
Step 2: AST Nodes
AST definitions are in /core/src/compiler/front/ast.rs.
Adding to AST
#![allow(unused)]
fn main() {
// In ast.rs
#[derive(Clone, Debug, PartialEq)]
pub enum BinaryOp {
Add,
Sub,
Mul,
Div,
Pow, // New operator
// ... other ops
}
#[derive(Clone, Debug)]
pub enum Expr {
Binary(BinaryOp, Box<Expr>, Box<Expr>),
// ... other expression types
}
}
AST Best Practices
- Use
Boxfor recursive types - Prevents infinite size - Implement
Clone,Debug- Required for compiler passes - Add
Spaninformation - For error reporting - Document semantics - Explain what the node represents
Step 3: Type Checking
Type checking is in /core/src/compiler/type_check/.
Type Inference
#![allow(unused)]
fn main() {
// In type_check/expr.rs
impl Expr {
pub fn infer_type(&self, env: &TypeEnv) -> Result<Type, TypeError> {
match self {
Expr::Binary(BinaryOp::Add, l, r) => {
let lt = l.infer_type(env)?;
let rt = r.infer_type(env)?;
unify_numeric(lt, rt)
}
Expr::Binary(BinaryOp::Pow, l, r) => {
// New: Type check power operator
let lt = l.infer_type(env)?;
let rt = r.infer_type(env)?;
match (lt, rt) {
(Type::Int, Type::Int) => Ok(Type::Int),
(Type::Float, Type::Float) => Ok(Type::Float),
(Type::Float, Type::Int) => Ok(Type::Float),
_ => Err(TypeError::InvalidPowerOp(lt, rt))
}
}
// ... other cases
}
}
}
}
Type System Components
Type- Represents Scallop types (int, float, string, ADTs)TypeEnv- Environment mapping variables to typesTypeError- Type checking errorsunify- Type unification for inference
Step 4: IR Generation
Lower AST to intermediate representation in /core/src/compiler/back/.
Generating IR
#![allow(unused)]
fn main() {
// In back/compile.rs
impl Compiler {
fn compile_expr(&mut self, expr: &Expr) -> IRExpr {
match expr {
Expr::Binary(BinaryOp::Add, l, r) => {
let l_ir = self.compile_expr(l);
let r_ir = self.compile_expr(r);
IRExpr::Binary(IRBinaryOp::Add, Box::new(l_ir), Box::new(r_ir))
}
Expr::Binary(BinaryOp::Pow, l, r) => {
// New: Compile power operator
let l_ir = self.compile_expr(l);
let r_ir = self.compile_expr(r);
IRExpr::Binary(IRBinaryOp::Pow, Box::new(l_ir), Box::new(r_ir))
}
// ... other cases
}
}
}
}
IR Structure
IR is closer to execution than AST:
- Variables become explicit
- Types are attached
- Control flow is explicit
- Aggregations are lowered to loops
Step 5: Runtime Execution
Implement execution in /core/src/runtime/.
Adding Runtime Support
#![allow(unused)]
fn main() {
// In runtime/eval.rs
impl Executor {
fn eval_binary(&self, op: &BinaryOp, l: &Value, r: &Value) -> Result<Value> {
match op {
BinaryOp::Add => {
match (l, r) {
(Value::I32(a), Value::I32(b)) => Ok(Value::I32(a + b)),
(Value::F32(a), Value::F32(b)) => Ok(Value::F32(a + b)),
_ => Err(RuntimeError::TypeError)
}
}
BinaryOp::Pow => {
// New: Execute power operator
match (l, r) {
(Value::I32(a), Value::I32(b)) => {
Ok(Value::I32(a.pow(*b as u32)))
}
(Value::F32(a), Value::F32(b)) => {
Ok(Value::F32(a.powf(*b)))
}
_ => Err(RuntimeError::TypeError)
}
}
// ... other cases
}
}
}
}
Runtime Components
Value- Runtime value representationExecutor- Executes IRDatabase- Stores relationsProvenanceContext- Manages provenance
Step 6: Testing
Add tests in /core/tests/integrate/.
Integration Test Example
#![allow(unused)]
fn main() {
// In tests/integrate/operators.rs
#[test]
fn test_power_operator() {
let program = r#"
rel base = {2, 3, 4}
rel power(x, y) = base(x), y = x ** 2
query power
"#;
let expected = vec![
(2, 4),
(3, 9),
(4, 16),
];
test_program(program, "power", expected);
}
#[test]
fn test_power_with_float() {
let program = r#"
rel x = {2.0, 3.0}
rel result(x ** 0.5) = x(x)
query result
"#;
let expected = vec![
1.414, // sqrt(2)
1.732, // sqrt(3)
];
test_program_float(program, "result", expected, 0.01);
}
}
Test Categories
- Unit tests - Test individual components
- Integration tests - Test complete programs
- Type error tests - Ensure bad programs fail
- Provenance tests - Test with different provenances
- Performance tests - Benchmark critical operations
Complete Example: Adding let Bindings
Let’s walk through a complete example: adding local variable bindings.
1. Grammar
// Add to Atom rule
pub Atom: Atom = {
// Existing patterns...
// New: let binding
"let" <var:Name> "=" <val:Expr> "," <body:Atom> => {
Atom::Let(var, val, Box::new(body))
},
}
2. AST
#![allow(unused)]
fn main() {
#[derive(Clone, Debug)]
pub enum Atom {
// Existing variants...
Let(String, Expr, Box<Atom>),
}
}
3. Type Checking
#![allow(unused)]
fn main() {
impl Atom {
pub fn type_check(&self, env: &mut TypeEnv) -> Result<(), TypeError> {
match self {
Atom::Let(var, val, body) => {
let val_type = val.infer_type(env)?;
env.insert(var.clone(), val_type);
body.type_check(env)?;
env.remove(var);
Ok(())
}
// ... other cases
}
}
}
}
4. IR Generation
#![allow(unused)]
fn main() {
impl Compiler {
fn compile_atom(&mut self, atom: &Atom) -> Vec<IRStmt> {
match atom {
Atom::Let(var, val, body) => {
let val_ir = self.compile_expr(val);
let var_id = self.fresh_var();
vec![
IRStmt::Assign(var_id, val_ir),
IRStmt::Scope {
bindings: vec![(var.clone(), var_id)],
body: Box::new(self.compile_atom(body)),
}
]
}
// ... other cases
}
}
}
}
5. Execution
#![allow(unused)]
fn main() {
impl Executor {
fn eval_stmt(&mut self, stmt: &IRStmt) -> Result<()> {
match stmt {
IRStmt::Scope { bindings, body } => {
// Push new scope
for (name, var_id) in bindings {
let value = self.read_var(*var_id)?;
self.env.push(name.clone(), value);
}
// Execute body
self.eval_stmt(body)?;
// Pop scope
for (name, _) in bindings {
self.env.pop(name);
}
Ok(())
}
// ... other cases
}
}
}
}
6. Tests
#![allow(unused)]
fn main() {
#[test]
fn test_let_binding() {
let program = r#"
rel edge(0, 1)
rel edge(1, 2)
rel result(a, c, d) =
edge(a, b),
let x = a + b,
edge(b, c),
let y = b + c,
d = x + y
query result
"#;
let expected = vec![(0, 1, 2), (1, 2, 4)];
test_program(program, "result", expected);
}
}
Common Pitfalls
1. Forgetting Provenance
New operations must handle provenance correctly:
#![allow(unused)]
fn main() {
// ✗ Bad: ignores provenance
fn eval_binary(l: Value, r: Value) -> Value {
Value::new(l.data + r.data)
}
// ✓ Good: propagates provenance
fn eval_binary(&self, l: TaggedValue, r: TaggedValue) -> TaggedValue {
let tag = self.provenance.mult(&l.tag, &r.tag); // Combine tags
TaggedValue::new(l.value + r.value, tag)
}
}
2. Not Handling All Types
Operations must work with all relevant types:
#![allow(unused)]
fn main() {
// ✗ Bad: only handles i32
match (l, r) {
(Value::I32(a), Value::I32(b)) => Value::I32(a + b),
_ => panic!("Unexpected types")
}
// ✓ Good: handles all numeric types
match (l, r) {
(Value::I32(a), Value::I32(b)) => Value::I32(a + b),
(Value::F32(a), Value::F32(b)) => Value::F32(a + b),
(Value::I64(a), Value::I64(b)) => Value::I64(a + b),
_ => Err(TypeError::InvalidOperation)
}
}
3. Breaking Semi-Naive Evaluation
New constructs must preserve monotonicity for correctness.
Summary
To add a new language feature:
- Update grammar in
grammar.lalrpop - Add AST nodes in
ast.rs - Implement type checking in
type_check/ - Generate IR in
back/ - Implement execution in
runtime/ - Write comprehensive tests in
tests/integrate/
For more details:
- Developer Guide - Architecture overview
- Bindings - Language bindings
- LALRPOP Book - Parser generator