Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Foreign Predicates

Foreign predicates are fact generators that extend Scallop by dynamically producing facts based on input patterns. Unlike foreign functions that return single values, foreign predicates can yield multiple results and support flexible bound/free variable patterns.

What are Foreign Predicates?

Definition

A foreign predicate is a Python generator function that:

  • Takes input arguments (bound variables)
  • Yields zero or more facts (bound + free variables)
  • Can produce probabilistic facts with tags
  • Supports pattern-based querying

Syntax

Foreign predicates are called like regular Scallop predicates:

rel result(input, output) = data(input), my_predicate(input, output)
rel check = data(x) and my_predicate(x)  // Boolean predicate

Foreign Predicates vs Foreign Functions

FeatureForeign FunctionForeign Predicate
Invocation$function(args)predicate(args)
ReturnsSingle valueZero or more facts
PatternAll inputs boundSupports bound/free patterns
Use casePure computationFact generation, search
TagInherits from contextPer-fact probability

Key Characteristics

Multi-valued:

  • Can yield multiple results for a single input
  • Empty results are valid (no facts generated)
  • Each result is independent

Pattern-driven:

  • Supports multiple calling patterns (bb, bf, fb, ff)
  • “b” = bound (input provided), “f” = free (output to generate)
  • Example: gpt(input, output) supports bb (check) and bf (generate)

Probabilistic:

  • Each yielded fact has a probability/tag
  • Tags integrate with Scallop’s provenance system
  • Enables fuzzy matching and uncertain reasoning

Using Foreign Predicates

Basic Usage

Foreign predicates become available after plugin loading:

import scallopy

# Create context with provenance
ctx = scallopy.ScallopContext(provenance="minmaxprob")

# Load plugins (e.g., GPT plugin)
plugin_registry = scallopy.PluginRegistry()
plugin_registry.load_plugins_from_entry_points()
plugin_registry.load_into_ctx(ctx)

# Now use foreign predicates from plugins
ctx.add_program("""
  rel question = {"What is the capital of France?"}
  rel answer(q, a) = question(q), gpt(q, a)
  query answer
""")
ctx.run()

Calling Patterns

Bound-Free (bf) - Generation:

// Input is bound, output is free - generate answers
rel questions = {"What is 2+2?", "What is the capital of Spain?"}
rel answers(q, a) = questions(q), gpt(q, a)

Bound-Bound (bb) - Verification:

// Both bound - check if answer is correct
rel question_answer_pairs = {
  ("What is 2+2?", "4"),
  ("What is 2+2?", "5")
}
rel verified(q, a) = question_answer_pairs(q, a), gpt(q, a)
// Only ("What is 2+2?", "4") passes verification

Multiple results:

// Foreign predicate can yield multiple answers
rel query = {"What are some French cities?"}
rel cities(c) = query(q), list_cities(q, c)
// Yields: Paris, Lyon, Marseille, Nice, ...

Integration with Rules

Chaining predicates:

rel document = {"The quick brown fox jumps over the lazy dog"}
rel extracted_entity(doc, entity) = document(doc), extract_entities(doc, entity)
rel classified(entity, type) = extracted_entity(_, entity), classify_entity(entity, type)

Filtering results:

rel candidates(x) = source(s), generate_options(s, x)
rel filtered(x) = candidates(x), validate(x)  // Boolean predicate

Aggregation:

rel all_answers = {a | question(q), gpt(q, a)}
rel num_unique_answers(n) = n = count(a: all_answers(a))

Examples from Plugins

GPT Plugin: Text Generation

The GPT plugin provides a foreign predicate that queries the OpenAI API:

Implementation:

@scallopy.foreign_predicate
def gpt(s: str) -> scallopy.Facts[None, str]:
    # Check memoization cache
    if s in STORAGE:
        response = STORAGE[s]
    else:
        # Call OpenAI API
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": s}],
            temperature=0.0,
        )
        STORAGE[s] = response

    # Iterate through all choices (typically 1)
    for choice in response["choices"]:
        result = choice["message"]["content"].strip()
        yield (result,)

Usage:

rel questions = {
  "What is the capital of France?",
  "Translate to Spanish: Good morning",
  "List three colors"
}

rel qa(question, answer) = questions(question), gpt(question, answer)
query qa

// Expected output (mock when API key not set):
// qa: {
//   ("What is the capital of France?", "Paris"),
//   ("Translate to Spanish: Good morning", "Buenos días"),
//   ("List three colors", "Red, blue, green")
// }

Multiple outputs:

// Configure GPT to return multiple answers
rel question = {"What are some programming languages?"}
rel languages(q, lang) = question(q), gpt_multi(q, lang)

// With n=3 completions, yields multiple facts:
// languages: {
//   ("What are some programming languages?", "Python, Java, C++"),
//   ("What are some programming languages?", "JavaScript, Ruby, Go"),
//   ("What are some programming languages?", "Rust, Swift, Kotlin")
// }

Custom Example: Semantic Similarity

@scallopy.foreign_predicate
def string_semantic_eq(s1: str, s2: str) -> scallopy.Facts[float, Tuple]:
    """Fuzzy string matching for kinship terms"""
    equivalents = {
        ("mom", "mother"): 0.99,
        ("mom", "mom"): 1.0,
        ("mother", "mother"): 1.0,
        ("dad", "father"): 0.99,
        ("dad", "dad"): 1.0,
        ("father", "father"): 1.0,
    }

    if (s1, s2) in equivalents:
        yield (equivalents[(s1, s2)], ())

Usage:

rel kinship = {
  ("alice", "mom", "bob"),
  ("alice", "mother", "casey"),
  ("david", "father", "emma")
}

rel parent(person, child) =
  kinship(person, relation, child),
  string_semantic_eq(relation, "mother")

rel parent(person, child) =
  kinship(person, relation, child),
  string_semantic_eq(relation, "father")

query parent

// Result (with probabilities):
// parent: {
//   0.99::("alice", "bob"),      // "mom" ~= "mother" with prob 0.99
//   1.0::("alice", "casey"),     // "mother" == "mother" exactly
//   1.0::("david", "emma")       // "father" == "father" exactly
// }

Custom Example: Divisor Generation

@scallopy.foreign_predicate
def divisors(n: int) -> scallopy.Facts[float, Tuple[int]]:
    """Generate all divisors of n"""
    for i in range(1, n + 1):
        if n % i == 0:
            yield (1.0, (i,))

Usage:

rel numbers = {12, 15, 20}
rel has_divisor(num, div) = numbers(num), divisors(num, div)
query has_divisor

// Result:
// has_divisor: {
//   (12, 1), (12, 2), (12, 3), (12, 4), (12, 6), (12, 12),
//   (15, 1), (15, 3), (15, 5), (15, 15),
//   (20, 1), (20, 2), (20, 4), (20, 5), (20, 10), (20, 20)
// }

Creating Foreign Predicates in Plugins

Basic Foreign Predicate

To create a foreign predicate in a plugin:

import scallopy
from scallopy import Facts
from typing import Tuple

class MyPlugin(scallopy.Plugin):
    def __init__(self):
        super().__init__("my_plugin")

    def load_into_ctx(self, ctx):
        @scallopy.foreign_predicate
        def my_pred(x: int) -> Facts[float, Tuple[int]]:
            # Generate facts
            yield (1.0, (x * 2,))
            yield (1.0, (x * 3,))

        ctx.register_foreign_predicate(my_pred)

Type Signature

The return type must be Facts[TagType, TupleType]:

from scallopy import Facts
from typing import Tuple

# Single output column
def single_output(x: int) -> Facts[float, Tuple[int]]:
    yield (1.0, (x * 2,))

# Multiple output columns
def multi_output(x: int) -> Facts[float, Tuple[int, str]]:
    yield (1.0, (x, "even" if x % 2 == 0 else "odd"))

# Boolean predicate (empty tuple)
def is_prime(n: int) -> Facts[float, Tuple]:
    if check_prime(n):
        yield (1.0, ())

Yielding Facts

Use yield to produce facts lazily:

@scallopy.foreign_predicate
def range_values(start: int, end: int) -> Facts[float, Tuple[int]]:
    for i in range(start, end + 1):
        yield (1.0, (i,))

# Generates facts: 1, 2, 3, 4, 5 for range_values(1, 5)

Probabilistic Facts

Tag values integrate with provenance:

@scallopy.foreign_predicate
def fuzzy_match(s1: str, s2: str) -> Facts[float, Tuple]:
    # Calculate similarity score
    similarity = compute_similarity(s1, s2)

    # Only yield if similarity is high enough
    if similarity > 0.5:
        yield (similarity, ())

# Usage:
# rel similar(a, b) = strings(a), strings(b), fuzzy_match(a, b)

Error Handling

Handle errors gracefully by not yielding:

@scallopy.foreign_predicate
def safe_operation(x: int) -> Facts[float, Tuple[int]]:
    try:
        result = risky_computation(x)
        yield (1.0, (result,))
    except Exception as e:
        # Don't yield - fact doesn't exist for this input
        pass

Memoization

Cache expensive computations:

CACHE = {}

@scallopy.foreign_predicate
def expensive_pred(x: str) -> Facts[float, Tuple[str]]:
    if x not in CACHE:
        CACHE[x] = expensive_api_call(x)

    for result in CACHE[x]:
        yield (1.0, (result,))

Best Practices

Use Predicates for Multi-Valued Results

✓ Good - Multiple results with predicate:

@foreign_predicate
def get_synonyms(word: str) -> Facts[float, Tuple[str]]:
    for synonym in lookup_synonyms(word):
        yield (1.0, (synonym,))

✗ Bad - Single result with function:

@foreign_function
def get_synonyms(word: str) -> str:
    return ", ".join(lookup_synonyms(word))  # Returns string, loses structure

Pattern Validation

Document and validate expected patterns:

@scallopy.foreign_predicate
def my_pred(x: int, y: int) -> Facts[float, Tuple[int]]:
    """
    Supports patterns:
    - (bound, bound) → (free): Given x and y, compute result
    - (bound, free) → not supported (would require search)
    """
    result = x + y
    yield (1.0, (result,))

Lazy Evaluation

Take advantage of generator laziness:

@scallopy.foreign_predicate
def infinite_sequence(start: int) -> Facts[float, Tuple[int]]:
    # This won't run forever - Scallop only pulls what it needs
    i = start
    while True:
        yield (1.0, (i,))
        i += 1

Next Steps

For detailed Python API documentation, see Foreign Predicates (Python API).