Text Embeddings

Text embeddings are vector representations of text that capture semantic meaning. Scallop integrates with various text embedding models to combine neural language understanding with symbolic reasoning.

Overview

Text embeddings enable Scallop to:

Match natural language descriptions to structured data
Perform semantic similarity comparisons
Bridge neural text understanding with logical reasoning
Handle multi-modal tasks (text + vision, text + video)

Integration Pattern

Text embeddings are typically provided as input relations to Scallop programs:

import scallopy
from transformers import AutoTokenizer, AutoModel

# Create embedding model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModel.from_pretrained("distilbert-base-uncased")

# Create Scallop context
ctx = scallopy.ScallopContext()

# Define input relation for text embeddings
ctx.add_relation("text_embedding", (int, str, list))

# Process text and add embeddings
text = "example description"
embedding = get_embedding(text)  # Get embedding vector
ctx.add_facts("text_embedding", [(0, text, embedding)])

# Add reasoning rules
ctx.add_rule("match(id) = text_embedding(id, text, emb), similarity(emb, target) > 0.8")

Example: Video-Text Matching (Mugen Dataset)

This example demonstrates using text embeddings with video action recognition to match natural language descriptions to video content.

Neural Components

Text Embedding: DistilBERT for text description encoding
Vision Embedding: S3D for video frame encoding
MLP: 2-layer network (hidden size 256) for feature fusion

Scallop Program

// Input from neural networks
type action(usize, String)        // Video actions detected
type expr(usize, String)          // Text expressions from description
type expr_start(usize)            // Start of text expression
type expr_end(usize)              // End of text expression
type action_start(usize)          // Start of video action
type action_end(usize)            // End of video action

type match_single(usize, usize, usize)      // Single action-expression match
type match_sub(usize, usize, usize, usize)  // Subsequence match

// Check whether a text expression matches a video action
rel match_single(tid, vid, vid + 1) =
    expr(tid, a),
    action(vid, a)

// Match a single text expression to video subsequence
rel match_sub(tid, tid, vid_start, vid_end) =
    match_single(tid, vid_start, vid_end)

rel match_sub(tid, tid, vid_start, vid_end) =
    match_sub(tid, tid, vid_start, vid_mid),
    match_single(tid, vid_mid, vid_end)

// Match a sequence of text expressions to video subsequence
rel match_sub(tid_start, tid_end, vid_start, vid_end) =
    match_sub(tid_start, tid_end - 1, vid_start, vid_mid),
    match_single(tid_end, vid_mid, vid_end)

// Check whether the whole text specification matches the video
rel match() =
    expr_start(tid_start),
    expr_end(tid_end),
    action_start(vid_start),
    action_end(vid_end),
    match_sub(tid_start, tid_end, vid_start, vid_end)

// Integrity constraint: detect too many consecutive identical expressions
rel too_many_consecutive_expr() =
    expr(tid, a),
    expr(tid + 1, a),
    expr(tid + 2, a),
    expr(tid + 3, a)

Training Configuration

Dataset: 1K Mugen video-text pairs (training), 1K (testing)
Training: 1000 epochs, learning rate 0.0001, batch size 3
Loss: BCE-loss for end-to-end training
Neural-Symbolic Integration: Embeddings flow into Scallop’s logical reasoning

Key Insights

Structured Matching: Logical rules enforce alignment between text sequence and video sequence
Compositional Reasoning: Text expressions can match video action subsequences
Constraint Enforcement: Integrity constraints detect anomalies (repeated expressions)
Differentiable: Entire pipeline is trainable end-to-end

Common Text Embedding Models

Transformer-based

BERT (bert-base-uncased) - General-purpose text understanding
DistilBERT (distilbert-base-uncased) - Faster, lighter BERT variant
RoBERTa (roberta-base) - Robustly optimized BERT
T5 (t5-base) - Text-to-text transformer

Sentence Embeddings

Sentence-BERT (sentence-transformers) - Optimized for sentence similarity
MPNet - Strong general-purpose sentence embeddings
Universal Sentence Encoder - Google’s multilingual embeddings

Domain-Specific

BioBERT - Biomedical text
SciBERT - Scientific literature
CodeBERT - Source code

Integration with Scallop Plugins

The scallop-ext plugin system provides built-in support for text embeddings:

import scallopy

# Use OpenAI embeddings
ctx = scallopy.ScallopContext()
ctx.import_plugin("openai_gpt")

# Text similarity using embeddings
ctx.add_rule("""
  rel similar_docs(d1, d2) =
    document(d1, text1),
    document(d2, text2),
    $openai_text_similarity(text1, text2) > 0.85
""")

Best Practices

Normalize embeddings - Use L2 normalization for cosine similarity
Cache embeddings - Compute once, reuse for multiple queries
Batch processing - Embed multiple texts together for efficiency
Threshold tuning - Adjust similarity thresholds for your domain
Hybrid approaches - Combine embeddings with symbolic rules for robustness

Example Use Cases

Document retrieval - Semantic search over document collections
Text classification - Combine neural embeddings with logical rules
Named entity resolution - Match entities using semantic similarity
Multi-modal reasoning - Align text with images/video using embeddings
Question answering - Match questions to answers semantically

References

Scallop Paper: Scallop: A Language for Neurosymbolic Programming
Mugen Dataset: [Hayes et al. 2022] Video-text alignment benchmark
Transformers: Hugging Face Transformers
Sentence-BERT: Sentence-Transformers

For more examples of using embeddings with Scallop, see the OpenAI GPT and Transformers integration guides.

Keyboard shortcuts

Scallop Book