Database Services
Sim2l provides a comprehensive database architecture with four integrated systems for managing simulation data, results, and metadata.
Overview
The database architecture consists of:
Run Database - Per-run SQLite databases for complete isolation
Cache Service - Distributed cache with session-based access control
Catalog Service - Central registry for all sim2l tools and versions
Results Service - Introspected simulation results with searchable parameters
File Manager - File and folder management for sim2l runs
Quick Start
from sim2l import configure
from sim2l.database import (
RunDatabase,
CacheClient,
CatalogClient,
ResultsClient,
FileManager,
get_session_manager
)
# Configure sim2l with database services
session = get_session_manager().create_anonymous_session()
configure(
use_run_database=True,
cache_service_url="http://localhost:8001",
cache_session_id=session.session_id,
catalog_service_url="http://localhost:8002",
catalog_session_id=session.session_id,
results_service_url="http://localhost:8003",
results_session_id=session.session_id
)
Run Database
Per-run SQLite databases provide complete isolation for each simulation execution.
Features
Complete Run Isolation - Each execution gets its own database
Portable - Single file contains all run data
Rich Schema - 17 tables for inputs, outputs, logs, metrics, provenance
Fast Access - Local SQLite for high performance
Usage
from sim2l.database import RunDatabase
# Open a run database
run_db = RunDatabase("exec-2024-001")
# Get run summary
summary = run_db.get_summary()
print(f"Simulation: {summary['simulation_name']}")
print(f"Status: {summary['status']}")
print(f"Duration: {summary['duration_seconds']}s")
# Get inputs and outputs
inputs = run_db.get_inputs()
outputs = run_db.get_outputs()
# Get logs
logs = run_db.get_logs(level='ERROR')
# Get artifacts (files)
artifacts = run_db.get_artifacts()
# Get metrics
metrics = run_db.get_metrics(category='performance')
Context Manager
with RunDatabase("exec-2024-001") as run_db:
run_db.initialize_run("thermal_sim", "1.0.0")
run_db.save_input("temperature", 350, "number", units="kelvin")
run_db.save_output("max_stress", 1.5e8, "number", units="pascal")
run_db.complete_run("completed", duration_seconds=42.5)
Database Location
Run databases are stored at:
Default:
~/.sim2l/runs/{execution_id}.dbConfigurable via
SIM2L_RUN_DB_BASE_PATHenvironment variableCustom path via
configure(run_db_base_path="/custom/path")
API Reference
Cache Service
Distributed cache service for sharing simulation results across users and machines.
Features
Distributed Caching - Share results across team
Session-Based Auth - Role-based access control
TTL Support - Automatic expiration
Invalidation - Pattern-based cache clearing
Statistics - Hit rates and usage metrics
Dual Backend - SQLite (dev) or PostgreSQL (prod)
Starting the Service
# SQLite backend (development)
python -m sim2l.services.cache_service --backend sqlite --port 8001
# PostgreSQL backend (production)
python -m sim2l.services.cache_service \
--backend postgresql \
--db-url "postgresql://user:pass@localhost/sim2l_cache" \
--port 8001
# Docker
docker-compose up cache-service
Using the Client
from sim2l.database import CacheClient, get_session_manager
# Create session
session = get_session_manager().create_anonymous_session()
# Initialize client
cache = CacheClient(
"http://localhost:8001",
session_id=session.session_id
)
# Store in cache
cache.set(
cache_key="thermal_sim/350K/result",
simulation_id=42,
simulation_name="thermal_sim",
simulation_version="1.0.0",
execution_id="exec-001",
squid_id="thermal_sim/1.0.0/xyz",
input_hash="hash123",
run_db_path="/path/to/run.db",
ttl_hours=24
)
# Retrieve from cache
result = cache.get("thermal_sim/350K/result")
if result:
print(f"Cache hit! Run DB: {result['run_db_path']}")
else:
print("Cache miss")
# Invalidate cache
cache.invalidate(simulation_id=42, reason="new version")
# Get statistics
stats = cache.get_stats(simulation_id=42)
print(f"Hit rate: {stats['hit_rate']}")
API Endpoints
GET /cache/{cache_key}- Retrieve cached resultPOST /cache- Store cache entryPOST /cache/invalidate- Invalidate entriesGET /cache/stats- Get statisticsGET /health- Health check
API Reference
Catalog Service
Central registry for all sim2l tools, versions, and executions.
Features
Tool Registry - All simulations and versions
Search - Find tools by name, tags, metadata
Auto-Sync - Automatically register new installations
Execution Tracking - Record all runs
Statistics - Usage analytics
Access Control - Privilege-based updates
Starting the Service
# SQLite backend
python -m sim2l.services.catalog_service --backend sqlite --port 8002
# PostgreSQL backend
python -m sim2l.services.catalog_service \
--backend postgresql \
--db-url "postgresql://user:pass@localhost/sim2l_catalog" \
--port 8002
Using the Client
from sim2l.database import CatalogClient, get_session_manager
session = get_session_manager().create_anonymous_session(
privileges=['developer'] # Needed for registration
)
catalog = CatalogClient(
"http://localhost:8002",
session_id=session.session_id
)
# Register a simulation
catalog.register_simulation(
name="thermal_sim",
version="1.0.0",
description="Thermal stress analysis",
tags=["thermal", "physics"],
schema={
"inputs": {"temperature": {"type": "number", "units": "kelvin"}},
"outputs": {"max_stress": {"type": "number", "units": "pascal"}}
}
)
# Search for simulations
results = catalog.search(query="thermal", tags=["physics"])
for sim in results:
print(f"{sim['name']} v{sim['version']}: {sim['description']}")
# Get simulation info
sim = catalog.get_simulation("thermal_sim", "1.0.0")
print(f"Schema: {sim['schema']}")
# Record execution
catalog.record_execution(
simulation_id=sim['id'],
execution_id="exec-001",
status="completed",
duration_seconds=42.5
)
# Get statistics
stats = catalog.get_simulation_stats(sim['id'])
print(f"Total executions: {stats['total_executions']}")
print(f"Success rate: {stats['success_rate']}")
Auto-Sync
Automatically register installed simulations:
from sim2l import configure
configure(
catalog_service_url="http://localhost:8002",
catalog_session_id=session.session_id,
catalog_auto_sync=True # Enable auto-sync
)
# New simulations are automatically registered when used
API Reference
Results Service
Introspects simulation results and stores them in a searchable database. Modern replacement for registerSquidpgSimtool.
Features
Automatic Introspection - Extracts parameter schemas
Searchable Storage - Query by parameter values
Parameter Statistics - Min, max, average across runs
Type-Aware - Understands parameter types
REST API - Programmatic access
Dual Backend - SQLite or PostgreSQL
Starting the Service
# SQLite backend
python -m sim2l.services.results_service --backend sqlite --port 8003
# PostgreSQL backend
python -m sim2l.services.results_service \
--backend postgresql \
--db-url "postgresql://user:pass@localhost/sim2l_results" \
--port 8003
Using the Client
from sim2l.database import ResultsClient, get_session_manager
session = get_session_manager().create_anonymous_session()
client = ResultsClient(
"http://localhost:8003",
session_id=session.session_id
)
# Register a result (introspects run database automatically)
result = client.register_result("exec-2024-001")
print(f"Registered as result ID: {result['result_id']}")
# Search by parameter values
results = client.search(
simulation_name="thermal_sim",
input_filters={'temperature': 350},
output_filters={'max_stress': 1.5e8}
)
for result in results:
print(f"Execution: {result['execution_id']}")
print(f" Inputs: {result['input_params']}")
print(f" Outputs: {result['output_params']}")
# Get parameter statistics
stats = client.get_parameter_stats(
"thermal_sim",
"max_stress",
param_class="output"
)
print(f"Average max_stress: {stats['avg_value']}")
print(f"Range: {stats['min_value']} - {stats['max_value']}")
print(f"Analyzed {stats['count']} runs")
What Gets Introspected
The Results Service automatically extracts from each run:
Parameter Schemas - Types, units, min/max, defaults
Input Values - Actual input parameter values
Output Values - Computed output values
Metadata - Simulation name, version, status, duration
API Reference
File Manager
Manage files generated by sim2l runs.
Features
File Retrieval - Get files from run databases
Export - Export files to filesystem
Organization - Folder hierarchies
Metadata - Rich file information
Batch Operations - Process multiple files
Usage
from sim2l.database import FileManager
fm = FileManager()
# Get all files from a run
files = fm.get_run_files("exec-2024-001")
for file in files:
print(f"{file['name']}: {file['size']} bytes")
print(f" Category: {file['category']}")
print(f" Type: {file['content_type']}")
# Export a file
success = fm.export_run_file(
"exec-2024-001",
"output.dat",
"/tmp/output.dat"
)
# Get all files for a simulation
files = fm.get_simulation_files("thermal_sim", "1.0.0")
# Batch export
import os
output_dir = "/tmp/exports"
os.makedirs(output_dir, exist_ok=True)
for file in files:
output_path = os.path.join(output_dir, file['name'])
fm.export_run_file(file['execution_id'], file['name'], output_path)
Organizing Files
# Create folder
folder = fm.create_folder(
name="simulation_outputs",
creator="user123"
)
# Create file entry
fm.create_file(
name="result.csv",
size=2048,
uri="/data/result.csv",
creator="user123",
parent_id=folder['id']
)
# List folder contents
contents = fm.list_folder(folder['id'])
for item in contents:
print(item['name'])
API Reference
Session Management
Authentication and privilege checking for all services.
Features
User Management - Create users with roles
Session Tokens - JWT-like session IDs
Role-Based Access - User, developer, admin privileges
Anonymous Sessions - Temporary access
TTL Support - Session expiration
Usage
from sim2l.database import get_session_manager
# Get global session manager
manager = get_session_manager()
# Create user
manager.create_user(
username="alice",
password="secret",
role="developer",
email="alice@example.com"
)
# Authenticate
session = manager.authenticate("alice", "secret")
print(f"Session ID: {session.session_id}")
print(f"Role: {session.role}")
# Check privileges
has_privilege = manager.check_privilege(
session.session_id,
"register_simulation"
)
# Create anonymous session
anon_session = manager.create_anonymous_session(
privileges=['read_cache'],
ttl_hours=1
)
# Use session with services
cache = CacheClient(
"http://localhost:8001",
session_id=session.session_id
)
Roles and Privileges
User Role:
Read cache
Read catalog
Execute simulations
Developer Role:
All user privileges
Register simulations
Update catalog
Invalidate cache
Admin Role:
All developer privileges
User management
Service administration
API Reference
Deployment
Local Development
from sim2l import configure
# Use local databases only
configure(use_run_database=True)
# Run simulation
result = sim.run(temperature=350)
Docker Deployment
# Start all services with PostgreSQL
cd docker
docker-compose --profile prod up -d
# Services available at:
# - Cache: http://localhost:8001
# - Catalog: http://localhost:8002
# - Results: http://localhost:8003
Hybrid Deployment
from sim2l import configure
from sim2l.database import get_session_manager
# Run databases locally, services remote
session = get_session_manager().create_anonymous_session()
configure(
use_run_database=True, # Local
cache_service_url="http://cache-server:8001", # Remote
cache_session_id=session.session_id,
catalog_service_url="http://catalog-server:8002", # Remote
catalog_session_id=session.session_id
)
Environment Variables
All services support environment variable configuration:
# Run Database
export SIM2L_USE_RUN_DATABASE=true
export SIM2L_RUN_DB_BASE_PATH=$HOME/.sim2l/runs
# Cache Service
export SIM2L_CACHE_SERVICE_URL=http://localhost:8001
export SIM2L_CACHE_SESSION_ID=session-id-here
# Catalog Service
export SIM2L_CATALOG_SERVICE_URL=http://localhost:8002
export SIM2L_CATALOG_SESSION_ID=session-id-here
export SIM2L_CATALOG_AUTO_SYNC=true
# Results Service
export SIM2L_RESULTS_SERVICE_URL=http://localhost:8003
export SIM2L_RESULTS_SESSION_ID=session-id-here
Performance
Benchmarks
Component |
Metric |
Value |
|---|---|---|
Run Database |
Write speed |
~10,000 inserts/sec |
Run Database |
Typical size |
1-10 MB |
Cache Service |
Latency (local) |
<10ms |
Cache Service |
Throughput (SQLite) |
~1,000 req/sec |
Cache Service |
Throughput (PG) |
~10,000 req/sec |
Catalog Service |
Search speed |
<100ms |
Results Service |
Introspection |
<1s per run |
Best Practices
Use Run Databases - Always enable for complete tracking
Cache Strategically - Cache expensive computations
Register Results - Make simulations discoverable
Auto-Sync Catalog - Keep catalog up-to-date
Export Important Files - Don’t rely on run DBs forever
Troubleshooting
Service Won’t Start
# Check port availability
lsof -i :8001
# Test database connectivity
psql -h localhost -U user -d sim2l_cache
# View service logs
python -m sim2l.services.cache_service --backend sqlite 2>&1 | tee cache.log
Can’t Connect to Service
# Check service health
import requests
response = requests.get("http://localhost:8001/health")
print(response.json())
# Verify session
from sim2l.database import get_session_manager
manager = get_session_manager()
session = manager.get_session(session_id)
if session and session.is_valid():
print("Session is valid")
else:
print("Session is invalid or expired")
Run Database Not Found
import os
from sim2l.database import RunDatabase
# Check if database exists
db_path = os.path.expanduser(f"~/.sim2l/runs/{execution_id}.db")
if os.path.exists(db_path):
print(f"Database exists: {db_path}")
else:
print(f"Database not found: {db_path}")
# Try to open
try:
run_db = RunDatabase(execution_id)
print("Successfully opened run database")
except Exception as e:
print(f"Error: {e}")
See Also
Quick Start Guide - Getting started guide
api_reference - Complete API documentation
Examples - Code examples
GitHub: Database Architecture Documentation