Federated Queries

Placino enables secure federated analytics across multiple database systems without moving sensitive data. Write queries once, execute across PostgreSQL, ClickHouse, Snowflake, BigQuery, Redshift, MySQL, S3/Parquet, and Kafka—all while enforcing differential privacy and k-anonymity.

What Are Federated Queries?

Federated queries allow you to analyze data across multiple systems as if they were a single unified database. Placino's query engine:

Executes queries locally: Computations run on each data owner's infrastructure. No data is centralized or moved.

Supports 8 database systems: PostgreSQL, ClickHouse, MySQL, Snowflake, BigQuery, Redshift, S3/Parquet, Kafka streams.

Cross-database joins: Join tables across systems transparently. The query engine optimizes execution paths to minimize data movement.

Privacy-first results: All numeric outputs receive differential privacy noise. Results automatically suppressed if k-anonymity threshold not met.

5 Query Modes

Mode 1: Template Queries

Pre-approved, company-vetted queries. Fastest and most secure. Zero data science expertise required.

POST /api/v1/projects/ID/query/template{ "template": "intersection_size", "dataset_left": "advertiser_customers", "dataset_right": "publisher_audience" }

Use case: Intersection size, audience overlap, standard cohort counts.

Mode 2: SQL Federated Queries

Write custom SQL across all connected databases. Placino validates access and enforces privacy automatically.

POST /api/v1/projects/ID/query/sql{ "sql": "SELECT country, COUNT(*) as audience_size FROM users WHERE purchase_value > 1000 GROUP BY country", "privacy_epsilon": 1.5, "k_anonymity_min": 50 }

Supports SELECT, JOIN, GROUP BY, HAVING, WHERE. Blocked: UPDATE, DELETE, ALTER, raw PII columns without grant.

Mode 3: Aggregation Queries

Request summary statistics across federated datasets. Optimized for COUNT, SUM, AVG, MIN, MAX, STDDEV.

POST /api/v1/projects/ID/query/aggregate{ "aggregation": "count_distinct", "dataset": "customer_transactions", "groupby": ["country", "device_type"], "filter": "transaction_date >= '2026-01-01'" }

Results cached, indexed by dataset + aggregation + filter. Automatic k-anonymity on GROUP BY cells.

Mode 4: Natural Language Queries

Describe your question in English or Turkish. AI translates to secure federated SQL, enforces privacy, returns results.

POST /api/v1/projects/ID/query/nl{ "question": "How many UK customers purchased both mobile and desktop in the last 90 days?", "language": "en", "privacy_epsilon": 2.0 }

AI validates privacy constraints server-side before execution. Results must meet k-anonymity threshold.

Mode 5: Export Queries

Download federated query results as CSV or Parquet with privacy transformations applied.

POST /api/v1/projects/ID/query/export{ "dataset": "intersection_results", "format": "parquet", "apply_k_anonymity": true, "apply_differential_privacy": true, "storage_uri": "s3://secure-bucket/results" }

Results sanitized, PII removed, k-anonymity enforced, stored in encrypted cloud storage.

Privacy Constraints

Every federated query automatically enforces multiple privacy guardrails:

Differential Privacy (DP): All numeric results receive Laplace noise calibrated to epsilon budget. Prevents inference attacks even if results are combined with external data.

K-Anonymity: Result cells with fewer than k records (default k=50) are automatically suppressed or aggregated. Prevents identity disclosure.

Epsilon Budget: Each project receives a monthly epsilon budget (default 10.0). Every query consumes epsilon based on query scope. Once depleted, queries are rejected until reset.

Column-Level Access: Users can only query columns they have explicit permission for. SSN, phone, email, credit card require separate grant.

Result Set Limits: Queries limited to 100,000 rows. Aggregations limited to 1,000 groups. Prevents data exfiltration.

Query Audit Logging: All federated queries logged with user ID, timestamp, databases accessed, epsilon consumed, and result summary. Enables compliance audits.

Cross-Database Joins

Placino joins data across different database systems without moving it. The query engine optimizes execution by pushing down predicates and aggregations to each database.

Supported Join Types

INNER JOIN: Returns rows present in both datasets. Most common for audience overlap.

LEFT OUTER JOIN: Returns all rows from left dataset + matching rows from right. Use for finding non-matches.

CROSS JOIN (Cartesian): Combines all rows. Rare for federated queries; watch for explosion in result size.

Example: Federated Join Across PostgreSQL + ClickHouse

{ "query_type": "federated_join", "left": { "database": "postgres_customers", "table": "users", "columns": ["id", "country"] }, "right": { "database": "clickhouse_events", "table": "events", "columns": ["user_id", "event_type"] }, "join_on": "id = user_id", "aggregate": "COUNT(DISTINCT id)" }

Query engine: Pushes country filter to PostgreSQL. Sends matching user IDs to ClickHouse. ClickHouse filters events and aggregates. Returns COUNT(DISTINCT id) = 15,234 with DP noise applied.

Join Optimization Strategy

Placino automatically:

1. Estimates result cardinality for each join predicate
2. Pushes down WHERE filters to reduce data movement
3. Executes aggregations (GROUP BY, COUNT) locally on each database
4. Shuffles only aggregated results between databases
5. Applies differential privacy at the final aggregation step

Query Optimization Tips

Writing efficient federated queries requires thinking about network cost and privacy budget. Follow these best practices:

1. Use Indexes on Join Keys

Ensure join columns (user_id, customer_id, etc.) are indexed in all databases. Placino query planner checks for indexes and may reject unindexed joins on large tables.

2. Partition by Time

Use date partitioning (event_date, created_at). Add a WHERE clause on event_date to eliminate cold partitions before the join.

3. Aggregate Locally, Join Results

Avoid joining raw tables. Pre-aggregate to user or session level, then join aggregated results.

4. Minimize PII Exposure

Only request necessary columns. Join on IDs, not email. Placino flags queries that expose PII.

5. Estimate Epsilon Cost

Larger queries (more rows, more GROUP BY dimensions) consume more epsilon. Test with privacy_epsilon: 0.5 first.

Query Plan Review

-- Bad: Scans entire table SELECT * FROM events WHERE user_id IN (SELECT id FROM users WHERE country = 'UK') -- Good: Indexed join, aggregation SELECT COUNT(DISTINCT e.user_id) FROM events e INNER JOIN users u ON e.user_id = u.id WHERE u.country = 'UK' AND e.event_date >= '2026-01-01'

Common Federated Patterns

Pattern 1: Audience Intersection

Find overlap between two audience datasets (e.g., advertiser's customers + publisher's subscribers). Template or SQL with INNER JOIN + COUNT(DISTINCT).

Pattern 2: Lookalike Audience

Analyze shared attributes of intersecting users (age, country, device) to build lookalike cohorts. SQL with GROUP BY + COUNT on intersection.

Pattern 3: Attribution

Join impression events + conversion events across systems. Calculate conversion rate by channel, device, geography with DP noise.

Pattern 4: Cohort Analysis

Segment users by signup date or behavior cohort. Compare cohorts across federated datasets. JOIN + GROUP BY + time windowing.

Pattern 5: Data Quality Audit

Compare record counts, null rates, and value distributions across databases. UNION ALL + aggregation across federated sources.

Differential Privacy Budget

Epsilon is your privacy currency. Each project receives a monthly budget. Higher epsilon = less privacy, less noise. Lower epsilon = stronger privacy, more noise in results.

Epsilon Budget Tracking

{ "project_id": "prj_acme", "epsilon_total": 10.0, "queries_month": [ { "query_id": "q1", "epsilon_used": 1.5 }, { "query_id": "q2", "epsilon_used": 2.0 }, { "query_id": "q3", "epsilon_used": 1.2 } ], "epsilon_remaining": 5.3 }

Epsilon	Privacy Level	Noise Range (COUNT)	Use Case
0.1	Extremely strict	±1000s	HIPAA, financial records
1.0	Strong	±100s	Default recommendation
5.0	Moderate	±10s	Exploratory analysis
10.0+	Weak	±1–5	High-volume operational queries

Placino recommends epsilon=1.0 for most use cases. Epsilon resets monthly. Admins can grant additional budget for specific projects or extend reset date.

Error Handling & Troubleshooting

Federated queries can fail for privacy, data consistency, or infrastructure reasons. Understand common errors and how to resolve them.

Error: Privacy Budget Exhausted

{ "error": "privacy_budget_exhausted", "message": "Epsilon budget depleted for project prj_acme", "epsilon_remaining": 0.0, "epsilon_requested": 1.5, "reset_date": "2026-05-12T00:00:00Z", "http_status": 429 }

Resolution: Wait for monthly reset, or request budget increase from admin. Consider combining queries or using higher-privacy templates.

Error: K-Anonymity Threshold Not Met

{ "error": "k_anonymity_threshold_not_met", "message": "Result set contains 3 cells below k=50 threshold", "cells_suppressed": 3, "cells_total": 12, "suggested_action": "Increase grouping dimensions or use higher k value" }

Resolution: Broaden grouping dimensions (remove granular attributes). Increase time range. Increase k threshold (default 50) with admin approval.

Error: Column Access Denied

Message: User lacks permission for column: ssn. Resolution: Request column grant from data owner or use alternate column with lower sensitivity.

Error: Query Timeout

Message: Query exceeded 300s timeout. Resolution: Add WHERE filters to reduce cardinality. Partition by time. Pre-aggregate data before querying.

Error: Data Consistency Warning

Message: Results may not be consistent: ClickHouse updated during query execution. Resolution: Re-run query. Placino will retry with consistent snapshot.

Supported Databases

Placino currently supports 8 database connectors. All data stays local; queries execute on each system.

PostgreSQL: Full SQL support. Recommended for customer data, transactions.

ClickHouse: Optimized for analytics. Supports aggregations, time series, high-cardinality dimensions.

MySQL: Full SQL support. Common for legacy systems.

Snowflake: Cloud warehouse. Supports large-scale joins and aggregations.

BigQuery: Google Cloud analytics. Supports federated tables and streaming inserts.

Redshift: AWS data warehouse. Full SQL support, columnar compression.

S3/Parquet: Data lake format. Queries via Athena or Spark SQL engine.

Kafka: Stream aggregations. Windowed joins and real-time counts.

Data Encryption & Transport Security

Federated queries use encryption to protect data in transit and at rest.

Envelope Encryption: All data encrypted with AES-256-GCM. Encryption keys wrapped with RSA-4096 per data owner.

TLS 1.3: All inter-database communication encrypted. Mutual certificate verification between Placino and data owners.

Zero Data Movement Principle: Aggregations computed where data lives. Only aggregated results (counts, sums) leave the owner's infrastructure.

Audit Logging: All federated queries logged with encryption. Query logs retained for 1 year, encrypted at rest.