Federated Queries
Placino enables secure federated analytics across multiple database systems without moving sensitive data. Write queries once, execute across PostgreSQL, ClickHouse, Snowflake, BigQuery, Redshift, MySQL, S3/Parquet, and Kafka—all while enforcing differential privacy and k-anonymity.
What Are Federated Queries?
Federated queries allow you to analyze data across multiple systems as if they were a single unified database. Placino's query engine:
Executes queries locally: Computations run on each data owner's infrastructure. No data is centralized or moved.
Supports 8 database systems: PostgreSQL, ClickHouse, MySQL, Snowflake, BigQuery, Redshift, S3/Parquet, Kafka streams.
Cross-database joins: Join tables across systems transparently. The query engine optimizes execution paths to minimize data movement.
Privacy-first results: All numeric outputs receive differential privacy noise. Results automatically suppressed if k-anonymity threshold not met.
5 Query Modes
Mode 1: Template Queries
Pre-approved, company-vetted queries. Fastest and most secure. Zero data science expertise required.
Use case: Intersection size, audience overlap, standard cohort counts.
Mode 2: SQL Federated Queries
Write custom SQL across all connected databases. Placino validates access and enforces privacy automatically.
Supports SELECT, JOIN, GROUP BY, HAVING, WHERE. Blocked: UPDATE, DELETE, ALTER, raw PII columns without grant.
Mode 3: Aggregation Queries
Request summary statistics across federated datasets. Optimized for COUNT, SUM, AVG, MIN, MAX, STDDEV.
Results cached, indexed by dataset + aggregation + filter. Automatic k-anonymity on GROUP BY cells.
Mode 4: Natural Language Queries
Describe your question in English or Turkish. AI translates to secure federated SQL, enforces privacy, returns results.
AI validates privacy constraints server-side before execution. Results must meet k-anonymity threshold.
Mode 5: Export Queries
Download federated query results as CSV or Parquet with privacy transformations applied.
Results sanitized, PII removed, k-anonymity enforced, stored in encrypted cloud storage.
Privacy Constraints
Every federated query automatically enforces multiple privacy guardrails:
Differential Privacy (DP): All numeric results receive Laplace noise calibrated to epsilon budget. Prevents inference attacks even if results are combined with external data.
K-Anonymity: Result cells with fewer than k records (default k=50) are automatically suppressed or aggregated. Prevents identity disclosure.
Epsilon Budget: Each project receives a monthly epsilon budget (default 10.0). Every query consumes epsilon based on query scope. Once depleted, queries are rejected until reset.
Column-Level Access: Users can only query columns they have explicit permission for. SSN, phone, email, credit card require separate grant.
Result Set Limits: Queries limited to 100,000 rows. Aggregations limited to 1,000 groups. Prevents data exfiltration.
Query Audit Logging: All federated queries logged with user ID, timestamp, databases accessed, epsilon consumed, and result summary. Enables compliance audits.
Cross-Database Joins
Placino joins data across different database systems without moving it. The query engine optimizes execution by pushing down predicates and aggregations to each database.
Supported Join Types
INNER JOIN: Returns rows present in both datasets. Most common for audience overlap.
LEFT OUTER JOIN: Returns all rows from left dataset + matching rows from right. Use for finding non-matches.
CROSS JOIN (Cartesian): Combines all rows. Rare for federated queries; watch for explosion in result size.
Example: Federated Join Across PostgreSQL + ClickHouse
Query engine: Pushes country filter to PostgreSQL. Sends matching user IDs to ClickHouse. ClickHouse filters events and aggregates. Returns COUNT(DISTINCT id) = 15,234 with DP noise applied.
Join Optimization Strategy
Placino automatically:
- 1. Estimates result cardinality for each join predicate
- 2. Pushes down WHERE filters to reduce data movement
- 3. Executes aggregations (GROUP BY, COUNT) locally on each database
- 4. Shuffles only aggregated results between databases
- 5. Applies differential privacy at the final aggregation step
Query Optimization Tips
Writing efficient federated queries requires thinking about network cost and privacy budget. Follow these best practices:
1. Use Indexes on Join Keys
Ensure join columns (user_id, customer_id, etc.) are indexed in all databases. Placino query planner checks for indexes and may reject unindexed joins on large tables.
2. Partition by Time
Use date partitioning (event_date, created_at). Add a WHERE clause on event_date to eliminate cold partitions before the join.
3. Aggregate Locally, Join Results
Avoid joining raw tables. Pre-aggregate to user or session level, then join aggregated results.
4. Minimize PII Exposure
Only request necessary columns. Join on IDs, not email. Placino flags queries that expose PII.
5. Estimate Epsilon Cost
Larger queries (more rows, more GROUP BY dimensions) consume more epsilon. Test with privacy_epsilon: 0.5 first.
Query Plan Review
Common Federated Patterns
Pattern 1: Audience Intersection
Find overlap between two audience datasets (e.g., advertiser's customers + publisher's subscribers). Template or SQL with INNER JOIN + COUNT(DISTINCT).
Pattern 2: Lookalike Audience
Analyze shared attributes of intersecting users (age, country, device) to build lookalike cohorts. SQL with GROUP BY + COUNT on intersection.
Pattern 3: Attribution
Join impression events + conversion events across systems. Calculate conversion rate by channel, device, geography with DP noise.
Pattern 4: Cohort Analysis
Segment users by signup date or behavior cohort. Compare cohorts across federated datasets. JOIN + GROUP BY + time windowing.
Pattern 5: Data Quality Audit
Compare record counts, null rates, and value distributions across databases. UNION ALL + aggregation across federated sources.
Differential Privacy Budget
Epsilon is your privacy currency. Each project receives a monthly budget. Higher epsilon = less privacy, less noise. Lower epsilon = stronger privacy, more noise in results.
Epsilon Budget Tracking
| Epsilon | Privacy Level | Noise Range (COUNT) | Use Case |
|---|---|---|---|
| 0.1 | Extremely strict | ±1000s | HIPAA, financial records |
| 1.0 | Strong | ±100s | Default recommendation |
| 5.0 | Moderate | ±10s | Exploratory analysis |
| 10.0+ | Weak | ±1–5 | High-volume operational queries |
Placino recommends epsilon=1.0 for most use cases. Epsilon resets monthly. Admins can grant additional budget for specific projects or extend reset date.
Error Handling & Troubleshooting
Federated queries can fail for privacy, data consistency, or infrastructure reasons. Understand common errors and how to resolve them.
Error: Privacy Budget Exhausted
Resolution: Wait for monthly reset, or request budget increase from admin. Consider combining queries or using higher-privacy templates.
Error: K-Anonymity Threshold Not Met
Resolution: Broaden grouping dimensions (remove granular attributes). Increase time range. Increase k threshold (default 50) with admin approval.
Error: Column Access Denied
Message: User lacks permission for column: ssn. Resolution: Request column grant from data owner or use alternate column with lower sensitivity.
Error: Query Timeout
Message: Query exceeded 300s timeout. Resolution: Add WHERE filters to reduce cardinality. Partition by time. Pre-aggregate data before querying.
Error: Data Consistency Warning
Message: Results may not be consistent: ClickHouse updated during query execution. Resolution: Re-run query. Placino will retry with consistent snapshot.
Supported Databases
Placino currently supports 8 database connectors. All data stays local; queries execute on each system.
PostgreSQL: Full SQL support. Recommended for customer data, transactions.
ClickHouse: Optimized for analytics. Supports aggregations, time series, high-cardinality dimensions.
MySQL: Full SQL support. Common for legacy systems.
Snowflake: Cloud warehouse. Supports large-scale joins and aggregations.
BigQuery: Google Cloud analytics. Supports federated tables and streaming inserts.
Redshift: AWS data warehouse. Full SQL support, columnar compression.
S3/Parquet: Data lake format. Queries via Athena or Spark SQL engine.
Kafka: Stream aggregations. Windowed joins and real-time counts.
Data Encryption & Transport Security
Federated queries use encryption to protect data in transit and at rest.
Envelope Encryption: All data encrypted with AES-256-GCM. Encryption keys wrapped with RSA-4096 per data owner.
TLS 1.3: All inter-database communication encrypted. Mutual certificate verification between Placino and data owners.
Zero Data Movement Principle: Aggregations computed where data lives. Only aggregated results (counts, sums) leave the owner's infrastructure.
Audit Logging: All federated queries logged with encryption. Query logs retained for 1 year, encrypted at rest.
Related Documentation
Learn more about privacy and security in federated analytics: