Query Engine

Placino offers 5 query modes, from pre-built templates to natural language queries. All queries automatically enforce privacy constraints and apply differential privacy.

5 Query Modes

Mode 1: Template Queries (Safest)

Pre-built, company-approved query templates for common analyses. Fastest, most secure. No data scientist needed.

POST /api/v1/projects/ID/query/template
Body:
{ "template": "intersection_size", "dataset_left": "brand_a", "dataset_right": "brand_b" }

Pre-vetted SQL, epsilon budget pre-defined, results rounded to k-anonymity threshold.

Mode 2: SQL (Flexible)

Write custom SQL queries. Placino validates column access and enforces privacy constraints automatically.

POST /api/v1/projects/ID/query/sql
Body:
{ "sql": "SELECT COUNT(*) FROM customers WHERE age_group = '25-34'", "privacy_epsilon": 1.0 }

Full SQL support. Blocked: raw PII columns, joins across sensitive tables. Allowed: aggregations, filters, GROUP BY.

Mode 3: Natural Language (AI-Powered)

Describe your question in English or Turkish. AI translates to safe SQL, applies privacy, returns results.

POST /api/v1/projects/ID/query/nl
Body:
{ "question": "How many 25-34 year olds are in both datasets?", "language": "en" }

Supports English and Turkish. Privacy constraints validated server-side before execution.

Mode 4: Aggregations (Summary Stats)

Request pre-defined aggregations: COUNT, SUM, AVG, STDDEV. Optimized for low-latency summary statistics.

POST /api/v1/projects/ID/query/aggregate
Body:
{ "aggregation": "count", "dataset": "customers", "groupby": ["age_group", "brand"] }

Cached results. Automatic k-anonymity enforcement on GROUP BY cells.

Mode 5: Export (Results Download)

Export query results as CSV or Parquet with privacy-preserving transformations applied.

POST /api/v1/projects/ID/query/export
Body:
{ "dataset": "intersection_results", "format": "csv", "apply_k_anonymity": true }

Results sanitized, PII removed, k-anonymity enforced, stored in encrypted S3/GCS bucket.

Privacy Constraints

Every query automatically enforces these privacy guardrails:

Column-Level Access: Users can only query columns they have permission for. SSN, phone, email require explicit grant.

Epsilon Budget: Each query consumes epsilon from user's budget. Once depleted (default 10.0), queries rejected. Resets monthly or on admin approval.

K-Anonymity: Result cells with fewer than k records (default k=5) are automatically suppressed or rounded.

Result Set Size: Queries limited to 100,000 rows. Aggregations limited to 1,000 groups. Prevents data exfiltration.

Differential Privacy in Action

All numeric results get Laplace noise added. The amount depends on epsilon:

EpsilonPrivacyNoise Level
0.1Very strict+/- 1000s
1.0Strong+/- 100s
5.0Moderate+/- 10s
10.0Weak+/- 5s

Lower epsilon = stronger privacy guarantees but noisier results. Placino recommends epsilon=1.0 for most use cases.

Example Queries

Template: Find intersection size

How many customers overlap between two datasets?

curl -X POST http://localhost:8080/api/v1/query/template -d "template": "intersection_size"

SQL: Count by age group

Summary of customers by age cohort with k-anonymity enforcement.

SELECT age_group, COUNT(*) as cnt FROM customers GROUP BY age_group

NL: Natural language query

Ask in plain English, system translates and enforces privacy.

"What's the audience overlap between brand A and brand B in the 25-34 age group?"

Next Steps

Learn about the privacy mechanisms that protect results:

Privacy Controls Deep Dive