Data Ingestion Channels

Placino supports 6 flexible data ingestion channels, from one-time CSV uploads to real-time streaming from enterprise data warehouses.

Overview

Channel 1: CSV Upload

Drag-and-drop CSV files via web UI or API. Automatic schema detection, encoding handling, and encryption.

Best for: Ad hoc datasets, pilot programs

Channel 2: Parquet Streaming

Stream Parquet files from S3, GCS, or local storage. Columnar compression reduces ingestion bandwidth 50-80x.

Best for: Large batch exports, data warehouse snapshots

Channel 3: Kafka Topics

Real-time streaming from Kafka topics. Automatic offset tracking, back-pressure handling, and idempotent writes.

Best for: Real-time event streams, CDP integrations

Channel 4: PostgreSQL CDC

Change Data Capture from PostgreSQL. Listens to logical replication slots for INSERT/UPDATE/DELETE events.

Best for: CRM sync, transactional data pipelines

Channel 5: BigQuery Export

Direct SQL query export from BigQuery. Supports scheduled exports and incremental snapshots with dedup.

Best for: Cloud data warehouse ingestion

Channel 6: Salesforce API

Ingest Salesforce Contacts, Leads, and Accounts. Automatic field mapping, incremental updates via SOQL.

Best for: CRM data collaboration, lead scoring

Channel Details

CSV Upload

Simplest option for testing or one-time datasets.

curl -X POST http://localhost:8080/api/v1/projects/PROJECT_ID/ingest/csv
  -F "file=@customers.csv"
  -F "dataset_name=customers"

Max size: 1GB per file. Automatic UTF-8 detection. Supports quotes, escapes, and custom delimiters.

Parquet Streaming

Efficient bulk ingestion for data warehouse snapshots.

curl -X POST http://localhost:8080/api/v1/projects/PROJECT_ID/ingest/parquet
  -H "X-Parquet-Source: s3://bucket/export.parquet"
  -F "dataset_name=warehouse_export"

Supports S3, GCS, Azure Blob, local file://, http://. Automatic decompression (snappy, gzip, lz4).

Kafka Streaming

Real-time event ingestion from message brokers.

# Configure Kafka source
curl -X POST http://localhost:8080/api/v1/projects/PROJECT_ID/sources/kafka -d "broker": "kafka:9092" -d "topics": ["customer_events"]

Supports Avro, JSON, Protobuf schemas. Automatic offset management. Exactly-once delivery semantics.

PostgreSQL CDC

Listen to database changes from PostgreSQL logical replication.

curl -X POST http://localhost:8080/api/v1/projects/PROJECT_ID/sources/pg_cdc -d "connection_string": "postgresql://user:pass@host/db" -d "tables": ["customers", "orders"]

Requires logical replication slot on source database. Supports any table schema.

BigQuery Export

Export query results from BigQuery with scheduled runs.

curl -X POST http://localhost:8080/api/v1/projects/PROJECT_ID/sources/bigquery -d "project_id": "my-gcp-project" -d "query": "SELECT * FROM dataset.customers WHERE updated_at > @cutoff"

Supports parameterized queries for incremental loads. Automatic service account setup.

Salesforce API

Sync Contacts, Leads, Accounts from Salesforce CRM.

curl -X POST http://localhost:8080/api/v1/projects/PROJECT_ID/sources/salesforce -d "client_id": "YOUR_OAUTH_ID" -d "sobject": "Contact"

OAuth 2.0 flow. Field-level mapping. Incremental sync via SystemModstamp.

Encryption During Ingestion

All ingestion channels apply envelope encryption automatically:

Sensitive columns (email, phone, SSN) are encrypted with AES-256-GCM.

PII hashes (for matching) are ephemeral and only exist during query execution.

Non-sensitive columns (age_group, brand, region) remain queryable in plaintext.

Ingestion logs are immutable in Merkle-chain audit trail.

Schema Management

Placino automatically detects or lets you specify column types and sensitivity:

# Define schema with sensitivity levels
{ "column": "email", "type": "string", "sensitive": true }, { "column": "age_group", "type": "string", "sensitive": false }