The rise of IoT has led to an explosion of real-time data—from logistics fleets to power grids to factory floors. Sensors are constantly emitting data. This telemetry unlocks immense value: smarter routing, predictive maintenance, and responsive infrastructure.
But here’s the catch: this data often arrives as deeply nested JSON, emitted either directly by the devices or through aggregation gateways. Traditional databases choke on this complexity. They force teams to flatten data upfront via ETL pipelines—slow, brittle, and ill-suited for real-time demands.
✨ Kinetica: Real-Time Analytics on Raw Sensor Data
Kinetica eliminates the need for external ETL. It ingests raw nested JSON directly, stores it in native JSON columns, and lets you query it with standard SQL—all at sub-second speeds.
Let’s walk through how this works, using some sample sensor JSON data.
Sample Nested JSON Payload
{
"deviceId": "sensor-4389",
"lastUpdated": "2025-04-07T15:45:00Z",
"readings": [
{
"scenario": "Baseline_Readings",
"metrics": [{"type": "temperature", "value": 22.5}, {"type": "humidity", "value": 45}],
"status": {"ReadStatus": "OK", "ReadErrorType": "", "ReadErrorMessage": ""}
},
{
"scenario": "Alert_Readings",
"metrics": [{"type": "vibration", "value": 0.34}],
"status": {"ReadStatus": "OK", "ReadErrorType": "", "ReadErrorMessage": ""}
},
{
"scenario": "Diagnostic_Readings",
"metrics": [{"type": "voltage", "value": 220.1}],
"status": {"ReadStatus": "ERROR", "ReadErrorType": "SENSOR_FAILURE", "ReadErrorMessage": "Voltage spike detected"}
}
]
}
This one JSON object includes:
- deviceId and timestamp
- A readings array[]:
- With multiple scenarios (Baseline, Diagnostics etc.),
- array of metrics[]
- Metric type
- and a metric value
- and a status object.
- With a read status
- Read error type
- And a read error message
This is a common telemetry structure—rich, flexible, and hard to wrangle using traditional SQL.
Step 1: Ingest the raw JSON directly into Kinetica
With Kinetica, you skip the flattening step and store the payload as-is:
CREATE OR REPLACE TABLE json_extraction.base (
data JSON
);
Then insert your raw JSON data directly:
INSERT INTO json_extraction.base VALUES ('{...}');
Now we have the raw data in the base table.
Step 2: Flatten in SQL—Layer by Layer
Kinetica’s built-in JSON functions (json_value, json_query, unnest_json_array) make it easy to flatten even complex structures.
Extract top-level fields and the readings array:
To begin flattening, we extract the top-level fields (deviceId, lastUpdated) and the nested readings array from the raw JSON.
json_value() is used to extract scalar key-value pairs, while json_query() pulls out lists or objects. Since json_query() returns a string, we need to cast it to JSON using the json() function before we can work with it further.
Here’s how that looks in SQL:
SELECT
json_value(data, '$.deviceId') AS deviceId,
json_value(data, '$.lastUpdated') AS lastUpdated,
json(json_query(data, '$.readings')) AS readings_array
FROM json_extraction.base;
Explode the readings array:
Now that we’ve isolated the readings array, the next step is to explode it—turning each individual reading (e.g., Baseline, Alert, Diagnostic) into its own row.
unnest_json_array() is Kinetica’s function for flattening JSON arrays into rows. You can use it in the FROM clause to iterate over the elements of a JSON array column. Optionally, you can include WITH ORDINALITY to track the original position of each item.
Here’s how we apply it to the readings_array:
FROM ..., unnest_json_array(readings_array) AS reading
Extract scenario, metrics, and status:
With each reading now on its own row, we extract the nested fields: scenario (a string), status (an object), and metrics (an array). These use the same functions as before—json_value() for scalar values and json_query() (cast to JSON) for objects and arrays.
SELECT
json_value(reading, '$.scenario') AS scenario,
json(json_query(reading, '$.metrics')) AS metrics_array,
json_query(reading, '$.status') AS status
FROM ...;
This gives us each reading broken down by its scenario, diagnostic status, and a ready-to-unnest metrics_array.
Unnest the metrics and extract final values:
Now that we have the metrics_array and the status object for each reading, the final step is to:
- Unnest the metrics_array to get individual metric entries, and
- Extract diagnostic fields from the status JSON object.
Each resulting row will represent a single metric (like temperature or vibration) along with its scenario context and any associated error information.
SELECT
json_value(metric, '$.type') AS type,
json_value(metric, '$.value') AS value,
json_value(status, '$.ReadStatus') AS ReadStatus,
json_value(status, '$.ReadErrorType') AS ReadErrorType,
json_value(status, '$.ReadErrorMessage') AS ReadErrorMessage
FROM ..., unnest_json_array(metrics_array) AS metric;
At this point, the JSON has been fully flattened into an analytics-ready format—with one row per metric and all relevant metadata preserved. The result is a clean, tabular dataset—each row a single metric reading, complete with scenario context and status flags.
ELT > ETL for Streaming Sensor Data
This workflow flips the traditional model. Instead of extracting, transforming, and then loading (ETL), you use an ELT pattern:
- Extract from the source
- Load raw into Kinetica
- Transform using in-database SQL
This approach is faster, more flexible, and much better suited for evolving IoT schemas where fields and structures change frequently.
Built for Real-Time Ops
Kinetica’s architecture enables:
- Instant ingestion of streaming JSON
- Native SQL access to nested fields
- High-speed transformation with GPU acceleration
- Materialized views and stored procedures to automate routing and processing
You don’t need Spark, Airflow, or external ETL tools. The entire pipeline lives inside the database.
What’s Next
In upcoming posts, we’ll go deeper on:
- Using delta-based materialized views for CDC-style updates
- Writing stored procedures to split readings by metric
- Creating Python UDFs to handle custom logic or API enrichments
Conclusion
Sensor data is messy. JSON is messy. Kinetica makes it manageable—without sacrificing speed.
By loading raw JSON and flattening it with SQL, you get a real-time pipeline that’s powerful, flexible, and production-ready.