From Raw Nested JSON to Real-Time Insight: Simplifying Sensor Data Pipelines with Kinetica

Hari Subhash
April 17, 2025

The rise of IoT has led to an explosion of real-time data—from logistics fleets to power grids to factory floors. Sensors are constantly emitting data. This telemetry unlocks immense value: smarter routing, predictive maintenance, and responsive infrastructure.

But here’s the catch: this data often arrives as deeply nested JSON, emitted either directly by the devices or through aggregation gateways. Traditional databases choke on this complexity. They force teams to flatten data upfront via ETL pipelines—slow, brittle, and ill-suited for real-time demands.

✨ Kinetica: Real-Time Analytics on Raw Sensor Data

Kinetica eliminates the need for external ETL. It ingests raw nested JSON directly, stores it in native JSON columns, and lets you query it with standard SQL—all at sub-second speeds.

Let’s walk through how this works, using some sample sensor JSON data.

Sample Nested JSON Payload

Copy Code

{

  "deviceId": "sensor-4389",
  "lastUpdated": "2025-04-07T15:45:00Z",
  "readings": [
    {
      "scenario": "Baseline_Readings",
      "metrics": [{"type": "temperature", "value": 22.5}, {"type": "humidity", "value": 45}],
      "status": {"ReadStatus": "OK", "ReadErrorType": "", "ReadErrorMessage": ""}
    },
    {
      "scenario": "Alert_Readings",
      "metrics": [{"type": "vibration", "value": 0.34}],
      "status": {"ReadStatus": "OK", "ReadErrorType": "", "ReadErrorMessage": ""}
    },
    {
      "scenario": "Diagnostic_Readings",
      "metrics": [{"type": "voltage", "value": 220.1}],
      "status": {"ReadStatus": "ERROR", "ReadErrorType": "SENSOR_FAILURE", "ReadErrorMessage": "Voltage spike detected"}
    }
  ]
}

This one JSON object includes:

deviceId and timestamp
A readings array[]:
- With multiple scenarios (Baseline, Diagnostics etc.),
- array of metrics[]
  - Metric type
  - and a metric value
- and a status object.
  - With a read status
  - Read error type
  - And a read error message

This is a common telemetry structure—rich, flexible, and hard to wrangle using traditional SQL.

Step 1: Ingest the raw JSON directly into Kinetica

With Kinetica, you skip the flattening step and store the payload as-is:

Copy Code

CREATE OR REPLACE TABLE json_extraction.base (
  data JSON
);

Then insert your raw JSON data directly:

Copy Code

INSERT INTO json_extraction.base VALUES ('{...}');

Now we have the raw data in the base table.

Step 2: Flatten in SQL—Layer by Layer

Kinetica’s built-in JSON functions (json_value, json_query, unnest_json_array) make it easy to flatten even complex structures.

Extract top-level fields and the readings array:

To begin flattening, we extract the top-level fields (deviceId, lastUpdated) and the nested readings array from the raw JSON.

json_value() is used to extract scalar key-value pairs, while json_query() pulls out lists or objects. Since json_query() returns a string, we need to cast it to JSON using the json() function before we can work with it further.

Here’s how that looks in SQL:

Copy Code

SELECT
  json_value(data, '$.deviceId') AS deviceId,
  json_value(data, '$.lastUpdated') AS lastUpdated,
  json(json_query(data, '$.readings')) AS readings_array
FROM json_extraction.base;

Explode the readings array:

Now that we’ve isolated the readings array, the next step is to explode it—turning each individual reading (e.g., Baseline, Alert, Diagnostic) into its own row.

unnest_json_array() is Kinetica’s function for flattening JSON arrays into rows. You can use it in the FROM clause to iterate over the elements of a JSON array column. Optionally, you can include WITH ORDINALITY to track the original position of each item.

Here’s how we apply it to the readings_array:

Copy Code

FROM ..., unnest_json_array(readings_array) AS reading

Extract scenario, metrics, and status:

With each reading now on its own row, we extract the nested fields: scenario (a string), status (an object), and metrics (an array). These use the same functions as before—json_value() for scalar values and json_query() (cast to JSON) for objects and arrays.

Copy Code

SELECT
  json_value(reading, '$.scenario') AS scenario,
  json(json_query(reading, '$.metrics')) AS metrics_array,
  json_query(reading, '$.status') AS status
FROM ...;

This gives us each reading broken down by its scenario, diagnostic status, and a ready-to-unnest metrics_array.

Unnest the metrics and extract final values:

Now that we have the metrics_array and the status object for each reading, the final step is to:

Unnest the metrics_array to get individual metric entries, and
Extract diagnostic fields from the status JSON object.

Each resulting row will represent a single metric (like temperature or vibration) along with its scenario context and any associated error information.

Copy Code

SELECT
  json_value(metric, '$.type') AS type,
  json_value(metric, '$.value') AS value,
  json_value(status, '$.ReadStatus') AS ReadStatus,
  json_value(status, '$.ReadErrorType') AS ReadErrorType,
  json_value(status, '$.ReadErrorMessage') AS ReadErrorMessage
FROM ..., unnest_json_array(metrics_array) AS metric;

At this point, the JSON has been fully flattened into an analytics-ready format—with one row per metric and all relevant metadata preserved. The result is a clean, tabular dataset—each row a single metric reading, complete with scenario context and status flags.

ELT > ETL for Streaming Sensor Data

This workflow flips the traditional model. Instead of extracting, transforming, and then loading (ETL), you use an ELT pattern:

Extract from the source
Load raw into Kinetica
Transform using in-database SQL

This approach is faster, more flexible, and much better suited for evolving IoT schemas where fields and structures change frequently.

Built for Real-Time Ops

Kinetica’s architecture enables:

Instant ingestion of streaming JSON
Native SQL access to nested fields
High-speed transformation with GPU acceleration
Materialized views and stored procedures to automate routing and processing

You don’t need Spark, Airflow, or external ETL tools. The entire pipeline lives inside the database.

What’s Next

In upcoming posts, we’ll go deeper on:

Using delta-based materialized views for CDC-style updates
Writing stored procedures to split readings by metric
Creating Python UDFs to handle custom logic or API enrichments

Conclusion

Sensor data is messy. JSON is messy. Kinetica makes it manageable—without sacrificing speed.

By loading raw JSON and flattening it with SQL, you get a real-time pipeline that’s powerful, flexible, and production-ready.

Finding Competitive Coverage of the FSQ Places Dataset Over Road Networks Using Batch Isochrone Computations in One Tiny SQL Statement

From Raw Nested JSON to Real-Time Insight: Simplifying Sensor Data Pipelines with Kinetica

Sample Nested JSON Payload

Step 1: Ingest the raw JSON directly into Kinetica

Step 2: Flatten in SQL—Layer by Layer

ELT > ETL for Streaming Sensor Data

Built for Real-Time Ops

What’s Next

Conclusion

Resources

About

What’s New