PostGIS vs Kinetica vs
GeoMesa vs Apache Sedona
You'll want a system that's flexible, fast, scalable and easy to use.
What Tools for Geospatial Analysis at Scale?
Location data is exploding. Location-enabled chips for cellular connectivity are being used in more and more devices, new apps and the expansion of 5G networks are aiding in the collection of greater volumes of geospatial data. Add to that the increasing affordability of analyzable imagery from satellites, drones and surveillance cameras.
But designing systems to keep up with the volume and speeds required of modern location intelligence workloads can be tough. Traditional GIS tools and databases are no longer up to the task. To manage IoT at scale, you'll need to turn to a new generation of data-centric tools for tracking of objects and events at scale.
So, what are some options?
Kinetica is a high performance real-time database designed from the ground-up for analysis of geospatial data at speed and scale.
Kinetica scales horizontally with a memory-first distributed architecture which allows you to handle larger geospatial data sets with ease. An eventually-consistent, no-locking design allows for simultaneous ingest and analysis with no latency.
Kinetica's has been built from the ground up to take full advantage of modern processors. – it is able to vectorize queries for significantly faster results on ad-hoc exploratory queries. Spatial joins, which can be particularly complex, benefit most from this unique capability.
Kinetica can be accessed through SQL, ODBC/JDBC, REST APIs, or with Kinetica Workbench - an interactive SQL based notebook for sequential analysis. Over 130 spatial functions are available. Workbench makes it easy for business users to get up and running and interact with large volumes of interactive data. Workbench grants users an interactive data exploration experience at virtually unlimited scale, quickly and simply on a single system.
Kinetica's visualization engine is able to perform server-side visualization of data for maps, which enables interactivity with massive, detailed geospatial datasets. Kinetica also includes support for user defined functions, graph solving (map-matching, supply-demand analysis, isochrones), and machine learning capabilities.
Kinetica is a good choice for those looking for a complete system purpose-built for building geospatial applications, or analyzing geospatial data at scale. While it is not open-source, a developer version is available for single-node use for free.
PostGIS is a library that provides geospatial data types, functions, and queries for use with PostgreSQL
PostGIS offers geospatial functionality for use at a moderate scale. It works with PostgreSQL – an easy to understand relational database that is well suited to many OLTP-type applications. Its architecture is general purpose and flexible, but not specifically optimized for analytics. As a result, non-indexed queries and aggregations will be slower to return results
PostGIS provides a very robust geospatial function library of both 2D, 3D, and raster functions. It has full support for many spatial reference systems (SRS), and plugs into popular 3rd party tools to provide missing functionality, such as for visualizations or routing algorithms. It’s SQL compliant and also allows users to bring in User Defined Functions.
Without in-built support for visualization, users of PostGIS may face difficulties when trying to display large quantities of real-time data on a map. Many popular visualization vendors rely on client-side rendering, which limits visualizations to small and manageable datasets.
Users will also run into limitations with PostGIS when working with geospatial data at scale. PostgreSQL is not inherently designed to scale out, it does not offer advanced vectorized queries, and it is not ready for high velocity data ingestion from streaming feeds. This makes it unsuitable for IoT style use cases where data volumes can be expected to increase quickly.
GeoMesa is an open-source suite of tools for large-scale geospatial querying and analytics on distributed computing systems – such as HBase, Accumulo, Cassandra, Redis, Kafka and Spark.
GeoMesa will enable you to store, index, query, and transform spatio-temporal data at scale with systems that are horizontally scalable, but weren't intrinsically designed for using geospatial and temporal data. GeoMesa has become known as a good addition to the stack for big data use-cases which need to add geospatial queries on top of an established data management structure.
GeoMesa provides support for stream processing of spatial data on top of Apache Kafka, and it works with data stores including Redis, Cassandra, S3, HDFS, Accumulo and others. In addition, it supports Spark Analytics through various APIs.
But GeoMesa is just a collection of technologies to aid in building a large scale geospatial system. Effective usage requires mastery of multiple underlying data-systems and technologies. Stitching together these pieces is often time consuming and prone to error leading to less than optimal results. Performance is limited by choices made on the underlying data infrastructure.
Apache Sedona is another distributed computing framework for processing geospatial data at scale
Apache Sedona (incubating), formerly GeoSpark, gives you the ability to load, process, transform and analyze large volumes of geospatial data across different machines. It extends Apache Spark with distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. It can be deployed on the cloud such as AWS, Azure, GCP. You can work with data using either Spark's RDD API in popular languages like Java, Scala, Python, R, or by using SQL and ST_ functions.
Sedona leans on Apache Flink for stream processing, and can be hooked up to tools such as Zeppelin for data analysis. However this also requires stitching together multiple technologies to create a working system.
Sedona is best suited when adding geospatial functionality to existing Spark projects. But it is earlier in its development lifecycle than GeoMesa, and still lacks features and capabilities. Consider it as a component for large scale geospatial analysis use-cases with Spark.
Kinetica: Designed for the Next-Generation of Geospatial Applications
Book a Demo!
The best way to appreciate the possibilities that Kinetica brings to high-performance real-time analytics is to see it in action.
Contact us, and we'll give you a tour of Kinetica. We can also help you get started using it with your own data, your own schemas and your own queries.