Data Science Lab
 
University of Piraeus

ARGO: A Big Data Framework for Online Trajectory Prediction

Petros Petrou, Panagiotis Nikitopoulos, Panagiotis Tampakis, Apostolos Glenis, Nikolaos Koutroumanis, Georgios M.Santipantakis, Kostas Patroumpas, Akrivi Vlachou, Harris Georgiou, Eva Chondrodima, Christos Doulkeridis, Nikos Pelekis, Gennady L. Andrienko, Georg Fuchs, Yannis Theodoridis, George A. Vouros



Abstract

In this paper, we present a big data framework for the prediction of streaming trajectory data, enriched with data from other sources and exploiting mined patterns of trajectories from integrated data, allowing accurate long-term predictions with low latency. In particular, to meet this goal we follow a multi-step methodology. First, we efficiently compress surveillance data in an online fashion, by constructing trajectory synopses that are spatio-temporally linked with streaming and archival data from a variety of diverse and heterogeneous data sources. The enriched stream of trajectory synopses is stored in a distributed RDF store, thus supporting data exploration via simple SPARQL queries. Moreover, the enriched stream of synopses along with the raw data is consumed by trajectory prediction algorithms that exploit mined patterns from the RDF store, namely medoids of (sub-) trajectory clusters, which prolong the temporal window of useful predictions. The framework is also extended with an offline and an online interactive visual analytics tool to facilitate real world analysis in the maritime and the aviation domains.

Introduction

As the maritime and air-traffic management (ATM) domains have major impact to the global economy, a constant need is to advance the capacities of systems to improve safety and effectiveness of critical operations involving a large number of moving entities in large geographical areas [1]. Towards this goal, the correlated exploitation of heterogeneous data sources offering vast quantities of archival and high-rate streaming data is important for increasing the accuracy of computations when analysing and predicting future states of moving entities. However, operational systems in these domains for predicting trajectories are still limited to a short-term horizon to date, while facing increased uncertainty and lack of accuracy in mobility data.

Synopses Generator

The Synopses Generator (SG) [2][3] provides online, summarized representations of trajectories of vessels and aircrafts.

Semantic Integrator

The compressed stream of trajectory positions is received by the Semantic Integrator (SI), which performs two tasks: (a) data transformation to RDF [4], and (b) spatio-temporal link discovery (LD) against other data sources [5]. The output of the SI is a stream of integrated data, representing enriched trajectory synopses [6].

Data Manager

The Data Manager has as the fundamental module the distributed spatiotemporal RDF engine. It comprises two distinct layers: (a) the distributed storage layer [7], and (b) the parallel processing layer [8].

(Sub-)Trajectory Clustering Module

The objective of the (Sub-)Trajectory Clustering (STC) module is to first partition trajectories into sub-trajectories and then identify the most representative ones that will act as cluster pivots [9][10]. The role of this module to the overall architecture of ARGO is to take as input data selected from the RDF store, apply STC, and provide the resulting representatives (i.e. cluster medoids) both to the prediction module as well as back to the RDF store.

Future Location Prediction Module

The Future Location Prediction (FLP) module aims to make an accurate estimation of the next movement of a moving object within a specific look-ahead time frame. FLP exploits the cluster representatives mined from historical data as a reference for producing FLP forecasts aligned with the closest-matched route in the maximum-likelihood sense.

System Architecture

The proposed framework for trajectory prediction is implemented as a big data architecture and illustrated in Figure 3. It comprises two parts/layers: (a) stream processing layer, and (b) batch processing layer, which interact in order to provide the desired functionality.

In brief, the stream processing layer processes the stream of surveillance data, and performs data cleaning, noise elimination, compression and semantic data integration, in an online manner. The synopsized and enriched data stream, represented in RDF, can be consumed as it is, thus enabling the deployment of data analysis pipelines, and it is also stored in a distributed spatiotemporal RDF store for batch processing. This store supports scalable and efficient processing of SPARQL queries with spatiotemporal constraints, providing filtered, integrated, spatiotemporal data for higher level analysis tasks. Offline analysis of integrated data (e.g., for trajectory clustering) generates mined patterns, which are exploited in conjunction to the enriched data stream during the online operation of the trajectory prediction module.

Video Showcase - Scenario 1: Interactive pattern discovery

Video Showcase - Scenario 2: Online trajectory prediction

References

[1] G. A. Vouros, C. Doulkeridis, G. Santipantakis, A. Vlachou, N. Pelekis, H. Georgiou, Y. Theodoridis, K. Patroumpas, E. Alevizos, A. Artikis, G. Fuchs, M. Mock, G. Andrienko, N. Andrienko, C. Ray, C. Claramunt, E. Camossi, A.-L. Jousselme, D. Scarlatti, J. Manuel, 2018. Big Data Analytics for Time Critical Mobility Forecasting: Recent Progress and Research Challenges. In Proceedings of EDBT.
[2] K. Patroumpas, E. Alevizos, A. Artikis, M. Vodas, N. Pelekis, Y. Theodoridis, 2016. Online Event Recognition from Moving Vessel Trajectories. GeoInformatica, 21(2), 389-427.
[3] K. Patroumpas, N. Pelekis, Y. Theodoridis, 2018. On-the-fly Mobility Event Detection over Aircraft Trajectories. In Proceedings of SIGSPATIAL.
[4] G.M. Santipantakis, K.I. Kotis, G.A. Vouros, C.Doulkeridis. 2018. RDF-Gen: Generating RDF from Streaming and Archival Data. In Proceedings of WIMS.
[5] G.M. Santipantakis, A.Vlachou, C.Doulkeridis, A.Artikis, I.Kontopoulos, G.A. Vouros. 2018. A Stream Reasoning System for Maritime Monitoring. In Proceedings of TIME.
[6] G.M. Santipantakis, A.Glenis, K.Patroumpas, A.Vlachou, C.Doulkeridis, G.A. Vouros, N. Pelekis, Y. Theodoridis. SPARTAN: Semantic Integration of Big Spatio-temporal Data from Streaming and Archival Sources. Future Generation Computer Systems, to appear.
[7] A. Vlachou, C. Doulkeridis, A. Glenis, G. Santipantakis, G.A. Vouros. 2019. Efficient Spatio-temporal RDF Query Processing in Large Dynamic Knowledge Bases. In Proceedings of SAC.
[8] P. Nikitopoulos, A. Vlachou, C. Doulkeridis, G.A. Vouros. 2018. DiStRDF: Distributed Spatio-temporal RDF Queries on Spark. EDBT/ICDT Workshops.
[9] N. Pelekis, P. Tampakis, M. Vodas, C. Panagiotakis, Y. Theodoridis, 2017. In-DBMS Sampling-based Sub-trajectory Clustering, In Proceedings of EDBT.
[10] P. Tampakis, N. Pelekis, N. Andrienko, G. Andrienko, G. Fuchs, Y. Theodoridis. 2018. Time-aware Sub-Trajectory Clustering in Hermes@PostgreSQL (demo paper). In Proceedings of ICDE.

Supplementary Content - Synopses Generator