Data Science Lab
 
University of Piraeus

Time-aware Sub-Trajectory Clustering in Hermes@PostgreSQL

Panagiotis Tampakis, Nikos Pelekis, Natalia Andrienko, Gennady Andrienko, Georg Fuchs, Yannis Theodoridis



Abstract

In this paper, we present an efficient in-DBMS framework for progressive time-aware sub-trajectory cluster analysis. In particular, we address two variants of the problem: (a) spatiotemporal sub-trajectory clustering and (b) index-based time-aware clustering at querying environment. Our approach for (a) relies on a two-phase process: a voting-and-segmentation phase followed by a sampling-and-clustering phase. Regarding (b), we organize data into partitions that correspond to groups of sub-trajectories, which are incrementally maintained in a hierarchical structure. Both approaches have been implemented in Hermes@PostgreSQL, a real Moving Object Database engine built on top of PostgreSQL, enabling users to perform progressive cluster analysis via simple SQL. The framework is also extended with a Visual Analytics (VA) tool to facilitate real world analysis. Our system is demonstrated over a real MOD consisting of a fleet of aircrafts approaching airports of the London metropolitan area.

S2T-Clustering

In general, the objective of sub-trajectory clustering is to partition trajectories into sub-trajectories and then form groups of similar ones, while at the same time separate those (called outliers) that fit into no group. S2T-Clustering [1] is a state-of-the-art algorithm consists of two phases: during the first phase, a Neighborhood-aware Trajectory Segmentation (NaTS) method is applied over trajectories, thus splitting them in sub-trajectories; during the second phase, Sampling, Clustering and Outlier (SaCO) detection steps are performed in order to provide the final result. NaTS relies on a voting and segmentation process that detects homogenized sub-trajectories in the MOD w.r.t. how many other objects move in their neighborhood, while SaCO selects the most representative ones to serve as the seeds of the clusters, around which the clusters are formed (also, the outliers are isolated).

In more detail, during the adopted voting process each 3D trajectory segment of a given trajectory is voted by other trajectories w.r.t. their mutual distance. The voting received by each segment is a value ranging from 0 to N (N being the cardinality of the MOD) that has the physical meaning of how many trajectories co-move with that trajectory for a certain period of time. After the voting process takes place, the trajectory segmentation process follows. The goal of this step is to partition each trajectory into sub-trajectories having homogenous representativeness, irrespectively of their shape complexity. However, the goal of sub-trajectory clustering is to partition the entire dataset into groups (clusters) and to detect the outliers among the sub-trajectories identified by the trajectory segmentation step. Therefore, in our proposal, we first select the appropriate sampling set S and then, we tackle the problem of clustering according to the following idea: each sub-trajectory in the sampling set is considered to be a cluster representative. So, the sampling set should contain highly voted trajectories of the MOD which, at the same time, would cover the 3D space occupied by the entire dataset as much as possible. Then, the clustering is done building the clusters around those representatives. For more details about S2T-Clustering, please refer to [1].

QuT-Clustering

In summary, QuT-Clustering [2] relies upon a hierarchical structure, called ReTraTree (for Representative Trajectory Tree) that effectively indexes a MOD for sub-trajectory clustering purposes. ReTraTree consists of four levels: the first two levels operate on the temporal dimension, the third level builds clusters upon the spatio-temporal characteristics of the trajectories, and the fourth level is the actual data storage.

Given a MOD indexed according to ReTraTree structure and a temporal period W of interest, QuT-Clustering efficiently retrieves the subset of the MOD, actually the clusters and outliers at sub-trajectory level, that temporally intersect W.

System Architecture

The architecture of our framework is illustrated in the figure below. As expected, the core of the framework is the ReTraTree. The trajectories composing the MOD are partitioned according to the in-memory part of the structure and stored on disk-based partitions. The trajectories assigned to an existing representative trajectory are archived on disk in dedicated R-tree indexed partitions (called pg3D-Rtree-k). On the other hand, outlier trajectories are organized on disk in a separate partition. When the size of a partition exceeds a pre-defined threshold, S2T-Clustering takes action: it applies Voting among trajectories, with the voting results indicating the Segmentation of trajectories into sub-trajectories that should take place. The resulting sub-trajectories along with their voting descriptors feed the Sampling module that selects new representatives, which are then back-propagated to the in-memory part of ReTraTree. The new representative trajectories as well as the raw sub-trajectories form the input of the GreedyClustering module: if a sub-trajectory is clustered around a new representative, it is archived in its appropriate partition on disk; otherwise, it is considered outlier and is re-inserted to ReTraTree, as it may now be accommodated in the index. Note that our implementation is completely independent from PostGIS. This implies that the underlying R-tree index, coined pg3D-Rtree has also been implemented from scratch on top of GiST.

Having this functionality in hand, the data analyst is able to perform interactive clustering analysis, by providing different values of W as input, through either the SQL interface of Hermes@PostgreSQL [3] or the incorporated V-Analytics tool [4].

Video Showcase - Scenario 1

Video Showcase - Scenario 2

References

[1] N. Pelekis, P. Tampakis, M. Vodas, C. Panagiotakis, Y. Theodoridis. 2017. In-DBMS Sampling-based Sub-trajectory Clustering, In Proceedings of EDBT.
[2] N. Pelekis, P. Tampakis, M. Vodas, C. Doulkeridis Y. Theodoridis. 2017. On Temporal-Constrained Sub-Trajectory Cluster Analysis, Data Mining and Knowledge Discovery, 31(5):1294-1330.
[3] N. Pelekis, and Y. Theodoridis. 2014. Mobility Data Management and Exploration. Springer.
[4] G. Andrienko, N. Andrienko, P. Bak, D. Keim, and S. Wrobel. Visual Analytics of Movement. Springer, 2013.