[av_heading heading=’Mining Massive Datasets’ tag=’h2′ color=” style=” padding=’10’]
[av_textblock ]
In the data-generating world we live, when more advanced than simple querying or multi-dimensional analysis is required, we turn our sight to data mining, which uncovers the knowledge hidden in the database (hence, Knowledge Discovery in Data – KDD). In general, data mining tasks can be classified as:
• Clustering: determining a finite set of implicit classes that describe the data.
• Classification: finding rules to assign data items to pre-existing classes.
• Dependency analysis: finding rules to predict the value of an attribute on the basis of the values of other attributes.
• Deviation and outlier analysis: searching for data items that exhibit unexpected deviations or differences from some norm.
• Trend detection: finding lines and curves to data to summarize the database.
• Generalization and characterization: obtaining a compact description of the database, for example, as a relatively small set of logical statements that condense the information in the database.
Evidently, all four ‘V’ challenges (Volume, Velocity, Variety, lack of Veracity), as well as the ‘D’ challenge (Distribution of data sources) in the BIG data world, makes the problem of mining massive datasets the ultimate challenge for data scientists.
As a representative example of our research focus, all the above types of tasks and challenges are also applicable in mobility data mining, which lately is full of success stories in discovering interesting behavioral patterns of moving objects that can be exploited in several fields. Example domains include traffic engineering, climatology, social anthropology and zoology, implying application of the various mining techniques in vehicle position data, hurricane track data, human and animal movement data, respectively.
[/av_textblock]
[av_iconlist position=’left’]
[av_iconlist_item title=’Selected Publications’ link=” linktarget=” icon=’ue84d’ font=’entypo-fontello’]
C. Panagiotakis, N. Pelekis, I. Kopanakis, E. Ramasso, Y. Theodoridis: “Segmentation and Sampling of Moving Object Trajectories based on Representativeness”, IEEE Transactions on Knowledge and Data Engineering, 24(7):1328-1343, July 2012. IEEE CS Press. |
N. Pelekis, I. Kopanakis, E.E. Kotsifakos, E. Frentzos, Y. Theodoridis: “Clustering Uncertain Trajectories”, Knowledge and Information Systems (KAIS), 28(1):117-147, 2011. Springer. |
H. Karanikas, G. Koundourakis, I. Kopanakis, T. Mavroudakis, N. Pelekis: “Discovering market trends in the biotechnology industry”, Int. J. Business Intelligence and Data Mining (IJBIDM) 2011, Vol. 6, No. 2, 201. |
N. Pelekis, I. Kopanakis, E. Kotsifakos, E. Frentzos, Y. Theodoridis: “Clustering Trajectories of Moving Objects in an Uncertain World”, Proceedings of the 9th IEEE Int’l Conference on Data Mining, ICDM’09, Miami – FL, USA, December 2009. IEEE CS Press. Best application paper awar. |
[/av_iconlist_item]
[/av_iconlist]