Mining Massive Datasets

[av_heading heading=’Mining Massive Datasets’ tag=’h2′ color=” style=” padding=’10’]

[av_textblock ]

In the data-generating world we live, when more advanced than simple querying or multi-dimensional analysis is required, we turn our sight to data mining, which uncovers the knowledge hidden in the database (hence, Knowledge Discovery in Data – KDD). In general, data mining tasks can be classified as:

• Clustering: determining a finite set of implicit classes that describe the data.
• Classification: finding rules to assign data items to pre-existing classes.
• Dependency analysis: finding rules to predict the value of an attribute on the basis of the values of other attributes.
• Deviation and outlier analysis: searching for data items that exhibit unexpected deviations or differences from some norm.
• Trend detection: finding lines and curves to data to summarize the database.
• Generalization and characterization: obtaining a compact description of the database, for example, as a relatively small set of logical statements that condense the information in the database.

Evidently, all four ‘V’ challenges (Volume, Velocity, Variety, lack of Veracity), as well as the ‘D’ challenge (Distribution of data sources) in the BIG data world, makes the problem of mining massive datasets the ultimate challenge for data scientists.

As a representative example of our research focus, all the above types of tasks and challenges are also applicable in mobility data mining, which lately is full of success stories in discovering interesting behavioral patterns of moving objects that can be exploited in several fields. Example domains include traffic engineering, climatology, social anthropology and zoology, implying application of the various mining techniques in vehicle position data, hurricane track data, human and animal movement data, respectively.
[/av_textblock]

[av_iconlist position=’left’]
[av_iconlist_item title=’Selected Publications’ link=” linktarget=” icon=’ue84d’ font=’entypo-fontello’]

C. Panagiotakis, N. Pelekis, I. Kopanakis, E. Ramasso, Y. Theodoridis: “Segmentation and Sampling of Moving Object Trajectories based on Representativeness”, IEEE Transactions on Knowledge and Data Engineering, 24(7):1328-1343, July 2012. IEEE CS Press.

N. Pelekis, I. Kopanakis, E.E. Kotsifakos, E. Frentzos, Y. Theodoridis: “Clustering Uncertain Trajectories”, Knowledge and Information Systems (KAIS), 28(1):117-147, 2011. Springer.

H. Karanikas, G. Koundourakis, I. Kopanakis, T. Mavroudakis, N. Pelekis: “Discovering market trends in the biotechnology industry”, Int. J. Business Intelligence and Data Mining (IJBIDM) 2011, Vol. 6, No. 2, 201.

N. Pelekis, I. Kopanakis, E. Kotsifakos, E. Frentzos, Y. Theodoridis: “Clustering Trajectories of Moving Objects in an Uncertain World”, Proceedings of the 9th IEEE Int’l Conference on Data Mining, ICDM’09, Miami – FL, USA, December 2009. IEEE CS Press. Best application paper awar.

[/av_iconlist_item]
[/av_iconlist]

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga		Google Analytics: Used to distinguish users.
_gat		Google Analytics: Used to throttle request rate.
_gid		Google Analytics: Used to distinguish users.

Leave a Reply Cancel reply