Eiad Rostom:
Unsupervised Clustering and Multi-Label Classification of Ticket Data
Kurzbeschreibung
Issue tracking systems have become a main tool for companies to manage and maintain reported customer issues. A ticket within an issue tracking system describes a particular problem, its state, creation date, reporter, assignee, summary and other relevant data. The process of assigning this information to the ticket is mostly manually performed. commercetools GmbH has been using an issue tracking system in the process of supporting their customers for the last couple of years resulting in unused and unexplored data. This thesis consists of two parts. The goal of the first part is to explore the data obtained from the issue tracking system. For that purpose, I used an unsupervised learning approach to cluster the textual data of the tickets and then I visualized and analysed the data using different methods to observe the development of the clusters over the last two years. The goal of the second part of the thesis, is to find out if it is possible to automate part of the supporting process at commercetools by predicting the part of the product causing the reported issue and the responsible team for it. The fact that those two attributes were assigned to the ticket as labels made this problem a multi-label classification problem. To predict those labels, I trained four classifiers (i.e. k-NN classifier, decision tree classifier, logistic regression classifier and a neural network) using different multi-label classification approaches and evaluated the performance of them using the micro-average score of the recall, precision and f1 metrics. The results showed that the performance of the different classifiers was similar for most of the approaches used, with logistic regression and the neural network performing slightly better. The best performance was achieved by the neural network using a multi-label classification approach named "label powerset" resulting in an f1 score of 54%, which is a good result but unfortunately not enough to fully automate this part of the supporting process.