Dominik Schlomo Moog:
Spam detection in crowdsourced ideation
Requirements
- basic knowledge in machine learning
Contents
Context
Crowd Ideation is considered as a promising solution to collect creative ideas because this solution involves participants from different backgrounds and generates a large number of ideas. However, the main challenge is finding useful and innovative ideas. Moreover, allowing the crowd to freely generate ideas, opens the opportunity for some participants to provide dummy text. To tackle this problem, we defined in our model a number of quality gates that improve the ideation output. One of these gates is spam and duplicate detection.
Problem
Some MTurk workers can submit a copy paste text from wikipedia or enter single word, combining random words during idea generation.
Objectives
Carry out a study about the algorithm used in literature to detect spam and non-sense text
Possible procedure
Look for algorithms used to detect span (e.g. email)
Adapt or propose new algorithm to detecting such dummy text
References
Androutsopoulos, Ion, Georgios Paliouras, Vangelis Karkaletsis, Constantine D Spyropoulos und Panagiotis Stamatopoulos: Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach. März 2013.
Blanzieri, Enrico und Anton Bryl: A Survey of Learning-Based Techniques of Email Spam Filtering. Artificial Intelligence Review, 29(1):63–92, März 2008, ISSN 0269-2821, 1573-7462. http://link.springer.com/10.1007/s10462-009-9109-62020-01-15.