TY - THES ID - 134715223 TI - Using a structural classification model for the clustering of scientific abstracts AU - Keersmaekers, Alek AU - Thijs, Bart AU - KU Leuven. Faculteit Ingenieurswetenschappen. Opleiding Master of Artificial Intelligence (Leuven) PY - 2016 PB - Leuven KU Leuven. Faculteit Ingenieurswetenschappen DB - UniCat UR - https://www.unicat.be/uniCat?func=search&query=sysid:134715223 AB - In this internship we examine the possibility of clustering scientific abstracts on the basis of their content. A previous approach based these clusters on noun phrases (NPs) extracted from the ab-stract’s sentences, but it was found that they were rather inconsistent; some clusters were formed on the basis of their object of study while others occurred due to similar techniques that were em-ployed. Hence we devise an approach to automatically detect an abstract’s structure in order to be able to determine whether a NP occurs in the ‘introduction’, ‘methodology’ or ‘conclusion’ section. A Random Forest classifier with a wide range of structural, morpho-syntactic and lexical features performed best for this task. The accuracy of the model was fairly satisfactory (about 87%, although the recall for methodology sentences was quite low). However, the clusters that were formed on the basis of this classification were rather unsatisfactory. We suggest that the most likely explana-tion is the lack of any criterion to distinguish ‘irrelevant’ extracted NPs from others, and propose several solutions to this problem. ER -