Export directly to RefWorks

TY  - THES
ID  - 137184006
TI  - Corpus of Indian English Fiction: A Prototype
AU  - Tatariya, Kushal
AU  - Speelman, Dirk
AU  - KU Leuven. Faculteit Wetenschappen. Opleiding Master of Digital Humanities (Leuven)
PY  - 2022
PB  - Leuven KU Leuven. Faculteit Wetenschappen
DB  - UniCat
UR  - https://www.unicat.be/uniCat?func=search&query=sysid:137184006
AB  - This thesis project presents a working prototype of a corpus of Indian English fictional texts. It begins with a discussion of the qualitative background of the need of such a corpus. Various debates and arguments made in previous studies are laid out. An argument is made about the need for quantitative and corpus stylistic research in the study of Indian English literature. This project aims to fill the gap in the research by building a prototype of a corpus that can be used to that end.  The report moves on to discuss the state-of-the-art in corpus design and annotation principles, available corpora that are similar to the one proposed, and available corpus architectures. Two main approaches to corpus architecture are discussed – the IMS Corpus Workbench and the relational database approach. The later approach is the one used for building the prototype. An overview of the corpus prototype then follows, along with a description of the web-interface, the possible queries, and a discussion of the rationale behind the decisions made in the process of creating it. The pros and cons of the approach taken are discussed – the pros being that the relational database approach allows for a simplified and scalable model for building a corpus. A drawback of the approach is that despite the claims about it allowing for faster queries, the database tables can be quite large, resulting in slower execution times.  The strengths and shortcomings of the prototype are also laid out in the discussion. The prototype is an extremely valuable starting point for being scaled up into a fully-fledged working corpus with a web-interface. However, presently it lacks markup that separates the text from the non-textual elements, which create discrepancies in the frequency and n-gram tables. In conclusion, future prospects of the prototype are discussed, with the possibility of adding markup, and additional annotations, along with improvements to the user interface.
ER  -