UniCat-Search

Union Catalogue of Belgian Libraries

Feedback

About

Help

News

Narrow your search

Library

KU Leuven (1)

UCLouvain (1)

ULiège (1)

Resource type

book (1)

Language

English (1)

Year

From To

2007 (1)

Listing 1 - 1 of 1
Sort by

Book

Building and exploring Web Corpora : proceedings of the 3rd Web as Corpus workshop, incorporating Cleaneval
Authors: Cleaneval --- Web as Corpus workshop
ISBN: 9782874630828 2874630829 Year: 2007 Volume: 4 Publisher: Louvain-la-Neuve : Presses universitaires de Louvain-la-Neuve [presses de l'U.C.L.],

Abstract | Keywords | Export | Availability | Bookmark

Loading...

Export citation
Choose an application

Reference Manager

EndNote

RefWorks (Direct export to RefWorks)

Bookmark

Abstract
More and more people are using Web data for linguistic and NLP research. The Web as Corpus workshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead. This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium). The focus is on the description of Web corpus collection projects, the exploration of Web data characteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLP purposes. Any use of Web data requires that it be cleaned in order to get rid of unwanted material including, for example, HTML markup, navigation bars, advertisements. To date there has been no sharing of resources or expertise in this particular domain and the cleaning has often been done minimally. Cleaneval was an exercise aimed at promoting collaboration and improving our understanding of the issues. Results and perspectives are presented in this book.

Keywords
Conferences - Meetings --- Linguistique --- Computational linguistics --- Informatique --- Corpus linguistique

Listing 1 - 1 of 1
Sort by