Web Corpus Construction (Synthesis Lectures on Human Language Technologies)
Free Shipping Included! Web Corpus Construction (Synthesis Lectures on Human Language Technologies) by Morgan & Claypool Publishers at Translate This Website. MPN: black & white illustrations. Hurry! Limited time offer. Offer valid only while supplies last. The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).
For additional material please visit the companion website: sites.morganclaypool.com/wcc
Table of Contents: Preface / Acknowledgments / Web Corpora / Data Collection / Post-Processing / Linguistic Processing / Corpus Evaluation and Comparison / Bibliography / Authors' Biographies
|Manufacturer:||Morgan & Claypool Publishers|
|Part Number:||black & white illustrations|
|Publisher:||Morgan & Claypool Publishers|
|Studio:||Morgan & Claypool Publishers|
|MPN:||black & white illustrations|
|Item Weight:||0.58 pounds|
|Item Size:||0.33 x 9.25 x 9.25 inches|
|Package Weight:||0.75 pounds|
|Package Size:||7.5 x 0.33 x 0.33 inches|
Have questions about this item, or would like to inquire about a custom or bulk order?
If you have any questions about this product by Morgan & Claypool Publishers, contact us by completing and submitting the form below. If you are looking for a specif part number, please include it with your message.
Related Best Sellers
mpn: 607 black & white illustrations, biograp, ean: 9781484209592, isbn: 1484209591,
Python Data Analytics will help you tackle the world of data acquisition and analysis using the power of the Python language. At the heart of this book lies the coverage of pandas, an open source, BSD-licensed library providing high-performance, easy...
ean: 9781484235874, isbn: 1484235878,
Gain an accelerated introduction to domain-specific languages in R, including coverage of regular expressions. This compact, in-depth book shows you how DSLs are programming languages specialized for a particular purpose, as opposed to general purpos...
By Addison-Wesley Professional
ean: 9780134546926, isbn: 013454692X,
Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally b...
By Technics Publications
ean: 9781634621304, isbn: 9781634621304,
Master how to use the Julia language to solve business critical data science challenges. After covering the importance of Julia to the data science community and several essential data science principles, we start with the basics including how to ins...
By MySQL Press
mpn: illustrations, ean: 9780672328701, isbn: 0672328704,
Written by the creators of MySQL and edited by one of the most highly respected MySQL authors, the MySQL Administrator's Guide and Language Reference is the official guide to installing MySQL, to setting up and administering MySQL databases, and to ...
mpn: Illustrations, ean: 9780201314519, isbn: 0201314517,
This work prepares students for the world of computing by giving them a solid foundation in the science of computer science, algorithms. By taking an algorithm-based approach to the subject, this introductory text seeks to help students grasp overall...
By Brand: Cambridge University Press
mpn: 23882043, ean: 9780521865715, isbn: 0521865719,
Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective ...
By Ben Gan Itzik
mpn: 9780735685048, ean: 9780735685048, isbn: 0735685045,
T-SQL insiders help you tackle your toughest queries and query-tuning problems Squeeze maximum performance and efficiency from every T-SQL query you write or tune. Four leading experts take an in-depth look at T-SQL’s internal architecture and offe...
By Murach Joel
ean: 9781890774967, isbn: 1890774960,
If you’re an application developer, or you’re training to be one, this 2016 edition of Murach’s classic SQL Server book is made for you.To start, it presents the SQL statements that you need to retrieve and update the data in a database. These ...
By George Gr tzer
mpn: 53 black & white illustrations, 23 colou, ean: 9783319237954, isbn: 3319237950,
For over two decades, this comprehensive manual has been the standard introduction and complete reference for writing articles and books containing mathematical formulas. If the reader requires a streamlined approach to learning LaTeX for composing e...