Summer School 2016 Workshop 1

Corpus-based approaches to the study of language (Martin Hilpert)

Martin Hilpert is an Assistant Professor of English Linguistics at the University of Neuchâtel. He holds a PhD from Rice University and did postdoctoral research at the International Computer Science Institute in Berkeley and at the Freiburg Institute for Advanced Studies. He is interested in cognitive linguistics, language change, construction grammar, and corpus linguistics.

This workshop is a gentle introduction to corpus linguistics that assumes no prior experience with it. Why should you learn how to use corpora in your research? The short answer is that corpus-linguistic tools are incredibly empowering: they allow you to ask (and answer) a broad range of research questions that you could not tackle otherwise. At the same time, a lot of current linguistic research is corpus-based, so you need to know about corpus methods to better understand what other linguists are doing.

The three days of the workshop will be structured as follows. Each day will consist of a two-hour presentation and two hours of hands-on exercises. Day 1 will be dedicated to a basic corpus-linguistic tool: concordancing software. We will examine what kinds of question can be answered through the simple retrieval of key words in context. You will also get to know a few tricks for effective search, including so-called regular expressions.

Days 2 and 3 will examine 'what concordance programs can't do'. For a considerable period of time, concordance programs have defined the scope of corpus-linguistic work. Whatever the software could do, researchers could do, whatever it couldn't, researchers couldn't. Recent years have seen a very liberating development: Instead of using ready-made programs, more and more corpus linguists turn to tools that they can flexibly adapt. One such tool that I will present is R, which is a software that not only allows you to perform corpus-linguistic tasks, but which you can also use for visualization and for the statistical analysis of linguistic data. On days 2 and day 3, we will use R to answer a few basic linguistic questions that would be quite hard to investigate with a regular concordance program.

Participants should bring their own laptop computers. There is no need to pre-install software. We will do that directly in the workshop.

(C) ISLE 2009 - Acknowledgements - Email webmaster