The website provides secure on-line access to two oral corpora of contemporary English: PAC and LVTI.
THE PAC CORPUS
The PAC corpus was launched in 2003. In 2016, the corpus is made up of 37 surveys of native varieties across 9 English-speaking countries, corresponding to the interviews of 314 informants and 264 hours of validated recording (average length : 50 mn). Other surveys are announced in the coming months for Bath (UK), Laval and Quebec (Canada) and Michigan (USA).
The methodology is inspired by the classical work of Labov (1966, 1972, 1994, 2001, 2010) in that, for each selection of speakers, it involves the reading of a word list and a passage as well as formal and informal conversation (click here to see the protocol). But in each area surveyed, the speakers (usually groups between 10 and 20 informants) are selected on a network principle well known in the United Kingdom, particularly from the work of the Milroys and their associates (see Milroy 1980, 1987). The corpus relies on geographical criteria so as to cover both hemispheres of the English-speaking world. This way a large variety of Englishes with different historical, political, socio-cultural foundations and functions may be described.
The selection of informants involves proper knowledge of the linguistic community under study. The choice of a network of informants (ideally between 10 and 20 informants in a PAC survey) must be consistent with three major criteria:
- only speakers with a native status or a sufficient number of years at school within the same linguistic community will be retained
- a balance in the number of male and female speakers should be prefered
- whenever possible, a survey should exhibit three age ranges (e.g. 70+, 40+, 20+)
A biographical sheet is set up for each informant in order to control basic socio-economic parameters about the speakers (family background, education, professional status, ethnicity, other languages spoken within the community, etc.). A consent form and full anonymisation of the data ensure a correct treatment of the data in our database.
THE LVTI CORPUS
The LVTI corpus was launched in 2011 with two parallel corpora of spoken British English and French : Greater Manchester, England / Greater Toulouse, France.
The first three year period (2011-14) led to a collection of data of 61 speakers in Toulouse and 65 speakers in Greater Manchester.
LVTI – GREATER MANCHESTER : A MAP OF THE SURVEY (2011-2014)
67 speakers (36 ♀; 31 ♂)
LVTI – GREATER TOULOUSE : A MAP OF THE SURVEY (2011-2014)
61 speakers (33 ♀, 28 ♂)
Both PFC and PAC protocols are being applied to cohorts of speakers for each location, each time with specific sociolinguistic, psycholinguistic, sociological and pedagogical concerns, with specific extensions to both corpora (primary and secondary schools…).
As far as possible, corpora include ecological audiovisual recordings (in the sociolinguistic sense), but one has to remain careful about the comparability of the data.
The common protocol heavily relies on the PAC / PFC formats, with reading tasks (wordlists and a text), semi-guided conversation between the informant and the interviewer (with the usual PAC / PFC information sheet) and free conversation between the informant and a relative or friend or colleague.
A supplementary thematic conversation is added (semi-conversation style with a questionnaire) to fit the themes of language, work, urban life and identity and to back up some of the linguistic issues which are at stake. In the questionnaire, three sets of questions relating to urban life, work and language are proposed to each informant.
In the LVTI programme we need to take the speech of children and teenagers into account, which implies a few adjustments to the protocol:
- a school version of the protocol for Manchester Grammar School (recording of pupils from year 7 onwards) and an equivalent high standard school for girls, together with two boys and girls lower achieving schools in Greater Manchester
- another suitable level of protocol for primary schools (children aged 5-11), with a shorter version of the text and the design of map tasks (designed with material available at the Human Communication Research Centre Map Task Corpus, Universities of Edinburgh & Glasgow [http://www.hcrc.ed.ac.uk/]
One of our goals is to create a large interdisciplinary database that will be shared across disciplines.