Lan-fen Huang

Shih Chien University, Taiwan



This paper reports the compilation of a corpus of Taiwanese students’ spoken English, which is one of the sub-corpora of the Louvain International Database of Spoken English Interlanguage (LINDSEI) (Gilquin, De Cock, & Granger, 2010). LINDSEI is one of the largest corpora of learner speech. The compilation process follows the design criteria of LINDSEI so as to ensure comparability across the sub-corpora. The participants, procedures for data collection and process of transcription are all recorded. Fifty third- or fourth-year English majors in Taiwanwere given recorded interviews in English. Each interview was accompanied by a profile containing information about such learner variables as age, gender, mother tongue, country, English learning context, knowledge of other foreign languages, and amount of time spent in English-speaking countries and such interviewer variables as gender, mother tongue, knowledge of foreign languages and degree of familiarity with the interviewees. Data on another variable, the learners’ English proficiency level based on the results of international standardised tests, was collected; this is not available in other sub-corpora of LINDSEI. The participants’ proficiency was similarly distributed across B1 to C1 levels in the Common European Framework of Reference. The structure of the Taiwanese sub-corpus is discussed in comparison with eleven other published sub-corpora. The preliminary investigation, using corpus-linguistic approaches, reveals overall statistical information about the Taiwanese component and Version 1 of LINDSEI. The lexical analyses of the top 50 words and chunks show the characteristics of spoken English in the Taiwanese sub-corpus. The contributions and research potential of this newly-developed learner corpus are discussed, followed by an example of Contrastive Interlanguage Analysis of the most common chunk, I think, in the Taiwanese learners’ speech. The release of this learner corpus is merely the first step. It is hoped that more corpus research will be done on Taiwanese learners, that corpora of other speech genres will be compiled and that research results will contribute to relevant areas in Applied Linguistics.


Key Words: LINDSEI, interlanguage, learner corpus, Taiwanese learners of English, I think








「魯汶國際英語口語中介語語料庫」(LINDSEI) (Gilquin等 2010)為規模最大的英語學習者口語語料庫之一,目前共有二十個國際研究團隊參與。為確保各語料庫之間的可比性,台灣英語學習者口語語料庫依LINDSEI設計準則來建構。本文詳述語料庫建構流程招募參與者、執行面談和謄寫音檔等。與其它子語料庫略為不同,台灣子語料庫收錄參與者的英語檢定成績,以歐洲語言共同參考架構(CEFR)為標準,程度大多介於B1 C1等級。本研究使用台灣子語料庫和LINDSEI第一版十一個子語料庫,進行量化語料分析、單詞分析和詞串分析。再以台灣子語料庫中頻率最多的詞串I think為例,初步量化比較中介語,並討論其研究潛力。台灣英語學習者口語語料庫透過國際合作,將提供國內外學者研究之用,並作為未來建構語料庫之參考。


關鍵字: 魯汶國際英語口語中介語語料庫、中介語、學習者語料庫、台灣英語學習者