Veja agora mesmo a nova edição #88 da Revista New Routes na íntegra!

BlogInglês

How are corpus-informed materials different? – Daniela A. Meyer

Daniela A. Meyer – 

You may have already noticed that nowadays several ELT materials are corpus-informed.  Do you know what that means, and what exactly the difference is? First of all, what is a corpus? As Michael McCarthy points out, “a corpus is a collection of texts, written or spoken, usually stored in a computer database.  A corpus may be quite small, for example, containing only 50,000 words of text, or very large, containing millions of words.  The Cambridge International Corpus, collected by Cambridge University Press, has 700 million words of texts, and consists of a wide variety of both written and spoken material: from newspapers, magazines and books, to phone calls, meetings, media broadcasts, and casual conversations.”
Both written and spoken corpora show us how language is used in real life and in many different contexts.  Once a corpus is stored in a database, we can analyze it and search for information about the language.  We can, for example, get answers to questions like:
 
What are the most frequent words and phrases in English?
What are the differences between spoken and written English?
Which tenses do people use most frequently?
How do people use words like can, may and might?
How often do people use idiomatic expressions and why?
 
With corpora and software tools to analyze them, we no longer have to rely on intuition alone to know what we say or write, we can see language’s real usage.  Therefore, materials developed with corpora can be more authentic and can illustrate language as it is really used.  Let me give you a few examples.  We know that corpora have been used to design dictionaries for learners.  But how can a corpus help a course book author?  Authors analyze the Corpus to look for the most frequent and typical uses of everyday words. For example, how do people most typically use the verb can?  As well as having the meaning “ability” (e.g., I can swim under water), conversations in the Spoken Corpus show that an even more common use of can occurs when people talk about what it is possible to do in different places and situations (e.g., In New York, you can go to the top of the Empire State Building).  So, authors of course books can include this meaning and give it priority.  This will enable learners to use the language more naturally in their own conversations and classroom speaking activities.
 
A corpus is a very rich resource for writers because it gives them a detailed view of how people speak and write in everyday situations.  Writers and editors learn about vocabulary, grammar, formality and informality, the differences between spoken and written language, how we perform basic functions (i.e., requesting, greeting, apologizing, etc.), how people open and close conversations, how we change the subject, and so on.  A corpus can also provide very useful statistics to help textbook writers present grammar and vocabulary in the best way.  In choosing the vocabulary items to include in a course, frequency lists derived from the analysis of the Corpus are very helpful.  Authors then research these word lists from the Spoken and Written Corpus and make judgments about which words are the most important to include.  For example, there is a wide range of words in English for describing colors.  Which ten words would you teach first, if you were writing a course book? Before the advent of corpus-informed materials, the author would decide based solely on his/her intuition and preferences.  But now, by searching the Corpus, authors establish a list of the most common color words in order of their frequency , based on the Spoken Corpus of, say, North American English.  And the ten most frequent color words, in order of frequency, are white, black, red, blue, brown, green, yellow, gray, pink, and orange. These are prioritized and the less frequent ones will be taught in future lessons.
 
How else can Corpus research help authors and editors?  The most basic tool for analyzing the texts in a corpus is the frequency list.  A frequency list tells us what words and phrases are used most often.  The most frequent word of all, in spoken North American English, is I. You is also in the top 20 word list, because conversation is very interactive.  In the Written Corpus, however, I and you appear less frequently.  Most of the top 50 most used words are grammar words (pronouns, prepositions, articles, conjunctions, auxiliary verbs, etc.), but not all of them. “Non-words” such as uh, um, and oh are also high-frequency items. They are important as ways of showing that one is listening and reacting, as silence is not normal in ordinary conversation, even when we’re listening.  From these frequency lists, we can learn a lot about how people communicate, and this information can be used to design appropriate materials and activities for the conversation class.
 
Another way of learning from the Corpus is the study of collocation, that is, the likelihood that two words will occur together.  So, for example, the word blond is likely to be used with hair, curls, woman, etc., but not with car or jacket.  Beige, on the other hand, occurs with carpet, jacket, etc., but not with hair. We say, therefore, that blond collocates with hair, but beige does not.  Knowledge of collocations is vital for effective language use, and a sentence that is grammatically correct will look or sound awkward if collocational preferences are not used.  We say blond hair but not blond car; lean meat but not slim meat; perform a play, but not perform a meeting.  Developing learners’ awareness and familiarity with patterns of collocation from the beginning of their learning will enhance their speaking abilities.
 
The software tools that are used to analyze corpora, the frequency lists and collocation statistics enable textbook writers to get at the facts about language use in a way that would be very difficult to do by using intuition alone, or by studying a small number of texts. When authors spend long hours interpreting and mediating their corpus research, they have, as McCarthy mentions, “three broad goals in mind:
 

  • To identify authentic, motivating language
  • To weave these findings into a carefully crafted syllabus
  • To create course books that are familiar in structure and easy to use”

 
Teachers and learners should expect that, in most ways, corpus-informed materials will look like traditionally prepared materials.  The presentation of new language and activity types will be familiar. Certainly, teachers do not need any additional knowledge to use them.  Beneath the surface, however, corpus-informed materials are genuinely special because they are based on actual usage, the examples used in them are not invented (but may be edited or adapted), the contexts in which words and grammar structures are used are authentic ones, and the writers can anticipate common errors but looking at corpora of learners’ work.  Last but not least, successful learning is all about motivation, and corpus-informed materials motivate both teachers and learners because they can be sure that the language they are practicing is modern, used in everyday situations, targeted to situations they are likely to find themselves in, and corresponds to what they will hear and see in real conversations, movies, radio and TV shows, Internet texts, and magazines.  It is not artificial or invented language, but consists of the most widely used words, phrases, and grammar.  Therefore, learners are sure to produce more effective communication!
 
References:
McCarthy, M. (2004) Touchstone From Corpus to Course Book. Cambridge: Cambridge University Press.
O’Keefe, A., McCarthy, M. Carter, R. From Corpus to Classroom. Cambridge: Cambridge University Press.
Richards, Jack C. (2008) Moving Beyond the Plateau From Intermediate to Advanced Levels in Language Learning. Cambridge:  Cambridge University Press.
 
 

This post is offered by: Cambridge University Press

CUP

 
 

Related posts
BlogInglêsO que há de novo

FAN CULTURE

Chá PedagógicoInglês

What’s new on Chá Pedagógico?

BilinguismoDisal IndicaInglês

Disal Indica - Do You Know? Making Clean Energy (Level 4)

BilinguismoDisal IndicaInglês

Disal Indica - Exercícios de Inglês - Graded Exercises

Assine nossa Newsletter e
fique informado

    E-mail

    Deixe um comentário

    O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

    Espere um pouquinho!
    Queremos mantê-lo informado sobre as principais novidades do mercado acadêmico, editorial e de idiomas!
    Suas informações nunca serão compartilhadas com terceiros.