VOCABULARY INPUT IN ESL TEXTBOOKS: A CORPUS-BASED ANALYSIS

Although textbooks are a major source in language learning, few research studies explored to what extent vocabulary input in ESL textbooks supports English language learning. The present study compares the vocabulary input of two English textbooks prescribed for Indian university undergraduate students. A corpus is constructed using online software based on the words in the textbooks. Comparison of frequency against different corpora reveals the differences in the order of words. The results are discussed both from a quantitative and a qualitative perspective. The analysis reveals that many words in the textbooks occasionally occur in common everyday language use and vocabulary selection in these textbooks demonstrates great variation in the number and selection of vocabulary. Therefore, the study confirms that the communication needs of learners are afforded greater weight than frequency criteria.


Introduction
Vocabulary frequency information provides a rational basis for ensuring that learners get the best return for their vocabulary learning effort by ensuring the words studied will often be met. Vocabulary frequency lists that take account of the range have a significant role in curriculum design and setting learning goals. The essential thing that can be learned from studying the language contained in a corpus is how frequently any particular word occurs. A frequency list records the number of times each word occurs in the text. Therefore, it can provide interesting information about the words that appear (and do not appear) in a text. A word list can be arranged in order of the first occurrence, alphabetically, or in frequency order. First occurrence order serves as a quick guide to the distribution of words in a text, an alphabetic listing is built mainly for indexing purposes, but a frequency-ordered listing highlights the most commonly-occurring words in the text. Word frequency counting dates back to Hellenistic times and made word lists. Word frequency count in English was started for making word lists. Since the initiation of word counting made by Kaeding (mentioned by Engels 1968), there have been several developments and innovations in the study of word frequency. A few important works include Thorndike's Teacher's Word Book in 1921 (Bright andMcGregor 1970), Michael West's A General Service List of English Words (1953), the work of Kučera and Francis (1967). development of computerized corpora, word frequency study has been "revolutionized" (Read 2004: 156); and very large size corpora have been established in this way, from which more reliable, authentic, and up-to-date word lists based on frequency counts have been prepared. English word lists based on the British National Corpus, COBUILD Corpus, and Cambridge International Corpus are worth mentioning.

Review of literature
To fully know a word means knowledge of meaning and form, register, collocation, and association, among other things (Nation, 2013). According to Tyler (2012), to gain this knowledge and enable the automatization of word knowledge and recognition, a word must be encountered in various contexts. According to (Aitchinson, 2012), words are not stored in isolation in our brains but are connected in semantic networks. When learners encounters the same word repeatedly and in different contexts, those connections are likely to be internalized and thus remembered (Nation, 2008). As Cameron (2001) emphasizes, language learning materials should be constructed to make it possible for learners to meet new words repeatedly and in different contexts and how often a word should recur in a textbook to enable learning is unclear. The figures vary among researchers, with estimates ranging between 5-6 (Cameron, 2001;Nation, 1990) and twenty (Waring & Takaki, 2003) occurrences. Repeated exposure has been shown as 3-4 times more important for beginners than for advanced learners (Zahar, Cobb & Spada, 2001). Studies suggest that knowledge of high-frequency words is closely related to text coverage and successful language learning (Nation, 2006;Nation & Beglar, 2007).
Relevance of word frequency in language learning, it seems natural that a learner can pick the more frequently occurred items with ease compared to the rarely occurred ones in the course of his learning career. This implies the need for making provision for the occurrence of words time and again in language learners' course materials. Moreover, particularly in the initial stages of language learning, it seems reasonable to emphasize those vocabulary items that occur very frequently in the concerned language. Learning such items will be more helpful for learners in general than learning those items that are rarely used in the language. Word frequency is highly important, and despite some other criteria, it has been considered an essential criterion for vocabulary selection in language pedagogy (Honeyfield, 1987;Verghese, 1989). The primary purpose of depending on frequency is to assist the learners in developing "a survival-level repertoire for comprehension and production in language" (McCarthy 1990: 79). This sort of exploration will help us in predicting how far the learners' exposure to the target language can facilitate the comfortable mastery over vocabulary and can thus contribute towards language development in such a way that they can function well once they encounter the situation whereby the use of authentic, real-world English is demanded.
The present study investigates the vocabulary input contained in selecting textbooks for ESL teaching in an Indian University: Degree first and second-year undergraduate program. It pursued the following specific objectives: 1) finding out the number of words in the textbooks at two levels; 2) to identify the different words contained in the textbooks; 3) to draw a profile of the distribution of word categories in the whole textbook corpus; 4) to determine whether there is an increase in vocabulary input through the first and second-year texts.

Method
A corpus consisting of two English textbooks were created for the study. An analysis is done to what extent the vocabulary in the textbooks corresponds to common everyday language use and compares lexical occurrences, adjectives, verbs, and nouns. The university curriculum at the advanced level states that learners need to develop the ability to use language in everyday life and focus particular attention on the words they use and how they use them. The textbooks also mention that learners need to develop everyday verbal communications to be accurate and effective with exposure and study of the language. These textbooks are designed to develop command over the English language with specimens of modern English, ordinary conversations, personal letters, formal speech, and reflective writing, fiction, and drama activities. The objectives of teaching the second-year learners are to develop the LSRW skills, and the exercises and the classroom activities are specifically designed to train the students in compositional and communication skills. The English prose selections serve as models for content and style and develop writing and communication skills through the exercises and activities. The two-word lists of undergraduate English textbooks of Degree first and the second-year are prepared using Web Frequency Indexer v1.3. In the present study, the unit of analysis is the word as Carter (1998:4) defined "any sequence of letters (and a limited number of other characteristics such as hyphen and apostrophe) bounded on either side by a space or punctuation mark" For Lemmatization as Read (2000:18) proposes as the grouping under the same heading of the base and inflected forms of a word. However, we are aware that in most corpus studies, words are lemmatized. The study has aimed to see how far the frequency patterns of words found in a large corpus with a wide range of sources match with the selected ESL textbooks corpus. From a pedagogical view, it is essential to know which words forms are included in the textbooks and which are not. Researchers like (Laufer 1991: Nation 1990: Richards 1976: Wallace 1992 have claimed that knowing a word means knowing its different word forms. The present study is quantitative, and it attempts to count the number of words comprising the vocabulary input in a sample of textbooks. It establishes ratios and percentages to compare the vocabulary load of the textbooks.

Results
The number of words contained in the textbooks corresponding to the levels is presented. The two levels' analysis of the textbooks with the following objectives is addressed.
1) The number of words contained in each textbook; 2) The top fifty words (both grammatical and content word types); 3) The top hundred content words; 4) Common words found in textbooks at different levels.

Number of words in the textbooks
The results indicate a significant difference in the number of types analyzed: 2424 in Degree first-year textbook compared to 3777 in the second year textbook. The difference (1353) in the number of types in each textbook: considering the number tokens (total number of running words) in each textbook: 10071 in Degree first-year textbook versus 16827 in Degree second-year textbook and the difference (6756). The lexical variation between the two textbooks is 22.46 and 24.06 and not as high as it might appear by looking at the raw figures. An analysis of word types and tokens can be summarized as follows:

Vocabulary size and text coverage
One of the most important points of interest in word frequency studies has been to see how many words cover what percentage of the entire corpus. There have been attempts to establish several frequency bands (levels) in terms of the most frequent words(e.g., the most frequent 100 words, 500 words or 1000 words, etc.) and calculate in percentage the entire corpus's size that the items falling within a given frequency band occupy. The result gives us an estimation of the coverage of a given frequency band in the given corpus. With this overview, an attempt has been made to establish different bands of word frequency as given in the table below and calculate the coverage of the items falling into them, based on both corpora separately. The frequency analysis informs us that the crucial part of the learner's vocabulary lies in the first 1000 words, which have 80-90% coverage in the whole text. The fifty most frequent function words in these words list contribute coverage of 44.2% & 44.7% of the whole text. For the following 1000 words, there is an increase of 5% in coverage for every 500 words. The remaining vocabulary (roughly 1000) in the text contributes only 10% or less than 10%, and the distribution of function words and content words across the two textbooks maintains the same range (i.e., coverage and contribution). The percentage coverage of words in the whole textbook corpus is presented in Table 2, along with the graphs. Table 2 compares the frequency of the textbooks with well-established corpora.
The top fifty words for each textbook alongside their occurrences with respect to the total number of tokens in the text are shown in decreasing order. The information given in the table above is depicted in the following figure (G1, G2) below, highlighting the coverage pattern of words and the two corpora across different levels. When compared, the Degree second-year textbook has greater coverage than the first-year textbook. As the figure depicts, there is a slight disparity between the percentage values derived from the two in the first frequency band (50 words level), which shows the tendency of a gradual increase at each next upper band until the curve reaches the five hundred band. After that, the increase or decrease in the disparity is negligible, and we can notice more or less a parallel increase in the coverage of the given frequency band in the two corpora up to the 300 words level. Despite the difference between the two corpora regarding the percentage value seen in all the frequency bands, a general principle established by several word frequency studies in quite large corpora that a small number of items account for a very high proportion of general use (Branford 1967) seems to be applicable in the case of both corpora. Table 3 illustrates the frequency lists from three different corpora, and a comparison is made against Cambridge International Corpus (CIC), The Cambridge and Nottingham Corpus of Discourse in English (CANCODE), the British National Corpus (BNC). The fourth and fifth lists are Degree first and second-year textbook. Function words are predominant in the two textbooks, including pronouns, determiners, prepositions, modal verbs, auxiliary verbs, conjunctions. Several lexical words such as know, well, got, think, and right are the most frequent words in the first fifty words. Function words dominate the highest frequencies of five lists, and indeed one of the defining criteria of function words is their high frequency. As we go down the frequency list, there is no absolute cutoff between function words and lexical words of high frequency such as thing, young, science, and day. On closer examination, some of the 'lexical' words which appear in the high-frequency function word list prove to be elements of interpersonal markers (e.g., you know, I think) or single-word organizational markers (well, right). The ranks of most of the words in corpus conform closely to those obtained in the Brown corpus (Kucera & Francis, 1967), BNC, CIC, and CANCODE. The relative frequencies of words, ranging from the most frequent (the) to the least frequent (been), are the same. Some words have changed their order that can be ascribed to the differences in the size of the corpus. The derived words contribute only 0.61% in Degree first-year text, 0.56% in Degree secondyear textbook. The first-fifty most frequent function words of Degree first-year text contribute 44.2% of the whole vocabulary, and second-year degree text is 44.7%. The first 100 most frequent content words contribute only 9.6% in Degree first-year text and 8.01% in Degree second-year text in terms of coverage of the whole text.

The top hundred content words contained in the textbooks
The content words are grouped into nouns, verbs, adjectives, and adverbs. There is similar distribution in each textbook. It was easy to find out which word among the synonyms is the most frequent from the above list. The content words are grouped into nouns, verbs, adjectives, and adverbs. There is similar distribution in each textbook. It was easy to find out which word among the synonyms is the most frequent from the above list. For example, if we take start (0),(0), begin(0),(1) and commence(0),(0), begin is more frequent than start, and commence is the least frequent without any citation as against (1) occurrence for begin. The words that commonly go together are not included in the textbooks, such as knife, fork, spoon, but loan words such as teacher, educator, and tutor are given, learners are familiar with and occur in their mother tongue. Many new words are introduced at the advanced level, such as punctiliousness and yanked, which occur as less frequently occurring content words, where the learners need to spend more time learning them. Idioms are also introduced, such as scrambled out, throttle down, on the wane, leaps, and bounds where the complete meaning cannot be deduced from the sum of its parts. These words include inflected forms of nouns such as plural possessive, neighbour -neighbours, beggar-beggars, personpersons, child-children. Word lists/vocabulary extracted from these texts does not include all the inflected forms of the relevant categories. The words in Italics do not occur in the word lists, however, included here for comparison.

Commonly occurring prefixes
Many irregular verb forms, mainly the past tense forms, occur, but the present tense forms do not occur, e.g., took, cut, and went. The comparative inflections -er and -est (sooner, soonest; quicker, quickest) have been found. The content list contains many derived words such as nouns, verbs, adjectives, adverbs, and only specific derivational vocabulary is frequent and significant in learning and teaching. Derived words here mainly consist of words derived by adding derivational prefixes and suffixes to the stem. These affixes often change parts of the speech of the existing word. Some of the derivational affixes listed below are very productive. The most commonly occurring prefixes in these textbooks are in-, dis-, de-, un-, over-, under-, fore-, re-and ir-. Table 5 displays the distribution of prefixes throughout the textbooks. These two textbooks have 0.56% and 0.61% coverage in the whole corpus. The prefix un-had the highest frequency and topped the list. Each prefix was used an equal number of times in each textbook, and few prefixes included more than others. For example, non-de-anti are equally shared in the textbooks. There are many affixes that are still productive and used to generate new words. Among the most common prefixes that are used to form new words.
unfriendly -unhelpful or harmful engrain -not concerned about or harmful to the environment unwaged -unemployed unleaded -not containing lead Although it is sometimes difficult to process the meaning of words that have been formed by affixation, new ones are generally transparent. Once the learner knows the meanings of the root and that of the affix, the combination can be easily predicted.
The prefix anti-can also be used in the sense of preventing or neutralizing. The prefix de-means to remove or reverse and only two occurrences are contained in the text. The prefix non-means negation, exclusion or refusal. The second language learner has to learn the most productive affixes and also their patterns of distribution, e.g., teach -teacher -teaching. This particular affix -ing does not change the part of speech as both are nouns, rather the grammatical meaning of the word is changed. A teacher is one who teaches, teaching is a classroom activity of a teacher. Table 6 shows the distribution of types, tokens, and the odds of encounter with suffixes in the textbooks. Compared with the distribution of prefixes shown in Table 7,8 many types of suffixes are included. A suffix is a group of letters which is added to the end of a word that changes how a word is used as a part of speech. Suffixes can carry grammatical information (inflectional suffixes), or lexical information (derivational suffixes) The most commonly used inflectional suffixes are -est, -ies, -ed, -ing, and derivational suffixes are -tion, -ly, -ness, -ful. Throughout the textbooks a majority of suffixes are included in the textbooks. The combination of a base and suffix may decrease readers' opportunities to encounter other variants. Compared with prefixes, suffixes exhibited substantially more types and tokens in the textbooks. Though suffixes with various types and many tokens (e.g., -er and -ly) are attached to certain bases, which may prohibit expansion of the reader's suffix knowledge, they can be useful tools for teaching suffixes and are relatively easy to enhance using other materials such as graded readers and supplemental reading. Using these suffixes in explicit instructions is also effective in helping learners increase their morphological knowledge (cf. Bowers, Kirby, & Deacon, 2010). The suffix -able occurs 27 times and are shared commonly in both the tests. Examples extracted from degree first and second-year frequency lists. The suffix -dom occurs four times in the second year degree textbook. Most of these words are abstract nouns. The suffix -er has the highest frequency and coverage. It has been observed that when more than one affix is involved in derivation, the resulting words are more complex and hence present difficulties in the acquisition of the word. Often these words have different meanings and different restrictions for word collocations from those of their bases. This makes language learning very difficult. In such a case, each word has to be acquired as a separate one. However, if frequencies of words are correlated with their usage patterns, it seems prefixation in derivation is less frequent hence present difficulties in acquisition when compared to suffixation.

Contracted forms and their frequencies
Some of the contracted forms are acquired later than the full forms. These examples suggest that learners begin lexical development by using features that vary along perceptual dimensions. The forms in the left most columns are spelling variants (abbreviated) of the corresponding forms in the right of the table. However, they occur in fewer frequencies.
In formal learning or teaching one may avoid use of such variants. These examples suggest that learners begin lexical development by using features that vary along perceptual dimensions. Learners acquire nouns earlier than verbs, but the learnability criteria explain how the learners establish word classes that are equivalent to noun or verb.

Compound words
Few of the compound words are difficult to process and ESL learners take time in understanding the semantic relations hence find these forms difficult. Learners are not sure whether compounding is possible. A second problem for learners is to relate the order in which the parts of a compound appear. There are pie apples and apple pies, there is gum chewing (often used as an adjective) and chewing gum. This, undoubtedly, is confusing to an L2 learner as shown in the word order errors from Hatch & Brown (1995:194).

All time I have my book map in my bag. Soldiers have glasses shoes.
We eat in room banquet. Prince married shoes girl.
Such errors suggest that learners do not learn all compounds as new items, rather put the parts together to create the compounds. In doing so, they use the typical word order of their first language. Another possibility is that they place the words such that the meaning on which they want to focus comes first and then the modifying part is added. Some of the interesting compounds presented in the textbooks are as follows.
cock -eyed river -bed half -opened heart -boiled cold -blooded noble -hearted many -horned well -oiled many -getting well -educated over -shadowed easy -going fast -changing price -winning tele -education self -satisfaction

Conclusion
To conclude, the corpus study provided a basis for analyzing the textbooks has an important role to play setting learning goals and curriculum design. In the present study, quantitative analysis on the vocabulary input in two textbooks from two educational levels was carried out. Depending on the prescribed textbooks learners may be exposed to different words and learners may be provided wordlists as the major source of vocabulary learning activity. Course designers can refer to wordlists when they consider the vocabulary component of a language course and the teachers need to have reference lists to judge whether a particular word deserves attention as frequency provides a key indication of their importance. Percentage coverage was the main criterion for selecting what has to be taught to the degree level students. Frequency information allows teachers to focus appropriately on the most common words, ensuring that learners know and actively use them. The less frequent words are topic-specific and can be acquired when needed, e.g., benefactor, despotic, flutter. The common words need less learning effort as the frequency of exposure helps in easy understanding. Concerning language textbook designers, this study provided an analysis of two textbooks, which, hopefully, should make teaching material designers reflect on the need to follow common objectives. Textbooks for different levels should contain words depending on age appropriateness. Regarding language teachers, the results of this study highlight the nature of the input that textbooks should include. As textbooks provide students with different kinds of input, and that this difference may have an effect on language learning, so teachers need to find out what criteria are used in vocabulary selection. From a research perspective, this study contributes to the field of second language acquisition and teaching with a description of the type of vocabulary contained in English language textbooks as representative of an important educational genre. It provides the evidence of vocabulary input contained in small sample of books with empirical evidence. However, a further analysis with comparison to different ESL textbooks might help in arriving at definite conclusions. Finally, word frequency is an important dimension in ESL textbooks along with other aspects such as vocabulary input, lexical density, lexical variation that require investigation such as the influence of the number of word encounters in word learning, or the effect of word length and word class in the acquisition of vocabulary. Other aspects that need further investigation are the relationship between the vocabulary input provided by textbooks and learners' vocabulary output in each level. It is essential to control certain variables that may be having an effect on this relationship, such as teachers' treatment of the vocabulary contained in the textbook, or learners' strategies to retain and recall the unfamiliar words.