Extensive Reading and Language Acquisition--Two Studies

Steve Schackne
 
Background

     I first became interested in extensive reading in 1984 while in a graduate TESOL program at the State University of New York at Albany. The Chinese students in the program possessed a communicative competence which appeared to surpass that of their counterparts from other countries. Upon questioning, these students yielded remarkably similar answers--their schools used audio-lingual drills and extensive reading as the main foundations of their language programs. Could the audio-lingual drills account for their superior mastery? I doubted it. Many of the other students were trained primarily through the audio-lingual method, and the results were mixed. Could it be the extensive reading? Possibly. None of the other students were trained by this method. Also it made intuitive sense--my American friends and I all agreed that the real accomplished practitioners of the written and spoken word that we all knew were all extensive readers as well.
     The literature yielded nothing specific. Most of the articles focused on the teaching of reading and reading strategies. And when I did run into articles on extensive reading (Jiang 1984; Hubbard, Jones, Thoirnton and Wheeler, 1983), beneficial effects of extensive reading were treated descriptively, as accepted common knowledge, with no empirical evidence.
     It was against this backdrop that I decided to try to measure what everybody was so sure of--if I manipulated one variable, extensive reading, could I measure significant English level gains over a four month period?

 Instrumentation
 
     A cloze test was chosen for its ease of construction and its high correlation with standardized English level tests (Hanania & Shikhani 1986; Aitken 1975; Stubbs and Tucker 1974; Oller 1973). A narrative passage from Jack London's To Build a Fire was chosen. The first three and last three sentences were left intact. Fifty words were deleted at intervals of every seven words. The deletions included articles, pronouns, prepositions, adverbs, verbs, adjectives, nouns, and conjunctions. It was pretested on two native speakers who both scored 30 on exact response scoring.

Subjects

     Four classes from the Department of Foreign Languages and Literature at Tunghai University, Taiwan were selected to participate: day school sophomore, night school sophomore-experimental, night school sophomore-control, and day school freshman remedial. Only the two night school classes were paired for experimental purposes, however, since they differed only in treatment.
     The night school classes had nearly an identical makeup in terms of age, sex, prerequisite courses taken, and current course load. In addition, they used the same textbooks in each course, and followed the same syllabus.
      The teacher of the control group was chosen because of her similarity to the teacher of the experimental--both were non-authoritarian, outgoing professionals in their thirties who enjoyed excellent rapport with their students.

Experiment

     Night school,sophomore-experimental and night school sophomore-control were each taught two hours of deductive grammar, two hours of paragraph level composition, and two hours of functional conversation a week. Night school sophomore-experimental read (as did day school sophomore and day school freshman remedial) an average of 12 simplified readers over a 4-month semester, while night school-sophomore-control did not.
     The readers were mostly fiction, ranging from 600 headword to 4000 headword level. Students, except for day school freshman remedial, chose them based on interest and level. The only requirement was that the students must be able to go through them relatively quickly without the constant hindrance of a dictionary, and that the students must read for pleasure.

Results

     All classes measured an increase measured along four criteria:  acceptable response median (ar median), acceptable response mean (ar mean), exact response median (er median), and exact response mean (er mean). All three classes which utilized extensive reading made greater gains than the control group along the four meaures, with one exception--night school sophomore-control and day school sophomore both registered +2 gain on exact response median. This was a greater percentage gain for the control group since it started at a lower raw median.
     Comparing the control and experimental groups, raw and percentage gains made by the experimental group were substantially higher along all four measures than gains made by the control group. Experimental group outdistanced control group along all four raw measures (ar median, ar mean, er median, er mean) +100%  +159%  +100%  +64%; experimental group outdistanced control group in percentage gains +90%  +159%  +120%  +60%.
     Experimental outgaining control along acceptable response and exact response was statistically significant (p<.02; p<.08).

Conclusion

     There is strong evidence that extensive reading promotes substantial language level increase within a short period of time as measured by cloze.
     A technique that is effective obviously has many applications. Here is an activity that is not only student centered, but an activity a student can pursue independently and be relatively sure of positive results--that would make it not only effective, but cheap and convenient as well. Also, it is an activity that supplies teachers with an effective weapon, a trump card to use when confronted with stagnant, ineffective programs which, unfortunately, due to the moneymaking potential of English language institutes, abound worldwide.
     General support for and agreement about extensive reading, backed by quantitative evidence, also begs a question:  Why isn't extensive reading both encouraged and used in more EFL-ESL programs throughout the world?
 

Study #2 at the University of Macau

     Recently the subject of extensive reading has taken on an increased interest, spurred on, to a great extent, by Stephen Krashen's The Power of Reading. In this research review, Krashen cites evidence for a link between what he calls free voluntary reading and overall language competence.
     Literature exists, going back fifty years, to support the salubrious effects extensive reading has on language development. Specific studies, however, studying the effects of "pleasure" reading on second language development in EFL classes are rare (I use the term "EFL," teaching English to second language learners in an area other than North America, the United Kingdom, or Australia, as opposed to "ESL," teaching English to second language learners in North America, the United Kingdom, or Australia. An English speaking environment and the variety of experiences students encounter in that environment would constitute too great a threat to the validity of the study).
     In the 1986 study, I cited Jiang 1984; and Hubbard, Jones, Thornton and Wheeler 1983 as two articles dealing with extensive reading. However, I lamented the fact that in these two articles beneficial effects of extensive reading were treated descriptively, as accepted common knowledge, with no empirical evidence. Krashen has collected an extensive corpus, some of it empirical, dating back to 1948, citing correlations between free voluntary reading (interchangeable, I feel, with extensive or pleasure reading) and language development. The Krashen research review, however, deals primarily with native language development, not second language development.
     We know that language input is a major factor in native language development, but can we generalize when two major factors, age and language, are manipulated; that is, is the same input factor that influences children in native language development relevant to adults learning a second language? In 1995 at the University of Macau, a replication study using a similar construct and measurement as the 1986 study was set up to try to ascertain this.

Instrumentation

     A cloze test was chosen for its ease of construction and its high correlation with standardized English level tests (Hanania & Shikhani 1986; Hinofotis 1979; Aitken 1975; Stubbs and Tucker 1974; Oller 1973). A narrative passage from Raymond Chandler's A Small, Good Thing was chosen. The first and last sentences were left intact. Fifty words were deleted at intervals
of every seven words. The deletions included articles, pronouns, prepositions, adverbs, verbs, adjectives, nouns, and
conjunctions.

Subjects

     Two classes from the entering freshman class at large at the University of Macau, Macau were selected to participate. The classes included students who will major in a variety of subjects, but who are required to take a freshman efl course. The control and experimental classes had nearly an identical makeup in terms of age, sex, and current course load since freshman students at the University of Macau have a common first year. Prerequisite courses might differ slightly since Macau high schools offer a Chinese and an English track. However, the English 110 course, which the subjects were taking, draws most of its students from traditional Chinese track programs. The same textbook was used for experimental and control group. The same teacher taught experimental and control group.

Experiment

     Two freshman English 110 classes were taught two hours and forty minutes of efl a week. The course covered all four skills with a minor emphasis on paragraph writing. Interactions II--A Reading Skills Book was bought by all the students. The experimental class read an average of 11 simplified readers over one 14-week semester, while the control group did not.
     The readers were Longman and Heinemann graded readers, mostly fiction with some biography and general interest, ranging from stage 1 to stage 6 (300 to 3000 headword) level. Students chose books based on interest and level that was accessible to them. The only requirement was that the students must be able to go through them relatively quickly without the constant hindrance of a dictionary and the students must read for pleasure.

Results

     The control class registered raw and percentage gains along three criteria, acceptable response median (ar median), exact response median (ar median), exact response median (er median), and exact response mean (er mean). The control group registered a slight raw and percentage loss along one criterion, acceptable response mean (ar mean).
     The experimental class registered raw and percentage gains along three criteria, acceptable response median (ar median), acceptable response mean (ar mean), and exact response mean (er mean). The experimental group registered no raw or percentage gain along one criterion, exact response median (er median).
     The experimental class made greater gains than the control class along three out of four criteria, acceptable response median (ar median), acceptable response mean (ar mean), and exact response mean (er mean). The control group made greater gains than the experimental group along one criterion, exact response median (er median). Subsequent T-testing determined that although the experimental group made a slightly greater average gain along the criterion of exact response, there was no significant statistical difference in experimental-control group gains along the criterion of exact response. However, the experiment group significantly outdistanced the control group along the criterion of acceptable response, p<.03.

Conclusions

     There is evidence that extensive reading promotes language level increase within a short period of time as measured by cloze. But, even a cursory examination of both the 1986 and this study reveal that the 1995 results are not nearly as pronounced as the 1986 results.
     Let me briefly address the factors that might (or might not) have skewed results. First, our original sample (control n=16, experimental n=12) pre tested at about the same level, the control group and the experimental group were within a point of each other along four criteria--median exact response, median acceptable response, mean exact response, mean acceptable response. Through drop-add, our final pre-post test sample was control n=11, experimental n=10 with pre test scores varying from one to three points along the four grading criteria, with the experimental group starting at a higher level along all four criteria. One could question the experimental group starting at a higher level may have a better foundation or aptitude for language development than the control group. Conversely, starting at a lower level, the control group may benefit from having a greater statistical range to improve, sort of the opposite of the "Hawthorne effect."
     We must also add that the measurement period was two weeks less than in 1986, and the average number of books read was one less.
     Thirdly, the statistically significant results occurred in the area of acceptable response, the scoring criteria, due to both grammatical and textual factors, most susceptible to scoring error. One could question that scoring error accounted for the different results along the criteria of exact response and acceptable response, although, upon examination by outside scorers, no significant scoring errors were discovered in the 21 scripts. Conversely, one could argue that acceptable response gains on the part of the experimental group simply indicate superior creativity in productive language skills.
     The research on the two scoring methods is still unclear--Hinofotis 1976 and Oller 1972 suggest that acceptable word scoring method yields more reliable scores and provides more accurate information about esl proficiency levels; however, Stubbs and Tucker 1974 and Oller et al. 1974 indicate very little difference between the two scoring methods. Brown 1978 felt that the acceptable word method was more appropriate for measuring productive language skills, but there is no consensus for either method.

Final Word

     Based on the 1986 and 1995 results a replication study(s) should be undertaken with the following suggestions:*

     a) sample size >15 consistent from pre to post; possible multiple class study increasing sample size.
 
     b) pre testing any cloze instrument with native speakers to eliminate any linguistically controversial items.

     c) possible multi-instrument evaluation along different individual skill areas.

     d) sample taken from non-Sinitic language group in home country; e.g., Brazilians or Hungarians.
 
 

     *In 1994, Dr. James Sims undertook a similar study at Tunghai University in Taichung, Taiwan.
      Sims used a multi-instrument evaluation with several classes constituting several hundred students.
     His study showed statistically significant gains in the experimental groups which engaged in free voluntary reading.
 
 
 

Day School Freshman Remedial (N=10)
                             ar median                               ar mean                                      er median                    er mean

pre                            10.5                                     12.0                                            6.5                             7.7

post                  (+4)14.5(+38.2%)             (+3.1)15.1(+25.8%)                  (+3.5)10.0(+53.8%)    (+2.6)10.3(+33.8%)
 

Night School Sophomore-Control (N=16)
                            ar median                                ar mean                                        er median                   er mean

pre                          17.0                                       17.0                                               11.0                          11.2

post                 (+1.5)18.5(+8.8%)               (+1.7)18.7(+10%)                     (+2)13.0(+18.2%)        (+1.4)12.6(+12.5%)

Night School Sophomore-Experimental (N=17)
                           ar median                                  ar mean                                         er median                 er mean

pre                         18.0                                          17.0                                               10.0                        11.5

post              (+3)21.00(+16.7%)                  (+4.4)21.4(+25.9%)                    (+4)14.0(+40%)           (+2.3)13.8(+20%)

Day School Sophomore (N=14)
                     ar median                                    ar mean                                         er median                 er mean

pre                        24.0                                            23.4                                              15.0                          14.6

post           (+3.5)27.5(+14.6%)                   (+4.3)27.7(+18.4%)                     (+2)17.0(+13.3%)        (+2.8)17.4(+19.2%)
 
 

ar=acceptable response  p<.02

er=exact response  p<.08
 
 

English 110-Control (N=11)
                       ar median                                         ar mean                                       er median                  er mean

pre                     16.0                                                 16.9                                            12.0                          12.3

post        (+1)17.0(+6.3%)                            (-.2)16.7(-1.2%)                               (+3)15.0(+25%)      (+1)13.3(+8.1%)
 

English 110--Experimental (N=10)
                     ar median                                          ar mean                                         er median                   er mean

pre                   17.5                                                  18.0                                               15.0                           14.5

post       (+5.5)23.0(+31.4%)                       (+4.8)22.8(+26.7%)                            (+0)15.0(+0%)        (+1.4)15.9(+9.7%)

ar =acceptable response
er=exact response

                             experimental group made statistically significant gains over control group in ar response p<.03.
 
 

References (1986 Study)
 

Aitken, K.G. 1975. Problems in a cloze testing re-examined. TESOL Reporter, 8:2.

Hanania, E. & M. Shikhani, 1986. Interrelationships among three tests of language proficiency: standardized esl, cloze
     and writing. TESOL Quarterly, 20 97-09.

Hubbard, P., Jones, H., Thornton, B., & R. Wheeler, 1983. A Training Course for TEFL. Hong Kong: Oxford University
     Press.

Jiang, H.S. 1984. Teaching extensive reading. Forum. 22, 4, 37.

London, J. 1964. "To build a fire." In Grindell, R.M., Marelli, L.R.&H. Nadler (eds.) American readings. 192-193.
     New York: McGraw-Hill.

Oller, J.W. 1973. Cloze tests of second language proficiency and what they measure. Language Learning 23, 105-118.

Stubbs, J.B. & G.R. Tucker, 1974. The cloze test as a measure of english proficiency. Modern Language Journal 58:
     239-241.

Zukowski-Faust, J., Johnston, S., Atkinson, C., & E. Templin, 1982. In Context. New York: CBS College Publishing.
 

References (1995 Study)

Aitken, K.G. 1975. Problems in a cloze testing re-examined. TESOL Reporter, 8:2.

Brown, J.D. 1978. Correlational study of four methods for scoring cloze tests. Master's thesis, University of California,
     Los Angeles.

Hanania E. & M. Shikhani, 1986. Interrelationships among three tests of language proficiency: standardized esl, cloze, and
     writing. TESOL Quarterly, 20, 97-109.

Hinofotis, F.B. 1976. An investigation of the concurrent validity of cloze testing as a measure of overall proficiency in English
     as a second language. Doctoral dissertation, Southern Illinois University.

_______________1980. Cloze testing: an overview. CATESOL Occ. Papers, 6: 51-55.

Hubbard, P., Jones, H., Thornton, B., & R. Wheeler, 1983. A Traing Course for TEFL. Hong Kong: Oxford University
     Press.

Jiang, H.S. 1984. Teaching extensive reading. Forum, 22, 4, 37.

Krashen, Stephen. 1994 The Power of reading. Englewood, Colo: Libraries Unlimited.

Oller, J.W. Jr. 1972. Scoring methods and difficulty levels for cloze tests in English as a second language. Modern
     Language Journal 56, 3, 151-157.

_____________1973. Cloze tests of second language proficiency and what they measure. Language Learning 23, 105-118.

_____________, Irvine, P., and P. Atai, 1974. Cloze, dictation, and the test of English as a foreign language. Language
     Learning 24, 2, 245-252.

Schackne, Stephen. 1986 Reading for pleasure and language acquisition. Unpublished research. Tunghai University,
     Taichung, Taiwan.

Stubbs, J.B. & G.R. Tucker, 1974. The cloze test as a measure of English proficiency. Modern Language Journal 58,
     239-241.