I first became interested in
extensive reading in 1984 while in a graduate TESOL program at the
State University of New York at Albany. The Chinese students in the
program possessed a communicative competence which appeared to surpass
that of their counterparts from other countries. Upon questioning,
these students yielded remarkably similar answers--their schools used
audio-lingual drills and extensive reading as the main foundations of
their language programs. Could the audio-lingual drills account for
their superior mastery? I doubted it. Many of the other students were
trained primarily through the audio-lingual method, and the results
were mixed. Could it be the extensive reading? Possibly. None of the
other students were trained by this method. Also it made intuitive
sense--my American friends and I all agreed that the real accomplished
practitioners of the written and spoken word that we all knew were all
extensive readers as well.
The literature yielded nothing specific. Most of the articles focused on the teaching of reading and reading strategies.
And when I did run into articles on extensive reading (Jiang 1984;
Hubbard, Jones, Thoirnton and Wheeler, 1983), beneficial effects of
extensive reading were treated descriptively, as accepted common
knowledge, with no empirical evidence.
It
was against this backdrop that I decided to try to measure what
everybody was so sure of--if I manipulated one variable, extensive
reading, could I measure significant English level gains over a four
month period?
Instrumentation
A cloze test was chosen for its ease of
construction and its high correlation with standardized English level
tests (Hanania & Shikhani 1986; Aitken 1975; Stubbs and Tucker
1974; Oller 1973). A narrative passage from Jack London's To Build a Fire
was chosen. The first three and last three sentences were left intact.
Fifty words were deleted at intervals of every seven words. The
deletions included articles, pronouns, prepositions, adverbs, verbs,
adjectives, nouns, and conjunctions. It was pretested on two native
speakers who both scored 30 on exact response scoring.
Subjects
Four classes from the
Department of Foreign Languages and Literature at Tunghai University,
Taiwan were selected to participate: day school sophomore, night school
sophomore-experimental, night school sophomore-control, and day school
freshman remedial. Only the two night school classes were paired for
experimental purposes, however, since they differed only in treatment.
The night school classes had nearly an
identical makeup in terms of age, sex, prerequisite courses taken, and
current course load. In addition, they used the same textbooks in each
course, and followed the same syllabus.
The teacher of the control group was
chosen because of her similarity to the teacher of the
experimental--both were non-authoritarian, outgoing professionals in
their thirties who enjoyed excellent rapport with their students.
Experiment
Night
school,sophomore-experimental and night school sophomore-control were
each taught two hours of deductive grammar, two hours of paragraph
level composition, and two hours of functional conversation a week.
Night school sophomore-experimental read (as did day school sophomore
and day school freshman remedial) an average of 12 simplified readers
over a 4-month semester, while night school-sophomore-control did not.
The readers were mostly fiction, ranging from
600 headword to 4000 headword level. Students, except for day school
freshman remedial, chose them based on interest and level. The only
requirement was that the students must be able to go through them
relatively quickly without the constant hindrance of a dictionary, and
that the students must read for pleasure.
Results
All classes measured an
increase measured along four criteria: acceptable response median
(ar median), acceptable response mean (ar mean), exact response median
(er median), and exact response mean (er mean). All three classes which
utilized extensive reading made greater gains than the control group
along the four meaures, with one exception--night school
sophomore-control and day school sophomore both registered +2 gain on
exact response median. This was a greater percentage gain for the
control group since it started at a lower raw median.
Comparing the control and experimental groups,
raw and percentage gains made by the experimental group were
substantially higher along all four measures than gains made by the
control group. Experimental group outdistanced control group along all
four raw measures (ar median, ar mean, er median, er mean) +100%
+159% +100% +64%; experimental group outdistanced control
group in percentage gains +90% +159% +120% +60%.
Experimental outgaining control along
acceptable response and exact response was statistically significant
(p<.02; p<.08).
Conclusion
There is strong evidence
that extensive reading promotes substantial language level increase
within a short period of time as measured by cloze.
A technique that is effective obviously has
many applications. Here is an activity that is not only student
centered, but an activity a student can pursue independently and be
relatively sure of positive results--that would make it not only
effective, but cheap and convenient as well. Also, it is an activity
that supplies teachers with an effective weapon, a trump card to use
when confronted with stagnant, ineffective programs which,
unfortunately, due to the moneymaking potential of English language
institutes, abound worldwide.
General support for and agreement about
extensive reading, backed by quantitative evidence, also begs a
question: Why isn't extensive reading both encouraged and used in
more EFL-ESL programs throughout the world?
Study #2 at the University of Macau
Recently the subject of
extensive reading has taken on an increased interest, spurred on, to a
great extent, by Stephen Krashen's The Power of Reading. In
this research review, Krashen cites evidence for a link between what he
calls free voluntary reading and overall language competence.
Literature exists, going back fifty years, to
support the salubrious effects extensive reading has on language
development. Specific studies, however, studying the effects of
"pleasure" reading on second language development in EFL classes are
rare (I use the term "EFL," teaching English to second language
learners in an area other than North America, the United Kingdom, or
Australia, as opposed to "ESL," teaching English to second language
learners in North America, the United Kingdom, or Australia. An English
speaking environment and the variety of experiences students encounter
in that environment would constitute too great a threat to the validity
of the study).
In the 1986 study, I cited Jiang 1984; and
Hubbard, Jones, Thornton and Wheeler 1983 as two articles dealing with
extensive reading. However, I lamented the fact that in these two
articles beneficial effects of extensive reading were treated
descriptively, as accepted common knowledge, with no empirical
evidence. Krashen has collected an extensive corpus, some of it
empirical, dating back to 1948, citing correlations between free
voluntary reading (interchangeable, I feel, with extensive or pleasure
reading) and language development. The Krashen research review,
however, deals primarily with native language development, not second
language development.
We know that language input is a major factor
in native language development, but can we generalize when two major
factors, age and language, are manipulated; that is, is the same input
factor that influences children in native language development relevant
to adults learning a second language? In 1995 at the University of
Macau, a replication study using a similar construct and measurement as
the 1986 study was set up to try to ascertain this.
Instrumentation
A cloze test was chosen for
its ease of construction and its high correlation with standardized
English level tests (Hanania & Shikhani 1986; Hinofotis 1979;
Aitken 1975; Stubbs and Tucker 1974; Oller 1973). A narrative passage
from Raymond Chandler's A Small, Good Thing was chosen. The first and last sentences were left intact. Fifty words were deleted at intervals
of every seven words. The deletions included articles, pronouns, prepositions, adverbs, verbs, adjectives, nouns, and
conjunctions.
Subjects
Two classes from the entering freshman class at large at the University of Macau, Macau were selected to participate. The classes included students who will major in a variety of subjects, but who are required to take a freshman efl course. The control and experimental classes had nearly an identical makeup in terms of age, sex, and current course load since freshman students at the University of Macau have a common first year. Prerequisite courses might differ slightly since Macau high schools offer a Chinese and an English track. However, the English 110 course, which the subjects were taking, draws most of its students from traditional Chinese track programs. The same textbook was used for experimental and control group. The same teacher taught experimental and control group.
Experiment
Two freshman English 110
classes were taught two hours and forty minutes of efl a week. The
course covered all four skills with a minor emphasis on paragraph
writing. Interactions II--A Reading Skills Book was bought by
all the students. The experimental class read an average of 11
simplified readers over one 14-week semester, while the control group
did not.
The readers were Longman and
Heinemann graded readers, mostly fiction with some biography and
general interest, ranging from stage 1 to stage 6 (300 to 3000
headword) level. Students chose books based on interest and level that
was accessible to them. The only requirement was that the students must
be able to go through them relatively quickly without the constant
hindrance of a dictionary and the students must read for pleasure.
Results
The control class registered
raw and percentage gains along three criteria, acceptable response
median (ar median), exact response median (ar median), exact response
median (er median), and exact response mean (er mean). The control
group registered a slight raw and percentage loss along one criterion,
acceptable response mean (ar mean).
The
experimental class registered raw and percentage gains along three
criteria, acceptable response median (ar median), acceptable response
mean (ar mean), and exact response mean (er mean). The experimental
group registered no raw or percentage gain along one criterion, exact
response median (er median).
The experimental class made greater gains than
the control class along three out of four criteria, acceptable response
median (ar median), acceptable response mean (ar mean), and exact
response mean (er mean). The control group made greater gains than the
experimental group along one criterion, exact response median (er
median). Subsequent T-testing determined that although the experimental
group made a slightly greater average gain along the criterion of exact
response, there was no significant statistical difference in
experimental-control group gains along the criterion of exact response.
However, the experiment group significantly outdistanced the control
group along the criterion of acceptable response, p<.03.
Conclusions
There is evidence that
extensive reading promotes language level increase within a short
period of time as measured by cloze. But, even a cursory examination of
both the 1986 and this study reveal that the 1995 results are not
nearly as pronounced as the 1986 results.
Let me briefly address the factors that might (or might not) have
skewed results. First, our original sample (control n=16, experimental
n=12) pre tested at about the same level, the control group and the
experimental group were within a point of each other along four
criteria--median exact response, median acceptable response, mean exact
response, mean acceptable response. Through drop-add, our final
pre-post test sample was control n=11, experimental n=10 with pre test
scores varying from one to three points along the four grading
criteria, with the experimental group starting at a higher level along
all four criteria. One could question the experimental group starting
at a higher level may have a better foundation or aptitude for language
development than the control group. Conversely, starting at a lower
level, the control group may benefit from having a greater statistical
range to improve, sort of the opposite of the "Hawthorne effect."
We must also add that the measurement period
was two weeks less than in 1986, and the average number of books read
was one less.
Thirdly, the statistically significant results
occurred in the area of acceptable response, the scoring criteria, due
to both grammatical and textual factors, most susceptible to scoring
error. One could question that scoring error accounted for the
different results along the criteria of exact response and acceptable
response, although, upon examination by outside scorers, no significant
scoring errors were discovered in the 21 scripts. Conversely, one could
argue that acceptable response gains on the part of the experimental
group simply indicate superior creativity in productive language skills.
The research on the two scoring methods is
still unclear--Hinofotis 1976 and Oller 1972 suggest that acceptable
word scoring method yields more reliable scores and provides more
accurate information about esl proficiency levels; however, Stubbs and
Tucker 1974 and Oller et al. 1974 indicate very little difference
between the two scoring methods. Brown 1978 felt that the acceptable
word method was more appropriate for measuring productive language
skills, but there is no consensus for either method.
Final Word
Based on the 1986 and 1995 results a replication study(s) should be undertaken with the following suggestions:*
a) sample size >15
consistent from pre to post; possible multiple class study increasing
sample size.
b) pre testing
any cloze instrument with native speakers to eliminate any
linguistically controversial items.
c) possible multi-instrument evaluation along different individual skill areas.
d) sample taken from non-Sinitic language group in home country; e.g., Brazilians or Hungarians.
*In 1994, Dr. James Sims undertook a similar study at Tunghai University in Taichung, Taiwan.
Sims used a multi-instrument evaluation with several classes
constituting several hundred students.
His study showed statistically significant gains in the experimental
groups which engaged in free voluntary reading.
pre 10.5 12.0 6.5 7.7
post
(+4)14.5(+38.2%)
(+3.1)15.1(+25.8%)
(+3.5)10.0(+53.8%) (+2.6)10.3(+33.8%)
pre 17.0 17.0 11.0 11.2
post (+1.5)18.5(+8.8%) (+1.7)18.7(+10%) (+2)13.0(+18.2%) (+1.4)12.6(+12.5%)
pre 18.0 17.0 10.0 11.5
post (+3)21.00(+16.7%) (+4.4)21.4(+25.9%) (+4)14.0(+40%) (+2.3)13.8(+20%)
pre 24.0 23.4 15.0 14.6
post
(+3.5)27.5(+14.6%)
(+4.3)27.7(+18.4%)
(+2)17.0(+13.3%)
(+2.8)17.4(+19.2%)
ar=acceptable response p<.02
er=exact response p<.08
pre 16.0 16.9 12.0 12.3
post
(+1)17.0(+6.3%)
(-.2)16.7(-1.2%)
(+3)15.0(+25%) (+1)13.3(+8.1%)
pre 17.5 18.0 15.0 14.5
post (+5.5)23.0(+31.4%) (+4.8)22.8(+26.7%) (+0)15.0(+0%) (+1.4)15.9(+9.7%)
ar =acceptable response
er=exact response
experimental group made statistically significant gains over control
group in ar response p<.03.
References (1986 Study)
Aitken, K.G. 1975. Problems in a cloze testing re-examined. TESOL Reporter, 8:2.
Hanania, E. & M. Shikhani, 1986. Interrelationships among three tests of language proficiency: standardized esl, cloze
and writing. TESOL Quarterly, 20 97-09.
Hubbard, P., Jones, H., Thornton, B., & R. Wheeler, 1983. A Training Course for TEFL. Hong Kong: Oxford University
Press.
Jiang, H.S. 1984. Teaching extensive reading. Forum. 22, 4, 37.
London, J. 1964. "To build a fire." In Grindell, R.M., Marelli, L.R.&H. Nadler (eds.) American readings. 192-193.
New York: McGraw-Hill.
Oller, J.W. 1973. Cloze tests of second language proficiency and what they measure. Language Learning 23, 105-118.
Stubbs, J.B. & G.R. Tucker, 1974. The cloze test as a measure of english proficiency. Modern Language Journal 58:
239-241.
Zukowski-Faust, J., Johnston, S., Atkinson, C., & E. Templin, 1982. In Context. New York: CBS College Publishing.
References (1995 Study)
Aitken, K.G. 1975. Problems in a cloze testing re-examined. TESOL Reporter, 8:2.
Brown, J.D. 1978. Correlational study of four methods for scoring cloze tests. Master's thesis, University of California,
Los Angeles.
Hanania E. & M. Shikhani, 1986. Interrelationships among three tests of language proficiency: standardized esl, cloze, and
writing. TESOL Quarterly, 20, 97-109.
Hinofotis, F.B. 1976. An investigation of the
concurrent validity of cloze testing as a measure of overall
proficiency in English
as a second language. Doctoral dissertation, Southern Illinois University.
_______________1980. Cloze testing: an overview. CATESOL Occ. Papers, 6: 51-55.
Hubbard, P., Jones, H., Thornton, B., & R. Wheeler, 1983. A Traing Course for TEFL. Hong Kong: Oxford University
Press.
Jiang, H.S. 1984. Teaching extensive reading. Forum, 22, 4, 37.
Krashen, Stephen. 1994 The Power of reading. Englewood, Colo: Libraries Unlimited.
Oller, J.W. Jr. 1972. Scoring methods and difficulty levels for cloze tests in English as a second language. Modern
Language Journal 56, 3, 151-157.
_____________1973. Cloze tests of second language proficiency and what they measure. Language Learning 23, 105-118.
_____________, Irvine, P., and P. Atai, 1974. Cloze, dictation, and the test of English as a foreign language. Language
Learning 24, 2, 245-252.
Schackne, Stephen. 1986 Reading for pleasure and language acquisition. Unpublished research. Tunghai University,
Taichung, Taiwan.
Stubbs, J.B. & G.R. Tucker, 1974. The cloze test as a measure of English proficiency. Modern Language Journal 58,
239-241.