arrow_back Back to Research

Memory: Testing

One of the golden rules of school life is that students hate tests. But I would like to refine this rule to say students hate "high-stakes" tests. The papers in this section will outline a finding that has possibly transformed my lessons more than any other: the power of regular, low-stakes testing. By low-stakes, I mean short tests (typically 3 to 5 questions) that the students take at the start of every single lesson based on content from earlier in the year (to take advantage of the Spacing and Interleaving effects), they mark them themselves, the marks are not recorded anywhere, I don't know their scores, students don't know each others' scores, and I go through the answers straight away so students have immediate feedback. You will not believe me when I say this, but the students love them. More importantly, the effects on their retention have been profound. The biggest impact this has had on my teaching is that I now see tests as "learning events", as opposed to just a means of assessment. When I interviewed Dani Quinn for my podcast, she explained how Michaela Community School makes extensive use of regular weekly tests. In this section I attempt to get to the bottom of the power of testing.

Research Paper Title: Ten Benefits of Testing and their Application to Educational Practice
Author(s): Henry L. Roediger III, Adam L. Putnam and Megan A. Smith
My Takeaway:
This paper is stunning. It details (as the title suggests) ten benefits of testing, citing research to support each claim. Each benefit is worth discussion here, but I am going to limit myself to just three.
1) The Testing Effect: retrieval aids later retention. A fascinating study described here explains that students who followed a pattern of study-study-study-test for a topic performed better on the final test than a group who followed study-test-test-test. But - and here is the key - when tested a week later, the exact opposite had occurred, even though the latter group had much less exposure to the material. Testing led to better long-term retention. The Testing Effect is examined further later in this section.
2) Testing causes students to learn more from the next learning episode. When students take a test and then restudy material, they learn more from the presentation than they would if they restudied without taking a test, hence it helps with revision.
3) Testing improves transfer of knowledge to new contexts. This is the holy grail! We have seen in the Cognitive Science section that the inability to transfer knowledge to new situation is one of the characteristics that defines novice learners and inhibits their learning. It seems that testing can help with that, possibly by enabling learners to form and strengthen schemas that they can later draw upon in new contexts.
Add to all of this the benefit that testing can give you the teacher information about gaps in students' knowledge which you can resolve in class, and it promotes the benefits of testing to students so that they may use it in their revision, and you have possibly one of the most important teaching tools ever.
My favourite quote:
We have reviewed 10 reasons why increased testing in educational settings is beneficial to learning and memory, as a self-study strategy for students or as a classroom tactic. The benefits can be indirect—students study more and attend more fully if they expect a test – but we have emphasized the direct effects of testing. Retrieval practice from testing provides a potent boost to future retention. Retrieval practice provides a relatively straightforward method of enhancing learning and retention in educational settings.

Research Paper Title: The Critical Importance of Retrieval for Learning
Author(s): Jeffrey D. Karpicke, et al
My Takeaway:
For years I have underestimated just how powerful actively practising retrieval can be, relative to other more common revision techniques. Three findings in this paper particularly stood out to me.
1) Repeated studying after learning had no effect on delayed recall, but repeated testing produced a large positive effect. I have seen this myself - students think they have cracked a topic like adding fractions because they have got questions correct in the past, and so during revision may merely glance over their notes on fractions (studying) instead of trying questions out (retrieval). Such revision feels easy, and hence has little effect on long term retention - as we have seen from the work of Bjork in the Memory section above, learning needs to be difficult.
2) Students’ predictions of their performance were uncorrelated with actual performance. The students in the study were not aware of the benefits of practising recall, even after they had done it! This is both fascinating and worrying, and suggests that students need to be convinced of the power of this strategy long before they start their revision. Regular low-stakes testing in the classroom seems a sensible way of achieving this.
3) Finally - and I have saved the best to last - if retrieval is so important for learning (as you will see in the next paper, it can actually cause learning), then is it a good idea to just do a load of past papers for exam preparation, as I have done for many years? NO! Why? Well, exam papers are designed to cater to a wide variety of abilities. Hence, they do not always encourage retrieval from long term memory, but instead test elements of problem solving (specifically, elements not stored in long term memory). We have seen in the Cognitive Load Theory section that problem solving does not always lead to learning, and hence students can work through exam papers and not actually learn anything. I conclude that until students have covered the full course, regular, low-stakes tests on topics that students have already studied is the best way to profit from the positive effect of retrieval.
My favourite quote:
The conventional wisdom shared among students and educators is that if information can be recalled from memory, it has been learned and can be dropped from further practice, so students can focus their effort on other material. Research on students’ use of self-testing as a learning strategy shows that students do tend to drop facts from further practice once they can recall them. However, the present research shows that the conventional wisdom existing in education and expressed in many study guides is wrong. Even after items can be recalled from memory, eliminating those items from repeated retrieval practice greatly reduces long-term retention. Repeated retrieval induced through testing (and not repeated encoding during additional study) produces large positive effects on long-term retention.
 

Research Paper Title:
Retrieval Practice Produces More Learning than Elaborative Studying with Concept Mapping
Author(s): Jeffrey D. Karpicke, et al
My Takeaway:
This paper is by the same team of authors as the previous one, but makes two key additional points.
1) It makes explicit the finding that practising retrieval is more effective on long term retention than attempting to encode information during the learning process - possibly using strategies such as mind-mapping, and making notes. Practising retrieval was found to be more effective than more traditional revision strategies when students were later tested on both factual recall and more problem solving questions.
2) However, possibly the more interesting point is that the researchers found that retrieval practice can actually produce learning (as opposed to being neutral). That is, tests act not only as passive assessments of what is stored in memory (as is often the traditional perspective in education) but also as vehicles that modify what is stored in memory. This startling finding is called the Testing Effect (or the Retrieval Effect) and is possibly due to the cognitive strain experienced when trying to reconstruct knowledge, which is related to the fascinating paper on Desirable Difficulties discussed later in this section. Once again, for me this emphasises how important it is that students are aware of the power of self-testing during revision (and not necessarily of complete exam papers, as discussed in the paper above), and the importance of low-stakes tests in the classroom.
My favourite quote:
Research on retrieval practice suggests a view of how the human mind works that differs from everyday intuition. Retrieval is not merely a read out of the knowledge stored in one’s mind; the act of reconstructing knowledge itself enhances learning. This dynamic perspective on the human mind can pave the way for the design of new educational activities based on consideration of retrieval processes.

Research Paper Title: Test anxiety in UK schoolchildren: Prevalence and demographic patterns
Author(s): David W. Putwain
My Takeaway:
Test Anxiety is a concept I have observed many times in many of my students over the years, but never attributed this particular label to it. It can be defined as a psychological condition in which people experience extreme distress and anxiety in testing situations. I notice this most in the build up to high-stakes exams, such as GCSEs or A Levels. Students who have been calm and composed throughout the rest of the year begin to fall apart - they cannot sleep, they look anxious in class, they start getting things wrong that they never previously had a problem with. This is a huge issue, and one I have had many a long conversation with concerned parents about. This fascinating paper sheds some light on the issue. The author was specifically looking at UK students in their final two years of compulsory schooling (i.e. Key Stage 4), which he describes as being “of critical importance to the future life trajectory of the student”. Firstly, it was clear that tests induce anxiety in students, and that this anxiety is likely to inhibit performance. One explanation purported for this is that the "worry" component of anxiety takes up valuable working memory space, which makes the processing of the kind of complex tasks you are likely to find on a high-stakes exam more difficult. However, contrary to his predictions, Putwain did not find that the higher stakes exams produced the most anxiety; instead he found the exam with the lowest stakes produced the highest anxiety. But what did he mean by that? Well, Putwain labeled a mock exam as his lowest stakes variable. It is possible that mock exams could be highly anxiety producing for students if their perceptions of these exams were that current performance will predict future performance. In this situation the way in which the “mock” exams were presented to the students would be very important - "I got an E on this mock, that is what I am going to get in the real thing, my life is officially over!" Also, in Putwain’s research all self-reported anxiety measures were administered after taking the exam in question. Student’s taking mock exams may realise how much more they need to prepare which may have caused the higher self-reported anxiety levels after the mock exams in his study. So, what are we to make of all this? Well, firstly it has made me acutely aware that test anxiety is a real and serious thing, with implications both for students' psychological well-being and their performance on tests. Secondly, the finding that "lower-stakes" tests seemingly produce more anxiety than their high-stakes cousins has in fact made me more convinced of the validity and importance of an increase in low-stakes testing! Let me try to explain. Students in the study exhibited the most anxiety from, mock exams. Why? Well, possibly because they were not used to taking exams, and this was the first time their ability to retrieve knowledge was put to the test. We have seen how reading notes makes it far easier to convince yourself you know something versus explicitly testing retrieval. So, it is no surprise that when students struggled with retrieval in the mocks, got a rubbish mark that they never saw coming ("but I knew it all when I read it last night, sir") and then realised their actual exams were a matter of weeks away, that panic ensued. How to combat this? For me, it is regular low-stakes tests. These have all the benefits explained earlier on in this section (identifying areas of weakness, making future study more effective, and even causing learning), with the added bonus that they are likely to get students more comfortable with testing as a concept, and hence hopefully reduce the overall level of anxiety. We are never likely to reduce test anxiety completely, and possibly nor should we strive to. It is important students take high-stakes tests seriously, we just do not want their performance to be inhibited by something that we can possibly control.
My favourite quote:
Later models such as the ‘processing efficiency’ theory (Eysenck & Calvo, 1992) suggest that the additional demands on working memory resources made by task-irrelevant worry cognitions may reduce processing efficiency, but not necessarily the effectiveness. A highly test anxious student could maintain effectiveness on tasks requiring low working memory demands with extra effort to compensate for lowered efficiency. A decline in processing effectiveness would only be predicted on assessment demands making heavy demands on working memory resources (e.g. difficult questions, high memory load, tasks involving coordinative complexity, etc.). Only under these conditions, would a decline in task performance manifest.


Research Paper Title:
Both Multiple-Choice and Short-Answer Quizzes Enhance Later Exam Performance in Middle and High School Classes
Author(s): Kathleen B. McDermott, Pooja K. Agarwal, Laura D’Antonio, Henry L. Roediger, III, and Mark A. McDaniel
My Takeaway:
This paper is key for proponents of low-stakes testing. There is little surprise that the authors found that practicing retrieval of recently studied information enhances the likelihood of the learner retrieving that information in the future, as we have seen the benefits of testing for retrieval throughout this paper. But the key difference with this paper is that  the format of the quiz (multiple-choice or short-answer) did not need to match the format of the critical test (e.g. end of unit exam) for this benefit to emerge. This supports the research related to deliberate practice, whereby the activities involved in practice do not need to exactly replicate the final performance for the practice to be effective. The authors also find that frequent classroom quizzing with feedback improves student learning and retention, and multiple-choice quizzing is as effective as short-answer quizzing for this purpose. There is more discussion of the merits of multiple choice questioning in the Formative Assessment section, but a key takeaway for me here is that both multiple-choice questioning and short form skill-based questions are ideal to use in the classroom for regular low-stakes testing.
My favourite quote:
First, we consistently observed that the format of the quizzes did not have to match the format of the unit exam for the quizzing benefits to occur. This is the most important finding of the present report, in that it is novel, was unanticipated from the laboratory literature, and is a critically important practical point for teachers. Even quick, easily administered multiple-choice quizzes aid student learning, as measured by unit exams (either in multiple-choice or short-answer format). Further, the benefits were long lasting: Robust effects were seen on the end-of-semester exams in Experiments 1a, 1b, and 2; that is, both multiple-choice and short-answer quizzing enhanced performance on end-of-semester class exams (again, in both multiple-choice and short-answer formats).

Research Paper Title: The Pretesting Effect: Do Unsuccessful Retrieval Attempts Enhance Learning?
Author(s): Lindsey E. Richland,  Nate Kornell,  Liche Sean Kao
My Takeaway:
So far we have seen the power of testing to enhance retrieval and long-term learning. If that wasn't impressive enough, the next two papers in this section highlight another, rather surprising, benefit of testing - the power of the Pretest.  Firstly, unlikely as it sounds, generating a wrong answer increases our chances of learning the right answer. In this study, one group of students was given the text on which they would be tested – passages with key facts marked, a second group was given the opportunity to memorise the questions they would be asked, a third group was given an extended study period and a fourth group was given a pretest. Even though they got almost every pretest question wrong, students in the pretesting group out-performed all other groups on a final test, including those students who had been allowed to memorise the  test questions. It would seem that the act of unsuccessfully attempting to answer questions has a greater effect on learning than studying the questions on which you are to be tested. The simple takeaway here is that it doesn't matter if students get things wrong - it is the act of attempting to retrieve the answer that is the key. However, we are once again faced with the same Learning v Performance dilemma - a dip in short-term performance (students getting the initial questions wrong) is the price to be paid for an increase in long-term learning. This, of course, needs communicating to teachers and students, and is expressed beautifully in the quote below.
My favourite quote:
When a learner makes an unsuccessful attempt to answer a question, both learners and educators often view the test as a failure, and assume that poor test performance is a signal that learning is not progressing. Thus, compared with presenting information to students, which is not associated with poor performance, tests can seem counterproductive. Tests are rarely thought of as learning events, and most educators would probably assume that giving students a test on material before they had learned it would have little impact on student learning beyond providing teachers with insight into their students’ knowledge base. In terms of long-term learning, however, unsuccessful tests fall into the same category as a number of other effective learning phenomena - providing challenges for learners leads to low initial test performance, thereby alienating learners and educators, while simultaneously enhancing long-term learning. The current research suggests that tests can be valuable learning events, even if learners cannot answer test questions correctly, as long as the tested material has educational value and is followed by instruction that provides answers to the tested questions.

Research Paper Title: Unsuccessful Retrieval Attempts Enhance Subsequent Learning
Author(s): Nate Kornell, Matthew Jensen Hays, and Robert A. Bjork
My Takeaway:
The previous paper outlined the potential benefit to long-term learning of the Pretest Effect, but surely being given a test on something students have never studied before is a complete waste of time? Surely students will just end up guessing, and we all know that guessing is pointless. Well, apparently not! This study sought to answer this question: does an unsuccessful retrieval attempt impede future learning or enhance it? The authors examined this question  using materials that ensured that retrieval attempts would be unsuccessful. They found that as long as students are given appropriate feedback, they will still see a testing benefit even if they get the answer wrong on a pretest. So, here we have a crucial consideration - the benefits of the Pretesting Effect are only realised if immediate feedback is given. Students need to know they are wrong, and what the right answer is.  Often I give my students some form of baseline test before starting a new topic. More often than not, as discussed in the Formative Assessment section, this involves asking a series of Diagnostic Questions at the start of my lesson, listening to their answers an explanations, and adapting my teaching accordingly. The benefit of this approach (or so I had assumed) was purely so I could get a sense of the current levels of understanding in the class, address any misconceptions, and move on to the new topic when I deemed the class ready. Amazingly, it seems that the students were also benefiting from this approach, even if they were being tested on material they have never encountered before. So, a written baseline test before teaching a new topic is likely to have a positive effect, so long as you go through the answers immediately with students. But, if you combine it in a formative assessment setting, then the benefits are likely to be even greater. One word of caution: students may become demoralised if you give them a test on material they cannot do (and indeed should not be expected to do if they have not studied it before). My advice here is to explain, open and honestly to them, that this test is purely for you, their teacher, to find out where they are at so you can better help them, and that the mere act of them attempting to answer the questions is actually helping them learn.
My favourite quote:
The current findings support Izawa’s (1970) argument that tests potentiate the learning that occurs when an answer is presentedafter a test, even if the test is unsuccessful. The results also suggest that, in situations where tests and study opportunities are interleaved or testing is followed by feedback, the benefits of testing go beyond the benefits attributable to the learning that happens on successful tests. With respect to theoretical explanations of the testing effect, this finding is important because it demonstrates that the benefits of testing are not limited to the benefits of successful retrieval; rather, for a theory to fully explain the benefits of tests, it needs to explain the benefits of retrieval failure as well as the benefits of retrieval success. Successful tests obviously play a role, and perhaps a unique role—the findings do not imply that unsuccessful tests and successful tests are equally effective or that they are necessarily effective for the same reasons—but unsuccessful tests can also have a positive effect on long-term retention.

Research Paper Title: The generation effect: A meta-analytic review
Author(s): Sharon Bertsch, Bryan J Pesta, Richard Wiscott and Michael A McDaniel
My Takeaway:
Closely related to the Testing Effect comes the Generation Effect. In a typical generation experiment, participants are asked to either generate the to-be-learned items themselves—for example, by producing opposites when presented with a word (e.g., hot–???)—or to simply read the items (e.g., long–short). A later retention test is then administered, which usually consists of presenting the cues (hot–???, long–???) and asking participants to recall their corresponding targets (cold, short). The finding from this extensive meta-analysis, is that when people learn material by generating components of it themselves, the effect on long term retention is far greater than simply reading them. Once again, we are faced with the issue that most of the experiments done are not in the realms of mathematics, but the amount of rules that need to be learned in maths suggest that a strategy such as this which improves recall could prove very useful. Something as simple as providing students with a list of notes (on the rule of fractions, for example) and leaving out key words for them to fill in themselves is likely to be far more benefical then a set of notes without the gaps, as may be found in a revision guide.
My favourite quote:
The current findings support Izawa’s (1970) argument that tests potentiate the learning that occurs when an answer is presentedafter a test, even if the test is unsuccessful. The results also suggest that, in situations where tests and study opportunities are interleaved or testing is followed by feedback, the benefits of testing go beyond the benefits attributable to the learning that happens on successful tests. With respect to theoretical explanations of the testing effect, this finding is important because it demonstrates that the benefits of testing are not limited to the benefits of successful retrieval; rather, for a theory to fully explain the benefits of tests, it needs to explain the benefits of retrieval failure as well as the benefits of retrieval success. Successful tests obviously play a role, and perhaps a unique role—the findings do not imply that unsuccessful tests and successful tests are equally effective or that they are necessarily effective for the same reasons—but unsuccessful tests can also have a positive effect on long-term retention.