# Reflections on My First Year of Standards-Based Assessment, Part 3

###### image credit: *http://laurentnajman.org*

In my final post reflecting on last year’s shift to SBAR (Parts 1 and 2 can be found here and here), I’d like to take a look at some data. It should be noted up front that the sample size we’re dealing with is *very* small (i.e., two) since 2012/13 is my only comparable pre-SBAR year, at least as far as calculus is concerned. During 2011/12, I was using a drastically different curriculum (which ended up becoming the basis for my current Applied Calculus course, perhaps soon to become Mathematical Modeling – more to come on that later). Since I just started in the math department halfway through 2010/11, I don’t think any comparisons with that year would be fair. Prior to 2010, my calculus teaching was done at the undergraduate level for a very different student population. That said, while I don’t want to read *too* much into what follows, I do think it bears examination. Time to dive into this data rabbit hole.

**Final Grades**

First let’s compare the distribution of final grades from 2013/14 (SBAR) to those from 2012/13 (pre-SBAR). At the beginning of the year, I would have confidently predicted that the grades were going to shift toward As since students would be allowed to reassess as often as they wanted in pursuit of a higher score. Here’s what actually shook out: (Note that there were two sections of Calculus in 2013/14 and only one in 2012/13, hence the significant difference in totals.)

Scaled up for the increased number of students, the distribution looks nearly identical. For both years, the median and mean final grades were in the B range. When I saw how the grades were shaping up towards the end of the first semester (that is, how similar they looked to previous years), I was initially surprised. Given how much control students had over their grade, I really did expect them to be pushing for As. Instead it seemed like many students were satisfied with Bs. I certainly had several students I thought capable of earning As but I just couldn’t seem to motivate them enough to put in the extra effort. Similarly, there were some students who earned Cs that could have pulled off Bs. Perhaps I need to work on better motivating students, or perhaps my standards are so high that some students don’t feel like the reward is worth the amount of work required. Probably both.

While the grade distribution didn’t seem to change in any significant way, I am more confident that the grades from last year do a better job of reflecting how much students know and can do. Due to my sensitivity to the destructive capability of a zero, prior to SBAR I graded in such a way as to not absolutely wreck anyone’s grade so long as they were exhibiting effort (or appearing to, at any rate). When grading tests and quizzes, I awarded enough partial credit so that a (hypothetical) student who was absolutely clueless but at least tried things that looked like what we’d discussed in class would earn close to 50%. This means that I was really just compressing the grading scale to 50 – 100, but it also means that everything I graded had some built-in padding that distorted the apparent level of student understanding. Now I am much more confident that when a student earns a B, I can tell you at what level they understand what percentage of the material (and with a quick glance at my gradebook, I can tell you about the specific topics as well).

**Exam Score Prediction**

It seems reasonable to think of a student’s average going into an exam as a predictor of how they should perform on the exam. Given that context, let’s examine how well my students’ pre-exam averages did in predicting their exam scores by looking at the difference between the two quantities for each student and seeing how many students ended up within certain point ranges of the prediction. I took care to look at each semester separately for a number of reasons, not the least of which is that second-semester seniors tend to be a bit less concerned about exams than usual, so I don’t think it’s fair to compare Fall apples with Spring oranges.

The average errors in prediction from Fall 2012 and Fall 2013 were -4.2 and -3.6, respectively. So on average, pre-exam grades were marginally better at predicting fall exam scores under SBAR. Pre-SBAR, two thirds of students were within 5 points of their predicted score while only half were within 5 points under SBAR. Pre-SBAR, four fifths of students scored within 10 points (i.e., within a letter grade) of their predicted score. A slightly greater proportion of students were within the same range under SBAR (27 out of 32). All in all, not much difference.

Looking at the spring exam scores, on the other hand, is a bit worrisome.

The average errors in prediction from Spring 2013 and Spring 2014 were -7.6 and -8.8, respectively. So on average, pre-exam grades were about a point worse at predicting spring exam scores under SBAR. Pre-SBAR, one third of students were within 5 points of their predicted score while slightly fewer (10 out of 32) were within 5 points under SBAR. Pre-SBAR, two thirds of students scored within 10 points (i.e., within a letter grade) of their predicted score. Only half of the students were within the same range under SBAR.

The large number of students (half) whose pre-exam averages under SBAR overshot their spring exam scores by more than 15 points bothers me. There *are* other factors at play, so perhaps this is no cause for concern. For instance, the subject order in which our semester exams are given is determined by a rotating schedule. Math exams were first in Fall 2012 and second to last in Spring 2014, so it could be that the bump on the left side of the histogram are exhausted students that did not prepare or perform as well as a result. Perhaps students who were better able to stave off the fatigue account for the boost on the right side, which is also abnormally large (judging by our admittedly small data set). On the other hand, maybe my system exhausted the students by the end or maybe I just didn’t do a very good job with the second semester material. Next spring the math exam will rotate back around to being first, so time will tell. I also plan to disaggregate student exam performance on individual learning goals for my own records, so I think that will lead to better data in the future.

The complete exam score/pre-exam average data set can be found here. Note that for 2013/14, when assigning a numeric grade before exams, I gave the highest grade in the range for whatever letter students had earned (e.g., an A- would receive a grade of 92 rather than 90 or 91). This was done to provide students with the benefit of the doubt when averaging pre-exam grades with exam scores as well as when averaging semester grades at the end of the year.

Because predictions which are too low are just as bad as (or maybe worse than) those which are too high, perhaps I should be looking at a standard deviation instead. Or perhaps the idea of thinking of pre-exam average as a predictor of exam score is fundamentally flawed. Maybe giving much attention at all to exam performance is itself problematic. After all, part of what led me to SBAR in the first place was my uneasiness with giving significant weight to a student’s performance on a single day. On the other hand, we do need a meaningful summative measure of student achievement and perhaps this points to an opportunity for improvement in my exam-crafting.

In any event, this is what I’ve thought to look at so far. Does any of it point to anything significant or useful? Since the grading policy had to be shoe-horned into SBAR, do the grade outcomes say anything about the efficacy of SBAR? What else bears examination? What should I be looking at next year as I roll out SBAR in all three of my regular courses?

## Recent Comments