The Problem With Standardized Tests: The Kobyashi Maru Theorem
This is going to be an education rant in the guise of a science fiction rant, or maybe a science fiction rant in the guise of an education rant – your choice, really. My day job is as an educator and one of my former education-related jobs was working for a test prep company, teaching students how to take the SAT. I have a lot of opinions about tests, not all of them bad, but I will say this: standardized tests, in order to function as designed, need completely unrealistic infrastructure and, for that reason, they should be universally abolished or completely changed.
By way of demonstrating what I’m talking about, let’s talk about that iconic science fiction test, the Kobyashi Maru of Starfleet Academy.
For those of you not in the know, the Kobyashi Maru is a bridge crew simulation that Starfleet runs for its cadets, placing them in an impossible, unwinnable situation for the purpose of seeing how the cadets will react under such pressures. On the surface, this is a wholly reasonable and even intelligently designed test, very much in the vein of what a standardized test aspires to be: a test that can be applied equally to everyone that will generate completely unbiased results that allow you to evaluate all students who take the test equally. It’s also the purest of test-design fiction – it literally cannot exist as displayed and actually work.
The reason for this is very simple: as soon as students learn how the test works (and they inevitably will, since students always talk to other students about tests), the test will cease to be an accurate measure of the cadet’s capabilities because they will know it’s a no-win scenario going in. This necessarily will change their behavior towards said test and will, therefore, throw off the results. So, sure, for the first few years (if we’re being generous) the Kobyashi Maru will be a perfectly reasonable test because no one will actually know it is no-win, but before long somebody will find out. Once they find out, it is in their interest that the test (1) not change and (2) they guide their friends in how to take the test. Furthermore, instructors – whose capabilities will likewise be judged by how their students perform on the test – will inevitably skew their instruction (even surreptitiously) to reflect the qualities the test aims for.
Before long, certainly long before the events of The Wrath of Khan, the test would be bracketed by
all manner of pedagogical apparatuses that serve to help students perform well on the test, and that wouldn’t even count the cheaters, such as Kirk (who would doubtlessly be more numerous).
The solution, of course, is for Starfleet to change the test somehow, but even if they change that test, if it is designed to test the same thing (a no-win scenario), it will inevitably be vulnerable to the same kind of gaming as it was before. Test gamesters will just have to modulate their strategies somewhat, and the Academy will be right back where it started.
This is exactly the problem real world standardized testing faces. There is nothing inherently wrong with a standardized test. A standardized test is attempting to create an assessment tool that will tell you (with some degree of accuracy) the aptitude of any student in some specific set of skills, regardless of who they are or when they take the test. All SAT results, in other words, are supposed to be comparable with all other SAT results. This is a useful tool! Given that everyone comes from different school systems and are taught by different teachers and that, no matter how hard anyone tries, GPA is not and just cannot be a completely even or universally measurable kind of assessment (some people’s schools are easier/harder than others! Some schools “don’t believe” in GPA! Etc, etc.). Having some kind of universal yardstick by which to assess everybody is great!
But it can’t work! And here’s why:
1: It only works if nobody knows HOW it works
Here’s the thing: multiple choice standardized tests are a game. They are a game because they are (and sort of have to be) graded by machines and they have to assess all the exact same skills in the exact same proportions. Once you “solve” how the game works, the test becomes monumentally easier. Like the Kobyashi Maru, it is inevitable that people will figure out how the game works and, once they do that, they throw off all the results, since the results are designed to compare your performance against everyone else’s performance. So, unless you can keep the content of your test some kind of state secret (and good luck with that!), any given standardized test is only good for as long as it takes for the test takers to figure out the rules.
2: The people who know how it works will inevitably be the best connected/wealthiest people
Okay, fine – so suppose your test has been cracked by somebody. The damage, at least, would be somewhat mitigated if everyone had access to the tools needed to crack the test themselves. That’s never the case, though! The people who will be taught to crack the test will inevitable be the ones who can afford the tutors or who happen to have the connections or live in the kinds of privileged communities that get these kinds of advantages. This is NOT everyone, and this inequality automatically invalidates the test results, since some people are actually taking the test (the regular folks) and others are simply cracking the game behind the test (the test preppers). You can’t have an accurate standardized test that is testing two different cohorts of people in two different ways for two different things. Want to know something funny? The students that routinely did the worst on SAT math were almost always the best at math in all other environments. Why? They didn’t see the game. The best way to do well on SAT math is to do as little math as possible. Don’t believe me? Well, this lazy math student scored in the 95% percentile on SAT math by doing just that. And I did it several times over.
3: The test cannot be fundamentally changed without invalidating its own existence.
“Just change the test” sounds like a great plan, but if you fundamentally change how the test works, you automatically invalidate all the preceding test scores. In other words, if part of the purpose of a test like the SAT (or MCAS or TOEFL or LSAT or MCAT or whatever) is to produce scores that can be compared (and this is their purpose), then changing how the test works means your test has lost the very thing that makes it useful.
4: Testing warps instruction!
Because these tests are so important and because they are also so crackable, this means that teachers have a vested interest in teaching students to crack the test, knowing (as I point out in #2) that not all of their students will have the resources to crack it on their own. So, instead of actually learning things, they learn how to take a test. Pretty much everybody knows about this problem at this point – it’s cliche to even point it out – but it is also 100% true. Students that are taught to take tests have less knowledge, fewer skills, and impaired critical thinking when placed against previous generations who were not saddled with these things. I know this because I’ve been teaching at the collegiate level for 15 years and have watched both the amount of testing rise and the quality of incoming students drop simultaneously. Granted, that’s anecdotal – maybe I’m wrong (I hope I am!) – but I somewhat doubt it.
5: A perfect test doesn’t exist in the first place!
And all of this is just assuming the test is actually able to test the thing it claims it does! Sure, a well-designed standardized test might give us accurate picture of the average, neurotypical student, but this hardly covers everyone! Furthermore, I sort of doubt there’s any reliable way you could get a standardized test to apply equally to everybody – people, and how they think or approach test taking, are just too different.
So, if Starfleet Academy is still giving the Kobyashi Maru after however many years it takes for Kirk to go from being a cadet to being on the verge of retirement, it’s safe to assume that it is no longer performing the function it once did. It should be abolished and replaced with something new. Furthermore, we should reassess the need to compare students to other students in these kind of universal, simplistic ways. When looking to the future, we should try to imagine something more nuanced, more accurate, and fairer. You know, the sort of thing the Federation might cook up.