I was at dinner with a colleague this week—midterm week. Predictably, talk turned to the scourge of all professors: grading essays. There are few tasks in the life of a college professor less fulfilling than grading student essays. Every once in a while a really good essay jolts me to consciousness. I am elated by such encounters. To be honest, however, reading essays is for the most part stultifying. This is not the fault of the students, many of whom are brilliant and exuberant writers. I find it trying to wade through 25 essays discussing the same book, offering varying opinions and theories, while keeping my attention and interest. How many different ways can one ask for a thesis, talk about the importance of transition sentences, and correct grammar? For some time it is fun, in a way. One learns new things and is captivated by comparing how bright young minds see things. But after years, grading the essay becomes just part of the worst part of a great job.
So how might my colleagues and I react to news that EdX—the influential Harvard-MIT led consortium offering online courses—has developed software that will grade college student essays? I imagine it is sort of like how people felt when the dishwasher was invented. You mean we can cook and feast and don’t have to scrub pots and wash dishes? It promises to allow us to focus on teaching well without having to do that part of our job that we truly dread.
The appeal of computer grading is obvious and broad. Not only will many professors and teachers be freed from unwanted tedium, but also it may help our students. One advantage of computer grading is that it is nearly instantaneous. Students can hand in their work and get a grade and feedback seconds later. Too often essays are handed back days or even weeks after they are submitted. By then the students have lost interest in their paper and forgotten the inspiration that breathed life into their writing. To receive immediate feedback will allow students to see what they did wrong and how they could improve while the generative impulse underlying the paper is still fresh. Computer grading might encourage students to turn in numerous drafts of a paper; it may very well help teach students to write better, something that professorial comments delivered after a week rarely accomplish.
Another putative advantage of computer grading is its objectivity and consistency. Every professor knows that it matters when we read essays and in what order. Some essays find us awake and attentive. Others meet my eyes as they struggle to remain open. As much as I try to ignore the names on the top of the page, I can’t deny that my reading and grading is personalized to the students. I teach at a small liberal arts college where I know the students. If I read a particularly difficult sentence by a student I have come to trust, I often make a second effort. My personal attention has advantages but it is of course discriminatory. The computer will not do that, which may be seen by some as more fair. What is more, the computer doesn’t get tired or need caffeine.
Perhaps the most important advantage for administrators considering these programs is the cost savings. If computers relieve professors from the burden of grading, that means professors can teach more. It may also mean that fewer TA’s are necessary in large lecture courses, thus saving money for strapped universities. There may even be a further side benefit to these programs. If universities need fewer TA’s to grade papers, they may admit fewer graduate students to their programs, thus going some way towards alleviating the extraordinary and irresponsible over-production of young professors that is swelling the ranks of unemployable Ph.D.s.
There are, of course, real worries about computer grading of essays. My concern is not that the computers will make mistakes (so do I); or that we lack studies that show that computers can grade as well as human professors—for I doubt professors are on the whole excellent graders. The real issue is elsewhere.
According to the group “Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment,” the problem with computer grading of essays is simple: Machines cannot read. Here is what the group says in a statement:
Let’s face the realities of automatic essay scoring. Computers cannot ‘read.’ They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.
What needs to be taken seriously is not that computers can’t grade as well as humans. In many ways they grade better. More consistently. More honestly. With less grade inflation. And more quickly. But computer grading will be different than human grading. It will be less nuanced and aspire to clearly defined criteria. Are sentences grammatical? Is there a clear statement of the thesis? Are there examples given? Is there a transition between sentences? All of these are important parts of good writing and the computer can be trained to look for these characteristics in an essay. What this means, however, is that computers will demand the kind of clear, precise, and logical writing that computers can understand and that many professors and administrators demand from students. What this also means, however, is that writing will become more mechanical.
There is much to be learned here from an analogy with the rise of computer chess. The great grandmaster Gary Kasparov—who famously lost to Deep Blue— has perceptively argued that machines have changed the ways Chess is played and redefined what a good chess move and a well-played chess game looks like. As I have written before:
The heavy use of computer analysis has pushed the game itself in new directions. The machine doesn’t care about style or patterns or hundreds of years of established theory. It counts up the values of the chess pieces, analyzes a few billion moves, and counts them up again. (A computer translates each piece and each positional factor into a value in order to reduce the game to numbers it can crunch.) It is entirely free of prejudice and doctrine and this has contributed to the development of players who are almost as free of dogma as the machines with which they train. Increasingly, a move isn’t good or bad because it looks that way or because it hasn’t been done that way before. It’s simply good if it works and bad if it doesn’t. Although we still require a strong measure of intuition and logic to play well, humans today are starting to play more like computers. One way to put this is that as we rely on computers and begin to value what computers value and think like computers think, our world becomes more rational, more efficient, and more powerful, but also less beautiful, less unique, and less exotic.
Much the same might be expected from the increasing use of computers to grade (and eventually to write) essays. Students will learn to write in ways expected from computers, just as they today try to learn to write in ways desired by their professors. The difference is that different professors demand and respond to varying styles. Computers will consistently and logically drive writing towards a more mechanical and logical style. Writing, like Chess playing, will likely become more rational, more efficient, and more effective, but also less beautiful, less unique, and less eccentric. In other words, writing will become less human.
It turns out that many secondary school districts already use computers to grade essays. But according to John Markoff in The New York Times, the EdX software promises to bring the technology into college classrooms as well as online courses.
It is quite possible that in the near future, my colleagues and I will no longer have to complain about grading essays. But that is unlikely at Bard. More likely is that such software will be used in large university lecture courses. In such courses with hundreds of students, professors already shorten questions or replace essays with multiple-choice tests. Or they use armies of underpaid graduate students to grade these essays. It is quite likely that software will actually augment the educational value of writing assignments at college in these large lecture halls.
In seminars, however, and in classes at small liberal arts colleges like Bard where I teach, such software will not likely free my colleagues and me from reading essays. The essays I assign are not simple responses to questions in which there are clear criteria for grading. I look for elegance, brevity, insight, and the human spark (please no comments on my writing). Whether or not I am good at evaluating writing or at teaching writing, that is my aspiration. I seek to encourage writing that is thoughtful rather than writing that is simply accurate. When I have time to make meaningful comments on papers, they concern structure, elegance, and depth. It is not only a way to grade an essay, but also a way to connect with my students and help them to see what it means to write and think well.
And yet, I can easily imagine making use of such a computer-grading program. I rarely have time to grade essays as well or as quickly as I would like. I would love to have my students submit drafts of their essays to the EdX computer program.
If they could repeatedly submit their essays and receive such feedback and use the computer to catch not only grammatical errors but also poor sentences, redundancies, repetitions, and whatever other mistakes the computer can be trained to recognize, that would allow them to respond and rework their essays many times before I see them. Used well, I hope, such grading programs might really augment my capacities as a professor and their experiences as students.
I have real fears that grading technology will rarely be used well. Rather, it will too-often replace human grading altogether and in large lectures, high schools and standardized tests will impose a new and inhuman standard on the way we write and thus the way we think. We should greet such new technologies enthusiastically and skeptically. But first, we should try to understand them. Towards that end, it is well worth reading John Markoff’s excellent account of the new EdX computer grading software in The New York Times. It is your weekend read.