(Reuters) – American high school students are terrible writers, and one education reform group thinks it
has an answer: robots.
Or,
more accurately, robo-readers – computers programmed to scan student essays and spit out a grade.
The theory is that
teachers would assign more writing if they didn’t have to read it. And the more writing students do, the better at it
they’ll become – even if the primary audience for their prose is a string of algorithms.
That sounds logical to Mark
Shermis, dean of the College of Education at the University of Akron. He’s helping to supervise a contest, set up by the
William and Flora Hewlett Foundation, that promises $100,000 in prize money to programmers who write the best automated
grading software.
“If you’re a high school teacher and you give a writing assignment, you’re walking home with 150
essays,” Shermis said. “You’re going to need some help.”
But help from a robo-reader?
“Wow,” said Thomas Jehn,
director of the Harvard College Writing Program. He paused a moment.
“It’s horrifying,” he said at
last.
Automated essay grading was first proposed in the 1960s, but computers back then were not up to the task. In the
late 1990s, as technology improved, several textbook and testing companies jumped into the field.
Today, computers are
used to grade essays on South Dakota’s student writing assessments and a handful of other high-stakes exams, including the
TOEFL test of English fluency, taken by foreign students.
But machines do not grade essays on either the SAT or the
ACT, the two primary college entrance exams. And American teachers by and large have been reluctant to turn their students’
homework assignments over to robo-graders.
The Hewlett contest aims to change that by demonstrating that computers can
grade as perceptively as English teachers – only much more quickly and without all that depressing red ink.
Automated
essay scoring is “nonjudgmental,” Shermis said. “And it can be done 24/7. If students finish an essay at 10 p.m., they get
feedback at 10:01.”
Take, for instance, the Intelligent Essay Assessor, a web-based tool marketed by Pearson
Education, Inc. Within seconds, it can analyze an essay for spelling, grammar, organization and other traits and prompt
students to make revisions. The program scans for key words and analyzes semantic patterns, and Pearson boasts it “can
‘understand’ the meaning of text much the same as a human reader.”
Jehn, the Harvard writing instructor, isn’t so
sure.
He argues that the best way to teach good writing is to help students wrestle with ideas; misspellings and
syntax errors in early drafts should be ignored in favor of talking through the thesis. “Try to find the idea that’s
percolating,” he said. “Then start looking for whether the commas are in the right place.” No computer, he said, can do
that.
What’s more, Jehn said he worries that students will give up striving to craft a beautiful metaphor or
insightful analogy if they know their essays will not be read, but scanned for a split second by a computer
program.
“I like to know I’m writing for a real flesh-and-blood reader who is excited by the words on the page,” Jehn
said. “I’m sure children feel the same way.”
Even supporters of robo-grading acknowledge its limitations.
A
prankster could outwit many scoring programs by jumbling key phrases in a nonsensical order. An essay about Christopher
Columbus might ramble on about Queen Isabella sailing with 1492 soldiers to the Island of Ferdinand — and still be rated as
solidly on topic, Shermis said.
Computers also have a hard time dealing with experimental prose. They favor conformity
over creativity.
“They hate poetry,” said David Williamson, senior research director at the nonprofit Educational
Testing Service, which received a patent in late 2010 for an Automatic Essay Scoring System.
But Williamson argues
that automated graders aren’t meant to identify the next James Joyce. They don’t judge artistic merit; they measure how
effectively a writer communicates basic ideas. That’s a skill many U.S. students lack. Just one in four high-school seniors
was rated proficient on the most recent national writing assessment.
The Hewlett Foundation kicked off its
robo-grading contest by testing several programs already on the market. Results won’t be released for several weeks, but
Hewlett officials said they did very well.
Hewlett then challenged amateurs to come up with their own
algorithms.
The contest, hosted on the data science website Kaggle.com, has drawn hundreds of competitors from all
walks of life. They have until April 30 to write programs that will judge essays studded with awkward phrases such as, “I
slouch my bag on to my shoulder” or “When I got my stitches some parts hurted.”
The goal is to get the computer to
give each essay the same score a human grader would.
Martin O’Leary, a glacier scientist at the University of
Michigan, has been working on the contest for weeks.
Poring over thousands of sample essays, he discovered that human
graders generally don’t give students extra points for using sophisticated vocabulary. So he scrapped plans to have his
computer scan the essays for rare words.
Instead, he has his robo-grader count punctuation marks. “The number of
commas is a very strong predictor of score,” O’Leary said. “It’s kind of weird. But the more, the better.”
As he
digs into the data, O’Leary has run into a dismaying truth: The human graders he’s trying to match are inconsistent. They
disagree with one another on the merits of a given essay. They award scores that seem random. Indeed, studies have shown that
human readers are influenced by factors that should be irrelevant, such as how neatly a student writes.
“The reality
is, humans are not very good at doing this,” said Steve Graham, a Vanderbilt University professor who has researched essay
grading techniques. “It’s inevitable,” he said, that robo-graders will soon take over.
O’Leary won’t mind when that
day comes. He tests his program against student prose that has already been graded by a teacher. When the scores diverge,
O’Leary reads the essay to find out why.
“More often than not,” he said, “I agree with the computer.”
(Editing
by Jonathan Weber and Philip
Barbara)