MOOCs (Massive Open Online Courses) are slightly under fire these days because of their low completion rates, among other things – plenty of articles on that.
However, they are also bringing the world interesting insights about the assessment process (e.g. “did you learn something?”), such as these published in the attached paper. Because of the large (okay massive) scale of these courses, different approaches to grading are needed if there’s a desire to go beyond the multiple choice question. This is what the paper studied in two iterations of a course with over 3,600 students each, submitting a total of 13,972 assigments.
The “untuned” way of peer assessment goes like this:
- Student creates an assignment like an essay
- Student submits the assignment digitally
- Assignment is graded by 3-4 other students in the class, blinded to the student’s demographics
- The student is also required to grade 3-4 other assigments
- Each student’s grade is the average of the 3-4 peer assessments
Completion rate of one of the courses I took – not so good, but I learned a ton View on Flickr.com
So far so good, it’s a great way to deal with the fact that a few instructors can’t grade 65,000 papers. I’ve actually done it myself, and I liked the process. And I’m not shy, here’s a link to my assignment (see: Finished my 3rd MOOC, and moving to renewable Energy in Washington, DC | Ted Eytan, MD – oops I should spell check my work next time, see what blogging does to me) and the peer grades attached to it are below (in the actually very good course I took on Climate Literacy).
This approach appears to break down, though, because some of my graders may be better students than others, some may tend to give everyone a good grade, some may not, and with only 3 graders, my “true grade” and the grade I’m given may be different. No fair!
What the researchers did in this analysis is use statistical methods to see if they could adjust (“tune”) the peer grading so that it was more likely to reflect the real grade. Here, in bullet point form are their findings/assessments:
- If every student graded every paper, you’d get a true grade, more accurate than instructors doing the grading (because they have biases, too). Unfortunately that doesn’t scale well.
- If you take into account a student’s PRIOR grading habits, you can improve grading accuracy – the easy grader’s score gets adjusted down, the hard grader’s score gets adjusted up
- If you take into account a student’s prior GRADES, you also improve grading accuracy – the better student is a better grader, the not-as-good student is a worse grader
- Interestingly, the BEST students are not as good of graders as the above average students, they also are harshest on the worst students, and the worst students inflate the scores of the best students
- If you look at time spent grading, about 20 minutes is the sweet spot, taking less time means grader is not paying enough attention, taking more means the grader may not fully grasp the content
The interesting thing about all of this is that there is a way to tune peer grading so that it is scalable. Using these methods and technology, the “best” grader combinations can assess each students’ work for greater reliability.
Could this work for doctors?
The public may not realize/recognize this, however we take tests, lots of them, every 1-10 years depending on the specialty and stage of career.
When we measure “cognitive skills,” which is part of the requirement maintain our board certification, we typically are tested using multiple choice exams that are up to 8 hours in duration, in special testing centers with computers, cameras, and gentle pat downs/ID checks.
So the question is, in a world where we (society, colleagues) want to assess a physician’s ability to know how to help a patient produce health, is peer grading a possibility?
I don’t know, but I think this research is promising. For several reasons:
- Peer grading allows one to assess almost anything – from a video of a surgeon doing an operation, to an email reply back to a patient
- Peers don’t just have to be physicians – they could be nurses, they could be patients.
- Per the study above, a different dimension of assessment can be added, which is “are you a good grader?” so it’s not just whether you know the material or do a good job, its whether you are good at assessing others.
Right now, my impression of the maintenance of certification experience is that it’s not-social, bordering on isolating.
Imagine that an assessment activity that brings people in (colleagues, the people they serve, the community) to support learning is a more healthy one. Since we’re in the health profession, this seems like a good direction.
This shouldn’t imply that assessment should become more complicated, time consuming, or punitive-feeling for the learner. Perhaps it could be as rigorous, more accurate, more natural, more connected to people’s every day existence and therefore connected to the compassionate care they strive to deliver on regular basis.
In my own experience (above), I enjoyed reading essays written by students from all over the world (and even saved a few) – they provided great insights on the topic and created a sense of community.
When we assess doctors throughout their lifetime, we want to assess how well they perform for the people they serve, and we’re learning (well, we know) that having “knowledge” results in good performance if a physician knows how to connect this knowledge to the life experience of a patient and their family. I think it was said best in 1966 (!):
It may be that computers will soon diagnose better than doctors. But the facts fed to computers will still have to be the result of intimate, individual recognition of the patient.
Has this ever been tried with doctors?
Yes it has. I’ll post tomorrow about a recently published experiment in peer reviews of surgical technique.
H/T Candace Thille, from the Office of the Vice Provost for Online Learning at Stanford University for sending me this article.
What do you think? Open mike.