Embrace the Random: A Case for Randomizing Acceptance of Borderline Papers
A case for using randomization in the selection of borderline academic papers, a particular use case which has parallels with many other possible scenarios.
By Balázs Kégl, TektosData.
One of the strongest opening lines is from this seminal book on pattern recognition (aka machine learning):
“Life is just a long random walk.”
It stuck into my mind. All I could add was “perhaps, maybe, with a drift” that you control, but that’s it. Our life is full of random events that we later try to fit into a well-constructed narrative, with more or less success.
There are several systems that proudly embrace the random: the US green card lottery, the Vietnam war draft lottery, or, more recently, the Paris medical student selection draw, not without controversy of course. There are others which cling to the deterministic narrative, even at the price of several person-years of work. One of these is our selection procedures of papers in high-profile scientific conferences.
I was an area chair of ICML this year. I had sixteen papers. Two of them were strong accepts with three positive reviews, some enthusiastic. Eight of them were clear rejects, half-written, technically flawed, irrelevant, or without novelty (yes, a surprisingly high fraction). The remaining six were what we call borderline. They were all technically correct, novel, relevant, and clearly written. Reviews varied because of “significance”, a highly subjective criteria based on 80% reviewer taste and 20% experience. The unofficial target acceptance rate (dictated by two factors, largely irrelevant to paper quality: the number of submitted papers and the capacity of the conference program) was 25%, which meant that roughly one third of the borderline papers could be accepted. How did we make the decision? With a lot of hard head-scratching, reading and re-reading the paper and the reviews, initiating reviewer discussions, and a lot of work of the program chairs (I don’t think Nina and Kilian slept a lot that week). What I found the most ludicrous is the realigning of reviewer score: to sweeten the bitter pill of rejection, we pressure the positive reviewers to lower their marks, to make it look like the decision was non-random.
I think we should make a random decision on borderline papers (possibly with a biased coin for authors with multiple papers or with other penalties to discourage flooding the system), or accept all of them but randomly select those to be presented. By borderline papers, again, I mean: technically correct, clearly written, relevant, and novel, but with questionable significance.
Here are my arguments.
- We have known it’s random since the now famous NIPS experiment. The numbers above (12% clear accept, 12% random out of 40% borderline) are actually fitting the data. My suggestion does not make it more or less random, it makes us to embrace its randomness.
- It is cost effective. We spend about 25 to 50 person-years to review the yearly batch of ICML and NIPS papers. That is a full research carrier. Any savings there can go into other production.
- Reviews can be more objective. We could concentrate on clarity, technical quality, relevance (which is not the same as significance), and novelty. Significance is hard to judge and highly subjective. As a reviewer, I’m there to keep up paper quality. But who am I to decide what people should work on?
- It’s fair and psychologically easier to accept.
- It randomizes search. We are arguably fixating too much and too fast on certain topics. This generates our famous waves of Hollywoodesque hypes, bubbles, and stars. Randomization alone will not solve this but it will alleviate it.
Original. Reposted with permission.
- Random vs Pseudo-random – How to Tell the Difference
- Plausibility vs. probability, prior distributions, and the garden of forking paths
- 21 Must-Know Data Science Interview Questions and Answers