Dark Patterns in Deep Reinforcement Learning Research

We’ve seen an explosion of the deep RL field since the DQN paper back in 2014. Every conference, more and more people jump on the bandwagon to work on one more shiny algorithm that brings AGI a little bit closer.

However, the authors often forget that they are doing science and play the ‘academia’ game too hard churning out more and more papers for the sake of publishing and climbing higher on the career ladder. They release sloppy research, they cherry pick the results, and they oversell their ‘biologically inspired’ algorithms in order to get more followers on twitter and become AI influencers.

In this post, I’ll identify most popular dark patters in deep RL research that get some of the labs 100500 accepted papers per conference. I will not call any names here, but if you read this blogpost and recognize yourself, shame on you! I feel pity for you. It might be hard for you. You might be a professor in a top-tier University or a research scientist in a top industrial lab, but you still know you’re a fraud, don’t you?

Designing the task to make your algorithm/story work.

I won’t exaggerate if I say that every third deep RL paper uses this one. You make up a task that your algorithm is born to solve. There’s nothing wrong with experimental design, but when you design both, the problem, and the solution, you might end up with a useless problem and the useless algorithm when applied outside of your useless problem.

Vision -> Problem -> Solution -> Adding more stuff -> ??? -> Profit

This one is my favorite. You mention that we need X for AGI. Then you say that X is hindered by Y. You introduce your algorithm to fix Y and show huge performance benefits. However, somewhere along the way, you add five more complications to your algorithm that have no relations to X or Y. You forget to add ablations, but somewhere after the second submission, the reviewers forget to ask.

Not tuning the baseline.

Everyone is guilty of that. You run the baseline, you get some numbers within the first couple of weeks of your shiny new project. And you spend the rest four months on tuning your baby-algorithm. You might also change the experimental setting a bit, but why would you rerun the baseline, it won’t work anyways, right? RIGHT?

Cherry picking the seeds.

Another popular tool in a toolbox of a successful researcher. Run your stuff until you get decent performance, run the baseline until you get enough of bad seeds, average and be happy.

Resubmitting the paper without incorporating reviewer’s feedback.

This is a new one to me as a reviewer. You point out some flaws in the paper, the paper gets rejected. You bid for the paper on the next conference, and you see IT. You see the paper without even mentioning the flaw you pointing out. Academia is a huge lottery, the one who buys more tickets, wins!

Pseudo code release.

This one is easy. Release the implementation of your algorithm, but never disclose the experimental pipeline. This is dangerous, people might find you’re sloppy or cheating, people might find bugs in your evaluation pipeline. Never release the full code. Life is too short to deal with pesky issues on Github.

Implementation hacks.

People have been talking about this for quite a while. This is also related to the second item in this list. You sell a big vision, you sell a story. But it doesn’t work. What do you do? You add complications to the code no one will check. You might not even need to cherry pick in this case!