A lot of our laboratory’s research concerns the attention gap for human genes: a small fraction of human genes receive a large majority of our collective research attention. Often, biologists choose to research genes that are already well-studied, even though understudied genes are just as likely to be intimately associated with human disease.
To play a small role in fighting this discrepancy, I created Gene of the Day, a Mastodon bot that posts about one new human gene every day. Biomedical researchers tend not to interact with genes they know little about, so I figured this would be a bemusing and enlightening source of information for scientists worldwide.
This isn’t my first foray into botmaking; I previously created a Twitter bot that posts the daily United States Air Quality forecast. This time around, I was hesitant to launch this bot on Twitter given the current state of Twitter and its developer API. Now that many scientists have moved over, I thought that Mastodon was better suited to host the bot at this time.
The bot uses data from NCBI Gene, PubTator, RCSB PDB and Uniprot and is hosted on botsin.space, a bot-friendly Mastodon instance. Every day, the bot selects a random human gene (excluding previously-posted genes) and posts bibliometric and database-specific information about it, as well as an experimentally-derived protein structure if one is available.
I chose to center bibliometric information on the gene over functional information for three reasons:
- I would like to demonstrate how many human genes have very little scholarship available about them. I expect many human genes that Gene of the Day chooses to highlight will not yet have been featured in a single publication nor will have an experimentally-derived protein structure. Although this will make some posts quite boring, I believe that it is important to remind people that there are dark areas in our understanding of the human genome.
- Many human genes have no associated functional annotation. Restricting the bot’s posts to just genes that have already been functionally annotated will privilege better-studied genes.
- Some functional annotations are derived from publications that are almost certainly paper mill products. For instance, consider the article PMID:29254169. This article reports a novel function for MIR182 but is almost certainly a paper mill product and is entirely bogus. This paper mill product has been detected and retracted, but many more remain undetected in the biomedical literature. I am cautious to include functional annotations in case the bot were to report the findings of paper mill products as factual.
The bot’s source code is available under the GPL-3.0 license on GitHub. If you have feedback or questions, I encourage you to submit them as issues on the bot’s GitHub page. I also encourage you to reply to posts with fun facts about the daily gene if it happens to be one of your favorites. Here is an example post:
I believe that projects like these can help spur discussion and foster curiosity in the sciences, and there are plenty of other examples of this kind of creative botmaking (see @netzschleuder, @strangeattrbot, @ThreeBodyBot or browse botwiki for more examples). I encourage others to think about making similar bots. Making bots for Mastodon is relatively easy, so if you are interested, do not hesitate to reach out. I hope you enjoy Gene of the Day!