A Mastodon bot for learning about new genes

A lot of our laboratory’s research concerns the attention gap for human genes: a small fraction of human genes receive a large majority of our collective research attention. Often, biologists choose to research genes that are already well-studied, even though understudied genes are just as likely to be intimately associated with human disease.

To play a small role in fighting this discrepancy, I created Gene of the Day, a Mastodon bot that posts about one new human gene every day. Biomedical researchers tend not to interact with genes they know little about, so I figured this would be a bemusing and enlightening source of information for scientists worldwide.

This isn’t my first foray into botmaking; I previously created a Twitter bot that posts the daily United States Air Quality forecast. This time around, I was hesitant to launch this bot on Twitter given the current state of Twitter and its developer API. Now that many scientists have moved over, I thought that Mastodon was better suited to host the bot at this time.

The bot uses data from NCBI Gene, PubTator, RCSB PDB and Uniprot and is hosted on botsin.space, a bot-friendly Mastodon instance. Every day, the bot selects a random human gene (excluding previously-posted genes) and posts bibliometric and database-specific information about it, as well as an experimentally-derived protein structure if one is available.

I chose to center bibliometric information on the gene over functional information for three reasons:

  1. I would like to demonstrate how many human genes have very little scholarship available about them. I expect many human genes that Gene of the Day chooses to highlight will not yet have been featured in a single publication nor will have an experimentally-derived protein structure. Although this will make some posts quite boring, I believe that it is important to remind people that there are dark areas in our understanding of the human genome.
  2. Many human genes have no associated functional annotation. Restricting the bot’s posts to just genes that have already been functionally annotated will privilege better-studied genes.
  3. Some functional annotations are derived from publications that are almost certainly paper mill products. For instance, consider the article PMID:29254169. This article reports a novel function for MIR182 but is almost certainly a paper mill product and is entirely bogus. This paper mill product has been detected and retracted, but many more remain undetected in the biomedical literature. I am cautious to include functional annotations in case the bot were to report the findings of paper mill products as factual.

The bot’s source code is available under the GPL-3.0 license on GitHub. If you have feedback or questions, I encourage you to submit them as issues on the bot’s GitHub page. I also encourage you to reply to posts with fun facts about the daily gene if it happens to be one of your favorites. Here is an example post:

Example post about EGFR from Gene of the Day

I believe that projects like these can help spur discussion and foster curiosity in the sciences, and there are plenty of other examples of this kind of creative botmaking (see @netzschleuder, @strangeattrbot, @ThreeBodyBot or browse botwiki for more examples). I encourage others to think about making similar bots. Making bots for Mastodon is relatively easy, so if you are interested, do not hesitate to reach out. I hope you enjoy Gene of the Day!