The Challenges of the Literature Review Process and the Promise of Artificial Intelligence

by Michael Shensky

A comprehensive and well conducted literature review is the foundation on which new research is built, yet carrying out a successful review of published academic work on a particular topic can be challenging. Given the importance of the task and the significant work involved in completing it correctly, it is worth asking whether there is anything that can be done to make this critical part of the research process easier and more efficient. As new tools, technologies, and methods are developed there will inevitably arise opportunities to successfully apply them to existing workflows, but not every new approach will necessarily lead to improvements in every workflow. Consequently, it is necessary to test and explore when there are new ways to approach long standing challenges to see if there is or is not a benefit offered by a new approach. One widely heralded “new” tool that many are exploring for improving established workflows in a variety of fields and disciplines is artificial intelligence. While artificial intelligence as a concept has existed for decades and techniques like machine learning have been used for years to solve data processing challenges, there remain opportunities to explore its potential application for improving aspects of the research process and, in particular, to facilitating engagement with academic literature.

Developing strong familiarity with the academic literature in one’s area of research is important because it is imperative at the outset of a new research study to have a firm understanding of what has already been discovered in the chosen area of inquiry, to know which methodologies have proven effective for carrying out specific research workflows, to have context for the work in which you will be engaging, and to be familiar with the research questions that have been suggested as grounds for new investigation. Without the foundation offered by a comprehensive literature review, a researcher risks using sub-optimal or out of date methodologies, drawing conclusions from only a limited sample of available data, testing hypotheses that have others have already examined, and other potential research issues. In order to avoid such pitfalls, researchers should strive to develop a robust literature review workflow that allows them to accomplish several key tasks including: searching all relevant databases of academic literature to find potentially significant scholarly publications, filtering out published studies that are not relevant, downloading manuscript files for easy access and detailed review, organizing downloaded manuscripts to ensure that none are lost or overlooked, accessing research data associated with relevant studies when it is significant to the new research they will be engaging in, and maintaining the ability to efficiently find new relevant literature over time as it published.

Each of these tasks can be difficult and time consuming to execute successfully when carrying out a literature review manually and there are many potential benefits to automating these tasks where possible. Scripts written in languages such as R and Python now offer the ability to carry out some or all of these tasks. While some are fairly straight forward to automate, like using a Python script to automate the sending of a search query to multiple academic publisher databases using their respective APIs so that one can retrieve metadata and in some cases full text about scholarly publications, other tasks are not as suitable to the application of hard coded processes. Perhaps the trickiest aspect of the literature review process from the stand point of automation, is the task of determining which publications are relevant to an individual researcher given their particular research interests. Using carefully crafted search queries when looking for literature is helpful, but it is often not entirely adequate for determining which studies are directly relevant to a particular area of research and worth consulting before engaging in new work in that area. This is where artificial intelligence may be able to play a helpful role.

Currently, researchers who are manually going through the process of review scholarly publications like academic journal articles are already making decisions about which publications they will read, reference, and cite, and which they will not engage with in depth based on their perception of the publication’s relevance to their work. As new research on their topic of interest emerges, researchers must individually review each new publication to determine if it is of potential relevance to their work which requires periodic searching for newly published research and then manually reviewing anything new that is found. Artificial intelligence models that can learn from data that is generated by this human classification of data into discrete categories offer a potential solution to this time consuming challenge of periodically having to find and evaluate the relevance of newly published research in a particular area of study. This potential exists in part because researchers are typically already sorting the scholarly publications they find during an initial literature review on a topic into the categories of relevant (those worth reading in full, referencing, etc.) and not relevant. If the results of this classification process are saved in such a way that they are recorded as well structured data, in the form of a CSV for example, that data could potentially be used to train an algorithm to carry out a learned imitation of the same sorting logic on other scholarly publications. Such an algorithm could then be deployed through a scripted process which could periodically search for new academic literature for the algorithm to evaluate. With this capability in place it is possible to imagine a fully automated literature review workflow that is personalized for the individual research interests of users who have themselves trained it to identify, retrieve, download, organize, and access associated research data for scholarly publications similar to those they have already classified as relevant to their work. Considering the benefits such an automated workflow would offer in terms of providing time savings, allowing for reproducibility, and facilitating standardized organization of downloaded resources, there is good justification for pursuing the development of a script that leverages AI to improve the literature review process for researchers.