The Semantic Reader Project

submited by

Style Pass

2024-10-10 06:00:03

The exponential growth in the rate of scientific publication4 and increasing interdisciplinary nature of scientific progress27 makes it increasingly hard for scholars to keep up with the latest developments. Academic search engines, such as Google Scholar and Semantic Scholar, help scholars discover research papers. Techniques such as automated summarization help scholars triage research papers.5 But when it comes to actually reading research papers, the process, often based on a static PDF format, has remained largely unchanged for many decades. This is a problem because digesting technical research papers in their conventional formats is difficult.2

In contrast, interactive and personalized documents have seen significant adoption in domains outside of academic research. For example, news websites such as the The New York Times often present interactive articles with explorable visualizations that allow readers to understand complex data in a personalized way. E-readers, such as the Kindle, provide in situ context to help readers better comprehend complex documents, showing inline term definitions and tracking the occurrence of characters in a long novel. While prior work has envisioned how authoring tools can reduce effort in creating interactive scientific documents,13 they have not seen widespread adoption. Furthermore, millions of research papers are locked in the rigid and static PDF format, whose low-level syntax makes it extremely difficult for systems to access semantic content, augment interactivity, or even provide basic reading functionality for assistive tools such as screen readers.

The experience of reading information-dense scientific papers has remained unchanged in decades, relying on aging formats with static content and low accessibility.