With the aim of making existing and future research and insights on the novel corona virus more accessible, metasphere is working on tools to help researchers navigate and make better sense of the growing corpus of scientific literature.
As research into COVID-19 is surging and a multitude of routes are being explored, the volume of produced papers makes maintaining oversight over the current state of research challenging. With the aim of getting new research teams working on COVID-19 related topics up to speed faster, an international group of volunteers is building a visual experience to provide a landscape of the current research as well as tools to enable researchers to collectively work through the amount of information being produced at faster rates and more effectively.
To make the complex corpus of research papers on COVID-19 more accessible, our approach is to cluster papers according to their semantic similarity and plot them on a map, thereby making the research landscape explorable as a whole.
We are currently working with the CORD-19 dataset which consists of roughly 30,000 articles. When trying to make such a large corpus of documents visually navigable in its entirity while accounting for a large number of interrelations between the papers, the complexity makes it hard to make sense of most visualizations. Our approach is to visualize clustered objects, forming interconnected networks, in a way that makes exploration more effortless – by utilizing the design of geospatial maps.
Visualizing the research landscape on a topographic map gives the benefits of building on well-learned navigational patterns in order to make the corpus navigable in the literal sense. Researchers can pan around to explore the landscape, zooming in and out to explore topical clusters. Additionally this form of visualization gives us an ability to encode structural attributes of the corpus (i.e. relevance of authors/papers, citation structure of connected papers, etc) in spatial features (i.e. size, elevation or proximity).
Moreover, adding heterogenous visual clues by placing the content on landforms, let’s us address the problem of uniformity in complex network visualizations. Common representations of clusters using uniform visual indicators such as colored dots tend to overwhelm the reader because they are not able to tell those indicators apart. This makes what appears to be “simple” network visualizations hardly comprehensible when visualizing a large number of interrelations. With humans being good at picking out patterns in seemingly random noise, we give the reader forms to visually latch on to, thereby making it easier to distinguish on what topical “continent” a certain piece of information “lives”.
In order to build a better understanding of the utility of possible features for our tools, we are currently conducting a survey with the aim of learning how researchers engage in their scholarship efforts. We hope to gather insights on how research papers are being read by different kinds of researchers, how they are using software for assisting their insight generation and what aspects are missing in such software.
Your participation in our short survey (completion should take only around 5–10 minutes) will greatly help us in making our tools as relevant as possible. So if you are a researcher – no matter from what field – please consider helping our efforts by filling out the survey below:
In the last month, we focused our efforts on two aspects: the clustering of research papers into topics and the visualization of those topics on the topographic knowledge map. The first aspect is laid out by our data scientists who explore different machine learning models to effectively cluster the research and extract keywords from it. The latter aspect is shaped by our designers and developers hand-in-hand with the data scientists overseeing the language processing. We have achieved workable results in both categories.
While working on the data processing and visualization, we have started gathering ⟶ input from the research community on desirable features for the tools. This ongoing dialogue with researchers allows us to define crucial features for the interface of the research explorer. Our current focus lies on combining the semantic mapping with the reading experience into one coherent interface which will be released as a first public version.
version 1 topographic map of the CORD-19 dataset, interface for reading and exploring papers
extract main topics
extract sub-topics within main topics
extract clustering information based on semantic proximity
extract keywords for topics
find optimal number of sub-topics within each main cluster
test, validate and fine-tune model
distribution of sub-topics within main clusters
distribution of papers within sub-topics
algorithm for landform generation
styling of landform generation
gather insights on feature-set from research community
define interface features for version 1
test and release
version 2 improved version with live-data and option to include pre-prints
add external sources to update the dataset
option to include papers released on pre-print servers
add a timeline feature
compile and download a summarized selection of papers
version 3 improved version open for all areas of research
This project is driven by volunteers, investing their expertise to develop tools that make a useful contribution to solving some of the current problems we are facing in the information sphere. We share interests in scientific research, data visualization and cartography, natural language processing and machine learning. We are open to expanding our team with anyone sharing these interests. If you want to contribute or share insights into the topics we are focusing on, we be very happy if you would contact us through the form below:
We are an international group of collaborators from diverse backgrounds. We found together through a hackathon on COVID-19 related projects where we started working on a protoype of the research explorer. Now we are continuing our efforts with the goal of creating a tool to help scientists extract knowledge out of vast information landscapes.
Julian is a design director and strategist working at the intersection of design and technology. He initiated metasphere as a design approach for navigating complexity and shapes the longterm strategy and focus of the project.
Yashar is a social researcher by training and has been working at the nexus of technology, strategy and education. He has mainly focused on theories of human action and in particular on the behavioral influences of structured methodologies in inciting action and the modifications they bring about on personal theories of thought and action.
Kikuo is a software developer having expertise in a wide variety of programming languages. He helps implementing visualization algorithms as well as building the backend architecture of the project. He is specifically interested in serverless architectures and single page applications.
Having studied computer sciences and worked as a developer in the web industry, he shifted his focus to the design field. For metasphere he is contributing as an interface and information designer. In his spare time he is researching on how to apply compositional methodologies of book design to responsive web design paradigms.
As an epidemiologist, Ewas overarching goal is to contribute to disease prevention through disease surveillance and research efforts. She has experience working on various domestic and global projects in academia, NGOs, and local health departments. Her specific interests include infectious disease surveillance and spatial epidemiology.
In addition to the currently active team, contributions to this project have been made by Beatriz Yumi Simões de Castro, Donatus Herre, Agnes Ferenczi, Celine Kuttler, Anja Krivograd, and Andrea Giacobino.
Thank you all a lot ♥