2019-08-15T14:54:53Z (GMT) by Lance C Novak
The scale of the scholarly community complicates searches within scholarly databases, necessitating keywords to index the topics of any given work. As a result, an author’s choice in keywords affects the visibility of each publication; making the sum of these choices a key representation of the author’s academic profile. As such the underlying network of investigators are often viewed through the lens of their keyword networks. Current keyword networks connect publications only if they use the exact same keyword, meaning uncontrolled keyword choice prevents connections despite semantic similarity. Computational understanding of semantic similarity has already been achieved through the process of word embedding, which transforms words to numerical vectors with context-correlated values. The resulting vectors preserve semantic relations and can be analyzed mathematically. Here we develop a model that uses embedded keywords to construct a network which circumvents the limitations caused by uncontrolled vocabulary. The model pipeline begins with a set of faculty, the publications and keywords of which are retrieved by SCOPUS API. These keywords are processed and then embedded. This work develops a novel method of network construction that leverages the interdisciplinarity of each publication, resulting in a unique network construction for any given set of publications. Postconstruction the network is visualized and analyzed with topological data analysis (TDA). TDA is used to calculate the connectivity and the holes within the network, referred to as the zero and first homology. These homologies inform how each author connects and where publication data is sparse. This platform has successfully modelled collaborations within the biomedical department at Purdue University and provides insight into potential future collaborations.