Author name disambiguation for collaboration network analysis and visualization

Andreas Strotmann, Dangzhi Zhao and Tania Bubela

In this paper we outline an algorithm for disambiguating author names of publications via deterministic clustering based on well-defined similarity measures between publications in which their names appear as authors. The algorithm is designed to be used for constructing a collaboration network, i.e., a graph of author nodes and co-author links. In this context, the goal is to produce a co-authorship graph with network characteristics that are close to those of the “true” collaboration network, so that meaningful network metrics can be determined from it. The algorithm we present here is fairly easily comprehended as it does not depend on any black-box AI techniques. This is important in the context of policy studies, in which we successfully applied it, as it enables policy makers to judge the soundness of the methodology with considerable confidence. It is also fast, making it possible to run large-scale analyses (here, in the order of a hundred thousand publications and the order of a million names to be disambiguated) on a moderately sized desktop computer within a few days.

