Work

Socio-technical Systems for Identifying Latent Knowledge Gaps

Public

Asymmetric relationships between creators and consumers in peer-produced knowledge repositories produce inequitable knowledge representation--or knowledge gaps. These gaps result in unequal access to information, and downstream technologies that leverage peer-produced data perpetuate these inequities. Effective knowledge gap identification represents a necessary first step towards equitable knowledge representation. However, while prior work has uncovered a few important biases (e.g. gender, political, and cultural bias), no comprehensive and systematic way for identifying knowledge gaps exists. In this dissertation we investigate current approaches for known knowledge gap mitigation, and we propose novel methods for latent knowledge gap identification through two studies. In other words, 1) how do editors currently address known unknowns, and 2) how do we identify unknown unknowns? In our initial study we interview Wikipedia's editor community in order to better understand existing methods for knowledge gap identification. Study 1 documents editors' definitions of knowledge gaps, potential causes of knowledge gaps, and the social and technical framework editors use to identify missing subjects and to create new content. We show that editors use a system of lightweight markers in order to distribute work throughout the community and to systematically ``fill in'' certain topical areas that are traditionally underrepresented. Ultimately, we argue that new technical systems need to leverage these existing social and technical frameworks--not rely on the creation of new workflows--in order to be successful. Our findings from Study 1 reinforce much of the existing empirical work on knowledge gaps, but represent a unique perspective grounded in the editor community. Study 2 investigates one potential method for latent knowledge gap identification. In Study 2 we examine a reader-sourced approach, which leverages knowledge from Wikipedia's reader community in order to identify new knowledge gaps. We build on data produced by Wikipedia's Article Feedback Tool (AFT). Study 2 finds that, while it is challenging to build a machine classifier that can perfectly predict whether reader feedback will be helpful or unhelpful, we can still reduce editor workload associated with triaging reader feedback.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items