
Model Positionality: A Novel Framework for Data Science with Subjective Target Concepts


Data Science and related fields like Artificial Intelligence, Machine Learning, and Statistics provide indispensable research methods for understanding a wide variety of phenomena from large datasets. However, as methodical and empirical as these methods aim to be, there are many subjective and discretionary choices that the data scientist must make in order to build models and analyze data that are often not discussed or disclosed. One reason for this could be that data scientists are not often taught practices which promote acknowledgement, reflection, and discussion of these subjective choices. In this dissertation thesis, I discuss ways in which qualitative researchers have developed concepts, methods, and practices for conducting and sharing research that inherently incorporates subjective choices and how these norms help to foster critical reflection of these choices. My research first discusses the benefits of understanding statistical and machine learning models as having a positionality with respect to the phenomenon being modeled and the sociotechnical system that the model represents and is embedded within. Next, I discuss how our current approaches to understanding label divergence within a labeled dataset are based in inter-annotator agreement metrics which summarize the differences of judgement among many thousands of pairs of annotators into one single number making it difficult to identify the specific contexts and specific annotator groups who disagree. To address this, I developed the inferred agreement and annotator fingerprinting methods which help us to better characterize and track the divergence of annotator perspectives. These two methods can be used with unsupervised clustering algorithms to identify groups of annotators who are most likely to disagree in a process that I call position mining. Finally, I show how all these methods together can be used in a framework which incorporates critical reflection of various perspectives of a phenomenon being modeled contextualized in relation to the crowd annotators’ positions, the data scientist’s position, and the model’s position. As the emerging field of Data Science matures and cements its epistemological position and pedagogy, this thesis argues that data scientists should do more to reflect on the social construction of the knowledge it yields by incorporating more techniques and concepts from qualitative research.

Alternate Identifier
Date created
Resource type
Rights statement

