Dialectograms: Machine Learning Differences between Discursive Communities
Word embeddings provide an unsupervised way to understand differences in word usage between discursive communities. A number of recent papers have focused on identifying words that are used differently by two or more communities. But word embeddings are complex, high-dimensional spaces and a focus on identifying differences only captures a fraction of their richness. Here, we take a step towards leveraging the richness of the full embedding space, by using word embeddings to map out how words are used differently. Specifically, we describe the construction of dialectograms, an unsupervised way to visually explore the characteristic ways in which each community use a focal word. Based on these dialectograms, we provide a new measure of the degree to which words are used differently that overcomes the tendency for existing measures to pick out low frequent or polysemous words. We apply our methods to explore the discourses of two US political subreddits and show how our methods identify stark affective polarisation of politicians and political entities, differences in the assessment of proper political action as well as disagreement about whether certain issues require political intervention at all.
This project is available as a preprint at: https://arxiv.org/abs/2302.05657
Fixing Fieldnotes: Developing and Testing a Digital Tool for the Collection, Processing, and Analysis of Ethnographic Data
Ethnographic fieldnotes can contain richer and more thorough descriptions of social phenomena compared to other data sources. Their open-ended and flexible character makes them especially useful in explorative research. However, fieldnotes are typically highly unstructured and personalized by individual researchers, which make them harder to use as a method for data collection in collaborative and mixed methods research. More precisely, the unstructured nature of ethnographic fieldnotes presents three distinct challenges: 1) Organizability—it can be difficult to search and sort fieldnotes and thus to get an overview of them, 2) Integrability—it is difficult to meaningfully integrate fieldnotes with other more quantitative data types such as more such as surveys or geospatial data, and 3) Computational Processability—it is hard to process and analyze fieldnotes with computational methods such as topic models and network analysis. To solve these three challenges, we present a new digital tool, for the systematic collection, processing, and analysis of ethnographic fieldnotes. The tool is developed and tested as part of an interdisciplinary mixed methods pilot study on attention dynamics at a political festival in Denmark. Through case examples from this study, we show how adopting this new digital tool allowed our team to overcome the three aforementioned challenges of fieldnotes, while retaining the flexible and explorative character of ethnographic research, which is a key strength of ethnographic fieldwork.
Paper published Social Science Computer Review and can be found her: Link to paper
Ideological scaling of the Danish Parliament using word embeddings
Developing tools for Computational ethnography.
As part of my Ph.D, I work on integrating ethnographic fieldwork with computational and qual/quant methods. As a part of this work, I am involved in developing an app, for ethnographers to store, access, and manage fieldnotes in a more structured way. This allows for after-the-fact NLP analysis of the notes, but also for ethnographers to sort and access fieldnotes based on time, place, keyword, etc.
Read more about the app and desktop version and sign up here: https://ethnote.org/
This work was generously supported by the Carlsberg Fondation’s reserach infrastructure grant. Read more here: Link to Carlberg Foundation
A paper is also currently being prepared, which aims to introduce the benefits of working with the App.
Responsiveness of politicians to social media feedback
A main project of my Ph.D. where I examine to what extend social media feedback such as “likes”, can be said to drive the political agenda. The project is built on 8 years of Facebook data labeled using deep learning.
I am currently writing the results into a paper.