Topics for Student Projects

We supervise students at the University of Konstanz who can work on one of the topics on the list. If you want to work with us on one of these topics, send your CV and transcript of courses to david.garcia@uni-konstanz.de. We also welcome project ideas from students, if you want to propose your own one, please add a description similar to the ones you can see below.

  1. Replication and validation of bot detection on Twitter
    Implementing a machine learning predictor of automated Twitter accounts based on previous classifiers including Botometer, totervogel, and other recent approaches (Sayyadiharikandeh et al 2020 and González-Bailón & De Domenico, 2021). Evaluation using standard datasets of automated Twitter accounts and emerging data on novel bots to provide a critical benchmark and comparison of methods.
  2. Corpus and analysis of #yes2* and #no2* hashtags
    Creating a list of hashtags that start with #yes2 and #no2 from a random tweet sample and collecting tweets using those hashtags to produce a corpus. Then annotate hashtags regarding their topic and test if #yes* hashtags are more successful than #no2* hashtags.
  3. Network analysis of colexification
    Analysis of networks connecting words through common translations, following our recent work studying affective meaning in this kind of network (Di Natale, Pellert & Garcia, 2021). This analysis can focus on various topics, such as the topological structure of the network (e.g. communities, comparison with other word networks), cultural aspects when including word ratings, and gender biases that generate stereotypical associations.
  4. Zero-shot and few-shot emotion detection with large language models
    Applying zero-shot models based on Natural Language Inference and few-shot models (e.g. GPT-3, GPT-neo) to detect emotional expression and sentiment in various kinds of text. Initial results of this approach are promising (Yin, Hay, & Roth, 2019) and can be systematically extended with exhaustive sentiment and emotion benchmark datasets.
  5. Data and resource integration for sentiment analysis: sentiment lexica
    Integrating data can be one of the most impactful scientific tasks when resources are scattered. Currently, there are many lexica of words and phrases annotated for emotion or sentiment in multiple languages. Including LIWC translations, adaptations of ANEW, and other multilingual resources, this project would curate and document a single-source to integrate all these lexica.
  6. Improvement of sentiment analysis with irony and sarcasm detection
    Integration of existing approaches to detect irony using established annotated datasets, testing if their inclusion in sentiment analysis methods improves classification performance.
  7. Ideology estimation of Twitter users from retweet data
    Adaptation of methods to identify political alignment of social media users that use following data (Barberá et al., 2015) to test if archival retweet data can also be used as a data source for this measurement. Validation of the method against aggregated data (e.g. US regions) and application to track changes over time.
  8. Testing an association between replicability of research and gender of authors
    Combining a database of the replications of experiments in psychology and other behavioral sciences with gender annotations of authors, either manually or automatically, to test if experiments led or involving female scientists are more likely to replicate.