Siân Brooke,  Programmed differently? Testing for gender differences in Python programming style and quality on GitHub, Journal of Computer-Media

Programmed differently? Testing for gender differences in Python programming style and quality on GitHub

submited by
Style Pass
2024-10-31 01:00:06

Siân Brooke, Programmed differently? Testing for gender differences in Python programming style and quality on GitHub, Journal of Computer-Mediated Communication, Volume 29, Issue 1, January 2024, zmad049, https://doi.org/10.1093/jcmc/zmad049

The underrepresentation of women in open-source software is frequently attributed to women’s lack of innate aptitude compared to men: natural gender differences in technical ability ( Trinkenreich et al., 2021). Approaching code as a form of communication, I conduct a novel empirical study of gender differences in Python programming on GitHub. Based on 1,728 open-source projects, I ask if there is a gender difference in the quality and style of Python code measured in adherence to PEP-8 guidelines. I found significant gender differences in structure and how Python files are organized. While there is gendered variation in programming style, there is no evidence of gender difference in code quality. Using a Random Forest model, I show that the gender of a programmer can be predicted from the style of their Python code. The study concludes that gender differences in Python code are a matter of style, not quality.

This study examines whether there is a difference in Python programming styles between gender groups. I examine available code on GitHub, a cloud-based hosting platform for collaboration known as version control, often used in open-source software development. First, I infer the gender of users from their usernames and the information provided on their profiles, labeling users as feminine, masculine, ambiguous, and anonymous. Anonymous users had no gender-based markers on their profiles, while ambiguous users had feminine and masculine characteristics. I then collect the publicly available projects of these users written in Python. Next, I analyze and generate statistics on Python files’ adherence to style guidelines using a linter, an automated checking of source code for programmatic and stylistic errors. My findings reveal a gendered difference in the structure and components of Python files. However, I also discovered no gender difference regarding violations of Python style guidelines and code quality. This study shows gender difference in Python programming styles but not in the standard or quality of the code.

Leave a Comment