Exploratory Analysis of Cultural Differences in Programming Language Communities

worked on by: Anton Wille

Initial Outline

While much attention has been dedicated to dissecting and comparing the structural features of programming languages, less has been done to analyze the features of the communities forming around them. In this thesis, we aim to find and analyze some such differences and their origin, by taking a Grounded Theory Methodology (GTM) approach to discussions on the programming forum StackOverflow. Taking Richard A. Schweder’s definition of culture as “[...] conceptions of what is true, good, beautiful, and efficient”, we will examine conversations about how to program well in 3 different programming languages: Ruby, Python and Perl. While these 3 languages share many similarities, being first introduced in the same era, primarily as dynamically typed, high-level scripting languages, each language also exhibits distinct characteristics that merit closer examination, for example:

Perl’s motto of “There’s more than one way to do it” runs against Python’s idea of offering exactly one way to do “it”. Ruby seems to sit in the middle, both defining a “rubyesque” way of doing things while acknowledging different approaches.

All three languages thrive in different applicat applications: Perl’s advanced regex capabilities and ubiquity on Unix systems made it a preferred tool for sys-admins. Since the inception of Ruby on Rails in 2004, Ruby found popularity in the web-development community. Python with its easy interoperability with C made it the default choice for machine learning and Data Science in general, and due to its simple, clean syntax often taught in universities and recommended to beginners.

While Perl was developed in the US, Python hails from the Netherlands, and Ruby from Japan. The cultural differences of their creators and initial communities might be reflected in the way people want to program in these languages or perceive its style.

These points can serve as a good initial focus for examination in combination with the categories we can derive from Schweder's definition of culture, however given the open nature of GTM and the broadness of the topic, the direction of the research might shift during encoding and theory building.

Abstract of thesis

Taking a Grounded Theory (GT) approach to discussions on the QA-platform StackOverflow, this exploratory work aims to find, analyse and compare cultural characteristics in the communities forming around Perl, Python and Ruby, and contrast them wherever they diverge, with the idea that differences in design decisions, different contexts of practice, and distinct histories lead to different pronounciations in culture. During the exploratory process, multiple interesting directions of investigation were opened, with two main questions emerging: “Should there be one or multiple ways of doing things in a programming language?”, and “What does it mean to write idiomatic code?”. The thesis offers some preliminary answers to these questions, insights into the character of these communities, and can hopefull serve as a springboard for future investigations.

Timeline

A more detailed timeline can be found under /notes in the thesis repository.

Week 1-2:

  • Creating Scripts to export and format the Data Sources
  • Setting up MaxQDA
  • Searching for and reading some comparable papers to get a feel for the topic

Week 3-6:

  • Initial Rounds of coding stackoverflow posts and refining ideas
  • Trying to develop axis and finding the right scope for the investigation
  • Trying to get a better understanding of what culture constitutes and how the different communities exude their own
Week 7-8
  • Preparation for the presentation in the seminar "Beiträge zum Software Engineering
  • Developing a "narrative" or framing for the thesis
  • Reading more about GTM
  • Writing memos (Oversight to not have done this earleir)

Week 9

I was hit by Covid and outside of doing some formatting and light rereading did not get done much more

Week 10-12

  • Writing of the actual thesis, more or less chronologically with Introduction, Methodology, Application, Results and Discussion/Conclusion
  • A lot of restructuring, reinterpretation and also additional encoding was required
  • Finally, forrmatting and hand in
  • A small quantitative analysis was started but not finished in time for the deadline

Difficulties, Mistakes, Regrets

Given the broad research question, the main difficulty of the thesis was finding the right scope of things. During the open coding phase, a lot of the directions I took looked promising, and probably could have turned out well as well, which possibly led to more breadths than depths. This is a slight regret, but also a natural limitation of the short time-frame of a Bachelor thesis. In fact, the discussion section could have easily been three times as long, I cut out a lot in order to not go beyong the word recommendation too much.

Two mistakes are worth mentioning as well: I severely underestimated the utility of memos (despite this being constantly mentioned in the literature). Given the short timeframe and limited scope, I thought it would be fine to just take notes and use version control, but looking back this made documentation more difficult, and also left me sometimes confused about my own intentions or interpretations. Often times, what seems to be clear in thought is significantly less sharp in writing. This became very apparent once I started writing the text of the thesis; some of my arguments and connections in the data did not work out as I thought they would. Starting relatively late with writing the actual text was my second mistake, and led to a tight deadline, and some minor inconsistencies in the written paper.

Project Repository: https://github.com/AntonWille/PatternPilot - All code and all notes as well as later on things like zotero files can be found here