The Stack Overflow 2024 Survey, with 65,437 participants, is summarized in four dashboards analyzing how coding was learned, correlations between tools (languages, databases, platforms, OS, AI), and participants' views on AI.
I have a dataset containing the results of a survey conducted by the Stack Overflow website. (https://survey.stackoverflow.co/2024/ ) The main dataset consists of 65,437 rows and 114 columns. In addition, there is another dataset showing the schema, which consists of 87 rows and 6 columns.
Besides that, the survey results mostly consist of string values. For this reason, I will need to perform some data manipulation to extract meaningful insights from these string values during the analysis phase.
Some columns contain multiple values that I will need to evaluate for analysis as part of the data structuring process. At this point, I will first determine the group of data I want to analyze and then use Python - Pandas Library to organize the necessary values. I will store the data I plan to analyze for each dashboard in separate CSV files.
In my data cleaning process, the first major task was to create several main data groups from the survey dataset. I achieved this by selecting specific columns of interest and categorizing them into distinct data groups for better organization and analysis. Here's a brief overview of the groups I created: