Exploring Mental Health in the Tech Industry

Project Introduction

The tech industry's demanding work environment has highlighted the importance of addressing mental health issues among its workers. This project aims to analyze a comprehensive dataset collected by Open Source Mental Illness (OSMI) from 2014 to 2019 to gain insights into tech workers' attitudes towards mental health and the prevalence of mental illnesses in the industry. We will use Python and SQL to clean and process the data to ensure its accuracy and reliability.

Research Objectives

Analyze the Structure of Survey Questions

Understand Worker Perspectives on Mental Health

Explore the Frequency of Mental Health Disorders among Tech Professionals

Assess Survey Comprehensiveness

Hypotheses

EDA Questions

We will start by loading the data into a Pandas dataframe and performing exploratory data analysis.

This will include creating statistical summaries and charts, testing for anomalies, checking for correlations and other relations between variables, and other EDA elements.

We will provide clear explanations in our notebook to inform the reader what we are trying to achieve, what results we got, and what these results mean.

We will also provide suggestions about how our analysis can be improved.

Data Loading and Inspection:

We can begin by loading the data and then examining the structure of each table, the type of data it contains, and identifying any missing values that may require attention.

Initial thoughts, the relationships between 'SurveyID,' 'UserID,' and 'QuestionID' are maintained, allowing for a clear association between surveys, users, and their responses.

The dataset appears well-prepared and organized, providing a solid foundation for further analysis.

Survey DataFrame

The Survey DataFrame contains information about the surveys conducted by Open Source Mental Illness (OSMI) from 2014 to 2019. It has the following characteristics:

Question DataFrame

The Question DataFrame contains information about the questions asked in the surveys. It has the following characteristics:

Answer DataFrame

The Answer DataFrame contains answers to the questions, including the answer text, survey ID, user ID, and question ID. It has the following characteristics:

The absence of missing values and duplicate rows across all dataframes suggests that the data is complete, unique, and suitable for analysis.

Additionally, the -1 values in the dataset may be an alternative representation of NULL values. We will investigate this further during the exploratory data analysis (EDA) process.

This clean state may minimize the need for extensive data cleaning, making further exploration straightforward, depending on the state and meaning of -1.

Therefore, we can proceed to address the EDA questions, conduct hypothesis testing, and accomplish some of the objectives.

Analyze the Structure of Survey Questions

Understand Worker Perspectives on Mental Health

Specific Questions we will look into:

  1. What is your age? (Q1)
  2. What is your gender? (Q2)
  3. How many employees does your company or organization have? (Q8)
  4. Does your employer provide mental health benefits as part of healthcare coverage? (Q10)

The upward trend of "mental health" keyword mentions, including its variations, exhibited a remarkable rise in 2017, surpassing the data from 2014 by 28% and peaking at 38% in 2018. However, this upward trajectory seems to be reversing, indicating a decline in recent years, as the percentage dropped to 34.6% in 2019. These fluctuations suggest that the topic of mental health has gained significant relevance in recent years.

To gain a deeper understanding of these fluctuations, further exploration is essential to identify potential influencing factors. Analysis of factors such as age, gender, company size, and the availability of mental health resources could uncover the reasons behind this shift.

Even in the face of the surge in "mental health" keyword mentions in 2018, the average age of individuals affected remained remarkably constant at 35 across 2017, 2018, and even increased slightly to 35.5 in 2019.

This suggests that individuals experiencing mental health issues may be becoming more resilient and mature as they age, possibly due to improved awareness and treatment options.

Moreover, the declining trend in keyword usage during these years could indicate that some individuals are effectively managing their mental health challenges.

In light of these observations, we now turn our attention to gender to determine whether males, females, or other groups were more likely to use the mental health keyword in their responses.

While the gender distribution of respondents is predominantly male, with over 60%, followed by women and others, it is noteworthy that the percentage of women participating is increasing alongside the trend of "mental health" keyword usage.

In 2014, only 19.6% of respondents were women, and by 2018, this percentage had increased to 30%, suggesting a potential association between female gender and increased mental health concerns.

However, this trend appears to reverse when female participation declines. In 2019, female participation decreased by 2.2%, from 30% in 2018 to 27.8% in 2019.

Furthermore, we can proceed to examine whether the impact is related to the company size and its resources towards addressing mental health challenges.

We can observe that when mental health terms and its variations were used the most, smaller companies were more involved.

Suggesting, that perhaps the workload is way higher there and stress levels are rising.

Even though the number of survey participants is decreasing, we notice that fewer people discuss mental health in larger, well-established companies with employee counts of 1000 or more.

Therefore, we need to examine the trend of mental health resources available for employees. However, before we do that, it would be beneficial to investigate the relationship between CompanySize and AttitudesCount.

This will help us understand how attitudes towards mental health vary across different company sizes, which could either confirm or deny our observations.

Our observations indicate that smaller companies tend to discuss mental health more openly than larger companies, particularly during periods when mental health terms are more prevalent in the surveys.

This suggests that smaller organizations may experience higher workloads and stress levels, leading to a greater need for open dialogue about mental health concerns.

In contrast, despite having fewer survey participants, larger companies with over 1000 employees exhibited a lower frequency of discussions about mental health.

This finding raises questions about the availability of mental health resources and support systems in these larger organizations.

Furthermore, we check what percentage do not know about the availability of mental resources.

We can observe that the percentage of people who do not know about the existence of mental health resources in their company fluctuated between 32.4% in 2014 and 23.5% in 2018.

Furthermore, the analysis shows that a greater number of larger companies (with over 1000 employees) corresponded with a higher availability of mental health resources.

This suggests that larger organizations may have more resources available to support their employees' mental health.

Summary of Worker Perspectives on Mental Health analysis

The discussion of "mental health" and related terms experienced a significant surge in 2017 and 2018, with corresponding increases of 34% and 38%, respectively, indicating a growing relevance of the topic.

However, the trend reversed in 2019 to 34.65%, suggesting a declining interest.

Despite the peak in keyword usage in 2018, the average age of affected individuals remained stable of 35 years old, indicating potential resilience and improved awareness.

The gender distribution shifted, with an increasing percentage of women participating alongside the rise in mental health discussions.

However, female participation declined in 2019, suggesting a potential correlation between the keyword usage and female participation, can be analyzed further.

Smaller companies engaged more in mental health discussions during peak periods, possibly due to higher workloads.

Larger companies, despite fewer participants, discussed mental health less, raising questions about resource availability.

Larger companies with over 1000 employees demonstrated a higher availability of mental health resources, suggesting better support systems.

Explore the Frequency of Mental Health Disorders among Tech Professionals

Investigate the prevalence and changes in mental health disorders among tech professionals over time.

Specific Questions we will look into:

  1. Have you had a mental health disorder in the past? (Q32)
  2. Do you currently have a mental health disorder? (Q33)
  3. Have you ever been diagnosed with a mental health disorder? (Q34)

Mental health disorders reached a peak in 2018, with 57% of respondents indicating 'yes.'

However, there was a reversal in 2019, with only 51.5% responding 'yes.'

This trend aligns with the increase in the percentage of female participants and the corresponding percentage of reported mental health disorders.

This suggests that females may be more open about mental health issues, or there may be a general trend in this direction.

Therefore, it would be worthwhile to investigate the correlation between response count and disorder count to determine if there is any significant relationship.

Finally, it would be valuable to explore in future research whether female participants are genuinely more open about mental health or if they experience higher levels of workplace stress due to gender-related issues and stereotypes.

The correlation between the percentages of reported mental health disorders and the availability of mental health resources is very close to zero.

This means that, based on the data we have, there is no clear pattern or trend suggesting that when the percentage of reported mental health disorders increases or decreases, the percentage of available mental health resources tends to do the same.

In simpler terms, these two factors don't seem to be strongly connected in a straightforward, predictable way.

It iss important to keep in mind that correlation doe not tell us about causation, and other factors not considered in our analysis might play a role.

The lack of a strong correlation suggests that, at least in a linear sense, these particular aspects of mental health disorders and resource availability do not show a clear relationship in our dataset.

Summary of the Frequency of Mental Health Disorders among Tech Professionals

The analysis shows that there were significant changes in the number of people who reported having mental health disorders over the years studied.

The number peaked in 2018 (57%) and then decreased in 2019 (51.5%).

This trend coincides with an increase in the number of female participants, and a corresponding increase in the percentage of them reporting mental health disorders.

These findings raise questions about whether this shift is due to increased openness among females or reflects broader societal trends, possibly linked to workplace stress related to gender issues.

Correlation analysis shows that there is a very weak correlation between reported mental health disorders and the availability of mental health resources.

This suggests that there is no clear linear relationship between these two factors. It is important to remember that correlation does not equal causation.

The lack of a strong correlation highlights the need for a more in-depth understanding of the factors that influence mental health trends in this dataset.

Further research is needed to identify additional factors that contribute to the observed patterns.

Assess Survey Comprehensiveness

The output we have is a list of topics that the LDA model has identified in survey questions, with each topic represented by its top 5 words. By analyzing these words, we can see that the survey should cover the following topics:

Topic 1 seems to be about mental health resources provided by the employer.

Topic 2 might be about previous experiences with mental health in the workplace.

Topic 3 could be about conversations around mental health at work.

Topic 4 appears to be about potential mental health issues in the workplace.

Topic 5 might be about wellness programs or initiatives related to mental health.

Based on this model and the questions we have analyzed, we can confidently say that we have identified the most crucial questions that directly impact employee mental health.

Project Summary

Research Objectives

Analyzing the Structure of Survey Questions

Understanding Worker Perspectives on Mental Health

Exploring the Frequency of Mental Health Disorders among Tech Professionals

Assessing Survey Comprehensiveness

Hypotheses

EDA Questions

Findings

Worker Perspectives on Mental Health

Frequency of Mental Health Disorders

Survey Comprehensiveness

Future Improvements

  1. Diversity and Inclusion: Explore opportunities to enhance diversity and inclusion by incorporating additional demographic questions that capture a broader range of identities, such as ethnicity, sexual orientation, and disability status.

  2. Geographical Specifics: Extend geographical data collection by incorporating more detailed location-related questions. This could include regional information, allowing for a more granular analysis of mental health trends based on location.

  3. Remote Work Impact: Further investigate the impact of remote work on mental health by refining questions related to remote work conditions, challenges, and support. This could provide insights into the unique challenges faced by remote workers.