Unveiling Listener Preferences through Podcast Reviews
This project involves the analysis of podcast review data using Python and SQLite. The
dataset consists of 2 million reviews for 100,000 podcasts. We will conduct exploratory data analysis (EDA) to
uncover insights into podcast popularity, review sentiments, category trends, and more. The analysis will
utilize Python libraries such as Pandas, Matplotlib, and Plotly for data manipulation and visualization.
Summary / Findings
First, we cleaned the data by removing entries in languages other than English,
converting all text to lowercase, and removing unnecessary punctuation and stop words. We also added a new
column to the data recording the length of each review.
One way we measured podcast popularity was by looking at the number of reviews each one
received. The podcast "Crime Junkie" had the most reviews, but interestingly, we found no clear connection
between the number of reviews and the average rating of a podcast. We also examined the relationship between
the length of a review and its rating. There was a weak negative correlation, meaning that slightly longer
reviews tended to have slightly lower ratings, but this connection was not very strong.
The majority of reviews (86.66%) gave podcasts a perfect rating of 5, indicating a high
level of listener satisfaction. Ratings of 4, 3, 2, and 1 were significantly less common. We also performed
sentiment analysis to identify common words and phrases used in positive and negative reviews. Negative
reviews often used words like "podcast," "like," and "listen" in ways that expressed dissatisfaction, while
positive reviews frequently included words like "love" and "great," suggesting enjoyment and appreciation.
Looking at podcast categories, "Society & Culture" had the most reviews, and these were
mostly positive. In contrast, categories like "TV & Film" and "Sports" had a wider range of ratings. We also
investigated trends over time, finding that the number of reviews increased over time, peaking in June 2020.
The average rating followed a similar trend, peaking in 2018 before declining slightly. When we looked at
variations by month, we found that January had the highest number of reviews and December had the lowest.
Similarly, February had the highest average rating, while December had the lowest.
Finally, we analyzed the behavior of individual reviewers. There was a significant
difference in how often different authors wrote reviews. Interestingly, four of the most active reviewers
consistently gave podcasts perfect scores, while another reviewer gave slightly lower ratings (averaging
around 4.04). Overall, the top 5 most active reviewers all had high average ratings, suggesting that they
generally had a positive view of the podcasts they reviewed. We also found that these reviewers had
preferences for certain categories, with "Business" being the most common category they reviewed, followed by
"Comedy" and "Education." However, they still reviewed podcasts from a wide range of categories.
Data Exploration / Analysis