South Korea COVID-19 Exploratory Data Analysis
In this project, I analyze COVID-19 data for South Korea using Python libraries such as Pandas, NumPy, Scipy, and Plotly. The study looks at how cases are spread out, details about patients, trends over time, demographic information, and government measures. I’ve also created interactive graphs with Plotly to visualize different aspects of the pandemic in South Korea.
Summary / Findings
The analysis of COVID-19 data for South Korea brings to light important findings. Daegu is the hardest-hit region, with Nam-Gu in Daegu being the most affected city. Group infections, often linked to international travel or contact with infected people, are a major source of the high case count. The PatientInfo dataset shows a majority of female patients, cases tied to international arrivals, and a pattern of transmission through contact within close communities.
Data over time reveals a surge in confirmed cases in February 2020, followed by a steady rise in recoveries and a relatively constant death rate. The TimeAge dataset indicates an increase in confirmed cases across all age groups, with those in their 20s being the most affected. The TimeGender dataset shows more confirmed cases and a higher mortality rate among men. The TimeProvince dataset identifies Daegu as the epicenter of the outbreak. The Region dataset shows variations in the proportion of the elderly population across different provinces.
The Weather dataset suggests a slight positive correlation between average temperature and case numbers. The SearchTrend dataset indicates a peak in public interest in “Coronavirus” around March 2020. After applying logarithmic transformations, the SeoulFloating dataset reveals fluctuations in the floating population of Seoul. Finally, the Policy dataset outlines government policies aimed at curbing the spread of the virus, protecting public health, and supporting individuals and businesses impacted by the pandemic.
Data Exploration / Analysis