South Korea COVID-19 Exploratory Data Analysis
In this project, I analyze COVID-19 data for South Korea using Python libraries such as
Pandas, NumPy, Scipy, and Plotly. The study looks at how cases are spread out, details about patients, trends
over time, demographic information, and government measures. I’ve also created interactive graphs with Plotly to
visualize different aspects of the pandemic in South Korea.
Summary / Findings
The analysis of COVID-19 data for South Korea brings to light important findings. Daegu
is the hardest-hit region, with Nam-Gu in Daegu being the most affected city. Group infections, often linked
to international travel or contact with infected people, are a major source of the high case count. The
PatientInfo dataset shows a majority of female patients, cases tied to international arrivals, and a pattern
of transmission through contact within close communities.
Data over time reveals a surge in confirmed cases in February 2020, followed by a steady
rise in recoveries and a relatively constant death rate. The TimeAge dataset indicates an increase in
confirmed cases across all age groups, with those in their 20s being the most affected. The TimeGender dataset
shows more confirmed cases and a higher mortality rate among men. The TimeProvince dataset identifies Daegu as
the epicenter of the outbreak. The Region dataset shows variations in the proportion of the elderly population
across different provinces.
The Weather dataset suggests a slight positive correlation between average temperature
and case numbers. The SearchTrend dataset indicates a peak in public interest in “Coronavirus” around March
2020. After applying logarithmic transformations, the SeoulFloating dataset reveals fluctuations in the
floating population of Seoul. Finally, the Policy dataset outlines government policies aimed at curbing the
spread of the virus, protecting public health, and supporting individuals and businesses impacted by the
pandemic.
Data Exploration / Analysis