Vytautas Bunevičius

Travel Insurance Prediction: Who's Buying It?

I dug into a dataset from a Tour & Travels Company to figure out which customers are likely to grab their travel insurance package—complete with Covid cover. Using 2019 data on about 2,000 customers, I built a model to predict who's in and who's out, based on stuff like age, income, and travel habits.

Travel Insurance Analysis Project Header

What I Did

Data Dive: Loaded up a dataset with customer details—age, job type, income in Euros (converted from rupees), family size, and more. Cleaned it up, tossed duplicates, and peeked at patterns with charts.
Stats Check: Ran some tests (like Chi-Square and Mann-Whitney U) to spot what drives insurance buys. Income and travel experience stood out big time.
Model Building: Tested four models—Logistic Regression, Random Forest, Gradient Boosting, and SVM. Tweaked them with random search to boost recall (catching more buyers), then mixed them into an ensemble for fun.
Results: Random Forest came out on top. Tuned version hit a solid ROC AUC of 0.70, and the recall-optimized one nailed 100% recall—catching every buyer, though with some extra noise.

How It Played Out

Data Prep: Started with 1,987 records, but 738 were repeats—yep, 37% duplicates. Dropped those, leaving 1,249 unique customers. Converted rupees to Euros (1 INR = 0.011 EUR) for a European vibe. No missing values, which was nice. Age ranged 25-35, income €3,300-€19,800.
Exploration: Plotted histograms and bar charts. Age peaked around 28-29 and 32-33; income had clumps at €6,600 and €13,200. Most folks were private sector workers, grads, and not big flyers. Stats showed income, frequent flying, and overseas trips tied strongly to buying insurance.
Model Run: First pass, Gradient Boosting led with 0.76 accuracy, but recall was meh (0.49). After tuning for recall, Random Forest shone—hit 100% recall with a 0.70 ROC AUC. The ensemble was decent too (0.68 ROC AUC). Trade-off? More false positives to catch everyone.
Key Finds: Income's the star—higher earners buy more. Frequent flyers and globetrotters are in too. Age and family size matter a bit, but education and chronic diseases? Not so much.

The Details

Data: Post-cleanup, 1,249 customers. Features like age (mean 29.8), family size (2-9), chronic diseases (33% yes), and insurance uptake (39% yes).
Stats: Income difference was huge (p < 0.001, CI €11,381-€12,116 for buyers vs. €9,100-€9,608 for non-buyers). Frequent flyers (p=0.000009) and abroad travelers (p=10^-26) were way more likely to buy.
Model Specs: Random Forest won after 50 tuning rounds—max_depth 12, 193 trees, balanced weights. Recall-optimized threshold dropped to 0.0188 for 100% recall. Precision took a hit (0.40), but that's the cost of not missing anyone.
Challenges: Duplicates were a mess—37% is wild. Small sample after cleanup (1,249) limited some stats power. Recall focus meant more false positives.

What's Next

More data would help—1,249's a start, but thin for rare cases.
Could tweak thresholds based on business goals—fewer false positives if precision matters more.
Adding features like trip frequency or destination might sharpen it.

Why It's Cool

This model flags who's likely to buy travel insurance—great for targeting marketing without bugging everyone. Random Forest gives a solid balance, or the recall-optimized version catches all potentials if you're okay with extra follow-ups. It's not a crystal ball, but it's a handy tool for the company to boost sales smarter. Code's on my GitHub if you want a look!

What the Data Showed