Project Overview
This project applies interpretable machine learning techniques to predict employee attrition using logistic regression, with a focus on transparency and stakeholder communication. Both global and local model explanations are included, leveraging model coefficients and SHAP values.
Objective
Predict which employees are at risk of leaving the company and understand why, using:
- Logistic Regression for interpretability.
- SHAP Values for individualized explanations and global patterns.
- A structured, end-to-end ML pipeline with preprocessing and evaluation.
1. Logistic Regression Coefficients (Global Explainability)
Logistic regression provides a direct mapping between feature values and their contribution to the log-odds of attrition.
- Positive coefficients increase attrition risk.
- Negative coefficients reduce it.
Key Highlights
Strongest Positive Predictors:
EducationField_Technical Degree,JobRole_Research Scientist, andBusinessTravel_Non-Travel.Strongest Negative Predictors:
JobRole_Healthcare Representative,JobRole_Manager, andBusinessTravel_Travel_Rarely.Minimal Impact Features:
Features likeGender,Education, andJobRole_Human Resourcesshowed negligible coefficient values.
2. SHAP Value Interpretation (Global + Local Explainability)
SHAP values explain predictions on a per-observation basis and provide model-agnostic insights.
- Each dot shows how a feature contributed to an individual prediction.
- Color indicates feature value (blue = low, red = high).
- Horizontal position reflects direction/magnitude of influence.
Key Global Insights (SHAP)
Most Influential Features:
NumCompaniesWorked,TotalWorkingYears,YearsWithCurrManager, andEnvironmentSatisfaction.Contrast With Coefficients:
Some features with low coefficients (e.g.NumCompaniesWorked) had high SHAP impact—emphasizing their interaction effects or conditional relevance.
3. Final Thoughts
- Coefficients tell us how the model is built.
- SHAP tells us how the model behaves.
- Together, they give a full picture: the “rules” (coefficients) and the “realities” (SHAP).
This project demonstrates the value of combining linear interpretability with model-agnostic explanation tools to surface actionable insights in HR analytics.
Repository Structure
.
├── data/
│ ├── raw/
│ └── processed/
├── models/
│ └── final_model_pipeline.pkl
├── notebooks/
│ ├── 01_eda.ipynb
│ ├── 02_preprocessing.ipynb
│ ├── 03_modeling.ipynb
│ ├── 04_explainability.ipynb
│ └── 05_final_report.ipynb
└── README.md
Next Steps
- Add SHAP-based cohort profiling for team-level analysis.
- Implement visual dashboards using Streamlit or Tableau.
- Extend analysis with Random Forest or XGBoost for performance benchmarking.