First view on data

## Rows: 1,470
## Columns: 35
## $ Age                      <int> 41, 49, 37, 33, 27, 32, 59, 30, 38, 36, 35...
## $ Attrition                <fct> Yes, No, Yes, No, No, No, No, No, No, No, ...
## $ BusinessTravel           <fct> Travel_Rarely, Travel_Frequently, Travel_R...
## $ DailyRate                <int> 1102, 279, 1373, 1392, 591, 1005, 1324, 13...
## $ Department               <fct> Sales, Research & Development, Research & ...
## $ DistanceFromHome         <int> 1, 8, 2, 3, 2, 2, 3, 24, 23, 27, 16, 15, 2...
## $ Education                <int> 2, 1, 2, 4, 1, 2, 3, 1, 3, 3, 3, 2, 1, 2, ...
## $ EducationField           <fct> Life Sciences, Life Sciences, Other, Life ...
## $ EmployeeCount            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ EmployeeNumber           <int> 1, 2, 4, 5, 7, 8, 10, 11, 12, 13, 14, 15, ...
## $ EnvironmentSatisfaction  <int> 2, 3, 4, 4, 1, 4, 3, 4, 4, 3, 1, 4, 1, 2, ...
## $ Gender                   <fct> Female, Male, Male, Female, Male, Male, Fe...
## $ HourlyRate               <int> 94, 61, 92, 56, 40, 79, 81, 67, 44, 94, 84...
## $ JobInvolvement           <int> 3, 2, 2, 3, 3, 3, 4, 3, 2, 3, 4, 2, 3, 3, ...
## $ JobLevel                 <int> 2, 2, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 1, ...
## $ JobRole                  <fct> Sales Executive, Research Scientist, Labor...
## $ JobSatisfaction          <int> 4, 2, 3, 3, 2, 4, 1, 3, 3, 3, 2, 3, 3, 4, ...
## $ MaritalStatus            <fct> Single, Married, Single, Married, Married,...
## $ MonthlyIncome            <int> 5993, 5130, 2090, 2909, 3468, 3068, 2670, ...
## $ MonthlyRate              <int> 19479, 24907, 2396, 23159, 16632, 11864, 9...
## $ NumCompaniesWorked       <int> 8, 1, 6, 1, 9, 0, 4, 1, 0, 6, 0, 0, 1, 0, ...
## $ Over18                   <fct> Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, ...
## $ OverTime                 <fct> Yes, No, Yes, Yes, No, No, Yes, No, No, No...
## $ PercentSalaryHike        <int> 11, 23, 15, 11, 12, 13, 20, 22, 21, 13, 13...
## $ PerformanceRating        <int> 3, 4, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, ...
## $ RelationshipSatisfaction <int> 1, 4, 2, 3, 4, 3, 1, 2, 2, 2, 3, 4, 4, 3, ...
## $ StandardHours            <int> 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80...
## $ StockOptionLevel         <int> 0, 1, 0, 0, 1, 0, 3, 1, 0, 2, 1, 0, 1, 1, ...
## $ TotalWorkingYears        <int> 8, 10, 7, 8, 6, 8, 12, 1, 10, 17, 6, 10, 5...
## $ TrainingTimesLastYear    <int> 0, 3, 3, 3, 3, 2, 3, 2, 2, 3, 5, 3, 1, 2, ...
## $ WorkLifeBalance          <int> 1, 3, 3, 3, 3, 2, 2, 3, 3, 2, 3, 3, 2, 3, ...
## $ YearsAtCompany           <int> 6, 10, 0, 8, 2, 7, 1, 1, 9, 7, 5, 9, 5, 2,...
## $ YearsInCurrentRole       <int> 4, 7, 0, 7, 2, 7, 0, 0, 7, 7, 4, 5, 2, 2, ...
## $ YearsSinceLastPromotion  <int> 0, 1, 0, 3, 2, 3, 0, 0, 1, 7, 0, 0, 4, 1, ...
## $ YearsWithCurrManager     <int> 5, 7, 0, 0, 2, 6, 0, 0, 8, 7, 3, 8, 3, 2, ...

Correlation plots

##                                 Age    DailyRate DistanceFromHome   Education
## Age                      1.00000000  0.010660943     -0.001686120  0.20803373
## DailyRate                0.01066094  1.000000000     -0.004985337 -0.01680643
## DistanceFromHome        -0.00168612 -0.004985337      1.000000000  0.02104183
## Education                0.20803373 -0.016806433      0.021041826  1.00000000
## EnvironmentSatisfaction  0.01014643  0.018354854     -0.016075327 -0.02712831
## HourlyRate               0.02428654  0.023381422      0.031130586  0.01677483
##                         EnvironmentSatisfaction  HourlyRate JobInvolvement
## Age                                  0.01014643  0.02428654    0.029819959
## DailyRate                            0.01835485  0.02338142    0.046134874
## DistanceFromHome                    -0.01607533  0.03113059    0.008783280
## Education                           -0.02712831  0.01677483    0.042437634
## EnvironmentSatisfaction              1.00000000 -0.04985696   -0.008277598
## HourlyRate                          -0.04985696  1.00000000    0.042860641
##                             JobLevel JobSatisfaction MonthlyIncome MonthlyRate
## Age                      0.509604228    -0.004891877   0.497854567  0.02805117
## DailyRate                0.002966335     0.030571008   0.007707059 -0.03218160
## DistanceFromHome         0.005302731    -0.003668839  -0.017014445  0.02747286
## Education                0.101588886    -0.011296117   0.094960677 -0.02608420
## EnvironmentSatisfaction  0.001211699    -0.006784353  -0.006259088  0.03759962
## HourlyRate              -0.027853486    -0.071334624  -0.015794304 -0.01529675
##                         NumCompaniesWorked PercentSalaryHike PerformanceRating
## Age                             0.29963476       0.003633585      0.0019038955
## DailyRate                       0.03815343       0.022703677      0.0004732963
## DistanceFromHome               -0.02925080       0.040235377      0.0271096185
## Education                       0.12631656      -0.011110941     -0.0245387912
## EnvironmentSatisfaction         0.01259432      -0.031701195     -0.0295479523
## HourlyRate                      0.02215688      -0.009061986     -0.0021716974
##                         RelationshipSatisfaction StockOptionLevel
## Age                                  0.053534720      0.037509712
## DailyRate                            0.007846031      0.042142796
## DistanceFromHome                     0.006557475      0.044871999
## Education                           -0.009118377      0.018422220
## EnvironmentSatisfaction              0.007665384      0.003432158
## HourlyRate                           0.001330453      0.050263399
##                         TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
## Age                           0.680380536          -0.019620819    -0.021490028
## DailyRate                     0.014514739           0.002452543    -0.037848051
## DistanceFromHome              0.004628426          -0.036942234    -0.026556004
## Education                     0.148279697          -0.025100241     0.009819189
## EnvironmentSatisfaction      -0.002693070          -0.019359308     0.027627295
## HourlyRate                   -0.002333682          -0.008547685    -0.004607234
##                         YearsAtCompany YearsInCurrentRole
## Age                        0.311308770        0.212901056
## DailyRate                 -0.034054768        0.009932015
## DistanceFromHome           0.009507720        0.018844999
## Education                  0.069113696        0.060235554
## EnvironmentSatisfaction    0.001457549        0.018007460
## HourlyRate                -0.019581616       -0.024106220
##                         YearsSinceLastPromotion YearsWithCurrManager
## Age                                  0.21651337          0.202088602
## DailyRate                           -0.03322898         -0.026363178
## DistanceFromHome                     0.01002884          0.014406048
## Education                            0.05425433          0.069065378
## EnvironmentSatisfaction              0.01619361         -0.004998723
## HourlyRate                          -0.02671559         -0.020123200

Libraries

  • ggcorrplot, corrplot

Advanced exploratory

Libraries

  • GGally

Categorical variables

Libraries

  • gridExtra

Line graphs

Parallel plots

Parallel coordinates - continuous variable

Parallel coordinates - categorical variables

Independence test of categorical variables

##                    
##                      No Yes
##   Non-Travel        138  12
##   Travel_Frequently 208  69
##   Travel_Rarely     887 156

Here we see, that Attrition is not independent of both OverTime and JobInvolvement. In particular (comparing expected and observed tables), attrition people tend to spend less time over time and are less involved. Surprisingly however, they travel more, than we would expect.

Dimensionality reduction

PCA

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

TSNE

UMAP

Libraries

  • Rtsne, umap, dbscan