AI in Human Resources

In this article, for beginners in machine learning, you will learn how to use the machine learning (ML) tools in the AI-TOOLKIT to make difficult HR decisions automatically. In this simple example we will train an ML model which can be used to predict if an employee will leave the company. We could use the same principles to predict the reason of leave or if it is worthwhile to offer a promotion to an employee. The article will also explain and compare some of the ML models available in the AI-TOOLKIT.

You can apply the same principles to any other sector or business case, for example, you could predict if a client will leave, why it will leave, or if it is worthwhile to offer a discount, etc.

The Dataset

The dataset contains 15,000 rows (records) and 10 columns (variables or features). You can download the data at the end of this article. If you are doing this in your company you should first study which variables influence the most the specific business case, in this example an HR problem, and select the variables accordingly. We call this step Feature Engineering. Selecting not enough or too many variables or features (not enough knowledge or unneeded noise) will result in a less useful or less accurate ML model. The accuracy of the trained ML model depends mainly on the input data (quantity and quality) and also on the parameters of the models.

The 10 columns (features) are as follows:

Satisfaction Level (0-1)
Last evaluation (0-1)
Number of projects (integer)
Average monthly hours (integer)
Time spent at the company (integer)
Whether they have had a work accident (0-no, 1-yes)
Whether they have had a promotion in the last 5 years (0-no, 1-yes)
Department name (text)
Salary (text: low, medium, high)
Whether the employee has left (0-no, 1-yes)

Depending on which variable (column) you choose as decision variable you can train a model for different purposes, for example, to predict whether the employee will leave in the future, whether it is worthwhile to offer a promotion, etc.

In this example we will choose the ‘Left’ (whether the employee has left) column as decision variable in order to predict if an employee will leave or not.

satisfaction level	last evaluation	number of projects	average monthly hours	time spend company	work accident	left	promotion last 5 years	sales	salary
0.38	0.53	2	157	3	0	1	0	7	1
0.8	0.86	5	262	6	0	1	0	7	2
0.11	0.88	7	272	4	0	1	0	7	2
0.72	0.87	5	223	5	0	1	0	7	1
...	...	...	...	...	...	...	...	...	...

Training the AI Model

There are different types of machine learning models available in the AI-TOOLKIT. Each model/algorithm has its advantages and disadvantages. Some algorithms are well suited for one type of data but not for another type of data. Neural network based models can be tuned so that they can be applied to all kinds of problems, but with the cost of complexity (often with many layers of different type and with many nodes) and processing speed (more layers and nodes mean more processing time and more computer resources). Furthermore, neural networks also need much more data than other types of machine learning models. Therefore, it is worthwhile to choose the machine learning model you want to use in a clever way!

Let us choose the SVM model for this example.

Support Vector Machine (SVM) model

You can easily import your numerical delimited data into the AI-TOOLKIT. The SVM model has several parameters, which can be automatically optimized by the built-in parameter optimization module.

Follow the next steps in order to train the ML model:

Create a new AI-TOOLKIT project (Open AI-TOOLKIT Editor + New Project).
Insert the SVM model template (Insert ML Template + choose Supervised Learning + Support Vector Machine).
Save the project.
Download the data (at the end of the article) and change the extension to ‘.tsv’. Import the data into a new AI-TOOLKIT database (On the DATABASE tab: Import Data Into Database + follow the instructions on the screen. It is important that you indicate the correct number of header rows (non-numerical) and the zero based index of the decision column (6 in this example)). Use as table name: ‘hr_data’.
Save the database into the same folder as the project is saved. Use the name ‘hr.sl3’.
Run the SVM parameter optimization module to find the optimal parameters (SVM Parameter Optimizer on the AI-TOOLKIT tab). You may stop the optimization earlier if you see a high enough accuracy or just skip the optimization and use the values shown below.
Adjust the SVM model template as shown below (some of the unneeded parameters and comments are not shown). The optimal parameters are filled in.

model:
   id: 'ID-EFnMmvBNWr'
   type: SVM
   path: 'hr.sl3'
   params:
       - svm_type: C_SVC
       - kernel_type: RBF
       - gamma: 15.0
       - C: 281.8
   training:
       - data_id: 'hr_data'
       - dec_id: 'decision'
   test:
       - data_id: 'hr_data'
       - dec_id: 'decision'
   input:
       - data_id: 'input_data'
       - dec_id: 'decision'
   output:
       - data_id: 'output_data'
       - col_id: 'decision'

Save the project.
Train AI model (AI-TOOLKIT tab).

After the training is ready you will see the performance evaluation results:

Performance Evaluation Results

Confusion Matrix [predicted x original] (number of classes: 2):

(0) (1)

(0) 11427 0

(1) 1 3571

Accuracy 99.99%

Error 0.01%

C.Kappa 99.98%

(0) (1)

Precision 100.00% 99.97%

Recall 99.99% 100.00%

FNR 0.01% 0.00%

F1 100.00% 99.99%

TNR 100.00% 99.99%

FPR 0.00% 0.01%

The accuracy of the trained model is very good (nearly 100%). The trained model only makes one mistake in 15,000 cases. In this example we will not go more in detail about all performance measures and discuss the so called generalization error (testing with unknown data) because this is not the aim of this simple example.

DeepAI Educational Neural Network Model

The deep neural network model in DeepAI Educational is based on a semi-automatic multi-layer and multi-node neural network implementation. The software designs the neural network semi-automatically, you only need to define the number of layers and nodes per layer (you can of course adjust some more parameters but this is most of the time not necessary). DeepAI Educational does not use complex state of the art neural network architectures and extensive model performance evaluation, but it often provides a good result. For real world problems use the machine learning models and tools in AI-TOOLKIT Professional.

DeepAI uses the SSV data file format (delimited text file). Adjust the settings in the ‘Settings/AI’ tab according to the following if needed:

Number of iterations: 10
Learning rate: 0.01
Regularization rate: 0.001
Batch size: 10
Activation Function: TANH
Regularization Function: NONE
Test data %: 10
Treat data as X-Y Classification / Regression

Download the data (at the end of the article). You can load an external training data file with the 'Load Data File (SSV)' command. The data must have the AI-TOOLKIT SSV data file format (.ssv), which is tab delimited, without a header row, contains only numbers and with the decision variable (classes in case of classification, continuous numbers in case of regression) in the first column.

Since the decision variable must be in the first column for DeepAI we need to open the data file in MS Excel and move the decision column (‘Left’) to the first column. We also must remove the first header row! After you are ready save the file in tab delimited format and with ‘.ssv’ extension.

Use the ‘Load Data File’ command and load the above prepared data file. DeepAI will automatically design a neural network for the data file. This neural network will provide good results but let us add an extra layer (4 layers in total), adjust the number of nodes to 24 on the second layer and 10 on the third layer. The first and the last layers have a fixed number of nodes depending on the input functions and the output (1).

Change the number of iterations to 480 and start the training process with the Run command. After a while (2-3 min) the results will appear which indicate 98.3 % accuracy for the training data. You can still fine tune the model and obtain a higher accuracy but this is not a fast and simple process. Fine tuning a neural network is a tedious and often long lasting process (adjusting the number of layers, adjusting the number of nodes per layer, adjusting the learning rate, the activation function, etc.). It is also not sure that more layers and nodes will provide better results but you will need to find the optimal solution also depending on the other parameters.

You can use the trained ML model for making automatic and precise decisions about this HR problem.

References

The Application of Artificial Intelligence, Zoltan Somogyi.
HR Analytics Dataset: Attribution-Share Alike 4.0 International (CC BY-SA 4.0) license, Source: https://www.kaggle.com/ludobenistant/hr-analytics.
You can download the dataset in MS Excel format here: HR_COMMA_SEP_U.XLS

Learn about the application of Artificial Intelligence and Machine Learning from the book "The Application of Artificial Intelligence | Step-by-Step Guide from Beginner to Expert", Springer 2020 (~400 pages) (ISBN 978-3-030-60031-0). Unique, understandable view of machine learning using many practical examples. Introduces AI-TOOLKIT, freely available software that allows the reader to test and study the examples in the book. No programming or scripting skills needed! Suitable for self-study by professionals, also useful as a supplementary resource for advanced undergraduate and graduate courses on AI. More information can be found at the Springer website: Springer book: The Application of Artificial Intelligence.

The Application of Artificial Intelligence | Step-by-Step Guide from Beginner to Expert

Contact

Have a general inquiry?

Contact our team.

Performance Evaluation Results
Confusion Matrix [predicted x original] (number of classes: 2):
	(0)	(1)
(0)	11427	0
(1)	1	3571

Accuracy	99.99%
Error	0.01%
C.Kappa	99.98%

	(0)	(1)
Precision	100.00%	99.97%
Recall	99.99%	100.00%
FNR	0.01%	0.00%
F1	100.00%	99.99%
TNR	100.00%	99.99%
FPR	0.00%	0.01%