The need for Credit Screening may occur in several circumstances in a Financial Institution (banking, insurance, investment banking, etc.). For example, when a private person wants to borrow money, when a business wants extra credit, as part of a recruitment process, etc. Credit Screening means that the financial institution performs a background check on the applicant to decide whether to approve or reject e.g. the credit request. Such a credit screening involves the collection of a number of attributes which are relevant for making such a decision. Depending on the value of these attributes the financial institution can decide whether to approve or reject the application. Such attributes are e.g., the annual income of the applicant, owned cash and properties, existing loans, former applications history, etc.
This very simple case study will show you how to use a machine learning (ML) model to make credit screening decisions fast and accurate for credit card applications.
The dataset used in this case study contains data collected in a Japanese bank for 653 credit card applications [2]. Each record in the dataset corresponds to an APPROVE or REJECT credit card applicant. A part of the dataset can be seen in the image here under.
Please note that the names and some of the values of the attributes are changed to symbols in order to protect the confidentiality of the bank.
The type and values of the different attributes:
- A1 - Text data type with values: A, B.
- A2 - Number data type with values in the range of: 13.75 – 76.75
- A3 - Number data type with values in the range of: 0 - 28
- A4 - Text data type with values: U, Y, L, T.
- A5 - Text data type with values: G, P, GG.
- A6 - Text data type with values: C, D, CC, I, J, K, M, R, Q, W, X, E, AA, FF.
- A7 - Text data type with values: V, H, BB, J, N, Z, DD, FF, O.
- A8 - Number data type with values in the range of: 0 - 28.5
- A9 - Text data type with values: T, F.
- A10 - Text data type with values: T, F.
- A11 - Number data type with values in the range of: 0 - 67
- A12 - Text data type with values: T, F.
- A13 - Text data type with values: G, P, S.
- A14 - Number data type with values in the range of: 0 - 2000
- A15 - Number data type with values in the range of: 0 - 100000
- A16 - This is the decision variable or class (Text data type) with values: APPROVE, REJECT
After collecting the data the training of the ML model is a very simple process by using the AI-TOOLKIT:
- Download the data and make sure that the file has the ‘.csv’ extension.
- Create a new AI-TOOLKIT project.
- Import the input data into a new AI-TOOLKIT database (DATABASE tab Import Data). Make sure that you choose comma as delimiter, set the index of the decision column to 15 (zero based index), click NEW Database and save the database file, enter the name ‘creditscr’ into the New Table Name field, select the ‘Automatically Convert Categorical or Text Values’ option (right side), click OK on the screen which appears for the categorical value encoding (use default) and finally close the import data screen when the data is imported with success.
- The next step is to choose the machine learning model and insert the chosen template into the project. Let us choose the SVM model (Insert ML Template + Supervised Learning + Support Vector Machine).
- For the SVM model it is important to optimize the input parameters. Use the built-in automatic SVM Parameter Optimizer for this (AI-TOOLKIT tab).
- Adjust the project file as shown below by entering the database file name, the table name and the optimized parameter values:
model:id: 'ID-dNBORxmlJf'type: SVMpath: 'crx.sl3'params:- svm_type: C_SVC- kernel_type: RBF- gamma: 15.0- C: 1000training:- data_id: 'creditscr'- dec_id: 'decision'test:- data_id: 'creditscr'- dec_id: 'decision'input:- data_id: 'input_data'- dec_id: 'decision'output:- data_id: 'output_data'- col_id: 'decision'
- Train the model.
After the model is trained the performance evaluation results will appear:
Performance Evaluation Results
Confusion Matrix [predicted x original] (number of classes: 2):
+ - + 307 0 - 0 383 Accuracy 100.00% Error 0.00% C.Kappa 100.00% + - Precision 100.00% 100.00% Recall 100.00% 100.00% FNR 0.00% 0.00% F1 100.00% 100.00% TNR 100.00% 100.00% FPR 0.00% 0.00%
The performance evaluation of this simple model is excellent with 100% accuracy. This is the accuracy on the training data! In this example we will not go more in detail about all performance measures and discuss the so called generalization error (testing with unknown data) because this is not the aim of this simple example.
There are of course many other applications where a machine learning model can also be used in the Financial sector, as for example in decision making processes similar to the credit card application process, or in other types of risk analyses, in making buy/sell decisions on the financial market, etc.
References
- The Application of Artificial Intelligence, Zoltan Somogyi.
- Japanese Credit Screening dataset, Chiharu Sano.
You can download the dataset here: Japanese Credit Screening dataset.
For in case you are also interested in some BUSINESS PROCESS IMPROVEMENT Cloud computing tools visit the following links:
Learn about the application of Artificial Intelligence and Machine Learning from the book "The Application of Artificial Intelligence | Step-by-Step Guide from Beginner to Expert", Springer 2020 (~400 pages) (ISBN 978-3-030-60031-0). Unique, understandable view of machine learning using many practical examples. Introduces AI-TOOLKIT, freely available software that allows the reader to test and study the examples in the book. No programming or scripting skills needed! Suitable for self-study by professionals, also useful as a supplementary resource for advanced undergraduate and graduate courses on AI. More information can be found at the Springer website: Springer book: The Application of Artificial Intelligence. |