Workforce Optimization for Bank Operation Centers: A Machine Learning Approach

This paper introduces an approach to plan workforce in bank operation centers based on forecasting workload with a machine learning algorithm and analyzes the workforce optimization on production data. Also, a generic workload forecasting system is Abstract Online Banking Systems evolved and improved in recent years with the use of mobile and online technologies, performing money transfer transactions on these channels can be done without delay and human interaction, however commercial customers still tend to transfer money on bank branches due to several concerns. Bank Operation Centers serve to reduce the operational workload of branches. Centralized management also offers personalized service by appointed expert employees in these centers. Inherently, workload volume of money transfer transactions changes dramatically in hours. Therefore, workforce should be planned instantly or early to save labor force and increase operational efficiency. This paper introduces a hybrid multi stage approach for workforce planning in bank operation centers by the application of supervised and unsupervised learning algorithms. Expected workload would be predicted as supervised learning whereas employees are clustered into different skill groups as unsupervised learning to match transactions and proper employees. Finally, workforce optimization is analyzed for proposed approach on production data.

developed to involve different transaction types or business fields. The worked on data obtained from Isbank's operation center date between 2012, Jan 01 -Present. The historical data includes transaction date time and transaction volume information.

II. Background
Human based experience and intelligence can be replaced by artificial intelligence and expert systems in many areas with the improvements and findings in these areas within the last decades [10] [11]. A special area of artificial intelligence which is mostly based on statistics is the machine learning. This discipline is strong about modelling NP type problems.
In machine learning problems, a mathematical function is modeled by given historical data examples and the obtained learned function forecasts the outputs of future examples without known outputs. The function is derived from the affecting factors of the problem that would be predicted. Determining affecting factors specifies the complexity of function and this is the key factor of model success. If the function is not complex enough to cover the state space of the problem, there would be underfitting problem. This means that the function would not even successfully forecast for the given historical examples. Function should be re-modeled again as to become more complex for this case. In contrast, there would be overfitting problem if the function is too complex. This means that function would forecast successfully for given historical examples but could not predict successfully for unknown examples with large computational and time complexity.
For the case of machine learning adaptation and usage for the expert system of the bank operational centers, there are several factors affecting on transaction count that have already been discovered. Firstly, by the nature of business operations, day of week is one of the most important factors affecting transaction count as demonstrated (see Fig. 1). Transaction count peaks up on Friday and Monday due to the weekends. Moreover, if the public holidays shift the first or last work day of week, transaction counts of the following days are affected dramatically. That's why; Boolean first or last work day parameter should be additionally included in input layer. Furthermore, half work days shift every year because of the hijri calendar. That's why; Boolean half day parameter is added in input.
Secondly, Morning hours have low transaction volume whereas dinner hours bottom out. There is an obviously seen trend on evening hours where transaction count peaking up as illustrated (see Fig. 1). Furthermore, transaction counts show a change depending on month of year as shown in Fig. 2. Also, day of month causes to change in transaction count. Generally, customers tend to transfer money on beginning, ending and the middle of the month as shown in Fig. 2.  Thirdly, yearly deviation is thought to be useful as input to catch the trend. Deviation is calculated by the difference between current and previous year's average transaction count for 10 days period. Finally, transaction count of the previous hours should be included in the input to retrieve future values in time series problems. Thus, transaction counts of previous three hours (h-1, h-2, and h-3) are included into the network. Finally, the model is based on aiming to retrieve transaction count of hour h. Thus, the output of the network should be transaction count (h). Correlation indicates the strength of relationship between two variables. It ranges from -1 to +1. Coefficient closes -1 or +1 for strongly related datasets. Sign of the coefficient states the direction of relation. Neutral means no relationship between datasets. None of the correlation coefficient of variable closes to ±1. This means there is no directly relation between these variables and transaction count.

III. Motivation
A basic neural network cell has ability to learn, remember and predict. A neuron consists of multiple inputs and an output as illustrated in Fig. 3. Each input (x) would be involved in network through own weight (w) which specifies the strength of input on output. Learning is provided by adjusting the weight values positively or negatively. Assembly function calculates the net input (o) which is derived from the sum of the multiplying the inputs and their own weights. Activation function (commonly sigmoid function) computes the net output (y). Finally, the output of the neuron is calculated by the formulas (1) and (1) A complex neural network system consists of multiple neuron cells. It provides a satisfactory way to forecast and predict. Designing input -output parameters and modeling neural network are dramatically important to have successful results.
The challenge in this study is predicting transaction volumes. In other words, it is a regression problem. Neural networks can be applied on both regression and classification studies. On the other hand, linear regression can be applied for regression studies but this algorithm is not convenient for non-linear time series. What's more, some statistical methods can be applied on non-linear time series such as exponential smoothing methods but these models are based on discovering trends and seasonal effects on data set. What if some activities might repeat irregularly (e.g. random day in a week) but these activities may have high effect on result such as sport activities or religious days. In this case, statistical approach might fail whereas neural networks can be successful if event days are defined in input parameters. That's why, neural networks and exponential smoothing methods are applied on the problem in this study.
Network design plays a pivotal role to have successful results. Three layered network is modeled with node numbers 27, 18, 1. Firstly, nodes in input layer correspond to the variables illustrated in Table I. Secondly, the number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer as Heaton declared [12]. That's why, there are 18 nodes in hidden layer. Thirdly hidden nodes are connected with an output layer. Finally, output node calculates expected transaction count.
Additionally, sigmoid function is selected as activation function and back propagation algorithm and stockastic gradient descent is applied to implement learning. Furthermore, some configuration parameters of the network model is mentioned in the Table II. So, neural network models is built in 260 seconds for following parameters and historical data consisting of 7K instances. Network is built on a machine with following capabilities: Core i7 CPU, 16 GB RAM and 64 bit OS.  Fig. 4 for a day and Fig. 5 for overall. Also, the same historical data set is modeled with triple, double and single exponential smoothing methods and the model success is compared. It seems the neural network model forecasts much more successful than the others. Mean Absolute Error and Correlation metrics are calculated to evaluate the performance of the system. Suppose that p is the prediction set and a is the actual set. Performance of the system would be calculated by formulas (3) and (4). (3)

V. Unsupervised Learning
There are class labels for dataset in supervised learning. Machine learning algorithms create a generalized function from historical data and this function predicts the class labels for unknown examples. In this case, predicted classes could be compared with actual classes and the success performance of the function could be calculated. In contrast, there are not class labels for dataset in unsupervised learning. In this case, dataset could be grouped into different clusters. Newly examples are assumed to be members of these created clusters. However, function performance could not be calculated for unsupervised learning problems. There are commonly known algorithms such as k-means or c-means to implement unsupervised learning.
Satisfactory results are retrieved with neural networks as shown is previous section. The requirement is matching expected work with the correct employees to optimize the problem.
Employees should be evaluated by their skills and the individual performance should be considered while distributing work to have more sensitive results. In this section, employee skills are examined by unit performing time of a work and the average count of completed works by itself on an hour period. Two different approaches are proposed for workforce planning. Firstly, aggressive mode is proposed to handle high transaction volumes to reduce the queue immediately. Secondly, moderate mode is proposed to encourage personal development.
Applying aggressive or moderate modes should be decided by expected transaction volumes. Fig. 6 is retrieved when the employee skills are analyzed between 06/27/14 -08/27/14 for 265 employees. Every node represents an employee and every shape (star, square and triangle) stands for different skill groups. Unsupervised k-means algorithm is applied to cluster employees. Simply, employees are clustered into three different sets depending on their skills. Star shaped employees (Cluster 1) seem to be in the highest performance work group. They complete large number of work in a short time period.
In aggressive mode, distributing work should begin with employees in cluster 1; continue with square shaped employees (Cluster 2), triangle shaped employees (Cluster 3) respectively. Suppose that employee array is ordered by cluster priority, sorted with respect to the unit count column from greatest to smallest and also sorted with respect to the unit perform time column in seconds from smallest to greatest respectively. Proposed employee assignment process is illustrated in Fig. 7 as pseudo code. PN is expected transaction count in next hour; PQ is waiting transaction count on queue from previous hours. The algorithm basically proposes to start with reserving the most powerful employee and repeat it until expected work handled. If the expected work could not be handled with all employees, then exit iteration.  Needed workforce should be computed by average completed work count of each employee instead of unit perform time. In this way, delays between works would not be ignored. This approach guarantees to reduce the queue immediately. However, talented employees would handle more transactions in this approach. Alternatively, fairer moderate mode is proposed for workforce planning.
In moderate mode, employees are group into different teams as demonstrated in Table IV. Cluster 1 consists of 53 employees, and Cluster 2 and Cluster 3 both consist of 106 employees. Each team would consist of a Cluster 1 member, two Cluster 2 members and Cluster 3 members. In this way, there would be 53 different and equal talented teams and each team consists of 5 members. Moreover, Cluster 1 employees complete almost 15 transactions per hour whereas Cluster 2 and Cluster 3 members complete 9 and 5 transactions respectively. In other words, each team could handle averagely 43 transactions per hour. Furthermore, SLA time is 90 minutes for money transfer transactions. In other words, 2/3 the size of expected work should be completed for each hour. Besides, assigned teams should be membered 2 / (3x43) the size of expected transaction count and each transaction should be assigned to a team randomly. A generalized algorithm for moderate type distribution is illustrated in Fig. 8. This approach is seemed to be fairer to aggressive mode and it also encourages the personal development. However, it does not guarantee to reduce the queue immediately. This approach should be applied on low transaction volume.

VI. Workforce Optimization
If the aggressive mode workforce planning method is applied on production data for dates between Sep 01, 2015 and Oct 20, 2015 on operation center, workforce cost would reduce almost 6.5%. Moreover, SLA (promised time to commit the transaction) would decrease from 90 minutes to 60 minutes in aggressive mode; this means work time is optimized 33.3%. This describes the moments proposed operator count larger than reserved operators.  Fig. 9 describes the overall optimization on system for aggressive mode and for moderate mode. The blue lines state the moments proposed operator count less than the reserved operator count, and the red lines express the moments proposed operator count larger than reserved operator count. X-axis corresponds to the work hour instances, and Y-axis corresponds to the workforce optimization for work hour. Also, workforce optimization is calculated by formula (5).
To sum up, planning workforce with less number of employees is possible to handle the more workload as shown in Table V.

VII. System Architecture
The proposed hybrid multilevel expert system is designed as illustrated in Fig. 10. Firstly, historical data should be regularly retrieved and stored. This operation is handled by ETL module. Secondly, machine learning algorithm is run by machine learning module. Training network and predicting is applied on this module. ETL module also provides to be run machine learning module with current data. Thirdly, AI module is responsible for reserving and assigning required workforce based on employee skills and expected workload calculated by ML module. Also, applying moderate or aggressive mode is decided on AI module, too. Then, feedback module validates how correct machine learning module predicts. Thus, network can be retrained if validation fails. After then, these operations are implemented on service oriented architecture and service layer communicates with data layer to collect and store data. Deeply, Generic Machine Learning module provides to create dynamic neural network model for different transaction types. Thus, new business processes or transaction types would be modeled easily in the expert system. Common time based attributes (e.g. weekday, transaction hour) are defined strictly whereas custom exceptional days (e.g. holiday for USD region, religious day) could be defined and involved in neural network model as seen in Fig. 11. At this point, some parameters help to handle time series problems such as has previous hours as input whereas correlative depth parameter indicates how many previous hour would be included in network as input.  Dynamically defined network parameters create a custom neural network model. Thus a generic solution would be provided as seen in Fig. 12. Finally, neural network learning is applied by the pseudo code as illustrated in Fig. 13.

VIII. Conclusion
In this paper, a hybrid multi-level machine learning based expert system approach is introduced to plan workforce management for bank operation centers based on applying supervised and unsupervised machine learning algorithms for forecasting workload and clustering workforce. Chosen supervised machine learning algorithm, neural network, is compared with alternative exponential smoothing algorithms to evaluate how successful results are. Furthermore, workforce optimization is analyzed on production data. Satisfactory results are obtained for both workload forecasting and workforce optimization.
Although, this paper mainly focuses on forecasting transaction volumes of money transfer transactions, a generic architecture is developed to workload forecasting. Thus, newly transaction types or business fields should be easily involved in workload forecasting lifecycle. A similar approach is thought to be adopted in turnover and shift requiring work areas such as Call Centers.