Clasification Of Arrhythmic ECG Data Using Machine Learning Techniques

-67Abstract In this paper we proposed a automated Artificial Neural Network (ANN) based classification system for cardiac arrhythmia using multi-channel ECG recordings. In this study, we are mainly interested in producing high confident arrhythmia classification results to be applicable in diagnostic decision support systems. Neural network model with back propagation algorithm is used to classify arrhythmia cases into normal and abnormal classes. Networks models are trained and tested for MIT-BIH arrhythmia. The different structures of ANN have been trained by mixture of arrhythmic and non arrhythmic data patient. The classification performance is evaluated using measures; sensitivity, specificity, classification accuracy, mean squared error (MSE), receiver operating characteristics (ROC) and area under curve (AUC). Our experimental results gives 96.77% accuracy on MIT-BIH database and 96.21% on database prepared by including NSR database also.


I. INTRODUCTION
ne of the ways to diagnose heart diseases is to use Electrocardiogram (ECG) signals. ECG signals are formed of P wave, QRS complex, and T wave. They are designated by capital letters P, Q. R, S, and T. In the normal beat phase of a heart, the main parameters, inspected include the shape, the duration, and the relationship with each other of P wave, QRS complex, and T wave components and R-R interval. The changes in these parameters indicate an illness of the heart that may occur by any reason. All of the irregular beat phases are generally called arrhythmia and some arrhythmias are very dangerous for patient. Some automatic ECG interpreting systems is available. Moreover, the computer-based interpreter systems are currently being developed to diagnose arrhythmia in time, and various methods are applied to these systems with one of them being Artificial Neural Networks (ANN) [1].
In this study, using architecture of multilayered neural network, we performed ECG waveform detection.

II. RELATED RESEARCH WORK
Several methods for automated arrhythmia detection have been developed in the past few decades to attempt simplify the monitoring task [4] [5]. These include Wavelet transformation, RBF Neural Networks, self-organizing map [14] and fuzzy cmeans clustering techniques. Multilayer neural networks have also been used to classify arrhythmia QRS complexes, and for ischaemia detection. Dayong et.al. developed an arrhythmia detection system with ECG signals based on a Bayesian ANN Classifier and its performance is compared with that of other classifiers, specifically Naive Bayes, Decision Trees, Logistic Regression and RBF Networks. A review of classification methods suitable for ECG signals can be found in [6] [7].

III. METHODOLOGY
In this study, the back-propagation learning algorithm is used since it is the most popular supervised learning algorithm [1] .

A. Data preprocessing
In this research, we use MIT-BIH arrhythmia database from physionet [2]. This database contains 48 recordings from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Each record contains two 30-min ECG lead signal, mostly MLII lead and lead V1/V2/V4/V5. The frequency of the ECG data was 360Hz. For this research, we only use 2 channels as our source data. The first step of ECG data preprocessing is baseline noise reduction.
Original ECG contains irregular distance between peaks, irregular peak form, presence of low-frequency component in ECG due to patient breathing etc. To solve the task the processing pipeline should contain particular stages to reduce influence of those factors.   Figure contains raw ECG data, which is unfiltered and contains noise which is required to be removed before further operations) We removed low-frequency component. we applied direct Fast Fourier Transformation remove low frequencies and restore ECG with the help of inverse FFT. (Data which was obtained after applying fast fourier transform, which removes low frequencies components of data, then low frequency components were removed and inverse FFT was applied to get back straightened ECG data) After baseline noise reduction, segmentation of ECG beat was done. In this step, the continuous ECG signals were transformed into individual ECG beats. The width of individual beat was approximated to 300 sample data and the extracted beat is centered around R peak [3] [13]. For this purpose we utilize the annotation provided by the database to do the transformation. We use the R peak annotation as the pivot point for each beat. For each R-peak, we cutoff the continuous signal for each beat start at R-150 pos until R+149 pos, therefore we will get a beat with 300 sample data in width.

B. Feature Extraction
To handle the Multi channel data, summation of data from both channels was done to prepare an input vector as the input to Artificial Neural Network. Thus, in effect reducing the chances of False Positive cases, when exertion may create a abnormal ECG signal in a particular channel [7]. Thus, adding data from both channels minimizes the chances of incorrect identification of Arrhythmia.
In this paper one ECG beat corresponds to one sample of 300 inputs, which covers the whole ECG beat. The inputs for the networks were selected considering two important points [13]: a) The inputs must be of a standard size such that it is neither too small to cover up one ECG cycle and nor two high to increase the number of beats required to analyse the signal, thus increasing the hardware requirements. b) The input must be so arranged that the R peak in the QRS complex must be at the centre of the signal cycle under considerations.
The first condition was achieved by setting up an arbitrary value of 300 samples of MLII lead data obtained from the database in which the 150 samples were on the left side and 149 samples on the right side of the 151st sample value, which in turn is the detected R peak. Now, Similar was done for other lead, and generated samples were added to create an input of 300 samples.
Thus the input becomes a matrix of 300x<no of samples> and ready to be used in MATLAB. The same process was repeated to make all the inputs of all the kinds of beats that are normal, fusion and ventricular premature.
The second condition was achieved by allowing the 150st sample to be the best value of both channels obtained from the database for particular conditions. E.g. If a number (2250-150=) 2100 to sample number (2251+150=) 2400 will be the input data.

C. Traning And Testing
A part of researches in this work is devoted to consideration of different neural networks in order to determine their accuracy in identification and separation of categories or classes.
Among all neural networks Feed Forward back propagation has been chosen based on the below mentioned reasons,

It has 2 hidden layers including input layer and output layer shown in fig 3. 2. As result of this fact that the numbers of existed neurons in
hidden layer is an effective parameter for improvement of learning results, neuron numbers was chosen in order to achieve the optimum number based on output results. So 1 hidden layer has 3 neurons and 5 neurons in second layer. 3. Tansig and Purelin function and also their combination function have been compared as transfer function of network neurons and finally, the effective one has been chosen [13]. 4. For training utilized of BP algorithm and traingd function. 5. Lr parameter 1 has been chosen. 6. For teaching of mentioned neural network, mean squared error (MSE) or goal parameter criterion was utilized in which error of 0.0001 was the stopping point of teaching and maximum repetitions was 1000 times [8].

1) Training of ANN
Structures of ANN were trained using abnormal and normal patients. If the value of node output of output layer was logic-1, we interpreted this as arrhythmia. If the value was logic-0, this was considered as normal. If y(i)≥0.5, we accept as logic-1 and we used h(i)=|1-y(i)| in the error calculation. If y(i)<0.5, it was considered as logic-0 and we used h(i) = | 0-y(i)|. Trained ANN architectures is given in Fig 3. Normal sinus arrhythmia and abnormal were mixed in sequence. The length of the training pattern was 21200 samples (106 sets). The test pattern was done similarly. Both patterns, used in training of ANN, and used in testing trained ANN were occurred from MIT-BIH ECG database and normal sinus rhythm database obtained from MIT-BIT NSRDB database Learning rate (ε) was 0.01 and momentum coefficient (α) was 0.2. Whereas training error was found 0.1% after 1000 iteration, test error became 3.79%.

2) Testing
For testing purpose we use mixture of MIT -BIT Arrhythmia, QT database and NSR database.
They are randomly mixed. On simulating the resultant network with the test dataset, the results are summarized in Table 1. A detection accuracy of 96.21% was obtained.

A. Performance Measures
We have evaluated the performance of the classification algorithms using six measures; sensitivity, specificity, classification accuracy, mean squared error (MSE), receiver operating characteristics (ROC) [9,10]and Youden Index.
These measures are defined using True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). TP decision occurs when an arrhythmia detection of the classifier coincided with a decision of the physician. TN decision occurs when both the classifier and the physician suggested the absence of arrhythmia. FP occurs when the system labels a healthy case as an arrhythmia one. Finally, FN occurs when the system labels an arrhythmia case as healthy.  V. CONCLUSION AND FUTURE SCOPE VI. This paper proposes an effective automated ANN based system for multi class Cardiac Arrhythmia classification from ECG signal data. Every ANN has been tested and compared with the most common traditional ECG analyzers on appropriate databases. Thus, based on the results, the ANN's approach is shown to be capable of dealing with the ambiguous nature of the ECG signal.
The crucial role of data pre-processing and post processing comes out, either for reducing the input space dimension or for more appropriately describing the input features. It is clear that the estimated feed forward ANN with error back propagation algorithm operate as an excellent classifier for given cardiac arrhythmia data set. Therefore our future scope will be further fine tuning design of MLP and pre-processing of ECG signal data so that classification results for other classes will be improved and design of MLP model to classify all 16 arrhythmia classes in one MLP design only. We hope that this system can be further developed and fine-tuned for practical application.