Improved Shape Parameter Estimation in Pareto Distributed Clutter with Neural Networks

— The main problem faced by naval radars is the elimination of the clutter input which is a distortion signal appearing mixed with target reflections. Recently, the Pareto distribution has been related to sea clutter measurements suggesting that it may provide a better fit than other traditional distributions. The authors propose a new method for estimating the Pareto shape parameter based on artificial neural networks. The solution achieves a precise estimation of the parameter, having a low computational cost, and outperforming the classic method which uses Maximum Likelihood Estimates (MLE). The presented scheme contributes to the development of the NATE detector for Pareto clutter, which uses the knowledge of clutter statistics for improving the stability of the detection, among other applications

A RAdAR scans the surrounding area emitting electromagnetic waves that produce echoes after being reflected on nearby objects. Echoes are then received back in the transceiver containing the objects' information [1]. Depending on the type of application, radars must ignore certain reflections and focus on others [2].
In the specific case of coastal and ocean exploration, the reflective properties of the sea surface result in the generation of unwanted echoes that may reach high magnitudes. The elimination of these echoes, known as sea clutter, is one of the main problems faced by naval radars whose objective is to detect targets like ships or low altitude aircraft [3].
The representation of clutter is one of the topics most discussed in the literature [4][5][6][7][8][9][10]. Clutter modeling, as it's also called, facilitates the simulation of radars' performance before site implementation. As the clutter is a random signal, its modeling falls in the field of probability distributions.
Many distributions have been used in the modeling of sea clutter. The Weibull [11], K [12], Log-Normal [13], WW [14] and KK [15] are among the most commonly employed alternatives. However, in recent years evidence have been provided suggesting that the Pareto distribution may achieve a better modeling of the phenomenon than its traditional counterparts, while having a simple mathematical formulation [16]. Consequently, a significant amount of papers have been presented in a short period of time including the Pareto distribution in radar related solutions [17][18][19][20][21][22].
Among the different components of clutter modeling, the estimation of the distribution parameters has occupied a prominent place in multiple studies [23][24][25][26]. In the specific case of the Pareto distribution, various estimators have been proposed for the shape parameter that has a remarkable influence in the quality of the detection [21,27,28].
Nevertheless, the Maximum Likelihood Estimator (MLE) is commonly regarded as the classical estimator [29].
The Radar Research Team from the Instituto Superior Politécnico Jose Antonio Echeverria (ISPJAE-CUJAE) has developed improved parameter estimation techniques for the Weibull and K distributions using artificial neural networks (ANN) [30,31]. Given the similarity of the above distributions and the Pareto alternative, the authors aimed at creating a new method for estimating the Pareto shape parameter using ANN.
The neural network, which was finally designed, achieves a precise and low computational cost estimation of the Pareto shape parameter in a wide range of possible values. Its design contributes to the development of the NATE (Neural Adaptive Threshold Estimation) detector for Pareto clutter, which uses the statistical knowledge of the clutter to improve the stability of detection in different scenarios.
Similarly, the precise estimation of the clutter has application in the DRACEC scheme and in identifying anomalous sea surface conditions such as fish gatherings, oil spills or shipwrecks.
The paper is structured as follows. The second section introduces the fundamentals of the Pareto distribution, and the third one presents the method used in the design and training of the neural network. The four section entitled "Results and Discussion" characterizes the performance of the solution and compares it with the classical MLE alternative. Finally, in "Conclusions and Future Research" the contributions of the paper are summarized and recommendations are given for future research lines.

II. PAReTo dIsTRIbuTIon
The Pareto distribution has been used in modeling the income of a population [32] and in several fields of engineering [27,28,33], including sonar [34] and radar [16,35,36] applications. Particularly in [16], the application of the Pareto distribution in the representation of high-resolution X-band sea clutter, observed at low grazing angles, was examined. The investigation compared the Pareto fit with the popular Log-Normal, Weibull, K, KK and WW intensity models. As a result, it was found that the Pareto distribution achieved a better fit than these traditional models.
It was also reported that the closest competitor to Pareto was the KK distribution. As the Pareto distribution is characterized by a simple PDF (Probability Density Function), the results are very promising. It is suggested that the Pareto distribution will become a natural replacement for the KK which uses between 4 and 5 parameters with a complicated PDF that includes Bessel functions.
The PDF of the Pareto distribution is given below.
Where α is the shape parameter and β is the scale parameter [29], also referred to as location parameter or x -minimun value [37]. The β parameter specifies the region where the distribution have positives values which always covers the interval [β, ∞); whereas the shape parameter controls how fast the tail of the distribution drops. Figure  1 ilustrates the effect on the Pareto PDF of the parameters variation.
Consulting the investigations of [21,22,38], the authors concluded that the interval of 2 < α < 10 is the more suitable for the Pareto clutter modeling. Additional simulations verified that samples generated with α < 2 produce high magnitude values too often and those corresponding to α > 10 have a too short PDF tail.

III. desIgn And TRAInIng of The neuRAl neTwoRk
For the design of the neural network, the authors took as a start point the solutions given in [26,30,[39][40][41] for different situations. Consequently, the initial configuration of the network internal variables was the one presented in Table 1. For the full understanding of the meaning of these parameters, the reader is referred to specialized literature [42,43].
The configuration displayed in Table 1 was enough to achieve the desired results. Several parameters were modified looking for a better performance without obtaining virtually any gain. The only exception to this rule was the training algorithm. The best results were obtained after applying Bayesian Regularization [44] instead of Levenberg-Marquardt.

A. Preparation of the Training Set
An essential element in the design of a neuronal estimator is the preparation of the training set. In order to execute the supervised training, a set of 16000 groups of 3000 Pareto samples each was computer-generated by changing the value of the shape parameter (α) every 10 groups. Therefore, the first group was generated with α = 2, the group number 11 with α = 2,005, the group number 21 with α = 2,010, and so on until α = 10. The Pareto scale parameter was maintained at β = 0.001 in all simulations.
The task of the neural network is to estimate the shape parameter for each of the 3000 samples groups. To present samples to the network, histograms were prepared from each group. The histograms reduced the 3000 intensity values to 50, performing therefore the feature extraction from sea clutter. The number of values in the histograms was chosen imitating what was applied in [30,31]. So, the network had 50 inputs designed to read histograms and an output conceived to estimate the value of the α parameter. The number of neurons in the hidden layer was left to optimize by successive trials.

IV. ResulTs And dIscussIon
After executing multiple trainings with ANNs, whose hidden layers contained between 5 and 50 neurons, it was concluded that the improvement by increasing the number of neurons was very low, as can be seen in Figure 2. In fact, the gain was less than a 5% in the mean absolute error when a 5 neurons ANN was replaced by a 50 neurons ANN. The mean absolute error was measured by averaging the absolute magnitude of the deviation of each parameter estimation from the exact value known a priori.
Each value from Figure 2 resulted from choosing the best network after performing 50 training with schemes containing the specified number of neurons in the hidden layer. Afterwards, the network performance was measured with a new dataset independent from the one used in the training.
Consequently, the authors selected a network with five neurons as the final proposal. It exhibited a mean absolute error of 0,0897 and a maximum error of 0,5748. These values represent only a 1,12% and a 7,2% respectively of the search interval. Figure 3 presents 3 graphs corresponding to the ANN's performance. Graph A shows the shape parameter estimation performed by the ANN together with the ideal estimate. Graph B shows the committed error obtained by subtracting both quantities. As can be seen, the deviation is greater for high values of the shape parameter. This is a result of the saturation of the parameter's influence in the heavy tail property of the Pareto distribution and it verifies what it was observed in [30] for the Weibull distribution. As α increases, the PDF curves will become more and more similar to each other, making more difficult the accurate estimation of the parameter.
Finally, graph C from Figure 3 presents a histogram of the committed error. As it's shown, the error exhibits a Gaussian-like behavior which is a positive feature for an estimator.

A. Comparison with the MLE Estimator
Generally, the shape parameter of clutter related distributions such as Weibull, K and Log-Normal is estimated by one of two methods: the Method of Moments (MoM) and the MLE method. In the case of the Pareto distribution, the MoM does not provide good estimates due to the long tails of the distribution and to limitations in the definition of  the moments that require a certain value of β as a condition of existence [29]. In fact, the use of this estimator is strongly unadvised. So, the most widely employed estimator is the MLE that uses the following expressions [29,45]: Where is the estimate of the scale parameter, is the estimate of the shape parameter and is the sample number of a given set.
The authors compared the performance of the new neuronal proposed estimator with the MLE. After evaluating both schemes with a new set of 16000 groups of 3000 samples, it was concluded that the neuronal method performs the estimation with a deviation 50% inferior to the one exhibited by the MLE, both regarding the mean absolute error and the maximum error. The mean absolute error committed by the MLE was of 0,1689 and the maximum error of 1,1077.
Additionally, it was observed that the behavior of the estimators changed in the different estimation intervals as it's shown in Tables 2 and 3. Note that both estimators exhibit a similar performance in the 2 < α < 3 interval, where only a small gain of less than a 3% is achieved by replacing the MLE with the neuronal solution. However, as the magnitude of α increases, the gain starts to be significant, reaching a figure of 200% for the 9 < α < 10 region.
Moreover, the speed of both the MLE method and the neural solution were tested using a personal computer with an Intel Core i5-4460 CPU (3.20GHz) and 4 GBs of RAM memory. The MLE took 5,5425 seconds to complete the estimation on a set containing 16000 groups of 3000 Pareto samples, whereas the ANN consumed 1,8721 seconds (almost 3 times faster). Nevertheless, the ANN processing time can be reduced even more by placing the solution on a FPGA kit which will provide parallel processing features. Also, the time elapsed in the gathering of the histograms (94% of the 1,8721 seconds) can be further reduced by establishing a memory aware system which will only replace the older sample when receiving a new one.
In conclusion, it's safe to say that the neuronal method outperforms the MLE in the region of high magnitudes of the shape parameter; while it's able to maintain an equal or superior performance in the remainder of the estimation interval. The proposed ANN achieves an accurate and low computational cost estimation of the Pareto shape parameter. Therefore, it contributes to the design of radar detectors that guarantee a constant false alarm probability when processing clutter with statistical variations. Indeed, the neural estimation solution presented in [30] for the Weibull distribution led, together with the contribution of [46], to the creation of the W-NATE-CA-CFAR adaptive detector [47]. So, the current paper is expected to lead to the creation of the P-NATE-CA-CFAR (Pareto-Neural Adaptive Threshold Estimation-Cell Averaging-Constant False Alarm Rate) detector.  At the same time, the new neural method helps improving the identification of anomalous sea surface conditions such as fish gatherings [48], oil spills [49,50] or shipwrecks [51,52]. These conditions cause deviations in the clutter statistics, which may be identified with a precise estimator of the shape parameter such as the one proposed.
Lastly, the presented results contribute to the development of the DRACEC method [53] that proposes an alternative detection scheme based on the moments domain. One of the major disadvantages of DRACEC is the need for the accumulation of a large number of samples for further processing. The accurate estimation of distribution parameters allows making inferences on the properties of the samples, reducing thus the volume of data to be stored.

V. conclusIons And fuTuRe ReseARch
A new estimation technique for the Pareto shape parameter, based on artificial neural networks, was proposed. The neural method proved to be better than the classic alternative based on Maximum Likelihood Estimates mainly in the region of high magnitudes of the parameter. The neural solution provides an accurate and low computational cost estimation that can be used to improve the stability of radar detectors, in the identification of clutter anomalies and in the detection in the moments' domain.
The authors will focus next on the development of the P-NATE-CA-CFAR detector and in the FPGA implementation of the presented solution to profit from the parallel processing advantages of this platform. Additionally, the design of similar solutions applied to clutter distributions such as the KK, WW and Compound Gaussian is recommended.

José Raúl Machado Fernández received his
Telecommunications and Electronics Engineering Degree from the Instituto Superior José Antonio Echeverría (ISPJAE-CUJAE) in 2012. He is currently a Ph.D. student at the same institution. His research topics include teledetection, digital signal processing, sea clutter modeling and the application of artificial intelligence for solving diverse engineering problems.