to the Association Rules Visualization for Decision Support: A Combined Use Between Boolean Modeling and the Colored 2D Matrix

In the present paper we aim to study the visual decision support based on Cellular machine CASI (Cellular Automata for Symbolic Induction). The purpose is to improve the visualization of large sets of association rules, in order to perform Clinical decision support system and decrease doctors’ cognitive charge. One of the major problems in processing association rules is the exponential growth of generated rules volume which impacts doctor’s adaptation. In order to clarify it, many approaches meant to represent this set of association rules under visual context have been suggested. In this article we suggest to use jointly the CASI cellular machine and the colored 2D matrices to improve the visualization of association rules. Our approach has been divided into four important phases: (1) Data preparation, (2) Extracting association rules, (3) Boolean modeling of the rules base (4) 2D visualization colored by Boolean inferences.


I. Introduction
T he decision support is a wide area.Since the 90's the decision making became a major activity that requires the establishment of effective dedicated systems.The Decision support is intended to assist the decision maker and to support him on his understanding of the decision situation by providing him the raison of the selected choices, allowing him to assess the risks when he adopts any strategy [1].Decision support can be encountered in many application domains as economy, industry, agriculture and medicine.In medicine, the decision is regarded as the center of the medical act.The medical decision process is to make a diagnosis, and also propose a treatment.So, a large number of decision support systems (DSS) have been developed in this area.These DSS are destined to support health workers in their decision making [2].The clinical decision support system (CDSS) is defined as an information technology tool whose main objective is to assist clinicians to organize, store, extract, and exploit medical knowledge [3].
We contribute to an interactive approach to clinical decision support.The objective of such CDSS is to provide interactive help to users which face similar medical decision problems.
Research in the field of CDSS has resulted in the appearance of new technologies on storage, processing and analysis of data and information required for decision process.The development of such systems requires a real knowledge of the application area, which gave rise to Knowledge Discovery from Databases approach [3].The decision value is in data history.The purpose of clinical decision support system (CDSS) based on Knowledge Discovery from Databases (KDD) is to capture medical data that can provide decision support at the time of the treatment.Using the techniques of KDD, the large databases become potentially rich and reliable information sources for knowledge generation and validation.The Data Mining represents the center phase of KDD and to apply the machine-learning algorithms on data to extract models [3].
In our work, we will focus on association rules, constituting one of the powerful models in Data Mining.Association rules are used to treat and discovering interesting rules from large collection of data as XY where X and Y are set of items [4].Association rules have been successfully used in many areas as economic planning support [5], diagnosis assistance and medical research [6].They are able to detect the trends and hidden relationships and exploring correlations from data.However the significant number of rules generated by this method makes it difficult for human eye.Given these large volumes, the text representation mode is not suitable and the interpretation is impossible.To overcome this problem various visualization techniques [7], [8], [9], [10], [11], [12], [13] have been proposed to improve interpretation of extracted knowledge, clearly and precisely.
The objective of our work is to improve visualization of association rules to increase performance of CDSS and reduce the physician's cognitive loading using 2D colored matrices [14] and Cellular Automaton for Symbolic Induction CASI [15].We propose a solution that provides a reasoning and essentially guarantees, storage space optimization and execution time.
Our contribution deals mainly with two aspects: • Firstly, we propose an Intelligent Clinical Decision Support System for immunization using Knowledge Discovery from Databases to provide interactive help to physicians.
• Secondary, we use jointly the CASI cellular machine and the colored 2D matrix to improve visualization of association rules to increase performance of ICDSS and reduce the physician's cognitive loading.
This article is structured as follows: In Section II, an overview of CDSS, association rules and techniques for their visualization are presented.In Section III, the proposed approach is described in detail.The results of the approach are explained in Section IV.Finally, we present our conclusions and our future research perspectives in section V.

II. Related Work
In a patient environment, the medical decision process is to choose an investigative mode.Thence, medical decision support is defined as information management techniques helping partially or completely, the physician in decision processes [16].The clinical decision support system (CDSS), are playing increasingly important roles in medical practice by helping physicians or other medical professionals making clinical decisions.CDSS are having a greater influence about the care process.They are expected to improve the medical care quality; their impact should intensify due to increasing capacity for more efficient data processing [17].So, great numbers of decision support systems have been developed in this domain [18], [19].These applications are made to support health services in their decising making [2].
There are three main categories of CDSS, the first category is indirect decision support systems or documentary assistance systems whose objective is to facilitate access to relevant information rapidly; but these systems have no reasoning method [20].The second category is about the automatic reminder systems that prevent doctors from making mistakes or remind important elements to be taken into account for the decision.Assistance provided is not a reasoning help but rather a reminder providing useful information in an easy defined situation [21].These systems, like the previous ones do not reason [20].Finally, the third category is consulting systems that provide the user with reasoned conclusions according the reasoning methods used.The conception is more satisfactory than precedents.Developers are primarily interested in this category [20].This category includes medical expert systems [21] and systems based on Data Mining [22].Our contribution is part of the CDSS based on Data Mining.Data Mining uses a variety of methods to process large amounts of data and information to discover useful knowledge for decision.It is a decision support tool in various sectors like health sector facing increasing pressures to improve the quality of health care while reducing costs.It is therefore not surprising that health organizations have been interested in data mining to improve physician practices, disease management and the resource utilization.Hence the progressive use of data mining in the medical domain.
Throughout our study, we identify different data mining techniques [23], [24], [25] such as association rules [23], [24], neural networks and naive bayes [24], Decision Tree and Case-Based Reasoning.The competing objective leading us to propose association rules is their simplicity and unlike other data analysis methods, they provide simple and easily interpretable results.Since its introduction in [4] the task of association rules has received a great deal of attention.Today the association rules are still one of the most popular pattern discoveries in KDD.An association rule is an expression R:X→Y where X and Y are sets of items.X is called antecedent or left-hand-side (LHS) and Y, consequent or righthand-side (RHS).The association rules search expresses, from data contained in a relational database, implicit trends between attributes of antecedent and consequent.In order to select interesting rules from the set of all possible rules; the best-known indicators are support and confidence [4].
One of the recurring problems for association rules is the exponential growth of generated rules volume which impacts doctor's adaptation.The expert must be able to move as quickly as possible to the essentials during an evaluation.That is why we are interested in graphical representation techniques of association rules.In fact, some approaches are pretty clear for very small data sets; but less so when these quantities increase.In addition, if an expert searches particular information, his eye has a high chance to mask some important information for his analysis [12].
Different applications have been developed to visualize association rules as tables [9].Each row corresponds to a rule, and each column represents a rule characteristic whose first column element represents the antecedent, the second column represents the consequent and the following columns represent quality measures of rules.These applications have a rule filtering interface, by specifying the items or setting the support and/or confidence.These applications allow you to classify the rules in ascending or descending order using support or confidence measure [26].The main limitation of its application is the textual presentation, which does not suit studying large quantities of rules.
On the other hand, the graph visualization [27] is used to represent relationships between items.They are therefore adapted to visualize association rules connecting their antecedent and consequent with an arc.The graph may be directed or undirected.In the undirected graph, in [28], authors used two colors to identify the antecedent and consequent.If an itemset is at the same time the antecedent of one rule and the consequent of another rule, its node has two colors.Three other colors are used to express rule support; light color for low supports, a dark color for high supports and a black color for supports with value 1.The rule confidence is represented by arc length.High confidence is represented by a long arc [26].An association graph can quickly turn into a tangled display with as few rules.Mosaic plot was introduced by [7] to visualize contingency tables and have been arranged for association rules in [29].Each rule is represented by a rectangle, whose width is the support and height is the confidence.The main limit to this method is that it can only represent a section of the association rules that have the same consequent and have the same attributes in [26].
In [8], author introduced parallel coordinates as a multidimensional data representation method and they are adapted by [28] to visualize association rules.Variables are represented by parallel axes on which items are distributed and each rule is represented by a broken line which cuts the parallel axes at the level of the items it contains [26].
The work of [14] proposes a Two-Dimensional Matrix (2D).A rule is represented by a cell, the antecedent is displayed on row and consequent is displayed in column or inversely.Visualization of association rule with 3D matrices was introduced with MineSet tools [30].These matrices known as «item to item» are composed of three axes whose two are used to display items of antecedent or consequent.The third axis is used to represent indicators as support or confidence.A rule is represented by a bar whose gradual color expresses its support and the height corresponds to his confidence [26].This visualization technique has been improved in matrices «rule-to-item» where each row represents a consequent item and each column has an antecedent item.A rule between two items is represented in the intersection box by a 2D/3D object that indicates the presence or not of the item in the rule.The color of these objects indicates the presence of the item in antecedent or consequent.The matrix is completed by two lines that indicate support and confidence by the height of the bars in threedimensional.The main disadvantage of this representation is that it reaches considerable sizes for important sets of rules and become illegible.
The majority of visualization methods are not adapted to represent large sets of patterns.They become unusable; when the pattern number is too important and few methods give an overview of the pattern sets.Finally, no method is adapted to explain how to deduce some rules or the presence of other rules.We are interested in our work on the Two-Dimensional Matrix (2D) for the simplicity to represent the rules on two colors, blue for antecedent and red for consequent.
Nonetheless, in the presence of a significant number of association rules this representation also becomes illegible, and the rules overlap.Occultation problems of such representation become inevitable.To resolve this problem, we have opted for rule optimization using boolean modeling offered by the Cellular Automaton (CASI) [15], and its inference engine to explain the reasoning of some deductions.
In our approach we based on some aspects of the CASI Cellular Automaton: its simplicity to express knowledge in rules and facts, its efficiency in optimizing storage space and execution time.The latter is a particular model of dynamic and discrete systems able to acquire, represent and process extracted knowledge in Boolean form.On the other hand, the machine exploitation as a Cellular Automaton, in the CDSS area, is a novel idea within the team and also at the scientific community.The Cellular Automaton showed its evidence in several research studies in Data Mining: [31] proposed an approach using cellular automaton for the regulation and the reconfiguration of urban transportation systems, [32] proposed a new approach based on cellular automata to reduce the classification time of knearest neighbors algorithm, and [33] proposed a new mapping approach based on the Boolean modeling of critical domain knowledge and also, on the use of different data sources via the data mining technique for the purpose of improving the process of acquiring knowledge explicitly.The problem addressed by [34] is data mining of biological Mycobacterium Tuberculosis responsible for tuberculosis.The author proposed a process of data-enough to generate new knowledge that will be profitable for extraction of particular patterns in the rules of association and they are modeled by the Boolean principle adopted by the cellular machine CASI (Cellular Automaton for Symbolic Induction).[35] proposed a new text categorization framework using concepts lattice and cellular automata.Finally, [36] developed a new approach to boolean fusion ontologies using the CASI cellular machine to optimize the complexity of classical fusion algorithms.The originality of our system is essentially in a proposal of a Intelligent Clinical Decision Support System for immunization using Knowledge Discovery from Databases and the combination between the 2D matrix and the CASI Cellular Automaton to prove its effectiveness in a new field which is visualization.

III. Proposed Approach
In order to handle the large volume of generated rules by an extraction algorithms, our approach operates as a real Knowledge Discovery Process with an important focus on the post-treatment phase.Indeed, our contribution is attempting to provide an effective means for the association rules analysis by graphical visualization.
Our objective is the involvement of visualization techniques in the medical decision-making process guided by data mining and Boolean modeling of CASI.Our contribution is a CDSS based on Data Mining named VisuelAR (Visuel Association Rules).The latter has an important role in medical practice, by helping physicians or other medical professionals making clinical decisions, using visualization techniques.Our process is divided into four important phases 1) Data preparation, (2) Extracting association rules, (3) Boolean modeling of the rule base (4) 2D visualization colored by Boolean inferences.(Fig. 1).

A. Preprocessing Module
Preprocessing is an crucial step in the Knowledge Discovery from Databases (KDD).Results obtained at the end of this phase depend mostly on the quality of the data used.Pre-treatment steps relate to data access to build tables named individual-variable tables, grouping observations (explicit data).Depending on data type (numeric, symbolic), pre-treatment methods structure data, clean them, process missing data, and select the attributes when they are numerous: selection of the most informative attributes then sampling.This phase is very important because it will condition the model quality established during Data Mining.Finally, these choices are intended to emerge information contained in the data set [37].

B. Data Mining Module
Data mining module extracts association rules using the APRIORI algorithm [4].The user can modify configuration settings such as support and confidence as needed.At the output of this extraction algorithm, the association rule sets are simple text lists.

C. Boolean Modeling Module
CASI [15] is a cellular automaton that simulates basic functioning of an inference engine within an expert system, considering a cellular automaton made of two finite arbitrary long layers of finite-Fig. 1. VisualAR System Architecture.
state machines (cells), all identical.The operation of the system is synchronous, and the state of each cell at time t+1 depends only on the state of its vicinity cells, and on its own state at time t.The behavior of a knowledge base can be represented by such a cellular automaton with two layers.A first layer known as CELFACT is the fact base, and a second layer, known as CELRULE, is the rule base.In each layer, the cell contents determine whether and how it participates in each inference step: at every iteration, a cell can be active or passive, can participate in the inference or not.We suppose that there are l cells in the layer CELFACT, and r cells in the layer CELRULE.The cell states are composed of three parts: the input EF, internal state IF and output parts SF of the CELFACT cells and ER, IR and SR which are the input, internal state and output parts of the CELRULE cells, R E and R S are the input and the output incidence matrices, respectively: The incidence matrices R E and R S represent the input/output relation of the facts and are used in forward chaining.One can also use R E as output relation and R S as input relation for backward chaining.Finally, since there are l cells in the layer CELFACT, EF, IF and SF will be considered as l-dimensional vectors (EF, IF, SF ∈ {0, 1}).Similarly, since there are r cells in the layer CELRULE, ER, IR and SR will be considered as r-dimensional vectors (ER, IR, SR ∈ {0, 1}).The cellular automaton dynamics implements the CIE module as a cycle of an inference engine made up of two local transitions δ fact and δ rules where δ fact corresponds to the evaluation, selection and filtering phases, and δ rules corresponds to the execution phase.(2) Where the matrix R t E is the transpose of R E .We consider G 0 as the initial cellular automaton configuration and the Δ = δ fact º δ rules as a global transition function: .G q } be the configuration set of our cellular automaton.The automaton evolution in discrete time steps from one generation to the next is defined by the configuration sequence ( ) Integration of an inference engine in 2D colored matrix gives it the possibility to justify the presence of some rules and facilitates expert decision-making.

D. Visualization Module
This module uses a 2D matrix visualization technique «rule-toitem», where each row represents a consequent item and each column has an antecedent item.The cell color indicates the itemset presence in the R E or R S matrix of CASI.The presence in the R E matrix is represented by the blue and its presence in the R S matrix is represented by the red.This technique reduces the physician's cognitive loading and improving their visual perception results.

IV. Results and Discussion
In our approach, we are interested in a preventive medicine area which concerns more than half of the world, namely vaccination.
In the world in general, and particularly in Africa, immunization became, through the different Expanded Program on Immunization, one of the most effective strategies for controlling viral and bacterial diseases.In the late 1970s, it was erected as a health program by the World Health Organization (WHO).This program is to protect children aged 0 to 11 months against the six (6) most deadly diseases.These include tuberculosis, diphtheria, tetanus, whooping cough, polio and measles.Indeed, these diseases are responsible for more than two (2) million deaths per year globally, according to WHO.Yet effective ways to protect children against these diseases have been popularized around the world.Unfortunately, a related problem is that all children presented at the first immunization one week after their birth do not receive all of the vaccines after 11 months for many reasons.They are the lost ones.A true problem for the country's health authorities who are always looking to reduce the causes of lost ones problems.
Through the various surveys funded by the WHO in African countries such as Cameroon, Niger, Burkina Faso, Benin [38] [39], [40], [41]: one of the causes of this low immunization coverage is the poor quality of the service offered.This quality of service can be appreciated from the knowledge viewpoint, staff attitudes and practices, the resource availability, the services organization, satisfaction and the beneficiaries' knowledge.Several studies have revealed that immunization service quality is often poor: it is characterized by the administration age of the different vaccines, high wait times, the many missed occasions, poor reception of providers and the appearance of abscesses after vaccination.
Several studies have shown the correlation between the levels of knowledge mothers have about vaccination and the lost ones [41].In this context, we are interested in this work to implement our approach on the detection of lost ones and abundance causes in relation to the mother's socio-economic characteristics.

A. The questionnaire
To collect the data about the mother's socio-economic characteristics we have developed an online questionnaire: https:// goo.gl/forms/mqDTqjXjs659AlvG3 with 9 questions in total about the mothers' socio-economic profile using the Google Forms platform.We opted for a majority of single-answer multichotomous questions (7 of 9 questions) in order to have a rapid response and data exploitation.The questionnaire results obtained have been saved in an Excel file (.csv).

Study Population Characteristics
In Table I, we present the socio-economic characteristics of the mothers interviewed.A total of 368 mothers replied to the questionnaire.Mothers aged above 30 represented the largest age group with 64.2% of the total.The university level represents 61.6% of the education level; the main occupation was state function work with 42.5%.43.7% of these mothers had only 1 child.The most common means of travel for the immunization session was personal transportation in 72% of cases, with travel time less than 30 minutes for 62.3% of mothers.66.4% of women are dissatisfied with the start time of the vaccination session and 76.1% of women respect vaccination dates.To ensure data quality data pretreatment has been developed: We can select data now that already exist in our initial file; we have selected the following attributes: Breakdown by age, occupation, number of children, education level, Means of travel, Travel time, Respect for vaccination.

• Data cleansing and enrichment
Data cleaning starts immediately after data selection [42].If enrichment is required a second cleaning step is essential.In cleaning, we find the processing of missing data.

• • Data transformation
This step consists of transforming one attribute A to another A' [42], in our case, to simplify the nomenclature we have used indexation to designate the different answers obtained and prepare the individual/variable tables (Table II).

B. Extraction of Association Rules
We implemented the Apriori algorithm in JAVA language on a Windows platform.We conducted a series of experiments on our mother's socio-economic database to choose the right model (Table III).We have fixed a suitable and sufficient number for our experience equal to 500 rules.An excerpt from the rules is given in Fig. 2.

C. CASI and the 2D Colored Matrix
The integration of an inference engine in the 2D color matrix gives it the opportunity to justify the presence of certain rules and facilitate expert decision-making.We will use the sample of the preceding association rules to explain how the cellular Automaton CASI works especially backward chaining.The expert can launch CASI on all iterations and visualize all rules that he/she can deduce from a certain fact.Based on formal mathematical reasoning, the resulting rules can be automatically validated.On the other hand, the expert can visualize the rules obtained, step by step, and stop the system any time.This shows the interactivity between the expert and the visualization matrix which is the expected contribution of our approach.
Inference by Purpose If inference by Purpose, the user can interact with the inference engine so that he can stop it at any iteration level following a specific purpose (Fig. 3, Fig. 4, Fig. 5):

Application of δ fact
The 2D matrix displays the applicable rules step by step, we note that the rules applicable in this step are R1, R2 R4 (Fig. 6.).Fig. 7 and Fig. 8 represent the 2 nd configuration G2.The visualization of this configuration is represented by Fig. 9.

Automatic inference
In automatic inference case, the user does not interfere in the inference engine process so that iterations are started automatically until no rules are applicable (Fig. 10 and Fig. 11).

V. Discussion and Validation
For this experiment we used the 5th series (Table III) with 500 interesting rules (Fig. 12) which we cannot all interpret.In order to retrieve the rules that can meet our objective which is the detection of the causes of non-vaccination we will use the Boolean modeling and colored 2D matrix visualization.In our case the fact (RespVacc=non) represents the initial fact to be determined (Fig. 13 and Fig. 14).The incidence matrices RE (Fig. 15) and RS (Fig. 16) represent the input/output relation of the facts and are used in forward chaining from antecedent to consequent.One can also use in transition function δ fact and δ rules , RS as input relation and RE as output relation for backward chaining, from consequent to antecedent.In our case we will choose a backward chaining to detect the facts that are responsible for the non-vaccination.The CASI cellular Automaton gave us the opportunity to retrieve 5 important rules from our series of 500 rules (Fig. 19).The rules, presented in Fig. 19, are R378, R429, R430, R433, R434.These rules were presented to an expert who helped us interpret them.
The rule 378:{NbEnf=4, Occup=FctLib}{RespVacc=Non} indicates that women who have at least 4 children and exercise a liberal profession (presented by a blue color), do not respect the vaccination date (presented by a red color).
The rule 429: {NbEnf=4, NvEtud=Univ, Occup=FctLib}  {RespVacc=Non}indicates that women who have at least 4 children and exercise a liberal profession; have a university education (presented by a blue color), do not respect the vaccination date (presented by a red color).
The rule 433:{MoyDep=TransPer, NbEnf=4, Occup= FctLib} {RespVacc=Non}indicates that women who have personal transportation, at least 4 children, exercise a liberal profession, have a university education (presented by a blue color), do not respect the vaccination date(presented by a red color).
The rest of the rules can be interpreted in the same way and it is clear that this result depends on the population and sample size of the study.

System validation
The complexity of every system is calculated by two factors: the temporal factor, it must go faster as possible, and a spatial factor, it must consume less of memory.
The incidence matrix, RE, facilitates the rules transformation into boolean expressions and makes it possible, to use elementary boolean algebra to test different simplifications.Boolean modeling proves that this can be achieved by reducing the amount of storage and execution time.Indeed, this is due to the use of the Boolean representation of the RE and RS matrices, and the Boolean multiplication used by the transition functions δ fact and δ rules .The two intensive computation processes in CASI are the storage and Boolean multiplication of the RE and RS incidence matrices.
• Storage in R E and R S memory; the latter, being Boolean matrices, can be expressed as follows: two vectors of various binary sequences (in Hexa).The amount of memory necessary to store Boolean matrices is in the order of �(�)when using q sequences of r bits or �(�) when r sequences of q bits are used.Such matrices can be processed in q×r steps.On the other hand, these matrices are in all the hollow iterations (contain a lot of zeros), it is enough to store only the other values equal to 1. • The standard algorithm (R E t .EF), for example, used by the transition function δ fact can be expressed by a sequential algorithm of boolean vector-matrix multiplication executed in a time O(rq) where q is the dimension of the vector EF and r×q the matrix R E t .dimension.The multiplication of R E t .and EF can be executed using the boolean matrix vectorization technique, in a time ≈ �(� log �), where the internal product of a REt.line with the vector EF is reduced to the product of the parity bit (bit wise And).We can therefore conclude that boolean modeling by cellular Automaton, can be a powerful tool for exploration of research spaces efficiently and effectively.It represents an algorithmic alternative of less complexity that facilitates scaling.

VI. Conclusion
This paper presents a new solution to the complex problem of decision visualization.We are interested in improving the visualization of large sets of association rules to increase system performance while reducing the user's cognitive load.In this article we have essentially tried to highlight the combined use of visualization and data mining techniques in an intelligent clinical decision support system.We proposed an approach combining a colored 2D matrix with Boolean modeling, this brings a deduction strategy that makes it expressive.Indeed, the integration of an inference engine on the 2D colored matrix can justify the presence of certain rules and facilitate the expert's decision-making.The result obtained illustrates this intention.The expert can choose to run the Automaton on all its iterations and see all of the rules it can infer from a certain fact.Using formal mathematical reasoning, the resulting rules can be automatically validated.On the other hand, the expert can visualize the rules obtained step by step and stop the system at will.This reflects the interactivity between the expert and the visualization matrix, which is the contribution of our approach.However, some improvements can be envisaged to make our approach more complete: set grayscale to colors based on confidence and support rules; submit to experts, sets of rules from medical areas to validate our approach.The results presented in this paper provide the basis for future research in several areas.Firstly, we propose to evaluate our approach in other application areas and with other visualization techniques.Secondly, one future direction of our work is to evaluate association rules with the different measures [43].

TABLE II .
Units for Magnetic Properties

TABLE III .
Association Rules Series Extract from the Association Rules of the 5th Series.