An Extensive Evaluation of Portfolio Approaches for Constraint Satisfaction Problems

— In the context of Constraint Programming, a portfolio approach exploits the complementary strengths of a portfolio of different constraint solvers. The goal is to predict and run the best solver(s) of the portfolio for solving a new, unseen problem. In this work we reproduce, simulate, and evaluate the performance of different portfolio approaches on extensive benchmarks of Constraint Satisfaction Problems. Empirical results clearly show the benefits of portfolio solvers in terms of both solved instances and solving time.


I. InTRoducTIon
I n the context of Constraint Programming (CP) [40] a portfolio approach [13,17] combines m > 1 different solvers S 1 , …, S m to get a globally better solver, dubbed a portfolio solver.When a new, unseen problem p comes, the portfolio solver seeks to predict and run the best constituent solver(s) S i 1 , …, S i k (with 1 ≤ i j ≤ m for j = 1, …, k) for solving p. Portfolio approaches can be seen as instances of the Algorithm Selection problem [39] where, as reported by [25], the algorithm selection is performed case-by-case according to the problem to solve.
Portfolio solvers have proven to be very efficient, especially for solving the Boolean satisfiability (SAT) problem.For instance, the SAT portfolio solvers 3S [21] and CSHC [30] won gold medals in the SAT Competition 2011 and 2013, while SATZilla [53] won the SAT Challenge 2012.Unfortunately, in the CP field fewer portfolio solvers have been proposed.In this regard, worth mentioning are CPHydra [35] that won the International Constraint Solver Competition 2008 [49] and sunny-cp [5] that won the MiniZinc Challenge 2015 [48].This witnesses that portfolio approaches can be effective also in the CP domain [6], and that the research in this field is not merely theoretical: many real life applications might take advantage of portfolio solvers for solving daily life problems such as, for example, task scheduling or resource allocation problems [1,36].
With the aim of deepening the study of portfolio solving in the CP field, in this paper we extend the research initiated by [2] by presenting a more recent and exhaustive evaluation of portfolio approaches for solving Constraint Satisfaction Problems (CSPs).Improvements are manifold: we evaluate more recent solvers, more features, we fully support the MiniZinc language [33], and we use a larger dataset of CSP instances.The obtained results are encouraging and confirm the effectiveness of portfolio solvers in terms of both solved instances and solving time.
Unfortunately, due to the difficulties in using and adapting to the CSP domain some approaches originally designed for SAT, most of the portfolio solvers we tested have been reimplemented as faithfully as possible.Hence, the goal of this paper is not to present a (possibly unfair) competition between portfolio solvers.We want instead to shed further light on CSP portfolio approaches by means of empirical evaluations.In this regard, we submitted the data and the results we computed to the Algorithm Selection library [9], an open-access library providing a standardized format for representing, evaluating, and comparing different portfolio approaches without the effort of rebuilding all the experimental environment.
Paper structure.Section 2 gives some background notions on CSP portfolio solvers.Section 3 explains the experimental methodology, while Section 4 describes the obtained results.In Section 5 we report the related literature and the concluding remarks.

II. BackgRound
A Constraint Satisfaction Problem (CSP) is a triple consisting of a set of variables each of which associated with a domainof values thatcould take, and a set of constraints defining all the admissible assignments of values to variables [27].The goal is normally to find a solution, i.e., a variable assignment satisfying all the constraints of the problem, by using a suitable constraint solver.In this context, a portfolio solver can be seen as a meta-solver consisting of m > 1 different solvers S 1 , …, S m .When a new, unseen CSP instance p comes, the portfolio solver seeks to predict and run the best constituent solver(s) for solving p.In the rest of the section we give a brief overview of the main ingredients characterizing a CSP portfolio solver, namely: the dataset of CSPs used to make (and test) predictions, the solvers of the portfolio, the features characterizing each CSP, and the selection algorithms used for deciding the solver(s) to run on a given CSP.

A. Dataset, Solvers and Features
In order to build and test a good portfolio approach it is fundamental to gather an adequate dataset of CSPs.The data sample should capture a significant variety of problems encoded in the same language.Although nowadays the CP community has not yet agreed on a standard modelling language, MiniZinc [33] is probably the most used and supported language to model CP problems.However, the biggest existing dataset of CSPs we aware is the one used in the 2008 International Constraint Solver Competition (ICSC) [49].These instances are encoded in the XML-based language XCSP [41].In [2] an empirical evaluation on such a dataset was conducted.Here we take a step forward by exploiting the xcsp2mzn [3] compiler we developed for converting XCSP to MiniZinc.This allowed us to use a bigger benchmark of 8600 CSPs: 6944 instances of ICSC converted by xcsp2mzn, and 1656 native MiniZinc instances coming from the MiniZinc 1.6 benchmarks and the MiniZinc Challenge 2012.
A portfolio solver contains a number of different constituent solvers that clearly should be as effective as possible.However, the individual performance of a solver is not the only key to success: what really matters is the contribution of a solver to the portfolio performance [51].For this reason, increasing the number of constituent solvers does not necessarily mean increasing the performance of a portfolio.Conversely, having too many candidates solvers can make the solvers prediction inefficient and especially inaccurate.In this work we consider (subsets of) a collection of 11 different solvers that attended the MiniZinc Challenge, namely: BProlog, Fzn2smt, CPX, G12/FD, G12/LazyFD, G12/CBC, Gecode, iZplus, MinisatID, Mistral, and OR-Tools.
Usually portfolio solvers decide the solver(s) to run according to a set of features extracted from the instance to solve.Features are specific attributes characterizing a given problem instance, and are clearly of paramount importance for the success of a portfolio approach [39].Features can be divided in static (computed off-line according to the problem specification) and dynamic (computed at runtime by monitoring the problem resolution).In this paper we used mzn2feat [3] to extract a set of 155 features (144 static, 11 dynamic) from a MiniZinc instance.For more details about such features we refer the interested reader to [3].

B. Algorithm Selection
There are several ways to select one or more constituent solver(s) for solving a given instance.A primary distinction can be done between the approaches that require training and the so-called lazy approaches [25] that do not need it.For the former, the training phase is usually performed off-line and empirical evidences prove that a good training can lead to very good performance (e.g., see [50,21,31,30]).However, avoiding the training phase can be clearly advantageous in terms of simplicity and flexibility: new information can be used to improve the predictions without rebuilding the prediction model.For this reasons some lazy approaches have been proposed in the literature (e.g., see [35,37,34,11,43,4]).A further distinction can be made between algorithms that run just one solver and those that schedule more solvers.These may have some practical advantages since they reduce the risk of choosing a wrong solver.Furthermore, scheduling more solvers enables the communication of potentially relevant information such as bounds [6] or nogoods [23].
In this work we considered different selectors disparate in their nature.We implemented and adapted them to the CSP domain trying to be as faithful as possible to their original concept.In particular, we compared the performance of off-the-shelf Machine Learning (shortly, ML) classifiers against some well-known portfolio approaches, namely: CPHydra [35], ISAC [22], 3S [21], SATzilla [50], and SUNNY [4].In the following we provide a brief overview of such approaches.
Off-the-shelf (OTS) are selectors that rely on off-the-shelf ML classification algorithms to predict the best solver to run for a given instance.Thanks to WEKA [14] we implemented a number of wellknown OTS selectors based on well-know classifiers, namely: IBk (k-Nearest Neighbours), J48 (4.5 decision trees), PART (PART decision lists), RF (Random Forests), and SMO (Support Vector Machines).
CPHydra [35] is the first general CSP portfolio solver proposed in the literature.It uses a k-Nearest Neighbour (k-NN) algorithm for computing a schedule of its constituent solvers according to the k-neighbours runtimes.The schedule is computed by solving a generalization of a knapsack problem.CPHydra won the ICSC 2008.
ISAC [22] is a configuration tool that aims at optimally configuring a highly parametrized algorithm.In this work we use the ISAC ``Pure Solver Portfolio" approach following what done by [31] in the SAT field.The training instances are clustered and the solver that solves the most instances in the cluster closer to the instance to be solved is selected.
3S [21] is a SAT solver conjugating a fixed-time static solver schedule (computed off-line) with the dynamic selection of one longrunning solver.This solver is chosen with a k-NN algorithm and is eventually executed after the static schedule.3S was the best dynamic portfolio in the SAT Competition 2011.
SATzilla [52] is a SAT solver relying on runtime prediction models.Its last version [51] uses a weighted random forest approach provided with a cost-sensitive loss function for punishing misclassifications in direct proportion to their performance impact.SATzilla won the SAT Challenge 2012.
SUNNY [4] is a lazy algorithm portfolio using a k-NN algorithm for selecting a sub-portfolio of solvers to run.Solvers are scheduled according to their performance in the neighbourhood.sunny-cp [5], a parallel portfolio solver built on top of the SUNNY algorithm [4], won the MiniZinc Challenge 2015.

III. MeThodology
In this section we explain the methodology used for conducting the experiments.Following what is usually done by most of the approaches, we first removed all the constant features and we scaled all the non-constant ones in the range [-1, 1], ending up with a reduced set of 114 features.Fixed a timeout of T = 1800 seconds1 , we then filtered the dataset of the 8600 CSPs mentioned in Section II-A by removing the ``easiest'' instances (i.e., those solved when computing the dynamic features) and the ``hardest'' ones (i.e., those for which the feature extraction required more than T/2 = 900 seconds).We discarded the easiest since if an instance is already solved during the feature extraction, then no solver prediction is needed.The hardest ones were instead discarded since if the extraction takes more than T/2 seconds, then recompiling the MiniZinc model into FlatZinc (a step needed to run the solvers) would take at least other T/2 seconds, therefore consuming all the time slot available.The final dataset ∆ on which we conducted the experiments was constituted by 4642 MiniZinc instances (3538 from ICSC, 6 from MiniZinc Challenge 2012, and 1098 from MiniZinc 1.6 benchmarks).We ran all the 11 solvers listed in Section II-A on each of the 4642 instances of ∆, thus solving 51062 CSPs2 .We ran all of the solvers with their default parameters, their specific FlatZinc redefinitions, and keeping track of their performance within the timeout T. We then built ten portfolios ∏ m of different size m = 2, …, 11 where ∏ m m is the portfolio with cardinality m maximizing the number of solved instances in ∆ (we used the average solving time for breaking ties).Unlike other approaches, following [2], we decided to keep in ∆ the 944 CSPs not solvable by any solver.We took this decision since these instances could affect the behavior of a portfolio approach.For example, SUNNY allocates to a predesignated backup solver an amount of time proportional to the instances of the k-neighborhood that no solver can solve.
The single solvers performance are listed in Fig. 1.The Single Best Solver (SBS) of the portfolio is MinisatID [10] since it solves the greatest number of instances.Each of the portfolio approaches described in Section II-B has been simulated and evaluated using a 5-repeated 5-fold cross validation [8].We evaluated the performance of each approach in terms of Average Solving Time (AST) 3 and Percentage of Solved Instances (PSI) within T seconds.

Iv. ResulTs
This section presents the obtained results.In addition to the SBS and the portfolio approaches, we add to the evaluation the Virtual Best Solver (VBS) baseline.The VBS is an "oracle'' solver always selecting the best solver of the portfolio for any given instance.For all the reimplemented approaches (i.e., ISAC, 3S, and SATzilla) we use the '-like' suffix.For the OTS approaches we tried different techniques like oversampling, parameters tuning, metaclassifiers, and feature selection.The best results were obtained by RF (with 250 decision trees) and SMO (with a RBF kernel and the C, γ parameters set to 2 9 and 2 --8 respectively).In the rest of the section, for better viewing, we report only their performance among all the OTS variants we experimented.For all the approaches relying on k-NN algorithm we fixed k = 10 and used the Euclidean distance metric.Fig. 2a shows the Percentage of Solved Instances for the aforementioned approaches.All of them have good performance.As already observed by [2], 3S-like and SATzilla-like are better than the 3.If a (portfolio) solver can not solve an instance in T seconds, its solving time is set to T. This choice is also adopted in the MiniZinc Challenge, while in other contexts (e.g., SAT competitions) a penalization of 10 × T seconds is given (PAR10 score).
best OTS approaches, which in turn solve more instances than ISAClike and CPHydra.We do not notice the performance deterioration observed by [2] when increasing the portfolio size: the addition of a new solver is almost always beneficial, or at least not so harmful.Being the methodology of the experiments basically the same of [2], we deem that such a behavior is due to the different nature of the dataset, the features, and the solvers we used in this evaluation.
The peak performances are reached by 3S-like (77.23% with 11 solvers) and SUNNY (77.69% with 10 solvers) while in this case SATzilla-like is slightly worse (75.85% with 9 solvers).Fig. 2b depicts the performance of 3S-like and SUNNY only, together with the SBS and the VBS.It is immediately visible the performance difference between the best portfolio approaches and the SBS, which solves just 51.62% of the instances.In particular, SUNNY is able to close up to 92.95% of the gap between the SBS and the VBS.Fig. 3a shows the Average Solving Time for each approach.As also noted by [2] the AST is highly anti-correlated with the PSI for all the approaches except CPHydra.3S-like however is slower if compared to its performance in the work by [2].A plausible explanation is that CPHydra and 3S-like do not employ any heuristic for sorting the selected solvers.Let us explain this with a simple example.Let us suppose that a solver S Conversely, the heuristic used by SUNNY (which sorts the selected solvers by increasing solving time in the k-neighbourhood) is fruitful in this context.SATzilla-like is not far from SUNNY, confirming that it can minimize the AST more than 3S-like, even if it solves less instances.Also in this case the difference with the SBS is remarkable (see Fig. 3b).The best AST performance is reached by SUNNY (568.84 seconds) which by using 10 solvers is able to close the 77.52% of the gap between the SBS and the VBS.The strong anti-correlation between AST and PSI is confirmed by the low Pearson coefficient (about -0.79).There is instead a linear correlation between the PSI and the AST of CPHydra.Nonetheless, its worst performance (884.81 seconds) is however better than the one of the SBS.For better viewing, Table 1 and Table 2 report the actual values of PSI and AST respectively.Reproducibility The problem of effectively reproducing and comparing different approaches is a well-known issue that also affected this work.Indeed, some of the approaches we tested were not publicly available or extremely hard to use and adapt when available.There are several different ways to adapt an approach to CSP, and many other solver selectors exists.Clearly, comparing them all is a daunting task.To address this problem, the Algorithm Selection Library (ASlib) [9] has been recently introduced.ASlib provides a standardized format for representing very heterogeneous portfolio scenarios with the aim of effectively sharing and comparing different approaches.Unfortunately, at the time we conducted the experiments the ASlib had not been developed yet.We then submitted to ASlib the data and the results described in this paper, hoping that this will foster the creation of further and better portfolio approaches for the CSP field.Furthermore, the source code we developed for conducting the experiments is available at: http://www.cs.unibo.it/~amadini/csp_portfolio.zip

v. conclusIons
In this paper we presented an empirical analysis of different portfolio approaches for solving Constraint Satisfaction Problems (CSPs).We simulated and evaluated different approaches on extensive benchmarks of CSPs encoded in MiniZinc language.The obtained results are encouraging and confirm the effectiveness of CSP portfolio solvers in terms of both solved instances and solving time.
Since the impossibility of using the original code, most of the approaches have been reimplemented trying to be as faithful as possible.However, for making our experiments reproducible and comparable, we submitted the evaluation scenario to the Algorithm Selection library [9].Indeed, in addition to the approaches evaluated in this paper, a plethora of other CSP portfolio approaches have been proposed in the literature [32,12,46,18,2].For more comprehensive surveys about algorithm selection and runtime prediction we refer the interested reader to [25,45,20].The possible extensions of this work are manifold.From the CSP point of view, the gap with SAT portfolio solvers is still pronounced.An immediate research direction is therefore to encourage the construction, the experimentation, and the dissemination of effective and portable CSP portfolio solvers by devising new techniques and strategies.Moreover, even if in this work we focused only on sequential approaches, the multi-solver nature of portfolios naturally leads to the parallelization of the solvers execution [29,24,42,16,5].
A well-known problem concerns the selection of the most informative features for removing redundant information and improving the prediction accuracy [19,26].Reducing the training times [47] and exploiting incoming knowledge [28] are also promising directions for having more dynamic portfolios.Finally, we remark that portfolio approaches can be successfully applied in the most disparate domains.Besides SAT and CSP fields, successful portfolio solvers have been developed also for Answer-Set Programming (ASP) [15], Quantified Boolean Formula (QBF) [38], Planning [44], Constraint Optimization Problems (COPs) [6].

Fig. 1 .
Fig. 1.Total number of solved instances for each solver of the portfolio.

1
solves a given CSP in 10 seconds, while another solver S 2 fails to solve it.Now consider two portfolio approaches P 1 first 900 seconds, and then S 2 for the remaining 900 seconds.Symmetrically, P 2 schedules S 2 for 900 seconds and then S 1 for the remaining time.Despite both P 1 and P 2 solves the CSP --so the different schedules do not influence the PSI-the solving time of P 1 will be 10 seconds, while the one of P 2 will be 910 seconds.Clearly, this difference might have a great influence on the AST.3S-like is better than CPHydra since it solves more instances and schedules the solvers in a reduced time window (T/10 = 180 seconds).