Distributed Search Systems with Self-Adaptive Organizational Setups

— This paper studies the effects of learning-induced alterations of distributed search systems’ organizations. In particular, scenarios where alterations of the search-systems’ organizational setup are based on a form of reinforcement learning are compared to scenarios where the organizational setup is kept constant and to scenarios where the setup is changed randomly. The results indicate that learning-induced alterations may lead to high levels of performance combined with high levels of efficiency in terms of reorganization-effort. However, the results also suggest that the complexity of the underlying search problem together with the aspiration level (which drives positive or negative reinforcement) considerably shapes the effects of learning.

t he organizational setup of distributed search systems is a topic that is investigated in many disciplines, such as control theory, complex systems science or computational organization theory (for extensive reviews cf. [1], [2], [3]). The coherence of and the coordination within distributed search systems are among the predominant issues in this line of research, where the former is defined in terms of some of the systems' properties (e.g., solution quality) and the latter is concerned with actions and interactions of agents collaborating in a distributed search system [4], [5]. Thus, the key topics of the organizational setup of distributed search systems addressed refer to the appropriate segmentation of the overall search problem into sub-tasks, the way sub-tasks are assigned to agents, and the mechanisms to consolidate the (partial) solutions to sub-tasks into an overall solution. The overall solution should be as satisfactory as possible where its quality is determined on the basis of coherence metrics (e.g., [4], [5], [6]).
Hence, feasible consensus mechanisms, performant algo rithms for search and optimization, and the appropriate assign ment of tasks are of particular interest in this line of research [1] -in order to contribute to improving results with respect to coherence metrics of relevance. However, this line of research (mostly implicitly) assumes that the designer of a distributed search system decides which of these mechanisms, algorithms and ways of assignment are employed in the organizational setup of the search system. This paper follows an approach that, in a way, can be regarded as complement to the aforementioned line of research: not the designer of a search system is allowed to (exogenously) decide on the systems' organizational setup but the search system's organizational setup evolves endogenously. In particular, we allow for self-adaptation of the search systems' organizational setup, i.e., while searching for better solutions for the overall search problem (during "run-time") the search system is allowed to change its organization, where changes are based on feedback [7].
The idea of self-adaptive distributed search systems builds on prior studies which provide evidence that distributed search processes could remarkably benefit with respect to solution quality obtained from inducing organiza tional dynamics while searching for better solutionsmay it be in the organizational setup of collaborating robots or "swarms" of unmanned aerial vehicles or in the organizational design of a firm where managers search for higher levels of firm performance [8], [9], [10], [11]. Apparently, organi za tional change per se tends to enhance the performance of a search system by inducing a shift towards more exploration, i.e., discovery of new solutions, and less exploitation, i.e., stepwise improve ment. However, it is worth emphasizing that these studies employ merely random-driven organizational change in the sense that the search systems do not learn which organizational setups are more successful than others.
By investigating the effects of learning-based organizational dynamics, this paper goes a step beyond research studies that employ random-driven organizational changes. In particular, this paper studies the effects of endowing distributed search systems with some capabilities to learn about their organization's performance and to adapt the organizational setup according to the search systems' performance. This paper is an extended version of [12] which was presented at the 13 th International Conference on Distributed Computing and Artificial Intelligence (DCAI). The extensions pre dominant ly relate to the dimensionality of the search problems under investigation, to the time horizon of simulations, and to a sensitivity analysis with respect to the number of search agents.
It appears to be of particular interest to investigate whether search systems which employ learning-based organizational change outperform systems which make use of random changes in their organizational setup or systems which do not change their setup changes at all. This study intends to provide findings on the relative potential benefits of learning-based organizational dynamics. Since, it is well known that the task environment (in terms of the task complexity) tends to affect the performance of search, this paper particularly controls for the complexity of the search problems by employing an agentbased simulation model which is based on the framework of fitness landscapes [13], [14]. The next section introduces the key elements of the simulation model. Section III gives an overview of the performed simulation experiments. The results are presented in Section IV where, first, an in-depth analysis of some baseline scenarios of organizational change modes for different levels of complexity are provided. Second, a sensitivity analysis is presented which puts particular emphasis on the need for coordination within the search system where this need is considerably affected by the number of search agents who carry out sub-tasks.

II. outlIne of the SImulatIon model
The study employs an agent-based simulation model which captures two intertwined adaptive processes: In (1) the short-term, search agents seek to find superior solutions for the search problem. The quality of a solution is measured on the basis of system's overall performance level achieved. We model search agents to operate on NK fitness landscapes [13], [14]. In (2) the mid-term, the search systems are allowed to adapt major features of their organizational setup. Changes are driven by reinforcement-learning, which is based on per formance enhancements achieved. A schematic flow-chart of key features of the simulation model is displayed in Figure 1.

A. Short-Term Adaptive Search for Higher Levels of Performance
The study employs the framework of NK-fitness landscapes, which were originally introduced in the domain of evolutionary biology [13]. An advantage of NK fitness landscapes is that they easily allow for controlling the complexity of the underlying search problem [15].

1) Search Problem
In each time step t of the observation period T, the search systems face an N-dimensional binary search problem, i.e., they seek for

2) Agents and their Choices
The search for higher levels of performance t V is collaboratively performed by M search agents. In particular, the N-dimensional search problem is partitioned into M disjoint partial problems, and each of these sub-problems is exclusively delegated to one search agent In each time step t, a search agent seeks to identify the best configuration for the "own" choices r d t assuming that the other agents do not alter their choices made in t-1. Each agent r randomly discovers two alternatives in addition to the status quo choice . This may not only be an unintentional shortcoming of, e.g., agents' information processing capabilities but may also be intentionally induced: Some evidence suggests that imperfect information on the fitness (performance) of options could increase the effectiveness of search processes (e.g., [16], [17]). Previous research shows that false-positive evaluations of options increase the diversity of search. As a consequence, there is a chance to end situations of inertia induced by sticking to a local peak and to reach basins of attractions for higher levels of fitness. Hence, intentionally or not, our agents may eventually be endowed with slightly distorted information about the performance of options. Distortions are captured by adding error terms as exemplarily shown in Eq. (4): For the sake of simplicity, distortions are modeled as relative errors added to the true performance (for other functions see [16] and stable over time (if not altered by self-adaptation as described subsequently); all error terms are assumed to be independent from each other.
Apart from the search agents, the model captures a kind of "central agent" whose role is a twofold: (1) In the short-termed adaptive search, the central agent could -depending on the particular mode of coordination -intervene in the selection of choices. (2) In the mid-term, the central agent assesses performance enhancements and "learns" about successful organizational setups by reinforcement. The next section provides more details on the central agent's roles.

B. Mid-Term Adaptation of the Organizational Setup based on Reinforcement Learning
The very core of this study is related to learning on the performance contributions of a search system's organization and, eventually, altering the organizational setup accordingly. The following two subsections describe the modelled mode of reinforcement learning as well as the dimensions of the organizational setup which may be subject to organizational change.

1) Mode of Reinforcement Learning.
In each T*-th time step, the central agent faces an L-dimensional decision problem related to the L dimensions of the organizational setup which can be altered. In particular, the central agent chooses a setup ( ) The model employs a simple mode of reinforcement learning (for overviews see [18], [19]) based on statistical learning, i.e., a generalized form of the Bush-Mosteller model [20], [21]: In every T*th period, the propensities of choices are updated based on the stimuli resulting from the evaluation of the outcome (performance effects) achieved under the regime of prior choices of the organizational setup.
The outcome ω of configuration t Ö is given by the maximal relative performance enhancement which is achieved within the last T* periods of the adaptive walk, i.e., The evaluation of the outcome can be regarded positive (1) After the probabilities are updated according to Eq. 7, the organizational setup t Ö to be implemented from time steps 1 + t to * T t + is determined randomly based on the updated probabilities.

2) Organizational Setup.
The vector of the organizational setup Ö is modelled to be threedimensional, i.e., 3 = L . Within each dimension, three options are ). These dimensions relate to (see also Table I): 1. The objective of the search agents as controlled by parameter

III. SImulatIon experImentS and parameter SettIngS
In the simulation experiments, after a performance landscape is generated, the initial organizational setup (i.e., vector 0 = t Ö ) of a search system is determined randomly with uniform probabilities ) 0 , ( = t a p l out of the options in each dimension l as introduced above and summarized in Table I. Next, the search systems are placed randomly in the performance landscape. Then, over an observation time T of 500 periods, the search systems are observed while searching for higher levels of performance. In each T*-th period, probabilities are updated and organizational configurations are (eventually) altered (cf. Sec. B). Fig. 1 displays the key events during a simulation experiment capturing learning-based adaptation of the organizational setup. In order to oppose search systems with learning capabilities (i.e., with l > 0) to non-learning systems employing organizational change, simulations for l = 0 are conducted. Moreover, search systems which do not alter their organization within the observation time T (i.e., with T* > T) are simulated.
In order to capture the complexity of the underlying search problem, simulations for two interaction structures are performed which, in a way, represent two extreme scenarios [22]: in the block-diagonal structure the overall search problem can be segmented into disjoint parts with maximal intense intra-sub-problem interactions but no cross-sub-problem interactions (K*). An example is given in Figure 2.a with K = 2 and K* = 0 where each of the four sub-problems is assigned to one search agent. In this setup, one agent's decisions do not affect the performance contributions of the other agents' choices.
The second case is characterized by full interdependence, i.e., all single options d i affect the performance contributions of all other choices d j≠i and the search problem's complexity is raised to its maximum, i.e., intensity of interactions K as well as the cross-subproblem interactions K* are maximal (see Figure 2.b for an example with K = 11 and K* = 9).  1 X X X X X X X X X X X X 2 X X X X X X X X X X X X 3 X X X X X X X X X X X X 4 X X X X X X X X X X X X 5 X X X X X X X X X X X X 6 X X X X X X X X X X X X 7 X X X X X X X X X X X X 8 X X X X X X X X X X X X 9 X X X X X X X X X X X X 10 X X X X X X X X X X X X 11 X X X X X X X X X X X X 12 X X X X X X X X X X X X

IV. reSultS
The simulation experiments are conducted for two baseline scenarios of complexity (see Figures 2.a and 2.b) and for four modes of organizational adaption: (I) no change, (II) change without learning, (III) learning-based change with low aspiration level and with (IV) high aspiration level. These baseline scenarios are, then, modified in the number of search agents. In the modified scenarios, two or six search agents are employed instead of four. Table II displays Figure 3 depicts the averaged adaptive walks for the different modes of change in the block-diagonal structure of interactions, and Figure 4 reports on the full interdependent structure correspondingly. In particular, Figures 3 and 4 show the performance differences of the "change, no learning" mode and the two modes employing learning, against the "no change" mode.   Table I. Each data row shows the results of 2,500 adaptive walks: 10 walks on 250 distinct landscapes.

A. Baseline Scenarios
In the following, three aspects of the presented results are discussed in detail: (1) performance differences of scenarios in which the organizational setup is changed against scenarios in which the organizational setup is modelled to be constant, (2) the effects of learning-based adaptation compared to purely random adaptions of the organizational setup, (3) the intensity of organizational change (which is captured by the average number of altered dimensions. Concerning the first aspect, Table II as well as Figures 3 and 4 indicate that -with one exception -performance levels of scenarios employing change persistently go beyond the level of performance achieved without change. This behavior can be observed after approximately 40 periods. These results confirm findings of research which indicate that altering the organizational setup in the course of distributed search processes may be favorable [8], [10], [11]: It has been argued that this is driven by the increased diversity of search which reduces the peril of sticking to local peaks. This is broadly confirmed by the ratio of alterations of configurations d and the frequency of how often the global maximum is found in . Table II).
However, results also suggest that learning by reinforcement with high aspiration levels is not universally beneficial. Apparently, the complexity of the search problem together with the aspiration level subtly affects the benefits of learning. In case of the block-diagonal structure, employing learning-based change with a high aspiration level leads to performance levels that are remarkably below the performances achieved without change throughout the adaptive walk from about time-step 75 to 500 and the final performance V t=500 is about 4 points of percentage below the "no change" case. An explanation why, in case of the block-diagonal interaction structure, a high aspiration level apparently induces such a rather poor performance, may be based in the specific selective effects induced in this scenario: With increasing aspiration level it becomes more unlikely that a positive stimulus ) (t τ is achieved under the regime of a certain organizational setup -even if the setup had brought some (lower than v) performance enhancements in the last T* periods. Hence, even potentially appropriate organizational setups are likely to receive low probabilities to be re-chosen for the next T* periods. In the blockdiagonal structure with its fairly low level of interactions (K = 2), it is rather likely that the global maximum is found [22]: of course, no further performance enhancement is possible in these cases and the aspiration level is not reached. Whenever the global maximum is found (with aspiration level v > 0) the organizational setup is likely to be modified. An altered organizational setup also induces a modified evaluation of the current configuration d [11]. As a result, a move away from the global maximum in the performance landscape becomes likely. Fig. 3. Performance differences of adaptive search processes employing organizational change against search processes without alterations of the organizational setup in case of the block-diagonal interaction structure. Each curve represents the difference of means of the average of 2,500 adaptive walks, i.e., 250 distinct performance landscapes with 10 adaptive walks on each. For parameter settings see Table I. Fig. 4. Performance differences of adaptive search processes employing organizational change against search processes without alterations of the organizational setup in case of the full interdependent interaction structure. Each curve represents the difference of means of the average of 2,500 adaptive walks, i.e., 250 distinct performance landscapes with 10 adaptive walks on each. For parameter settings see Table I.
The second aspect to be discussed in detail is related to the performance effects of learning-based adaptation compared to the purely random-driven alterations. The results suggest that learningbased change is not universally more beneficial than purely randomdriven organizational change. Rather, it appears that the aspiration level v is of remarkable relevance: in both interaction structures, learning-based adaptation employing a high aspiration level leads to a level of final performance that is inferior to the performances achieved under purely random-driven change. Employing a low aspiration level performs best in the block-diagonal structure it leads to a medium performance in the case of high complexity.
As argued above, a high aspiration level induces more organizational alterations which leads to more diversity of search, i.e., more alterations of d, as compared to the low aspiration level. For highly complex interaction structures, a particular peril is that the search processes may stick to a local optimum, and, hence, increasing diversity of search "per se" may be beneficial. This might explain the good performance of the "change, no learning" mode. However, a high aspiration level, in a way, "penalizes" particularly those search processes which have reached a good solution from which further improvements are hard to achieve: as argued above, the block-diagonal interaction structure is particularly prone to this effect; however, the rather low performance in the full-interdependent structure ( Figure 4) might also be caused by this effect.
With the third aspect to be discussed more into detail the intensity of organizational change (right-most column in Table II), and, thus, the efficiency of the mode of change and learning comes into play. The average number of organizational dimensions in which alterations occur during the adaptive search may be regarded as an indicator for the effort ("costs"), if any, of organizational dynamics.
Obviously, the context of the search organization is relevant for whether, or not, and, if so, in which shape costs of organizational change occur. For example, in case of a network of unmanned aerial vehicles, collaboratively serving a certain service area, the switch from one coordination mode to another might not cause any costs (apart from activating another of already available coordination mechanisms); however, in case of firm managers, collaboratively searching for better configurations of key performance drivers, reorganizations are rather costly, including, for example, learning costs of new organizational procedures or the adjustment of incentive schemes. Hence, the average number of dimensions changed may be rather critical for the efficiency of inducing organizational dynamics of search.
Results suggest that, in both interaction structures under investigation, learning with a low aspiration level yields good performance and a high level of efficiency as compared to the other scenarios: In case of the block-diagonal interaction structure the final performance achieved with a low aspiration level exceeds the performance reached via purely random-driven change by about 7 points of percentage while the average number of organizational alterations is remarkably lower (i.e., 6.9 altered dimensions on average in case of learning with low aspiration level versus 38.2 in case of purely randomized change). If the complexity of the search problem is high the performance of the "change, no learning" scenario exceeds the performance of learning-based adaptation with low aspiration levels; however, this comes along with, on average, 37.9 organizational alterations compared to 6.2 alterations in the latter case.
In sum, it appears that learning with low aspiration level may provide rather high performance levels combined with few organizational alterations. Thus, whenever organizational alte rations do not come along without any cost, learning with low aspiration level appears to be particularly interesting with respect to the efficiency of search.

B. Sensitivity Analysis
In the sensitivity analysis, the baseline scenarios are modified with respect to the number of search agents: the simulations additionally are conducted for systems with two and with six search agents. In particular, the interactions among decisions remain unchanged, but the assignment of decisions is modified. Figures 5.a and 5.b show the assignment for the case of two agents and six agents, respectively, in the block-diagonal structure as compared to Figure 2.a.
------X X X ---8 ------X X X ---9 ------X X X ---10 ---------X X X 11 ---------X X X 12 ---------X X X In the simulation model, with increasing (decreasing) the number of search agents the diversity of search is increased (decreased): in each time step, each search agent discovers two alternatives to the status quo of the own partial sub-problem (Section II.A) -one alternative where one bit is flipped and another with two bits flipped. Thus, in case of two search agents, at maximum four bits of the entire configuration d could be flipped in time step t; in contrast, with six agents at maximum 12 bits could be flipped. Thus, with increasing number of agents the need for coordination is increased too, and viceversa. Figures 6 and 7 plot the final performance V t=500 achieved in the block-diagonal and the full-interdependent interaction structure, respectively, with two, four and six search agents.  6. Sensitivity of final performance to number of search agents in the block-diagonal interaction structure. Each mark represents the average of 2,500 adaptive walks, i.e., 250 distinct performance landscapes with 10 adaptive walks on each. For parameter settings see Table I.  Table I. The results suggest that the "change, no learning" mode and learning with low aspiration level are least sensitive to the number of agents. In contrast, the final performance obtained by learning with high aspiration level varies considerably with the number of agents. However, the "no change" mode appears most sensitive to an increase in the number M of search agents compared to the modes employing organizational change.
With the transition from two to four agents, the final performance shows rather slight de-or increases -depending on the mode of change and the interaction structure. However, with the transition from four to six agents the final performance obtained decreases remarkably in both interaction structures.
An interesting question is what might cause these effects. A reason might be given by the relation between assignment of decisions to search agents and the interactions among agents' decisions. For example, with six search agents in the block-diagonal structure ( Figure 5.b), crossagent interactions among search agents' choices occur whereas for two and four agents no cross-agent interactions show up (Figure 5.a.). Hence, in this interaction structure the need for coordination among agents' choices ranges from no need at all (i.e., K* = 0 for M = 2 and M = 4) to some coordination need as captured by K* = 1 or K* = 2 (see Figure 5.b).

V. concluSIon
The major finding of this study is that employing self-adaptation for the organizational setup of distributed search systems via reinforcement-based learning potentially leads to high levels of performance and this, in particular, with a rather high level of efficiency, as given by the extent of reorga ni zation. These findings are particularly interesting when reorganizing the search system causes marginal costs -may it be due to learning of new organizational procedures on the agents' site or adjustments required in institutional arrange ments.
However, the results also suggest that the complexity of the search problem together with the aspiration level considerably shapes the effects of reinforcement learning -which, at worst, may even be harmful if compared to refraining from any organizational alterations. These findings may sensitize the designer of a distributed search system to employing learning by reinforcement as the level of the aspired performance enhancements should not be over stretched in order to avoid "hyper-actively and ineffectively" alternating search systems. Moreover, the sensitivity analysis suggests that learning with high aspiration level is particularly sensitive to the need for coordination among search agents.
These findings emphasize the need for further research efforts. An obvious next step is to test the key idea of inducing learning-based organizational change in more practical settings than the one presented here. Though some preliminary results obtained for learning-based selection of the coordination mode in terms of the job scheduling policy employed by a swarm of unmanned aerial vehicles [23] provide some support for the ideas presented in this paper, further applications are definitely of interest.
Moreover, further studies should perform in-depth analyses of the role of the aspiration level and other parameters like the interval between of organizational alterations or the learning strength which were fixed in the simulation ex periments presented in this paper. Furthermore, the basic search problem captured in this study is rather unstructured in terms of randomized performance contri butions (apart from the structure of interactions); hence, in further research studies learning-based organizational adjustments of the search system may turn out to be even more beneficial in case of more structured search problems.