IEEE CIS Task Force on advanced representation in biological and medical search and optimization

Members of the Task Force:

Chair: Daniele M. Papetti (University of Milano-Bicocca, Milan, Italy)
Co-Chair: Andrea Tangherloni (Bocconi University, Milan, Italy)
Honorary member: Daniel A. Ashlock (University of Guelph, ON, Canada)
Wendy Ashlock (Ashlock and McGuinness Consulting, Canada)
Daniela Besozzi (University of Milano-Bicocca, Italy)
Laurens Bliek (Eindhoven University of Technology, The Netherlands)
Joseph A. Brown (Innopolis University, Russia)
Paolo Cazzaniga (University of Bergamo, Italy)
Michael Dubé (University of Guelph, ON, Canada)
Caro Fuchs (Eindhoven University of Technology, The Netherlands)
James Hughes (St. Francis Xavier University, NS, Canada)
Sheridan Houghten (Brock University, ON, Canada)
Uzay Kaymak (Eindhoven University of Technology, The Netherlands)
Luca Manzoni (University of Trieste, Italy)
Giancarlo Mauri (University of Milano-Bicocca, Italy)
Marco S. Nobile (Ca’ Foscari University of Venice, Italy)
Daniele M. Papetti (University of Milano-Bicocca, Italy)
Gonzalo Ruz (Universidad Adolfo Ibanez, Chile)
Simone Spolaor (University of Milano-Bicocca, Italy)
Andrea Tangherloni (University of Bergamo, Italy)

Our mission

Our aim is to promote research on representation of candidate solutions (and the dual perspective of fitness landscape manipulation) in the biomedical field.

Motivation

Many complex computational tasks in the bio-medical disciplines (e.g., Bioinformatics, Bioengineering, Systems Biology, Synthetic Biology, and Computational Biology) can be reformulated in terms of optimization problems, that is, the identification of the global minima (or maxima) of a given function inside a (possibly bounded) multi-dimensional search space.
Due to the noisy, multi-modal, non-convex, non-separable nature of these problems, classic “local” approaches like hill climbing or gradient descent cannot be successfully employed. On the contrary, the community of Computational Intelligence (CI) introduced multiple population-based “global” search algorithms for the effective exploration of the (often huge) search space.

CI methods for global optimization can be categorized in two main classes: Evolutionary Computation (EC) and Swarm Intelligence (SI). The techniques based on EC rely on the idea of a population of candidate solutions undergoing Darwinian processes. Iteration after iteration, the population adapts to the pressure of a user-defined fitness function and converges to an optimal solution to the problem. The techniques based on SI exploit the cooperation between simple agents, which move in the search space by mimicking the behavior of super-organisms in nature. EC and SI techniques are continuously extended, improved with additional special capabilities, and hybridized with other local or global search heuristics, in order to create novel algorithms characterized by better optimization performances, in terms of convergence speed, quality of the solutions found, exploration
capability, and stagnation reduction. Although this well established research line is very prolific—paving the way to the design of efficient algorithms even for large-scale problems—there is another promising direction that could be investigated, consisting in the modification of the search space, that is, space transformation able to dilate, shrink, stretch, collapse, or remap the fitness landscape, leading to simplified formulations of the optimization problem. Moreover, in discrete domains, the simplification of the problem can be obtained by embedding implicit or explicit assumptions into the structure of candidate solutions, so that the feasible search space can be explored by genetic operators in a “smarter” way, reducing the overall computational effort. Biological problems, where nature often provides natural constraints, are an especially rich domain for search space manipulation.

This novel perspective speaks to the need for the advanced representation task force. Arguably, researchers always focused on the primary research line (i.e., improving the algorithms) but a few algorithms were designed and proposed for the dual problem (i.e., designing and implementing non-trivial representations). It is often the case that problem-specific strategies have been used, but there is a general lack of universal theories and results for the optimal representation of candidate solutions. Both theoretical and applied research can be performed in this novel field of research, driving new discoveries, novel perspectives in the context of fitness landscapes analysis, improving the performances of existing algorithms. We envision that this research will ultimately introduce new variation operators and even new classes of global optimization meta-heuristics, which might spend a part of their budget for fitness evaluations to automatically derive an optimal representation.

We thus propose to organise through a task force the efforts towards the investigation, development, and dissemination of new methods for candidate solutions representation. No central repository nor workshop to discuss these methods currently exists. In the following, we outline how the new task force will accomplish this in the future.

Novel representations of candidate solutions

In global optimization techniques applied in bio-medical disciplines, it is often the case that the data structures used to represent a candidate solution can be directly decoded and interpreted; hence, the best fitting individual immediately provides an explicit and human-readable description of the optimal solution. However, this simplicity is often related to indirect problems and implicit limitations:
non-closed variation operators: many optimization approaches require a “correction step” for each new putative solution that is generated. This circumstance is due to the fact that the variation operators (e.g., crossover, mutation, velocity updates) applied to individuals encoded using a naïve representation can yield non-valid solutions (e.g., the inference of Bayesian networks or Petri nets);
sub-optimal exploration: in some domains, the classic variation operators are not able to fully explore the search space due to intrinsic characteristics of the fitness landscape. Non-conventional representations could strongly improve the performances;
failure to incorporate special biological knowledge to restrict or reduce the search space. Most biological problems have numerous intrinsic limits on the possible form of their solutions. Modifying the search space of conventional CI algorithms to respect and incorporate these limitations has potentially huge payoffs at the price of computational scientists needing to better understand the biological domain, either through study or collaboration.
implicit/relative representation: for some optimization problems, it is often the case that some “directions” in the search space could be discarded a priori because they will lead to worse, or infeasible, solutions. This problem can be solved by adopting a representation that considers the “local” characteristics of the candidate solution;
generative or developmental representations that evolve directions for the construction of a solution, rather than directly encoding the solution. This technique is ideal for incorporating domain knowledge into the representation and paring away not only large parts of a search space, but sometimes the majority of points in the space;
self-adaptive representations, parameterized manifolds of representations, state-conditioned representations, procedural representations, generative automata are all examples of non-trivial solutions to represent individuals that are not directly understandable (i.e., they might require a processing step to yield an actually interpretable solution) but can strongly improve the performances of algorithms.

The new task force seeks to improve the state-of-the-art in candidate solutions representation as much as possible, with the eventual goal of providing new insights and directions in the field of global optimization applied to biological and medical disciplines.

Goals

In the short-term, the task force will create awareness of the existing lack of scrutiny regarding novel, alternative, non-intuitive representations of complex optimization and search problems in biological and medical disciplines. We envision to provide the community with a new dual perspective to global optimization solving, more focused on special representation of the problem instead of pushing on the algorithmic side. Our goal is to encourage more researchers in academia and industry to participate in this discussion. This will be achieved via: special sessions at conferences; special issues of appropriate journals; workshops focused on this topic.

In the long term, the task force will:

create an archive of candidate solution non-trivial representation methods, along with data regarding their feasibility, strengths and weaknesses. Creating a taxonomy and survey of these methods is a first step in this direction;
investigate the theoretical foundations of candidate solutions representation and any indirect fitness landscapes manipulation;
identify approaches for representation that have not yet been fully explored and encourage research in this direction;
develop new general representation methods and approaches that can realistically be used in real-world bio-medical applications;
develop a network across different bio-medical communities that use and are otherwise interested in novel representations, as well as introduce connections with the bio-medical industries.

For additional information

Prof. Marco S. Nobile
marco.nobile AT unive.it Department of Environmental Sciences, Informatics and Statistics Ca’ Foscari University of Venice, Italy