Policy learning in resource-constrained optimization

Richard Allmendinger, Joshua Knowles

    Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

    Abstract

    We consider an optimization scenario in which resources are required in the evaluation process of candidate solutions. The challenge we are focussing on is that certain resources have to be committed to for some period of time whenever they are used by an optimizer. This has the effect that certain solutions may be temporarily non-evaluable during the optimization. Previous analysis revealed that evolutionary algorithms (EAs) can be effective against this resourcing issue when augmented with static strategies for dealing with non-evaluable solutions, such as repairing, waiting, or penalty methods. Moreover, it is possible to select a suitable strategy for resource-constrained problems offline if the resourcing issue is known in advance. In this paper we demonstrate that an EA that uses a reinforcement learning (RL) agent, here Sarsa(λ), to learn offline when to switch between static strategies, can be more effective than any of the static strategies themselves. We also show that learning the same task as the RL agent but online using an adaptive strategy selection method, here D-MAB, is not as effective; nevertheless, online learning is an alternative to static strategies. Copyright 2011 ACM.
    Original languageEnglish
    Title of host publicationGenetic and Evolutionary Computation Conference, GECCO'11|Genet. Evol. Comput. Conf., GECCO
    PublisherAssociation for Computing Machinery
    Pages1971-1978
    Number of pages7
    ISBN (Print)9781450305570
    DOIs
    Publication statusPublished - 2011
    Event13th Annual Genetic and Evolutionary Computation Conference, GECCO'11 - Dublin
    Duration: 1 Jul 2011 → …

    Conference

    Conference13th Annual Genetic and Evolutionary Computation Conference, GECCO'11
    CityDublin
    Period1/07/11 → …

    Keywords

    • Bandit algorithms
    • Closed-loop optimization
    • Dynamic optimization
    • Evolutionary computation
    • Reinforcement learning

    Fingerprint

    Dive into the research topics of 'Policy learning in resource-constrained optimization'. Together they form a unique fingerprint.

    Cite this