1
|
Yu H, Zhao X, Dong D, Chen C. Hamiltonian Identification via Quantum Ensemble Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11261-11275. [PMID: 37030784 DOI: 10.1109/tnnls.2023.3258622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Identifying the Hamiltonian of an unknown quantum system is a critical task in the area of quantum information. In this article, we propose a systematic Hamiltonian identification approach via quantum ensemble multiclass classification (HI-QEMC). This approach is implemented by a three-step iterative refining process, i.e., parameter interval guess, verification, and judgment. In the parameter interval guess step, the parameter interval is divided into several sub-intervals and the true Hamiltonian parameter is guessed in one of them. In the parameter interval verification step, cross verification is applied to verify the accuracy of the guess. In the parameter interval judgment step, an adaptive interval judgment (AIJ) algorithm is designed to determine the sub-interval containing the true Hamiltonian parameter. Numerical results on two typical quantum systems, i.e., two-level quantum systems and three-level quantum systems, demonstrate the effectiveness and superior performance of the proposed approach for quantum Hamiltonian identification.
Collapse
|
2
|
Reuer K, Landgraf J, Fösel T, O'Sullivan J, Beltrán L, Akin A, Norris GJ, Remm A, Kerschbaum M, Besse JC, Marquardt F, Wallraff A, Eichler C. Realizing a deep reinforcement learning agent for real-time quantum feedback. Nat Commun 2023; 14:7138. [PMID: 37932251 PMCID: PMC10628214 DOI: 10.1038/s41467-023-42901-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 10/25/2023] [Indexed: 11/08/2023] Open
Abstract
Realizing the full potential of quantum technologies requires precise real-time control on time scales much shorter than the coherence time. Model-free reinforcement learning promises to discover efficient feedback strategies from scratch without relying on a description of the quantum system. However, developing and training a reinforcement learning agent able to operate in real-time using feedback has been an open challenge. Here, we have implemented such an agent for a single qubit as a sub-microsecond-latency neural network on a field-programmable gate array (FPGA). We demonstrate its use to efficiently initialize a superconducting qubit and train the agent based solely on measurements. Our work is a first step towards adoption of reinforcement learning for the control of quantum devices and more generally any physical device requiring low-latency feedback.
Collapse
Affiliation(s)
- Kevin Reuer
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland.
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland.
| | - Jonas Landgraf
- Max Planck Institute for the Science of Light, Staudtstraße 2, 91058, Erlangen, Germany
- Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany
| | - Thomas Fösel
- Max Planck Institute for the Science of Light, Staudtstraße 2, 91058, Erlangen, Germany
- Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany
| | - James O'Sullivan
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Liberto Beltrán
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Abdulkadir Akin
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Graham J Norris
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Ants Remm
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Michael Kerschbaum
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Jean-Claude Besse
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Florian Marquardt
- Max Planck Institute for the Science of Light, Staudtstraße 2, 91058, Erlangen, Germany
- Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany
| | - Andreas Wallraff
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland
- Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Christopher Eichler
- Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland.
- Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany.
| |
Collapse
|
3
|
Ma H, Dong D, Ding SX, Chen C. Curriculum-Based Deep Reinforcement Learning for Quantum Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8852-8865. [PMID: 35263262 DOI: 10.1109/tnnls.2022.3153502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep reinforcement learning (DRL) has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel DRL approach by constructing a curriculum consisting of a set of intermediate tasks defined by fidelity thresholds, where the tasks among a curriculum can be statically determined before the learning process or dynamically generated during the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based DRL (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical comparison with the traditional methods [gradient method (GD), genetic algorithm (GA), and several other DRL methods] demonstrates that CDRL exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with few control pulses.
Collapse
|
4
|
Konar D, Bhattacharyya S, Panigrahi BK, Behrman EC. Qutrit-Inspired Fully Self-Supervised Shallow Quantum Learning Network for Brain Tumor Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6331-6345. [PMID: 33983887 DOI: 10.1109/tnnls.2021.3077188] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Classical self-supervised networks suffer from convergence problems and reduced segmentation accuracy due to forceful termination. Qubits or bilevel quantum bits often describe quantum neural network models. In this article, a novel self-supervised shallow learning network model exploiting the sophisticated three-level qutrit-inspired quantum information system, referred to as quantum fully self-supervised neural network (QFS-Net), is presented for automated segmentation of brain magnetic resonance (MR) images. The QFS-Net model comprises a trinity of a layered structure of qutrits interconnected through parametric Hadamard gates using an eight-connected second-order neighborhood-based topology. The nonlinear transformation of the qutrit states allows the underlying quantum neural network model to encode the quantum states, thereby enabling a faster self-organized counterpropagation of these states between the layers without supervision. The suggested QFS-Net model is tailored and extensively validated on the Cancer Imaging Archive (TCIA) dataset collected from the Nature repository. The experimental results are also compared with state-of-the-art supervised (U-Net and URes-Net architectures) and the self-supervised QIS-Net model and its classical counterpart. Results shed promising segmented outcomes in detecting tumors in terms of dice similarity and accuracy with minimum human intervention and computational resources. The proposed QFS-Net is also investigated on natural gray-scale images from the Berkeley segmentation dataset and yields promising outcomes in segmentation, thereby demonstrating the robustness of the QFS-Net model.
Collapse
|
5
|
Gao Y, Wang X, Yu N, Wong BM. Harnessing deep reinforcement learning to construct time-dependent optimal fields for quantum control dynamics. Phys Chem Chem Phys 2022; 24:24012-24020. [PMID: 36128792 DOI: 10.1039/d2cp02495k] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We present an efficient deep reinforcement learning (DRL) approach to automatically construct time-dependent optimal control fields that enable desired transitions in dynamical chemical systems. Our DRL approach gives impressive performance in constructing optimal control fields, even for cases that are difficult to converge with existing gradient-based approaches. We provide a detailed description of the algorithms and hyperparameters as well as performance metrics for our DRL-based approach. Our results demonstrate that DRL can be employed as an effective artificial intelligence approach to efficiently and autonomously design control fields in quantum dynamical chemical systems.
Collapse
Affiliation(s)
- Yuanqi Gao
- Department of Electrical and Computer Engineering, University of California-Riverside, Riverside, CA, USA
| | - Xian Wang
- Department of Physics and Astronomy, University of California-Riverside, Riverside, CA, USA
| | - Nanpeng Yu
- Department of Electrical and Computer Engineering, University of California-Riverside, Riverside, CA, USA.
| | - Bryan M Wong
- Department of Chemical and Environmental Engineering, Materials Science and Engineering Program, Department of Chemistry, and Department of Physics and Astronomy, University of California-Riverside, Riverside, CA, USA.
| |
Collapse
|
6
|
Wei Q, Ma H, Chen C, Dong D. Deep Reinforcement Learning With Quantum-Inspired Experience Replay. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9326-9338. [PMID: 33600343 DOI: 10.1109/tcyb.2021.3053414] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to the traditional experience replay mechanism in DRL, the proposed DRL with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition), to achieve a balance between exploration and exploitation. In DRL-QER, transitions are first formulated in quantum representations and then the preparation operation and depreciation operation are performed on the transitions. In this process, the preparation operation reflects the relationship between the temporal-difference errors (TD-errors) and the importance of the experiences, while the depreciation operation is taken into account to ensure the diversity of the transitions. The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms, such as DRL-PER and DCRL on most of these games with improved training efficiency and is also applicable to such memory-based DRL approaches as double network and dueling network.
Collapse
|
7
|
A quantum system control method based on enhanced reinforcement learning. Soft comput 2022. [DOI: 10.1007/s00500-022-07179-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Konar D, Bhattacharyya S, Dey S, Panigrahi BK. Optimized activation for quantum-inspired self-supervised neural network based fully automated brain lesion segmentation. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03108-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
9
|
Optimizing quantum annealing schedules with Monte Carlo tree search enhanced with neural networks. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00446-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
10
|
Bolens A, Heyl M. Reinforcement Learning for Digital Quantum Simulation. PHYSICAL REVIEW LETTERS 2021; 127:110502. [PMID: 34558930 DOI: 10.1103/physrevlett.127.110502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 05/09/2021] [Accepted: 07/22/2021] [Indexed: 06/13/2023]
Abstract
Digital quantum simulation on quantum computers provides the potential to simulate the unitary evolution of any many-body Hamiltonian with bounded spectrum by discretizing the time evolution operator through a sequence of elementary quantum gates. A fundamental challenge in this context originates from experimental imperfections, which critically limits the number of attainable gates within a reasonable accuracy and therefore the achievable system sizes and simulation times. In this work, we introduce a reinforcement learning algorithm to systematically build optimized quantum circuits for digital quantum simulation upon imposing a strong constraint on the number of quantum gates. With this we consistently obtain quantum circuits that reproduce physical observables with as little as three entangling gates for long times and large system sizes up to 16 qubits. As concrete examples we apply our formalism to a long-range Ising chain and the lattice Schwinger model. Our method demonstrates that digital quantum simulation on noisy intermediate scale quantum devices can be pushed to much larger scale within the current experimental technology by a suitable engineering of quantum circuits using reinforcement learning.
Collapse
Affiliation(s)
- Adrien Bolens
- Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Straße 38, 01187 Dresden, Germany
| | - Markus Heyl
- Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Straße 38, 01187 Dresden, Germany
| |
Collapse
|
11
|
Haug T, Mok WK, You JB, Zhang W, Eng Png C, Kwek LC. Classifying global state preparation via deep reinforcement learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abc81f] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Quantum information processing often requires the preparation of arbitrary quantum states, such as all the states on the Bloch sphere for two-level systems. While numerical optimization can prepare individual target states, they lack the ability to find general control protocols that can generate many different target states. Here, we demonstrate global quantum control by preparing a continuous set of states with deep reinforcement learning. The protocols are represented using neural networks, which automatically groups the protocols into similar types, which could be useful for finding classes of protocols and extracting physical insights. As application, we generate arbitrary superposition states for the electron spin in complex multi-level nitrogen-vacancy centers, revealing classes of protocols characterized by specific preparation timescales. Our method could help improve control of near-term quantum computers, quantum sensing devices and quantum simulations.
Collapse
|
12
|
Konar D, Bhattacharyya S, Gandhi TK, Panigrahi BK. A Quantum-Inspired Self-Supervised Network model for automatic segmentation of brain MR images. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
13
|
Wang Z, Li HX, Chen C. Reinforcement Learning-Based Optimal Sensor Placement for Spatiotemporal Modeling. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2861-2871. [PMID: 30892267 DOI: 10.1109/tcyb.2019.2901897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A reinforcement learning-based method is proposed for optimal sensor placement in the spatial domain for modeling distributed parameter systems (DPSs). First, a low-dimensional subspace, derived by Karhunen-Loève decomposition, is identified to capture the dominant dynamic features of the DPS. Second, a spatial objective function is proposed for the sensor placement. This function is defined in the obtained low-dimensional subspace by exploiting the time-space separation property of distributed processes, and in turn aims at minimizing the modeling error over the entire time and space domain. Third, the sensor placement configuration is mathematically formulated as a Markov decision process (MDP) with specified elements. Finally, the sensor locations are optimized through learning the optimal policies of the MDP according to the spatial objective function. The experimental results of a simulated catalytic rod and a real snap curing oven system are provided to demonstrate the feasibility and efficiency of the proposed method in solving the combinatorial optimization problems, such as optimal sensor placement.
Collapse
|
14
|
Wang Z, Li HX, Chen C. Incremental Reinforcement Learning in Continuous Spaces via Policy Relaxation and Importance Weighting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1870-1883. [PMID: 31395556 DOI: 10.1109/tnnls.2019.2927320] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, a systematic incremental learning method is presented for reinforcement learning in continuous spaces where the learning environment is dynamic. The goal is to adjust the previously learned policy in the original environment to a new one incrementally whenever the environment changes. To improve the adaptability to the ever-changing environment, we propose a two-step solution incorporated with the incremental learning procedure: policy relaxation and importance weighting. First, the behavior policy is relaxed to a random one in the initial learning episodes to encourage a proper exploration in the new environment. It alleviates the conflict between the new information and the existing knowledge for a better adaptation in the long term. Second, it is observed that episodes receiving higher returns are more in line with the new environment, and hence contain more new information. During parameter updating, we assign higher importance weights to the learning episodes that contain more new information, thus encouraging the previous optimal policy to be faster adapted to a new one that fits in the new environment. Empirical studies on continuous controlling tasks with varying configurations verify that the proposed method achieves a significantly faster adaptation to various dynamic environments than the baselines.
Collapse
|
15
|
Al-Dabooni S, Wunsch DC. Online Model-Free n-Step HDP With Stability Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1255-1269. [PMID: 31251198 DOI: 10.1109/tnnls.2019.2919614] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Because of a powerful temporal-difference (TD) with λ [TD( λ )] learning method, this paper presents a novel n -step adaptive dynamic programming (ADP) architecture that combines TD( λ ) with regular TD learning for solving optimal control problems with reduced iterations. In contrast with a backward view learning of TD( λ ) that is required an extra parameter named eligibility traces to update at the end of each episode (offline training), the new design in this paper has forward view learning, which is updated at each time step (online training) without needing the eligibility trace parameter in various applications without mathematical models. Therefore, the new design is called the online model-free n -step action-dependent (AD) heuristic dynamic programming [NSHDP( λ )]. NSHDP( λ ) has three neural networks: the critic network (CN) with regular one-step TD [TD(0)], the CN with n -step TD learning [or TD( λ )], and the actor network (AN). Because the forward view learning does not require any extra eligibility traces associated with each state, the NSHDP( λ ) architecture has low computational costs and is memory efficient. Furthermore, the stability is proven for NSHDP( λ ) under certain conditions by using Lyapunov analysis to obtain the uniformly ultimately bounded (UUB) property. We compare the results with the performance of HDP and traditional action-dependent HDP( λ ) [ADHDP( λ )] with different λ values. Moreover, a complex nonlinear system and 2-D maze problem are two simulation benchmarks in this paper, and the third one is an inverted pendulum simulation benchmark, which is presented in the supplemental material part of this paper. NSHDP( λ ) performance is examined and compared with other ADP methods.
Collapse
|
16
|
Al-Dabooni S, Wunsch DC. An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1155-1169. [PMID: 31247567 DOI: 10.1109/tnnls.2019.2919338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ( λ ). This approach is known as TD( λ ), and its DHP extension is known as VGL( λ ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N -step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n -step TD( λ ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n -step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.
Collapse
|
17
|
Sommer C, Asjad M, Genes C. Prospects of reinforcement learning for the simultaneous damping of many mechanical modes. Sci Rep 2020; 10:2623. [PMID: 32060483 PMCID: PMC7021687 DOI: 10.1038/s41598-020-59435-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 01/28/2020] [Indexed: 11/08/2022] Open
Abstract
We apply adaptive feedback for the partial refrigeration of a mechanical resonator, i.e. with the aim to simultaneously cool the classical thermal motion of more than one vibrational degree of freedom. The feedback is obtained from a neural network parametrized policy trained via a reinforcement learning strategy to choose the correct sequence of actions from a finite set in order to simultaneously reduce the energy of many modes of vibration. The actions are realized either as optical modulations of the spring constants in the so-called quadratic optomechanical coupling regime or as radiation pressure induced momentum kicks in the linear coupling regime. As a proof of principle we numerically illustrate efficient simultaneous cooling of four independent modes with an overall strong reduction of the total system temperature.
Collapse
Affiliation(s)
- Christian Sommer
- Max Planck Institute for the Science of Light, Staudtstraße 2, D-91058, Erlangen, Germany.
| | - Muhammad Asjad
- Max Planck Institute for the Science of Light, Staudtstraße 2, D-91058, Erlangen, Germany
| | - Claudiu Genes
- Max Planck Institute for the Science of Light, Staudtstraße 2, D-91058, Erlangen, Germany
- Department of Physics, University of Erlangen-Nuremberg, Staudtstraße 2, D-91058, Erlangen, Germany
| |
Collapse
|
18
|
Fu Q, Yang Z, Lu Y, Wu H, Hu F, Chen J. Variational Bayesian Exploration-Based Active Sarsa Algorithm. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419510054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We proposed an improved variational Bayesian exploration-based active Sarsa (VBE-ASAR) algorithm, which tries to balance the exploration and exploitation dilemma, and speeds up the convergence rate. First, in the learning process, variational Bayesian method is adopted to measure the information gain, which is used as an exploration factor to construct an internal reward function for heuristic exploration. In addition, before the learning process, in order to improve the exploration performance, transfer learning is used to initialize the value function, where Bisimulation metric is introduced to measure the distance between two states from the source MDP and the target MDP, respectively. Finally, we apply the proposed algorithm to the cliff walking problem, and compare with the Sarsa algorithm, the Q-Learning algorithm, the VFT-Sarsa algorithm and the Bayesian Sarsa (BS) algorithm. Experimental results show that the VBE-ASAR algorithm has a faster learning rate.
Collapse
Affiliation(s)
- Qiming Fu
- Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
| | - Zhengxia Yang
- Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
| | - You Lu
- Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
| | - Hongjie Wu
- Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
| | - Fuyuan Hu
- Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
| | - Jianping Chen
- Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
- Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, P. R. China
| |
Collapse
|
19
|
Al-Dabooni S, Wunsch D. The Boundedness Conditions for Model-Free HDP( λ ). IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1928-1942. [PMID: 30418923 DOI: 10.1109/tnnls.2018.2875870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper provides the stability analysis for a model-free action-dependent heuristic dynamic programing (HDP) approach with an eligibility trace long-term prediction parameter ( λ ). HDP( λ ) learns from more than one future reward. Eligibility traces have long been popular in Q-learning. This paper proves and demonstrates that they are worthwhile to use with HDP. In this paper, we prove its uniformly ultimately bounded (UUB) property under certain conditions. Previous works present a UUB proof for traditional HDP [HDP( λ = 0 )], but we extend the proof with the λ parameter. By using Lyapunov stability, we demonstrate the boundedness of the estimated error for the critic and actor neural networks as well as learning rate parameters. Three case studies demonstrate the effectiveness of HDP( λ ). The trajectories of the internal reinforcement signal nonlinear system are considered as the first case. We compare the results with the performance of HDP and traditional temporal difference [TD( λ )] with different λ values. The second case study is a single-link inverted pendulum. We investigate the performance of the inverted pendulum by comparing HDP( λ ) with regular HDP, with different levels of noise. The third case study is a 3-D maze navigation benchmark, which is compared with state action reward state action, Q( λ ), HDP, and HDP( λ ). All these simulation results illustrate that HDP( λ ) has a competitive performance; thus this contribution is not only UUB but also useful in comparison with traditional HDP.
Collapse
|
20
|
Mehta P, Wang CH, Day AGR, Richardson C, Bukov M, Fisher CK, Schwab DJ. A high-bias, low-variance introduction to Machine Learning for physicists. PHYSICS REPORTS 2019; 810:1-124. [PMID: 31404441 PMCID: PMC6688775 DOI: 10.1016/j.physrep.2019.03.001] [Citation(s) in RCA: 203] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to more advanced topics in both supervised and unsupervised learning. Topics covered in the review include ensemble models, deep learning and neural networks, clustering and data visualization, energy-based models (including MaxEnt models and Restricted Boltzmann Machines), and variational methods. Throughout, we emphasize the many natural connections between ML and statistical physics. A notable aspect of the review is the use of Python Jupyter notebooks to introduce modern ML/statistical packages to readers using physics-inspired datasets (the Ising Model and Monte-Carlo simulations of supersymmetric decays of proton-proton collisions). We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists may be able to contribute.
Collapse
Affiliation(s)
- Pankaj Mehta
- Department of Physics, Boston University, Boston, MA 02215, USA
| | - Ching-Hao Wang
- Department of Physics, Boston University, Boston, MA 02215, USA
| | | | | | - Marin Bukov
- Department of Physics, University of California, Berkeley, CA 94720, USA†
| | | | - David J Schwab
- Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, 365 Fifth Ave., New York, NY 10016
| |
Collapse
|
21
|
Day AGR, Bukov M, Weinberg P, Mehta P, Sels D. Glassy Phase of Optimal Quantum Control. PHYSICAL REVIEW LETTERS 2019; 122:020601. [PMID: 30720331 DOI: 10.1103/physrevlett.122.020601] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Indexed: 06/09/2023]
Abstract
We study the problem of preparing a quantum many-body system from an initial to a target state by optimizing the fidelity over the family of bang-bang protocols. We present compelling numerical evidence for a universal spin-glasslike transition controlled by the protocol time duration. The glassy critical point is marked by a proliferation of protocols with close-to-optimal fidelity and with a true optimum that appears exponentially difficult to locate. Using a machine learning (ML) inspired framework based on the manifold learning algorithm t-distributed stochastic neighbor embedding, we are able to visualize the geometry of the high-dimensional control landscape in an effective low-dimensional representation. Across the transition, the control landscape features an exponential number of clusters separated by extensive barriers, which bears a strong resemblance with replica symmetry breaking in spin glasses and random satisfiability problems. We further show that the quantum control landscape maps onto a disorder-free classical Ising model with frustrated nonlocal, multibody interactions. Our work highlights an intricate but unexpected connection between optimal quantum control and spin glass physics, and shows how tools from ML can be used to visualize and understand glassy optimization landscapes.
Collapse
Affiliation(s)
- Alexandre G R Day
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Marin Bukov
- Department of Physics, University of California, Berkeley, California 94720, USA
| | - Phillip Weinberg
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Pankaj Mehta
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Dries Sels
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA
- Department of Physics, Harvard University, 17 Oxford Street, Cambridge, Massachusetts 02138, USA
- Theory of Quantum and Complex Systems, Universiteit Antwerpen, B-2610 Antwerpen, Belgium
| |
Collapse
|
22
|
Dunjko V, Briegel HJ. Machine learning & artificial intelligence in the quantum domain: a review of recent progress. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:074001. [PMID: 29504942 DOI: 10.1088/1361-6633/aab406] [Citation(s) in RCA: 112] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Quantum information technologies, on the one hand, and intelligent learning systems, on the other, are both emergent technologies that are likely to have a transformative impact on our society in the future. The respective underlying fields of basic research-quantum information versus machine learning (ML) and artificial intelligence (AI)-have their own specific questions and challenges, which have hitherto been investigated largely independently. However, in a growing body of recent work, researchers have been probing the question of the extent to which these fields can indeed learn and benefit from each other. Quantum ML explores the interaction between quantum computing and ML, investigating how results and techniques from one field can be used to solve the problems of the other. Recently we have witnessed significant breakthroughs in both directions of influence. For instance, quantum computing is finding a vital application in providing speed-ups for ML problems, critical in our 'big data' world. Conversely, ML already permeates many cutting-edge technologies and may become instrumental in advanced quantum technologies. Aside from quantum speed-up in data analysis, or classical ML optimization used in quantum experiments, quantum enhancements have also been (theoretically) demonstrated for interactive learning tasks, highlighting the potential of quantum-enhanced learning agents. Finally, works exploring the use of AI for the very design of quantum experiments and for performing parts of genuine research autonomously, have reported their first successes. Beyond the topics of mutual enhancement-exploring what ML/AI can do for quantum physics and vice versa-researchers have also broached the fundamental issue of quantum generalizations of learning and AI concepts. This deals with questions of the very meaning of learning and intelligence in a world that is fully described by quantum mechanics. In this review, we describe the main ideas, recent developments and progress in a broad spectrum of research investigating ML and AI in the quantum domain.
Collapse
Affiliation(s)
- Vedran Dunjko
- Institute for Theoretical Physics, University of Innsbruck, Innsbruck 6020, Austria. Max Planck Institute of Quantum Optics, Garching 85748, Germany
| | | |
Collapse
|
23
|
Masuyama N, Loo CK, Seera M, Kubota N. Quantum-Inspired Multidirectional Associative Memory With a Self-Convergent Iterative Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1058-1068. [PMID: 28182559 DOI: 10.1109/tnnls.2017.2653114] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Quantum-inspired computing is an emerging research area, which has significantly improved the capabilities of conventional algorithms. In general, quantum-inspired hopfield associative memory (QHAM) has demonstrated quantum information processing in neural structures. This has resulted in an exponential increase in storage capacity while explaining the extensive memory, and it has the potential to illustrate the dynamics of neurons in the human brain when viewed from quantum mechanics perspective although the application of QHAM is limited as an autoassociation. We introduce a quantum-inspired multidirectional associative memory (QMAM) with a one-shot learning model, and QMAM with a self-convergent iterative learning model (IQMAM) based on QHAM in this paper. The self-convergent iterative learning enables the network to progressively develop a resonance state, from inputs to outputs. The simulation experiments demonstrate the advantages of QMAM and IQMAM, especially the stability to recall reliability.
Collapse
|
24
|
Zhang P, Shen H, Zhai H. Machine Learning Topological Invariants with Neural Networks. PHYSICAL REVIEW LETTERS 2018; 120:066401. [PMID: 29481246 DOI: 10.1103/physrevlett.120.066401] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 12/04/2017] [Indexed: 06/08/2023]
Abstract
In this Letter we supervisedly train neural networks to distinguish different topological phases in the context of topological band insulators. After training with Hamiltonians of one-dimensional insulators with chiral symmetry, the neural network can predict their topological winding numbers with nearly 100% accuracy, even for Hamiltonians with larger winding numbers that are not included in the training data. These results show a remarkable success that the neural network can capture the global and nonlinear topological features of quantum phases from local inputs. By opening up the neural network, we confirm that the network does learn the discrete version of the winding number formula. We also make a couple of remarks regarding the role of the symmetry and the opposite effect of regularization techniques when applying machine learning to physical systems.
Collapse
Affiliation(s)
- Pengfei Zhang
- Institute for Advanced Study, Tsinghua University, Beijing 100084, China
| | - Huitao Shen
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Hui Zhai
- Institute for Advanced Study, Tsinghua University, Beijing 100084, China
- Collaborative Innovation Center of Quantum Matter, Beijing 100084, China
| |
Collapse
|
25
|
Palittapongarnpim P, Wittek P, Zahedinejad E, Vedaie S, Sanders BC. Learning in quantum control: High-dimensional global optimization for noisy quantum dynamics. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.12.087] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
26
|
Wu C, Qi B, Chen C, Dong D. Robust Learning Control Design for Quantum Unitary Transformations. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4405-4417. [PMID: 27705875 DOI: 10.1109/tcyb.2016.2610979] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Robust control design for quantum unitary transformations has been recognized as a fundamental and challenging task in the development of quantum information processing due to unavoidable decoherence or operational errors in the experimental implementation of quantum operations. In this paper, we extend the systematic methodology of sampling-based learning control (SLC) approach with a gradient flow algorithm for the design of robust quantum unitary transformations. The SLC approach first uses a "training" process to find an optimal control strategy robust against certain ranges of uncertainties. Then a number of randomly selected samples are tested and the performance is evaluated according to their average fidelity. The approach is applied to three typical examples of robust quantum transformation problems including robust quantum transformations in a three-level quantum system, in a superconducting quantum circuit, and in a spin chain system. Numerical results demonstrate the effectiveness of the SLC approach and show its potential applications in various implementation of quantum unitary transformations.
Collapse
|
27
|
Iwata K. Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1865-1877. [PMID: 27187974 DOI: 10.1109/tnnls.2016.2558295] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.
Collapse
|
28
|
Chen C, Dong D, Qi B, Petersen IR, Rabitz H. Quantum Ensemble Classification: A Sampling-Based Learning Control Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1345-1359. [PMID: 28113872 DOI: 10.1109/tnnls.2016.2540719] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Collapse
|
29
|
Wei Q, Song R, Yan P. Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:444-458. [PMID: 26292346 DOI: 10.1109/tnnls.2015.2464080] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. Adaptive dynamic programming (ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.
Collapse
|
30
|
Application of emotion affected associative memory based on mood congruency effects for a humanoid. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-2102-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|