Classical active flow control (AFC) methods based on solving the Navier–Stokes equations are laborious and computationally intensive even with the use of reduced-order models. Data-driven methods offer a promising alternative for AFC, and they have been applied successfully to reduce the drag of two-dimensional bluff bodies, such as a circular cylinder, using deep reinforcement-learning (DRL) paradigms. However, due to the onset of weak turbulence in the wake, the standard DRL method tends to result in large fluctuations in the unsteady forces acting on the cylinder as the Reynolds number increases. In this study, a Markov decision process (MDP) with time delays is introduced to model and quantify the action delays in the environment in a DRL process due to the time difference between control actuation and flow response along with the use of a first-order autoregressive policy (ARP). This hybrid DRL method is applied to control the vortex-shedding process from a two-dimensional circular cylinder using four synthetic jet actuators at a freestream Reynolds number of 400. This method has yielded a stable and coherent control, which results in a steadier and more elongated vortex formation zone behind the cylinder, hence, a much weaker vortex-shedding process and less fluctuating lift and drag forces. Compared to the standard DRL method, this method utilizes the historical samples without additional sampling in training, and it is capable of reducing the magnitude of drag and lift fluctuations by approximately 90% while achieving a similar level of drag reduction in the deterministic control at the same actuation frequency. This study demonstrates the necessity of including a physics-informed delay and regressive nature in the MDP and the benefits of introducing ARPs to achieve a robust and temporal-coherent control of unsteady forces in active flow control.