Simulated Annealing (SA) is widely used in FPGA placement either as a standalone algorithm or a refinement step after initial analytical placement. SA-based placers have been shown to achieve high-quality results at the cost of long runtimes. In this paper, we propose an improvement of SA-based placement using directed moves and Reinforcement Learning (RL). The proposed directed moves explore the solution space more efficiently than traditional random moves, and target both wirelength and timing optimizations. The RL agent further improves efficiency by dynamically selecting the most effective move types as optimization progresses. Taken together, these enhancements allow more efficient exploration of the large solution space than traditional annealing. Experimental results on the VTR benchmark suite show that our technique outperforms the widely-used VTR 8 placer across a wide range of CPU-quality trade-off points, achieving 5-11% reduced wirelength and comparable or shorter critical path delays in a given runtime, or 33-50% shorter runtimes for a target quality point.