Reinforcement Learning for Optimal Quantum State Discrimination
Model-free calibration of quantum receivers through trial and error.
This repository provides a comprehensive framework for implementing reinforcement learning (RL) techniques to achieve optimal quantum state discrimination over unknown channels. The codebase enables real-time calibration and optimization of quantum devices—particularly coherent-state receivers—without requiring prior knowledge of system parameters.
Quantum devices are particularly challenging to operate: their functionality relies on precisely tuning parameters, yet environmental conditions constantly shift, causing detuning. Traditional approaches require detailed modeling of environmental behavior, which is often computationally unaffordable, while direct parameter measurements introduce extra noise.
We frame quantum receiver calibration as a reinforcement learning problem, where an agent learns optimal discrimination strategies through trial and error—without any prior knowledge of experimental details.
Q-Learning Framework
- ε-greedy exploration with configurable parameters
- Adaptive learning rates (1/N decay or fixed)
- Real-time Q-value updates for optimal policy discovery
- Support for change-point detection scenarios
Quantum Physics Engine
- Born rule probability calculations
- Coherent state displacement operations
- Kennedy receiver simulation
- Variable-loss optical channel modeling
Analysis & Visualization
- Learning curve generation
- Q-function landscape plotting
- Noise sensitivity analysis
- Comparative performance metrics
Dynamic Calibration
- Model-free control loops
- Continuous parameter re-calibration
- Adaptation to environmental drift
- Optimal β displacement learning
qrec/
├── qrec/ # Core module
│ └── utils.py # Q-learning utilities, physics functions
├── experiments/ # Experimental configurations
│ ├── 0/ # Full exploration (ε=1.0)
│ ├── 1/ # Low exploration, 1/N learning rate
│ ├── 2/ # Change-point: α = 1.5 → 0.25
│ ├── 3/ # Fixed lr = 0.005
│ ├── 4/ # Fixed lr = 0.05
│ ├── 5/ # Change-point with fixed lr (best)
│ └── 6/ # Noise inspection
├── paper/ # Publication figures
├── basic_inspection.py # Error landscape visualization
├── index_experiments # Experiment documentation
└── requirements.txt # Dependencies
Related Repository: matibilkis/marek
marek/
├── main_programs/ # Core RL algorithms
├── dynamic_programming/ # DP optimization modules
├── bounds_optimals_and_limits/ # Theoretical bounds computation
├── plotting_programs/ # Visualization tools
├── appendix_A/ # Supplementary materials
├── tests/ # Validation suite
├── agent.py # RL agent implementation
├── environment.py # Quantum channel simulation
├── training.py # Training loop
└── basics.py # Core physics functions
The goal is to discriminate between coherent states |±α⟩ using a Kennedy-like receiver with displacement β:
P(n|α) = exp(-|α|²) · δₙ₀ + (1 - exp(-|α|²)) · δₙ₁
The success probability for a given displacement β:
Pₛ(β) = ½ Σₙ max_{s∈{-1,+1}} P(n | sα + β)
Action-value updates:
Q₁(β, n, g) ← Q₁(β, n, g) + α · [r - Q₁(β, n, g)]
Q₀(β) ← Q₀(β) + α · [max_g Q₁(β, n, g) - Q₀(β)]
Policy (ε-greedy):
π(β) = { random with probability ε
{ argmax Q₀ with probability 1-ε
git clone https://github.com/matibilkis/qrec.git
cd qrec
pip install -r requirements.txtfrom qrec.utils import *
# Initialize Q-tables with 25 discretized β values
betas_grid, [q0, q1, n0, n1] = define_q(nbetas=25)
# Find model-aware optimal (for comparison)
mmin, p_star, beta_star = model_aware_optimal(betas_grid, alpha=0.4)
# Run Q-learning episode
hidden_phase = np.random.choice([0, 1]) # Nature chooses ±α
indb, beta = ep_greedy(q0, betas_grid, ep=0.01) # Agent chooses β
n = give_outcome(hidden_phase, beta, alpha=0.4) # Photon detection
indg, guess = ep_greedy(q1[indb, n, :], [0, 1], ep=0.01) # Agent guesses
reward = give_reward(guess, hidden_phase) # Success/failurecd experiments/5
python change_point.pyThe RL agent successfully learns near-optimal receiver configurations:
| Experiment | Configuration | Key Finding |
|---|---|---|
| 0 | ε = 1.0 (full exploration) | Baseline uniform sampling |
| 1 | ε = 0.01, lr = 1/N | Convergent but slow adaptation |
| 2 | Change-point, lr = 1/N | Cannot adapt to α changes |
| 3 | ε = 0.01, lr = 0.005 | Stable but slow learning |
| 4 | ε = 0.01, lr = 0.05 | Good balance |
| 5 | Change-point, lr = 0.05 | Successful re-calibration |
Our agents in action: learning curves for sensor calibration
Key Insight: Fixed learning rates enable adaptation to changing channel conditions, while decaying rates (1/N) lock the agent to initial configurations.
| Function | Description |
|---|---|
p(alpha, n) |
Born rule probability P(n|α) |
Perr(beta, alpha) |
Error probability for displacement β |
give_outcome(phase, beta, alpha) |
Sample photon detection outcome |
model_aware_optimal(betas, alpha) |
Compute theoretical optimum |
| Function | Description |
|---|---|
define_q(nbetas) |
Initialize Q-tables and counters |
ep_greedy(qvals, actions, ep) |
ε-greedy action selection |
greedy(arr) |
Greedy selection (ties broken randomly) |
give_reward(guess, phase) |
Binary reward function |
Psq(q0, q1, betas, alpha) |
Evaluate current policy |
numpy
matplotlib
scipy
tqdm
numba
Contributions are welcome. Please open an issue first to discuss proposed changes.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License. See LICENSE for details.
This framework has enabled three peer-reviewed publications in quantum machine learning:
|
T. Crosta, L. Rebón, F. Vilariño, J.M. Matera, M. Bilkis arXiv:2404.10726 (2024) Model-free RL framework for continuous recalibration of quantum device parameters. Demonstrated on Kennedy receiver-based long-distance quantum communication.
|
M. Bilkis, M. Fraas, A. Acín, G. Sentís arXiv:2203.09807 (2022) Calibration of quantum receivers for optical coherent states over channels with variable transmissivity using reinforcement learning.
|
M. Bilkis, M. Rosati, R. Muñoz-Tapia, J. Calsamiglia Phys. Rev. Research 2, 033295 (2020) Foundational work: RL agents learn near-optimal coherent-state receivers through real-time trial and error experimentation.
|
If you use this code in your research, please cite:
@article{crosta2024automatic,
title={Automatic re-calibration of quantum devices by reinforcement learning},
author={Crosta, T. and Reb{\'o}n, L. and Vilari{\~n}o, F. and Matera, J. M. and Bilkis, M.},
journal={arXiv preprint arXiv:2404.10726},
year={2024}
}
@article{bilkis2022reinforcement,
title={Reinforcement-learning calibration of coherent-state receivers on variable-loss optical channels},
author={Bilkis, M. and Fraas, M. and Ac{\'i}n, A. and Sent{\'i}s, G.},
journal={arXiv preprint arXiv:2203.09807},
year={2022}
}
@article{bilkis2020realtime,
title={Real-time calibration of coherent-state receivers: Learning by trial and error},
author={Bilkis, M. and Rosati, M. and Mu{\~n}oz-Tapia, R. and Calsamiglia, J.},
journal={Physical Review Research},
volume={2},
pages={033295},
year={2020}
}
