TY - THES ID - 135043844 TI - Reward shaping to boost the performance of deep reinforcement learning for the lost sales inventory problem AU - Borgmans, Floor AU - Boute, Robert AU - De Moor, Bram AU - KU Leuven. Faculteit Economie en Bedrijfswetenschappen. Opleiding Master Handelsingenieur PY - 2021 PB - Leuven KU Leuven. Faculteit Economie en Bedrijfswetenschappen DB - UniCat UR - https://www.unicat.be/uniCat?func=search&query=sysid:135043844 AB - This thesis studies a periodic review, single item lost sales model with positive lead times. A deep Q-network (DQN), a deep reinforcement learning (DRL) algorithm, is constructed and domain knowledge is added with potential-based reward shaping to boost its performance. The domain knowledge is provided by existing heuristics, namely the base-stock, restricted base-stock and constant-order policy. The performance of the DQN algorithm without domain knowledge is evaluated against DQN algorithms with added domain knowledge, the optimal policy and the following heuristics: the constant-order, base-stock and restricted base-stock policy. When comparing the DQN algorithm without added knowledge to the one with reward shaping, using the base-stock or restricted base-stock policy as a teacher improves the performance of the algorithm in all six experiments. In one experiment using a restricted base-stock as a teacher improves the optimality gap by as much as 21,37%. Looking at the performance of the DQN algorithm with reward shaping and the heuristics policies themselves, in four out of six experiments, the DQN-agent with a base-stock policy as a teacher outperforms the base-stock policy. In all experiments, using a constant order policy as a teacher results in better performance than the constant order policy. These results demonstrate the potential of reward shaping to boost the performance of DRL in a lost sales inventory management environment. ER -