Listing 1 - 2 of 2 |
Sort by
|
Choose an application
This thesis investigates the learnability of Markov Decision Processes (MDPs) with added complexity along multiple dimensions. A theoretical investigation into the implications of the complexity dimensions is performed and it is discussed how existing algorithms deal with them. 3 complexity dimensions are defined, being temporal dependence, partial observability and dynamism. The temporal dependence and dynamism dimensions are closely related, but are distinguished because there is a subtle difference regarding the learnability of MDPs with these complexities. In Reinforcement Learning literature, frameworks exist for solving Decision Processes with non-Markovian Rewards (NMRDPs), based on the use of a Mealy Reward Machine, and for Partially Observable MDPs, based on the use of belief states. Both methods transform the original problem to a process that behaves in a Markov way, allowing traditional model-solving techniques to be used. The limitations of these techniques for dynamic PO-NMRDPs, which combine the three dimensions of complexity, are investigated and it is proposed how they can be extended to fit dynamic PO-NMRDPs. Consequently, two methods are presented to handle PO-NMRDPs, one based on an MRM and the other on the use of belief states. An important distinction between the two is made in regard to policies for these methods, as the former is deterministic and the latter results in a stochastic action plan. Following the theoretical discussion regarding complex MDPs, a practical framework is proposed that simultaneously learns an MRM to represent the reward structure and derives an optimal deterministic policy to maximize rewards in dynamic PO-NMRDPs. This framework is an extension of the ARM framework, which has been developed to deal with standard NMRDPs. The issues of learning dynamic PO-NMRDPs using the standard ARM framework are discussed and it is explained diligently how these issues are resolved. The learning phase is based on Angluin's L* algorithm, which constructs an MRM and combines this MRM with the original state space to create a synchronized product, which behaves in a Markov manner. Traditional model-checking techniques are employed on this synchronized product in order to derive an optimal deterministic, reactive policy.
Choose an application
This paper examines the antecedents and performance consequences of downsizing in a European-wide context, from a Belgian perspective. Belgium is characterized by relatively high labor costs, compared to other European markets. How these costs affect the likelihood to downsize is tested, using data from all over Europe between 2010 and 2018. The results provide evidence for a positive relationship between labor costs,at the country level, and the likelihood of firms to downsize. However, the capital intensity of a firm in high-wage countries is found to be a mitigating factor on this positive effect. Furthermore, the capital intensity of a firm, in all of Europe, is negatively related to its propensity to downsize, indicating some complementarity, which is higher in high-wage countries, between labor and capital in capital-intensive environments. No effect of downsizing on financial performance is found in Belgium, while downsizing negatively affect financial performance in other EU countries. However, contrary to these results, firms do report improved productivity following employee layoffs in Belgium and the rest of the EU.
Listing 1 - 2 of 2 |
Sort by
|