Narrow your search

Library

KU Leuven (32)

UGent (2)

UAntwerpen (1)

Vlerick Business School (1)


Resource type

dissertation (33)

book (3)


Language

English (30)

Dutch (3)

Undetermined (3)


Year
From To Submit

2024 (1)

2022 (1)

2021 (1)

2020 (4)

2019 (6)

More...
Listing 1 - 10 of 36 << page
of 4
>>
Sort by

Book
New robust methods in multivariate statistics and actuarial sciences
Author:
Year: 2009 Publisher: Antwerpen : Universiteit Antwerpen, Faculteit Wetenschappen, Departement Wiskunde-Informatica,

Loading...
Export citation

Choose an application

Bookmark

Abstract


Book
New robust methods in multivariate statistics and actuarial sciences
Author:
Year: 2009 Publisher: Antwerpen Universiteit Antwerpen, Faculteit Wetenschappen, Departement Wiskunde-Informatica

Loading...
Export citation

Choose an application

Bookmark

Abstract


Dissertation
Synthesis of ZrO2 Nanoparticle Embedded Bi2Te3 Thermoelectric Composites

Loading...
Export citation

Choose an application

Bookmark

Abstract

Keywords


Dissertation
Team-based Leren en Zelfstudie : Met Toepassingen in de Statistiek
Authors: --- ---
Year: 2013 Publisher: Leuven : K.U. Leuven. Faculteit Wetenschappen

Loading...
Export citation

Choose an application

Bookmark

Abstract

Doelstellingen zijn belangrijk voor zowel studenten als docenten. Studenten gebruiken doelstellingen bij het zelfstandig bestuderen van de cursus en de voorbereiding van het examen, terwijl docenten deze doelstellingen gebruiken om de voorkennis van de studenten te testen en helpen bij de communicatie met andere docenten. Doelstellingen kunnen worden opgesteld met de think forward en/of de think backward methode, het beste resultaat is bekomen indien docenten gebruik maken van beide resultaten en vervolgens de doelstellingen bekomen met de twee methodes vergelijken met elkaar. Uit onze enquête is gebleken dat de meerderheid van de studenten deze doelstellingen gebruiken bij het zelfstandig doornemen van de cursus en dat alle studenten de doelstellingen gebruiken bij de voorbereiding voor het examen.Team-based leren (TBL) is een stategie waarbij het werken met samenwerkende groepen centraal staat. Het gebruik van samenwerkende groepen leidt tot betere prestaties, meer inspanningen, betere psychische gezondheid, enz. Bij het gebruik van team-based leren zijn volgende vier elementen van cruciaal belang: groepen, verantwoording, feedback en teamopdrachten. Uit de enquête die we hebben afgenomen blijkt dat bij TBL leerdoelen vooral belangrijk zijn bij het zelfstandige leren en de voorbereiding voor het examen van studenten. Uit onze eigen TBL-implementatie ondervonden we dat er een groot gebrek aan voorbereiding van de studenten was, daarom is het beter om TBL te laten meetellen voor de evaluatie en/of te implementeren na hoorcolleges. Tussen de resultaten op de eerste individuele test en de team test werd bovendien een stijging van 14% waargenomen. Er werd ook opgemerkt dat zowel zwakke als sterke studenten bijleren van het TBL-proces, dit werd aangetoond door na de team test opnieuw dezelfde individuele test af te nemen. Deze tweede individuele test is hierdoor belangrijk maar voor de studenten komt dit als saai over, daarom is het aangeraden om deze tweede test s...

Keywords


Dissertation
Statistical Tools for Anomaly Detection and Fraud Analytics

Loading...
Export citation

Choose an application

Bookmark

Abstract

Data is one of the most valuable resources businesses have today. Companies and institutions increasingly invest in tools and platforms to collect and store data about every event that is impacting their business, like their customers, transactions, products, and the market in which they operate. Although the costs for maintaining the huge, expanding volume of data are often considerable, companies are willing to make the investment as it serves their true ambition of being able to extract valuable information from their large quantity of data. As a result, companies increasingly rely on data-driven techniques for developing powerful predictive models to aid them in their decision process. These models, however, are often not well aligned with the core business objective of profit maximization or minimizing financial losses, in the sense that, the models fail to take into account the costs and benefits that are associated with their predictions. In this thesis, we propose new methods for developing models that incorporate costs and gains directly into the construction process of the model.The first method, called ProfTree (Höppner et al., 2018), builds a profit driven decision tree for predicting customer churn. The recently developed expected maximum profit measure for customer churn (EMPC) has been proposed in order to select the most profitable churn model (Verbraken et al., 2013). ProfTree integrates the EMPC metric directly into the model construction and uses an evolutionary algorithm for learning profit driven decision trees.The second and third method, called cslogit and csboost, are approaches for learning a model when the costs due to misclassification vary between instances. An instance-dependent threshold is derived, based on the instance-dependent cost matrix for transfer fraud detection, that allows for making the optimal cost-based decision for each transaction. The two novel classifiers, cslogit and csboost, are based on lasso-regularized logistic regression and gradient tree boosting, respectively, which directly minimize the proposed instance-dependent cost measure when learning a classification model.A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set, often less than 0.5%. Detecting fraud in such a highly imbalanced data set typically leads to predictions that favor the majority group, causing fraud to remain undetected. The third contribution in this thesis is an oversampling technique, called robROSE, that solves the problem of imbalanced data by creating synthetic samples that mimic the minority class while ignoring anomalies that could distort the detection algorithm and spoil the resulting analysis.Besides using methods for making data-driven decisions, businesses often take advantage of statistical techniques to detect anomalies in their data with the goal of discovering new insights. However, the mere detection of an anomalous case does not always answer all questions associated with that data point. In particular, once an outlier is detected, the scientific question why the case has been flagged as an outlier becomes of interest.In this thesis, we propose a fast and efficient method, called SPADIMO (Debruyne et al., 2019), to detect the variables that contribute most to an outlier's abnormal behavior. Thereby, the method helps to understand in which way an outlier lies out.The SPADIMO algorithm allows us to introduce the cellwise robust M regression estimator (Filzmoser et al., 2020) as the first linear regression estimator of its kind that intrinsically yields both a map of cellwise outliers consistent with the linear model, and a vector of regression coefficients that is robust against outliers. As a by-product, the method yields a weighted and imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model.All introduced algorithms are implemented in R and are included in their respective R package together with supporting functions and supplemented documentation on the usage of the algorithms. These R packages are publicly available on CRAN and at github.com/SebastiaanHoppner.

Keywords


Dissertation
Telematics Feature Extraction and Predictive Modelling
Authors: --- --- ---
Year: 2019 Publisher: Leuven KU Leuven. Faculteit Wetenschappen

Loading...
Export citation

Choose an application

Bookmark

Abstract

Nowadays, the premium for motor third party liability insurance is often based on a priori risk factors such as age and type of car. Once some history of a policyholder is known, this premium can be adapted. The most famous example is the a posteriori bonus malus system which takes into account the amount of reported claims in the past. However, taking into account individual driving behaviour leads in a natural way to a more fair price. Telematics data can include a lot of features about driver behaviour and is most often collected via black boxes or mobile applications. One way to extract useful information from this data is via so-called heatmaps. A lot of my work is therefore based on Wu ̈thrich (2016) and Gao & Wu ̈thrich (2017), who introduced these heatmaps for the first time. In essence, these heatmaps show where a certain driver is most active in the speed-acceleration plane. I implemented and coded everything myself in R after which I added possible extensions. In a first part, very elementary telematics data is analysed. Speed-acceleration heatmaps are constructed and several methods are discussed to reduce the dimension of these heatmaps. Machine Learning techniques such as neural networks are included. In a second part, a real dataset from Allianz is analysed. This dataset contains a lot of features, including a binary accident indicator. The same techniques as in part I are performed, after which a small amount of heatmap features can be used in a supervised prediction of the accidents. The main research question is whether the heatmap features can increase the predictive performance of accidents. From all discussed classification methods, it turns out that boosted trees have the best performance. Also, there is a clear evidence that the heatmap features have significant added value in the classification problem by increasing an appropriate measure of performance.

Keywords


Dissertation
Extreme Value Theory in Finance and Insurance
Authors: --- --- --- ---
Year: 2017 Publisher: Leuven KU Leuven. Faculty of Science

Loading...
Export citation

Choose an application

Bookmark

Abstract

When modelling high-dimensional data, dimension reduction techniques such as principal component analysis are often used. In the first part of this thesis we will focus on two drawbacks of classical PCA. First, interpretation of classical PCA is often challenging because most of the loadings are neither very small nor very large in absolute value. Second, classical PCA can be heavily distorted by outliers since it is based on the classical covariance matrix. In order to resolve both problems, we present a new PCA algorithm that is robust against outliers and yields sparse PCs, i.e. PCs with many zero loadings. The approach is based on the ROBPCA algorithm that generates robust but non-sparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. Stock market crashes such as Black Monday in 1987 and catastrophes such as earthquakes are examples of extreme events in finance and insurance, respectively. They are large events with a considerable impact that occur seldom. Extreme value theory (EVT) provides a theoretical framework to model extreme values such that e.g. risk measures can be estimated based on available data. In the second part of this PhD thesis we focus on applications of EVT that are of interest to finance and insurance.A Black Swan is an improbable event with massive consequences. We propose a way to investigate if the 2007-2008 financial crisis was a Black Swan event for a given bank based on weekly log-returns. This is done by comparing the tail behaviour of the negative log-returns before and after the crisis using techniques from extreme value methodology. We illustrate this approach with Barclays and Credit Suisse data, and then link the differences in tail risk behaviour between these banks with economic indicators.The earthquake engineering community, disaster management agencies and the insurance industry need models for earthquake magnitudes to predict possible damage by earthquakes. A crucial element in these models is the area-characteristic, maximum possible earthquake magnitude. The Gutenberg-Richter distribution, which is a (doubly) truncated exponential distribution, is widely used to model earthquake magnitudes. Recently, Aban et al. (2006) and Beirlant et al. (2016) discussed tail fitting for truncated Pareto-type distributions. However, as is the case for the Gutenberg-Richter distribution, in some applications the underlying distribution appears to have a lighter tail than the Pareto distribution. We generalise the classical peaks over threshold (POT) approach to allow for truncation effects. This enables a unified treatment of extreme value analysis for truncated heavy and light tails. We use a pseudo maximum likelihood approach to estimate the model parameters and consider extreme quantile estimation. The new approach is illustrated on examples from hydrology and geophysics. Moreover, we perform simulations to illustrate the potential of the method on truncated heavy and light tails.The new approach can then be used to estimate the maximum possible earthquake magnitude. We also look at two other EVT-based endpoint estimators and endpoint estimators that are used in the geophysical literature. To quantify uncertainty of the point estimates for the endpoint, upper confidence bounds are also considered. We apply the techniques to provide estimates, and upper confidence bounds, for the maximum possible earthquake magnitude in Groningen where earthquakes are induced by gas extraction. Furthermore, we compare the methods from extreme value theory and the geophysical literature through simulations.In risk analysis, a global fit that appropriately captures the body and the tail of the distribution of losses is essential. Modelling the whole range of the losses using a standard distribution is usually very hard and often impossible due to the specific characteristics of the body and the tail of the loss distribution. A possible solution is to combine two distributions in a splicing model: a light-tailed distribution for the body which covers light and moderate losses, and a heavy-tailed distribution for the tail to capture large losses. We propose a splicing model with the flexible mixed Erlang distribution for the body and a Pareto distribution for the tail. Motivated by examples in financial risk analysis, we extend our splicing approach to censored and/or truncated data. We illustrate the flexibility of this splicing model using practical examples from reinsurance.

Keywords


Dissertation
Robust Sparse Principal Component Analysis
Authors: --- ---
Year: 2013 Publisher: Leuven : K.U. Leuven. Faculteit Wetenschappen

Loading...
Export citation

Choose an application

Bookmark

Abstract

Als onderzoekers een fenomeen bestuderen verzamelen ze heel wat gegevens over een aantal individuen (of objecten) uit een bepaalde populatie. Per individu kan het aantal gegevens redelijk groot zijn, wat het nogal moeilijk maakt om die grondig te analyseren. Een voorbeeld hiervan is te vinden in de genetica. Onderzoekers bekijken dan informatie over duizenden genen van een aantal proefpersonen om te zien welke genen de grootste invloed hebben op een bepaald fenomeen. Statistici hebben verschillende technieken ontwikkeld die proberen om het aantal gegevens drastisch te verminderen zonder al te veel informatie te verliezen. Een van de meest gebruikte technieken voor die reductie is Principale Componenten Analyse (PCA). Hierbij gaat men combinaties van gegevens bekijken die zoveel mogelijk informatie bevatten. Deze combinaties worden dan gesorteerd op de hoeveelheid informatie die ze beschrijven. Om de interpretatie te vergemakkelijken, gaat men typisch een klein aantal van die combinaties gebruiken als ze voldoende informatie bevatten. Voor het geneticavoorbeeld betekent dit dat we de informatie van verschillende genen samenvoegen in een kunstmatig gen dat bestaat uit een aantal keer het ene gen, een aantal keer een ander gen, enzovoort. We gaan dan werken met een aantal van die kunstmatige genen i.p.v. alle echte genen.Wanneer we PCA toepassen zouden we graag hebben dat deze combinaties uit slechts een klein aantal van de oorspronkelijke gegevens bestaan. Dit maakt het eenvoudiger om te zien wat de nieuwe gegevens voorstellen. Omdat de combinaties volgens informatie gesorteerd worden, kunnen we zo ook gemakkelijker bepalen welke van de oorspronkelijke gegevens het meeste invloed hebben (en dus in de eerste paar combinaties zitten). Daarom zijn er verschillende spaarse PCA methodes ontwikkeld die ervoor zorgen dat de combinaties maar uit een klein aantal oorspronkelijke gegevens bestaan. Een algemeen probleem van PCA is dat het zeer gevoelig is aan uitschieters. Di...

Keywords


Dissertation
Digitizing and Reconstructing Kaplan Meier Survival Data
Authors: --- ---
Year: 2019 Publisher: Leuven KU Leuven. Faculteit Wetenschappen

Loading...
Export citation

Choose an application

Bookmark

Abstract

When raw data is unavailable in a published paper because of confidentiality, sometimes the most accurate approximation of that data can be found in the paper’s chart. The aim of the R-package "survThief" is to reconstruct otherwise confidential individual patient data from plotted Kaplan Meier survival (KM) stepwise curves as closely as possible. Survival Data records either death or censoring time. The latter records when a patient is not monitored anymore in the study, but no evidence of death is found (e.g., when the patient leaves the country and exits the study). The primary aim of this thesis is to address the two following functionalities that are not yet implemented in previous research: - DIGITIZE digitize the curve with one click, at the maximal level of granularity with a walking algorithm along the curve. - RECONSTRUCT recognize censoring mark positions with an image filter and reconstruct the survival data accordingly. If no censoring marks are available, the package implements the method in Guyot, Ades, Ouwens, & Welton (2012). The thesis starts with a review of the available methods that require a separate digitizing software and an R function. Then, the proposed package survThief and its essential functions are described, followed by validation on simulated KM curves: a significant improvement to the previous method is measured. At the end, a show-case meta-analysis on heart arch repair is presented: several charts found in different papers on the same subject are subjected to the KM digitization and reconstruction, before a meta-analysis is performed with a proposed framework. The outcome of the analysis favours traditional open heart surgery over newer hybrid variations. Some elements of contribution emerged from the development of the program: a walking algorithm to detect colored lines, the simulation of non well-behaved survival data, the modelling of meta survival data to fit censoring and treatment effect, given few available covariates.

Keywords


Dissertation
Dealing with outliers in the chain-ladder method
Authors: --- ---
Year: 2008 Publisher: Leuven K.U.Leuven. Faculteit Economie en Bedrijfswetenschappen

Loading...
Export citation

Choose an application

Bookmark

Abstract

In this thesis it is shown that the estimates of the chain-ladder method are very much influenced by outlying data. Therefore a robust approach is presented. Besides obtaining the correct reserve estimate, we will also focus on the estimation of the stand

Keywords

Listing 1 - 10 of 36 << page
of 4
>>
Sort by