Listing 1 - 1 of 1 |
Sort by
|
Choose an application
Billions of dollars are lost every year due to fraudulent credit card transactions. The design of efficient machine learning techniques could provide an answer for detecting fraud and in this case reducing these losses. Each time a credit card is used, all kind of transactional data composed of different attributes (e.g. credit card identifier, transaction date, amount of the transaction, country where the transaction took place) is recorded. Automatic systems for detecting fraud are nowadays essential since most of the time it is not possible for human analysts to notice fraudulent behaviour due to a large number of transactions and variables available. The ultimate goal of fraud detection algorithms is to label new transactions as legitimate or as fraudulent ones. For this purpose, two different techniques could be used: supervised techniques that make use of labelled transactions and unsupervised techniques which do not use any labelling. For supervised techniques, we assume that reliable class labels of past transactions are available. These labelled observations are used for predicting the class labels of new transactions. Unsupervised techniques, on the other hand, make no use of the classes of transactions. Here one tries to find fraudulent behaviour by grouping transactions together or by finding rare observations which do not correspond with the usual behaviour of the majority of the data. The latter is also called anomaly detection. Both the supervised and unsupervised method rely strongly on a clear notion of similarity. The goal is namely to learn from instances based on how similar they are. The unsupervised learning technique groups similar objects together while the supervised technique needs nearby objects to decide on the label of the new instances. Many machine learning systems require that the input attributes are numerical to measure the similarity in an appropriate way. However, often, these techniques ignore or do not handle categorical attributes properly. The aim of this thesis is to use heterogeneous distance functions in various machine learning techniques for credit card fraud detection. These heterogeneous distance functions are designed to handle applications with numerical, categorical or both numerical and categorical attributes. Adding nominal attributes and using the heterogeneous distance function are shown to be useful by leading to improved versions of the existing (numerical) techniques.
Listing 1 - 1 of 1 |
Sort by
|