Listing 1 - 10 of 67 | << page >> |
Sort by
|
Choose an application
The first concepts that led to what is now called deep learning are not new. They appeared in the fifties, when researchers tried to mimic the human brain behaviour. Since then, the history of what was called neural networks has experienced many ups and downs. Recent breakthroughs in their design combined with a bigger computing power and the availability of large data sets have made possible the use of deep learning algorithms to solve difficult problems in domains such as computer vision or language processing with a far better accuracy than with other existing methods. The purpose of this master thesis is to study the true potential of deep learning applications in the real world with a special focus on the impacts that they can have on the world of work. While there is still a lot of research ongoing and that many applications will not be ready before many years, it appears that companies can already use deep learning in many ways but that it does not always lead to the same result. Some companies are engaged in a race involving a lot of investment to be the leaders of a revolution. Others try to benefit from the known advantages of the technology to offer new services or alternatives to existing ones without having such a revolutionary impact on the world. Massive job losses are to be expected with the deployment of some of those applications and the others will definitely have an impact on the daily job of many workers. Nevertheless, in the short and medium term, this impact will most probably be to help them with new tools and to free them from the most repetitive tasks while leaving them the orchestration and the important decisions. It will therefore result to a better quality of services and products in many domains.
Choose an application
This master thesis has for objective to explore different techniques (architecture, pruning as an architecture search, knowledge distillation, quantization) to improve the inference time of convolutionnal neural networks performing image classification on an embedded device.
Choose an application
”Artificial intelligence is no match for natural stupidity” is the famous quote of the brilliant Albert Einstein. Born in the fifties, artificial intelligence was already a source of admiration. It has undergone a spectacular evolution in recent years and continues to develop following technological and computer advances. Nowadays, artificial intelligence plays a fundamental role in research and is ubiquitous in our daily lives. The detection of objects in a video stream with deep learning having become accessible and efficient thanks to the more and more performant processor, this principle will be adapted in order to be able to use it as a door opening sensor. This thesis was conducted in a sensor company called BEA. It is a leading manufacturer of sensing solutions for automatic doors systems. The company was founded in 1965 and its headquarters are located in Li`ege, Belgium. As artificial intelligence is making great strides in our technologies, the company has decided to do some experimentations with this process. The detection and classification of objects in a video stream through deep learning is the main subject of this work. The use of it in real time is a big challenge because it was necessary to handle the use of a neural network accelerator while ensuring that the model used was not too consequent. Despite this constraint, it was still necessary that the precision and the accuracy were sufficient for the application. In order to be independent of the position of the camera and its orientation for future applications, a ground projection algorithm has been implemented. Then, to improve the feasibility of the process, a Kalman filter will be integrated into the detected objects and a tracking will be assigned to it. Subsequently, open decision-making and people counting were implemented and all was adapted to the Raspberry Pi 3 B + embedded system. Finally, various evaluation tests were carried out to demonstrate the system’s fidelity and its promising potential.
Choose an application
A multi-modal neural network exploits information from different channels and in different terms (e.g., images, text, sounds, sensor measures) in the hope that the information carried by each mode is complementary, in order to improve the predictions the neural network. Nevertheless, in realistic situations, varying levels of perturbations can occur on the data of the modes, which may decrease the quality of the inference process. An additional difficulty is that these perturbations vary between the modes and on a per-sample basis. This work presents a solution to this problem. The three main contributions are described below. First, a novel attention module is designed, analysed and implemented. This attention module is constructed to help multi-modal networks handle modes with perturbations. Secondly, two new regularizers are developed to improve the generalization of the robustness gain on more intensive failing modes (relative to the training set). Lastly, a unified multi-modal attention module is presented, combining the main types of attention mechanisms in the deep learning literature with our module. We suggest that this unified module could be coupled with a prediction model to enable the latter face unexpected situations, and improve the extraction of the relevant information in the data.
Choose an application
In this thesis, we tackle the problem of restoring an audio frame given the preceding and subsequent one, e.g. audio inpainting, and extend our proposed solution to the prediction of an audio frame given the last one. We consider frames of 64 and 128 milliseconds. The proposed solution combines a signal processing pipeline with a Generative adversarial network (GAN). Using as input the absolute value of the STFT of the surrounding frames, the network is able to retrieve the STFT magnitude corresponding to the gap frame. By applying the Griffin-Lim Algorithm, we are then able to estimate also the STFT phase and finally through the inverse STFT to reconstruct the missing audio frame. We compare our method, considering as baseline a Linear predictive coefficient (LPC) technique. The proposed solution shows encouraging results with respect to the baseline both for inpainting and prediction. It outperforms the baseline in term of Signal to noise ratio (SNR) on the magnitude spectrum and performs equally well or better in term of the Objective difference grade (ODG) which is a measure used tu assess the perceived audio quality. Since the phase of the STFT can be only approximately reconstructed through the Griffin-Lim Algorithm, the baseline shows better performances in terms of audio SNR. We further show the model generalization ability, by training and testing on two different types of music datasets.
Choose an application
Today, for Billy and many accounting fiduciaries, invoice information is usually encoded manually by the accountant or by a low-performance software, so a lot of time is wasted on encoding and not on advice. Indeed, During the year 2021, Billy's accountants spent 37% of their time on encoding. Consequently, the recognition of fields in a semi-structured document of variable layout (i.e. invoice) is a growing need for accountants, and especially for Billy, in which the number of new customers increasing every month. Nonetheless, the text and image pre-training strategies of the Transformer architecture model have proven to be efficient in the field of document understanding. Thus, several OCR tools were tested, and the Azure OCR tool, which gave the higher performances, was selected to extract text from image invoice of Billy's customers in order to create datasets. Indeed, this allows the elaboration of four datasets partially annotated, named BTT, BTT Star, BTT QV, and BTT QV Date, which were created from scanned purchase, sales documents, and their accounting encoding in the accounting Horus software. Then, the fine-tuning of the pre-trained multi-modal models LayoutLMv2_BASE, LayoutLMv2_LARGE, and LayoutXLM_BASE has been done. In contrast to the previous architectural models of the LayoutLM family, these models include, in addition to the text and layout information, information that can be provided by the document image. Thanks to spatial-aware self-attention mechanisms integrated in the Transformer architecture model, it is able to interpret relations through different bounding boxes. According to Billy's accountants, invoice information is recognized by Horus in 70\% of the cases. During the experiments conducted in this Master thesis, it was shown that on token classification tasks, higher results were obtained for the the different datasets in terms of F1-score: BTT (0.9420), BTT Star (0.9553), BTT QV ( 0.9413) and BTT QV Date (0.9472). In addition, similar state-of-the-art results were obtained using the open source CORD dataset which gives an F1-score of 0.9354. Moreover, the impact of a pre-trained model on a dataset composed only of English documents LayoutXLM_BASE) was studied in comparison with a pre-trained model on a multi-lingual dataset (LayoutLMv2_BASE) to classify tokens on documents mostly in French. The results show that the pre-trained model does not have a great impact on the final result for this type of task: (BTT (0.9323 -> 0.9338), BTT Star (0.9354 -> 0.9468), BTT QV (0.9229 <- 0.8955), and BTT QV Date (0.9411 <- 0.9328). To conclude, since the results of the four datasets were close to each other, the dataset BTT Star produced the best results. This dataset has the largest number of labels and is the most widely distributed over the documents, leading to the hypothesis that a more widely distributed set of labels provides better results. Finally, to concretize this work, a web application was developed in parallel in order to use this tool in everyday life for both the accountants and the customers.
Choose an application
The objective is to start from existing techniques and test those techniques in the case of fast moving sports images, and to propose / implement / test extensions where needed
Interpolation --- Image --- Deep Learning --- Machine Learning --- Video --- Frame --- Convolution --- EVS --- Interpolation --- Image --- Deep Learning --- Machine Learning --- Video --- Frame --- Convolution --- EVS --- Ingénierie, informatique & technologie > Sciences informatiques
Choose an application
Being able to automatically personalize football content according to each viewer's preferences is the future of the broadcasting world. One of the steps is to be able to classify the current game event in a football video stream. During the game, the main camera is filming a panoramic view of the football game events. This master's thesis introduces a three stages framework where global features such as the field, the players and the lines are extracted from the video stream of the main camera and processed to compute second stage features that are used inside of a decision mechanism to predict the current game event. Computer vision and machine learning techniques such as deep learning networks and semantic segmentation are used to extract the main features. From these features, some second stage features such as the extraction of the main circle or the mean position of the players inside of the field are computed and handed out to a decision tree that classifies the current game event. A playout application developed in this master's thesis allows the user to visualize some of the features computed by the algorithms and the game event predicted. The performances of the features extraction found in this work are impressive, especially for the line extraction problem for which we have a global accuracy of 95% at the pixel level. While the interpretation of game events remains challenging and unsolved, we managed, with our framework, to achieve an accuracy of more than 90% for the classification between attack, defense and middle game.
Choose an application
Le but de ce travail de recherche est d'évaluer et d'améliorer des algorithmes de segmentation pour la détection et le contourage automatique de tumeurs au sein d'images à haute-résolution de tissus. Ce travail permettra à l'étudiant(eà d'approfondir la compréhension, l'utilisation et l'adaptation de méthodes à base d'ensembles d'arbres ou de réseaux profonds (deep learning) sur de grandes quantités d'images liées à des problématiques concrètes dans le domaine biomédical.
image segmentation --- machine learning --- u-net --- deep learning --- Ingénierie, informatique & technologie > Sciences informatiques
Choose an application
Variable and feature selection have become the focus of much research, especially in bioinformatics where there are many applications. Machine learning is a powerful tool to select features, however not all machine learning algorithms are on an equal footing when it comes to feature selection. Indeed, many methods have been proposed to carry out feature selection with random forests, which makes them the current go-to model in bioinformatics. On the other hand, thanks to the so-called deep learning, neural networks have benefited a huge interest resurgence in the past few years. However neural networks are blackbox models and very few attempts have been made in order to analyse the underlying process. Indeed, quite a few articles can be found about feature extraction with neural networks (for which the underlying inputs-outputs process does not need to be understood), while very few tackle feature selection. In this document, we propose new algorithms in order to carry out feature selection with deep neural networks. To assess our results, we generate regression and classification problems which allow us to compare each algorithm on multiple fronts: performances, computation time and constraints. The results obtained are really promising since we manage to achieve our goal by surpassing (or equaling) random forests performances in every case (which was set to be our “state-of-the-art” comparison). Due to the promising results obtained on artificial datasets we also tackle the DREAM4 challenge. Due to the very small number of samples available in the datasets, this challenge is supposedly an ill-suited problem for neural networks. We were nevertheless able to achieve near state of the art results. Finally, extensions are given for most of our methods. Indeed, the algorithms discussed are very modulable and can be adapted regarding the problem faced. For example, we explain how one of our algorithm can be adapted in order to prune neural networks without losing accuracy.
Listing 1 - 10 of 67 | << page >> |
Sort by
|