Listing 1 - 4 of 4 |
Sort by
|
Choose an application
Les forêts aléatoires sont une méthode d'apprentissage statistique qui fait aujourd'hui partie des outils centraux des statisticiens ou autres data scientists. Introduites par Leo Breiman en 2001, elles sont depuis intensément utilisées dans de nombreux domaines d'application (comme par exemple l'écologie, la prévision de la pollution ou encore la santé), du fait des très bonnes performances de l'algorithme en prédiction, mais aussi de leur généralité, n'imposant que très peu de restrictions sur la nature des données. En effet, elles sont adaptées aussi bien à des problèmes de classification supervisée qu'à des problèmes de régression. De plus, elles permettent de prendre en compte un mélange de variables explicatives qualitatives et quantitatives. Enfin, elles sont capables de traiter des données standard pour lesquelles le nombre d'observations est plus élevé que le nombre de variables, mais se comportent également très bien dans le cas de données de grande dimension où le nombre de variables est très important. Ce livre est une présentation statistique des forêts aléatoires, orientée vers les applications. Il s'adresse donc en premier lieu aux étudiants de filières comportant des enseignements de la statistique mais aussi bien entendu aux praticiens du domaine. Pour fixer les idées sur le plan pédagogique, un niveau de licence scientifique est tout à fait suffisant pour tirer profit des concepts, méthodes et outils introduits. Sur le plan informatique, les pré-requis sont modestes mais une initiation au langage R est utile pour s'approprier pleinement l'usage des forêts aléatoires.
Decision trees --- Mathematical statistics --- Data processing
Choose an application
This book offers an application-oriented guide to random forests: a statistical learning method extensively used in many fields of application, thanks to its excellent predictive performance, but also to its flexibility, which places few restrictions on the nature of the data used. Indeed, random forests can be adapted to both supervised classification problems and regression problems. In addition, they allow us to consider qualitative and quantitative explanatory variables together, without pre-processing. Moreover, they can be used to process standard data for which the number of observations is higher than the number of variables, while also performing very well in the high dimensional case, where the number of variables is quite large in comparison to the number of observations. Consequently, they are now among the preferred methods in the toolbox of statisticians and data scientists. The book is primarily intended for students in academic fields such as statistical education, but also for practitioners in statistics and machine learning. A scientific undergraduate degree is quite sufficient to take full advantage of the concepts, methods, and tools discussed. In terms of computer science skills, little background knowledge is required, though an introduction to the R language is recommended. Random forests are part of the family of tree-based methods; accordingly, after an introductory chapter, Chapter 2 presents CART trees. The next three chapters are devoted to random forests. They focus on their presentation (Chapter 3), on the variable importance tool (Chapter 4), and on the variable selection problem (Chapter 5), respectively. After discussing the concepts and methods, we illustrate their implementation on a running example. Then, various complements are provided before examining additional examples. Throughout the book, each result is given together with the code (in R) that can be used to reproduce it. Thus, the book offers readers essential information and concepts, together with examples and the software tools needed to analyse data using random forests. .
Statistics . --- Big data. --- Bioinformatics. --- Statistical Theory and Methods. --- Big Data. --- Statistics for Life Sciences, Medicine, Health Sciences. --- Statistics for Social Sciences, Humanities, Law. --- Bio-informatics --- Biological informatics --- Biology --- Information science --- Computational biology --- Systems biology --- Data sets, Large --- Large data sets --- Data sets --- Statistical analysis --- Statistical data --- Statistical methods --- Statistical science --- Mathematics --- Econometrics --- Data processing --- Mathematical statistics. --- Statistical inference --- Statistics, Mathematical --- Statistics --- Probabilities --- Sampling (Statistics) --- R (Computer program language). --- GNU-S (Computer program language) --- Domain-specific programming languages --- R (Computer program language)
Choose an application
This book offers an application-oriented guide to random forests: a statistical learning method extensively used in many fields of application, thanks to its excellent predictive performance, but also to its flexibility, which places few restrictions on the nature of the data used. Indeed, random forests can be adapted to both supervised classification problems and regression problems. In addition, they allow us to consider qualitative and quantitative explanatory variables together, without pre-processing. Moreover, they can be used to process standard data for which the number of observations is higher than the number of variables, while also performing very well in the high dimensional case, where the number of variables is quite large in comparison to the number of observations. Consequently, they are now among the preferred methods in the toolbox of statisticians and data scientists. The book is primarily intended for students in academic fields such as statistical education, but also for practitioners in statistics and machine learning. A scientific undergraduate degree is quite sufficient to take full advantage of the concepts, methods, and tools discussed. In terms of computer science skills, little background knowledge is required, though an introduction to the R language is recommended. Random forests are part of the family of tree-based methods; accordingly, after an introductory chapter, Chapter 2 presents CART trees. The next three chapters are devoted to random forests. They focus on their presentation (Chapter 3), on the variable importance tool (Chapter 4), and on the variable selection problem (Chapter 5), respectively. After discussing the concepts and methods, we illustrate their implementation on a running example. Then, various complements are provided before examining additional examples. Throughout the book, each result is given together with the code (in R) that can be used to reproduce it. Thus, the book offers readers essential information and concepts, together with examples and the software tools needed to analyse data using random forests. .
Statistical science --- Biomathematics. Biometry. Biostatistics --- Information systems --- bio-informatica --- statistiek --- gegevensanalyse
Choose an application
Statistical science --- Biomathematics. Biometry. Biostatistics --- Information systems --- bio-informatica --- statistiek --- gegevensanalyse
Listing 1 - 4 of 4 |
Sort by
|