TY - THES ID - 146384115 TI - Master thesis : Large-scale gene regulatory network inference from single-cell RNA seq data AU - Paquot, Sarah AU - Geurts, Pierre AU - Wehenkel, Louis AU - Meyer, Patrick AU - Huynh-Thu, Vân Anh PY - 2018 PB - Liège Université de Liège (ULiège) DB - UniCat KW - machine learning KW - XGBoost KW - GRN inference KW - clustering KW - single-cell KW - Ingénierie, informatique & technologie > Sciences informatiques UR - https://www.unicat.be/uniCat?func=search&query=sysid:146384115 AB - Uncovering and modeling gene regulatory networks (GRNs) is one of the long-standing challenges in systems biology. This uncovering implies to computationally predict, from given gene expression data, direct regulatory interactions between transcription factors and their target genes. All those predicted direct regulatory interactions form a GRN. Several techniques have been tested to address this problem. Among those, GENIE3 is one of the top performing methods. However, it has a big disadvantage, which is its slowness. Using traditional sequencing methods, only the mean of the gene expression values over a mix of millions of cells could be obtained. The emergence of new techniques allows the creation of single-cell RNA-seq data, which contain values corresponding to the expression level in every single cell. It raises two main challenges. First, a computational challenge, as it creates much bigger expression matrices than traditional methods. Second, we can now see different cell types in the data, which we were not able to see before, as we only had means of expression values from different cells. One strategy is to cluster this data so that each cluster corresponds to a cell type contained in the data. Our contribution in this context is first to propose a variant of GENIE3 that uses boosting in order to make it faster and applicable to single-cell datasets. The results obtained are very promising, as this transforms GENIE3 from a very slow method to a very fast one, while having the same - and sometimes better - performance. The boosting method has however the drawback of depending on many parameters. Our second contribution is to propose three regulatory network-based methods for cell clustering from single-cell data. Results obtained were not as good as expected but call for more investigations in this way. Better results could probably be obtained by further analyzing some parameters. ER -