Listing 1 - 5 of 5 |
Sort by
|
Choose an application
Gene expression profile comparison is a rapidly evolving topic which holds great potential for drug research and science on its whole. Protein interaction networks have already proven to be quite useful in different research areas. Here it is proposed to combine the information embedded in protein interaction networks with gene expression profile comparison. For this purpose a dataset was constructed consisting of 8 drug and 45 disease gene expression profiles acquired with three different Affymetrix Human Genome platforms. Gene expression profile comparison based on different interaction networks was performed and different implementations were considered in order to discover the best conditions for implementing protein interaction networks. For gene expression profile comparison the Connectivity Map was taken as a guiding example. Protein interaction network information was infused in the gene expression profiles trough matrix multiplication. From the different methodologies used, incorporation of the STRING network provided the best results with an improvement of 20% compared to the original approach. This suggests that protein interaction information indeed can be a value asset in gene expression profile comparison. Furthermore we have shown that the cosine similarity, an introduced correlation measure, performs equally well in comparison to the original approaches such as Pearson correlation.
Choose an application
This thesis project is focused on differential gene expression in two human embryonic stem cell lines: H1 and H9. The investigation is directed to genes related to the Activin/Nodal signaling pathway. The Affymetrix microarray data from undifferentiated H1 and H9 cells were analyzed using different tools, including R software. The approach was both genome-wide and related to the Activin/Nodal pathway in order to understand if the genes are differentially expressed between H1 and H9 cells. The analyzed data reveal no significant enrichment of differentially expressed genes related to the specific pathway. Since high variability in expression of both Activin A and Nodal among the different samples were noticed during these analysis, cell culture methods were investigated too. The analysis shows that cells on a layer of feeder cells differ from feeder-free cultures.
Choose an application
The viral genome is subjected simultaneously to positive selection pressure, asserted by the host immune system as well as treatment, and to functional and structural constraints that create a preference for site conservation. This project aims to map the influences of these conflicting forces along the hepatitis C viral (HCV) genome, the understanding of which can aid in future drug design. Nucleotide variability at each site will be calculated, RNA structures predicted, and existing information on protein structure gathered; correlations between these factors will be noted and conclusions drawn.
Choose an application
The Next-generation sequencing (NGS) technologies have brought researchers conveniences of using full genome sequencing to answer questions. However, the sequencing data flood needs high computation power to tackle with. A lot of researchers are using scale-out network to simulate super computer. There have been many use cases of utilizing Apache Hadoop to coordinate distributed computation or HBase to act as a storage platform. But except sequencing reads assembly, it is very rare that people use scale-out network to handle gene variation data from NGS. In our study, we proposed an integration of Apache Hadoop, HBase and Hive to efficiently analyze NGS output such as VCF files, a commonly used gene variation data file format. We also developed an application named 'H3 VCF' for researchers with intermediate level of IT skills to conveniently use the integration. We tested several times to compare the performances between our proposal and traditional solution for VCF files. Our tests show that our proposed integration in cluster mode performs much better than traditional solution when file is extremely large. The performance is also acceptable when we have several big and small files next to each other. Therefore we conclude that our proposed integration could be a good alternative solution for VCF file management and the newly developed application, H3 VCF, can help researchers to handle the integration easily.
Choose an application
Human genetic variation is both obvious and hidden. Humans share a large proportion of this genetic information, but the differences are the more interesting. Our differences in physical appearance are straightforward evidence of the genetic diversity in the human species. But genetic diversity goes deeper than the color of our eyes or the length of our toes. DNA molecules encode all of our unique genetic information, part of which is associated with physical traits and part of which that is not. Human genetic variation is structured in space as a result of the cumulative behavior of various processes: natural selection, gene migration and random chance. One of the key sources of the structure of genetic variation is that gene migration is spatially limited. We do not choose are partners entirely random over the entire population, but restrict our choice to the people we meet. As a result, people whose origins are geographically distant are genetically less similar. The spatially restricted dispersal of genetic information does not happen abrupt at a political border or at the boundary of some other classification, but is gradual. Consequently, gradual patterns in genetic variation can be observed, especially within continents in the absence of major barriers to gene migration. Within the European population, it is quite agreed upon that genetic variation is structured as a gradient. Traditional principal component analysis has been the standard method to summarize a large number of genetic variants into just a few main dimensions and describing and visualizing this visual pattern. It is a conceptually straightforward method that does not require any modelling of the population genetic processes. However, it is not without problems. The main distorting factor is what it attempts to represent: that the genetic information observed in one individual is not independent of the genetic information observed in other individuals. Or: correlation through space. Two possibly valuable methods that explicitly take spatial information into account are spatial principal component analysis and spatial factor analysis. Although both methods have their own drawbacks, it is clear that they may offer interesting new perspectives. Regardless of the method, the results and derived conclusions may be strongly affected by choices in the study design, particularly in sampling design decisions and data preprocessing. A strong correlation between genetics and geography has been confirmed on the basis of traditional principal component analysis and Procrustes analysis. The first axis of genetic variation is oriented northwest-southeast but is strongly correlated with latitude. The second axis of genetic variation is oriented northeast-southwest and is strongly correlated with longitude. This conclusion is robust at least to one of the data preprocessing criteria: the level of association allowed between genetic variants located close to each other on the genome.
Listing 1 - 5 of 5 |
Sort by
|