Hess

Kathrin Hess – (PI) – Laboratory for topology and neuroscience –  PhD project 5

Topology is the field of mathematics devoted to the study of shape. Topological data analysis (TDA) provides sophisticated, topology-based methods for dimension reduction, feature extraction, and pattern recognition in large datasets[1],[2]. Global gene expression data samples, for instance, can be considered as point clouds in a high-dimensional space. Topological data analysis overcomes the shortcomings of existing statistical methods in detecting biologically important gene expression changes in highly variable biological samples[3],[4]. Topological data analysis should enable us to detect genes that play an important role in the development of breast and skin cancer and that are hormonally controlled, based on their cyclic expression during the menstrual cycle, as a function of individual variations and genetic or hormonal / pharmacological manipulation of cells.  We offer one PhD project, the primary aim of which will be to realize this goal. To ensure that insights gained from breast studies benefit the projects on skin cancers and vice versa, we will develop TDA methods for skin cancer development as well.

Project 5 – Detection of cyclic changes in gene expression by topological data analysis. 

Hypothesis: Topological data analysis allows us to detect genes that play an important role in the development of breast and skin cancer and that are hormonally controlled, based on their cyclic expression during the menstrual cycle, as a function of individual variations and genetic or hormonal / pharmacological manipulation of cells.

Background and Significance: Topology is the field of mathematics devoted to the study of shape. Topological data analysis (TDA) is used to reduce dimensions and to recognize patterns[1],[2]. Global gene expression data samples for instance are considered as point clouds in a high-dimensional space. Topological data analysis overcomes the shortcomings of existing statistical methods in detecting biologically important gene expression changes in highly variable biological samples[3],[4]. The Brisken lab has a Biobank comprising >340 mammoplasty specimens together with blood samples collected at the time of surgery and a questionnaire with information on the patients’ reproductive status. The Dotto lab has global transcriptomic and epigenetic analysis of skin-derived primary keratinocytes and dermal fibroblasts from >100 individuals of different ancestries, as well as profiles of multiple strains plus/minus genetic suppression of AR expression or hormonal / pharmacological modulation of AR activity.

Objectives: (1) Develop a topological data analysis tool that detects cyclic behaviour in time series of point cloud data; (2) Develop software implementing these methods; (3) Apply the software to RNAseq data from >300 normal breast tissue specimens from premenopausal women in the Biobank of the Brisken laboratory and identify target genes of ER, PR, and AR signaling in the human breast based on their cyclic expression during menstrual cycle; (4) Apply the software to RNAseq data from primary keratinocytes and dermal fibroblasts from > 100 individuals of different ancestries versus multiple strains of the same cells at various times of increased and suppressed AR activity.

Related readings

[1] G. Carlsson. Topology and dataBulletin of the American Mathematical Society 46 255-308 (2009).

[2] Y. Lee, S. et al. Quantifying similarity of pore-geometry in nanoporous materials, Nat Commun. 2017, DOI: 10.1038/ncomms15396.

[3] M. Nicolau, et al. Topology based data analysis identifies a subgroup of BCs with a unique mutational profile and excellent survival. PNAS 108, 7265 (2011).

[4] R. Jeitziner et al. Two-Tier Mapper:  a user-independent clustering method for global gene expression analysis based on topology, Bioinformatics. (2019).ip