Shi Pan

Research Fellow
Email
s.panobfuscate@ucl.ac.uk

I joined Dr. Secrier's laboratory at University College London (UCL) in 2021 as a research fellow. My current research interests include multimodal foundation models, large language models for multiomics data, computer vision, and their applications in computational pathology tasks. My research vision is to develop more effective AI tools that provide insights distinct from those of human researchers, thereby enhancing our understanding and ultimately enabling us to modify specific biological processes for our purposes.

I received my B.Sc. degree in Computer Science from Zhengzhou University (Zhengzhou, China) in 2013, the M.Sc. degree in Advanced Computer Science from the University of Manchester (Manchester, UK) in 2014, and the PhD degree from the University of Kent (Canterbury, UK) in 2019. Following my PhD, I worked on Learned Image Compression as a Research Associate at Imperial College London and as a Research Scientist at DeepRender.

My recent work encompasses two main directions: analysing and predicting molecular level information from histopathology whole slide images using AI models, and leveraging large language models (LLMs) to process and learn from genomic and transcriptomic level information to understand specific biological processes.

In the realm of computational pathology, my focus is on identifying specific gene expression levels or certain molecular level biomarkers from H&E slides. To facilitate this, I developed HistoMIL, a convenient toolkit designed to accelerate the preprocessing of H&E slides and the training of related models. HistoMIL is also easily scalable for training and using multiple instance learning algorithms on a larger scale. Additionally, I employ graph neural networks (GNNs) to study and model the tumour microenvironment at the cellular level.

For genomic and transcriptomic level information, one of my projects involves using GNNs to analyse the relationship between specific gene lists and DNA damage markers, reordering the original gene list and generating a shorter list with equivalent performance. Recently, the incorporation of language models has enhanced my models' capability to understand and model specific biological processes at the genomic and transcriptomic scales