>about
I'm Jacob, I am interested in researching gene regulation using machine learning methods, and I enjoy spending time in the wet-lab to create tailored datasets for this. I'm now a Postdoctoral scientist in the Velten Lab at the CRG in Barcelona.
I did my PhD in the Parts Lab at the Wellcome Sanger Institute in Hinxton / Cambridge University & the Stegle Lab. Before that I worked in the Hemberg Lab, at the Sanger as well (now at HMS & Brigham and Women's Hospital).
>research
>peer-reviewed
- J Hepkema, NK Lee, BJ Stewart, S Ruangroengkulrith, V Charoensawan, MR Clatworthy, M Hemberg. Predicting the impact of sequence motifs on gene regulation using single-cell data. Genome Biology 24, 189 (2023). https://doi.org/10.1186/s13059-023-03021-9
- In this paper, we developed a shallow convolutional neural network for the prediction of pseudobulked scRNA-seq data from the promoter sequence. The network performs simultaneous de novo inference of sequence motifs, as well as their influence on expression. The analysis tools include ways to summarise the found patterns per cell type category, as well as ways to relate the pseudobulk-specific influence scores to the expression of the relevant TFs in the pseudobulks.
- We additionally show that the network can be trained to predict pseudobulked scATAC-seq data from the peak sequence; what's nice is that if you have a multi-ome dataset, you can relate the influence scores back to the expression patterns of the relevant TFs in the same pseudobulks.
- Also see associated computational tool scover: https://github.com/jacobhepkema/scover.
More cool research on the way :-)
>pre-prints
I've had some part in the following projects:
- J Koeppel, P Murat, G Girling, EM Peets, M Gouley, V Rebernig, A Maheshwari, J Hepkema, J Weller, JH Johnkingsly Jebaraj, R Crawford, FG Liberante, L Parts. Resolution of a human super-enhancer by targeted genome randomisation. BioRxiv (2025). https://doi.org/10.1101/2025.01.14.632548
- For this paper, I benchmarked whether computational models such as ABC, rE2G, and Borzoi would be able to predict the importance of enhancers in a super-enhancer cluster (or the effect of their deletions) on the expression of OTX2. (see supplementary figure 14 in this PDF).
- K Tomberg, L Antunes, Y Pan, J Hepkema, DA Garyfallos, A Mahfouz, A Bradley. Intronization enhances expression of S-protein and other transgenes challenged by cryptic splicing. BioRxiv (2021). https://doi.org/10.1101/2021.09.15.46045
- For this paper, I wrote some processing scripts to identify short deletions (splicing) in long-read direct RNA-seq data using CIGAR string data wrangling.