Research on Identification of Potential Bifunctional Genes

In my major project, which also was my thesis project, I delved into the captivating realm of identifying potential bifunctional genes in human colorectal cancer cells, by illuminating the evolving landscape of gene functionality.

The paradigmatic notion in molecular biology posits that protein coding genes transcribe messenger RNAs (mRNAs) that serve as blueprints for protein synthesis. Conversely, non-coding genes are widely believed to only produce non-coding RNAs, which exhibit diverse functional roles with trivial coding potential. However, accumulated evidence suggests that the distinction between protein-coding and non-coding genes is loosely constrained.

In the case of TP53, a renowned tumor suppressor gene, one study unveiled TP53 mRNA possesses functional roles beyond its canonical function as a protein blueprint. Moreover, multiple studies have indicated that LINC00665, a long non-coding RNA gene, can encode short peptides while serving as a functional RNA molecule. Despite previous studies on genes with this dual coding and non-coding capacity, aptly referred to as "bifunctional genes", there have been few attempts at their genome-wide detection.

To identify genes genome-wide, it's imperative to analyze at the isoform level. While short-read RNA-seq has inherent limitations in comprehensive transcriptome analysis owing to its short reads, long-read RNA-seq (LR) offers deeper insights by providing full transcript coverage. Therefore, in my pursuit to classify each transcript into coding or non-coding categories, I primarily utilized LR data from the Oxford Nanopore Technologies (ONT) platform. I generated datasets through ONT PCR-cDNA sequencing and gained extensive experience troubleshooting the sequencing process on MinION and PromethION platforms. Additionally, I developed and optimized a comprehensive analysis pipeline to ensure precise processing and analysis of ONT data.

The data generated was polysome profiled to separate transcripts based on their interaction with ribosomes from normal colonic cells and colorectal cancer cells: the heavy polysome fraction containing more than three ribosomes, the light polysome fraction with mono- or disomes, and the unbound fraction without any ribosomes. With this data, I first performed isoform level classification to distinguish coding and non-coding transcripts. I generated gold standard datasets using GENCODE annotation, in silico coding probability prediction and Ribo-seq. I devised a classification algorithm and validated my classification using these gold standards.

Subsequently, gene-level classification was performed. Genes that encode both coding and non-coding transcripts were defined as "bifunctional". Interestingly, several non-canonical lncRNAs were identified that encode only coding transcripts. Additionally, I performed gene ontology analysis to determine the biological functions associated with these putative bifunctional genes.

Further, I conducted survival analysis on cancer patient data from The Cancer Genome Atlas (TCGA-COADREAD), which provided a more comprehensive and clinically relevant perspective for my research. Through this novel approach, I identified a gene with significant clinical potential that might have been overlooked otherwise, underscoring the importance of bifunctional genes. This finding could contribute to a deeper understanding of colorectal cancer and highlight potential therapeutic targets.