|
Maurice Ling's Professional Portfolio - Research Portfolio R & D Project Summaries |
|
As mentioned in my research interest, I'm interested in the management and meta-analysis of available biological data. Hence, my projects are divided into 3 broad areas - management of data, analysis of data, and supporting tools to enable management and analysis. Functionally, my projects are linked to each other as described below. Muscorian is my primary research work, focusing on the analysis and mining of publically available biomedical literature for protein-protein interactions which stems from high-throughput genomics and proteomics analyses. While most scientists are interested in differentially expressed genes, such as what genes are up-regulated or down-regulated in certain nutritional, developmental or disease conditions, I am interested in those genes that do not change as they may tell us fundamental operations of tissues and cells. I hypothesize that some of these invariant genes are crucial to life; therefore, may possess a long evolutionary history. Looking at bacterial systems, we may be able to understand some of life's fundamental processes and ultimately answer the question: why is life the way it is? In the course of developing Muscorian, I found that some data cannot be represented in a tabular form, such as protein networks. Another issue that is giving me problems is gene/protein names. Gene/protein names can change over time. For example, a hormone receptor (receptorA) may be found to have different variants say in 1995. Hence, if I were to search receptorA in papers published after 1995, it may be known as receptorA1 and receptorA2 instead. I believe that a hypergraph data model may be more suitable for such representation. This started my interest in developing a database on the basis of hypergraph data model - HygDAS. All these research work requires the management of large quantities and diverse sources of data. OpenDWS (Open Data Warehousing Suite) is developed as a data warehousing tool to manage these resources. In addition, backing up of these data is important. One main method to backup live databases is database replication. This can be done by having a few installations of synchronized database across geographical locations. Throughout these research and developmental work, there will be a number of algorithms and data structures which can be re-used in other projects. Hence, it is collected as COPADS - Collection of Python Algorithms and Data Structures. Being trained in experimental biology, I appreciate the rationale and need to document all our research activities and results. However, in the era of bioinformatics, it will be difficult to do so given the large quantities of data. CyNote (or Cyber Laboratory Notebook for Biologists and Bioinformaticists) has been developed as the electronic counterpart of the traditional paper-bound laboratory notebooks. Furthermore, Muscorian and OpenDWS (as used in my research context) do not present a user interface that could be accessed by other biologists and/or bioinformaticists. I believe that CyNote can fill this need as Muscorian and OpenDWS accesses can be developed as plugins into CyNote. In addition, all of these tools and systems (CyNote, COPADS, Muscorian, OpenDWS, and HygDAS) are expected to be perpetually in development, enhancement and re-factoring. Furthermore, these developments are not expected to be carried out by one person (that is, a single maintainer for each system) and it also expected to undergo phases of reduced activity. Hence, I will need a way to describe or specify the behaviour of each component in these systems unambiguously so that I can come back to it after a lapse or to pass on to an interested person. Formal methods which is based on mathematical theories is more likely to provide unambiguous description than the English language. However, it has to be simple enough to pick up with high-school mathematics background and must be in-line with my developmental routine. In this aspect, BeSSY is a formal method developed to specify the behaviour of software components, yet rely on high-school mathematics. |