Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Why Should We Switch to LLMs?
Published:
Why We Should Embrace the Shift to Large Language Models
portfolio
publications
Providing FAQ lists based on ontology
Published in 24th IEEE International Conference on Electrical Engineering (ICEE), 2016
Researchers have been fascinated in FAQ (Frequently Asked Questions) management systems in recent years…
Recommended citation: M. Pourreza Shahri, M. Kahani, and H. Ekbia, Providing FAQ lists based on ontology", 24th IEEE Iranian Conference on Electrical Engineering (ICEE), Shiraz, Iran, 2016"
Download Paper
Extracting Co-mention Features from Biomedical Literature for Automated Protein Phenotype Prediction using PHENOstruct
Published in 10th International Conference on Bioinformatics and Computational Biology (BiCoB) 2018, 2018
Human Phenotype Ontology (HPO) is a recently introduced standard vocabulary for describing diseaserelated phenotypic abnormalities in human. Since experimental determination of HPO categories for human proteins is a highly resource-consuming task, developing automated tools that can accurately predict HPO categories has gained interest recently. In our previous work, we developed PHENOstruct, an automated phenotype prediction tool that uses input features generated from heterogeneous data sources including standard bag-of-words features extracted from biomedical literature. In this work, we introduce novel co-mention features which are based on co-occurrences of protein names and HPO terms within a specified span of text. Our experimental results indicate that utilizing co-mentions significantly improves the overall performance and that the most effective span is the paragraph-level. This is the first study that uses a knowledge-based approach for generating literature features for the task of automated protein phenotype prediction. These findings have implications for practitioners interested in developing automated biocuration pipelines for phenotypes.
Recommended citation: M. Pourreza Shahri and I. Kahanda, Extracting Co-mention Features from Biomedical Literature for Automated Protein Phenotype Prediction using PHENOstruct", 10th International Conference on Bioinformatics and Computational Biology (BiCOB), Las Vegas, NV, USA, 2018."
Download Paper
Quality assurance of bioinformatics software: a case study of testing a biomedical text processing tool using metamorphic testing
Published in Proceedings of the 3rd International Workshop on Metamorphic Testing, 2018
Bioinformatics software plays a very important role in making critical decisions within many areas including medicine and health care. However, most of the research is directed towards developing tools, and little time and effort is spent on testing the software to assure its quality. In testing, a test oracle is used to determine whether a test is passed or failed during testing, and unfortunately, for much of bioinformatics software, the exact expected outcomes are not well defined. Thus, the main challenge associated with conducting systematic testing on bioinformatics software is the oracle problem.
Recommended citation: M. Srinivasan, M. Pourreza Shahri, U. Kanewala, and I. Kahanda, Quality Assurance of Bioinformatics Software: A Case Study of Testing a Biomedical Text Processing Tool Using Metamorphic Testing", Proceedings of the 3rd International Workshop on Metamorphic Testing, ACM, Gothenburg, Sweden, 2018."
Download Paper
Metamorphic Testing for Quality Assurance of Protein Function Prediction Tools
Published in 2019 IEEE International Conference On Artificial Intelligence Testing (AITest), 2019
Proteins are the workhorses of life and gaining insight on their functions is of paramount importance for applications such as drug design. However, the experimental validation of functions of proteins is highly-resource consuming. Therefore, recently, automated protein function prediction (AFP) using machine learning has gained significant interest. Many of these AFP tools are based on supervised learning models trained using existing gold-standard functional annotations, which are known to be incomplete. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, due to the incompleteness of gold-standard data, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. In this work, we use metamorphic testing (MT) to test nine state-of-the-art AFP…
Recommended citation: M. Pourreza Shahri, M. Srinivasan, G. Reynolds, D. Bimczok, I. Kahanda, and U. Kanewala, Metamorphic Testing for Quality Assurance of Protein Function Prediction Tools", The IEEE International Conference on Artificial Intelligence Testing, San Francisco, CA, USA, 2019."
Download Paper
PPPred: Classifying protein-phenotype co-mentions extracted from biomedical literature
Published in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 2019
The MEDLINE database provides an extensive source of scientific articles and heterogeneous biomedical information in the form of unstructured text. One of the most important knowledge present within articles are the relations between human proteins and their phenotypes, which can stay hidden due to the exponential growth of publications. This has presented a range of opportunities for the development of computational methods to extract these biomedical relations from the articles. However, currently, no such method exists for the automated extraction of relations involving human proteins and human phenotype ontology (HPO) terms. In our previous work, we developed a comprehensive database composed of all co-mentions of proteins and phenotypes. In this study, we present a supervised machine learning approach called PPPred (Protein-Phenotype Predictor) for classifying the validity of a given…
Recommended citation: M. Pourreza Shahri, G. Reynolds, M. M. Roe, and I. Kahanda, PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature", Proceedings of the 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), Niagara Falls, NY, USA, 2019."
Download Paper
ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature
Published in 2020 IEEE 14th International Conference on Semantic Computing (ICSC), 2020
Identifying protein-phenotype relations is of paramount importance for biomedical applications such as uncovering rare and complex diseases. One of the best resources that capture protein-phenotype relationships is the biomedical literature. In this work, we introduce ProPheno 1.0, a comprehensive online dataset composed of human protein/phenotype mentions extracted from the complete corpora of Medline and PubMed Central Open Access. Moreover, it includes co-occurrences of protein-phenotype pairs within different spans of text, such as sentences and paragraphs. We use ProPheno for completely characterizing the human protein-phenotype landscape in biomedical literature. The ProPheno dataset, the reported findings, and the gained insight have implications for (1) biocurators for expediting their curation efforts, (2) researches for quickly finding relevant articles, and (3) text mining tool developers for…
Recommended citation: M. Pourreza Shahri and I. Kahanda, ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature", 14th IEEE International Conference on Semantic Computing, 2020."
Download Paper
Deep Semi-supervised Ensemble Method for Classifying Co-mentions of Human Proteins and Phenotypes
Published in ISMB 2020, 2020
Identifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. Human Phenotype Ontology (HPO) is a recently introduced standard vocabulary for describing disease-related phenotypic abnormalities in humans. Since the experimental determination of HPO categories for human proteins is a highly resource-consuming task, developing automated tools that can accurately predict HPO categories has gained interest recently
Recommended citation: M. Pourreza Shahri and I. Kahanda, Deep Semi-supervised Ensemble Method for Classifying Co-mentions of Human Proteins and Phenotypes", Intelligent Systems for Molecular Biology (ISMB), 2020."
Download Paper
DeepPPPred: An Ensemble of BERT, CNN, and RNN for Classifying Co-mentions of Proteins and Phenotypes
Published in BioRxiv Preprints, 2020
The biomedical literature provides an extensive source of information in the form of unstructured text. One of the most important types of information hidden in biomedical literature is the relations between human proteins and their phenotypes, which, due to the exponential growth of publications, can remain hidden. This provides a range of opportunities for the development of computational methods to extract the biomedical relations from the unstructured text. In our previous work, we developed a supervised machine learning approach, called PPPred, for classifying the validity of a given sentence-level human protein-phenotype co-mention. In this work, we propose DeepPPPred, an ensemble classifier composed of PPPred and three deep neural network models: RNN, CNN, and BERT. Using an expanded gold-standard co-mention dataset, we demonstrate that the proposed ensemble method significantly outperforms its constituent components and provides a new state-of-the-art performance on classifying the co-mentions of human proteins and phenotype terms.
Recommended citation: M. Pourreza Shahri, K. Lyon, J. Schearer, and I. Kahanda, DeepPPPred: An Ensemble of BERT, CNN, and RNN for Classifying Co-mentions of Proteins and Phenotypes, BioRxiv, 2020.
Download Paper
An Ensemble Approach for Automatic Structuring of Radiology Reports
Published in ArXiv Preprints, 2020
The biomedical literature provides an extensive source of information in the form of unstructured text. One of the most important types of information hidden in biomedical literature is the relations between human proteins and their phenotypes, which, due to the exponential growth of publications, can remain hidden. This provides a range of opportunities for the development of computational methods to extract the biomedical relations from the unstructured text. In our previous work, we developed a supervised machine learning approach, called PPPred, for classifying the validity of a given sentence-level human protein-phenotype co-mention. In this work, we propose DeepPPPred, an ensemble classifier composed of PPPred and three deep neural network models: RNN, CNN, and BERT. Using an expanded gold-standard co-mention dataset, we demonstrate that the proposed ensemble method significantly outperforms its constituent components and provides a new state-of-the-art performance on classifying the co-mentions of human proteins and phenotype terms.
Recommended citation: M. Pourreza Shahri, A. Tahmasebi, B. Ye, H. Zhu, J. Aslam, T. Ferris, An Ensemble Approach for Automatic Structuring of Radiology Reports, ArXiv, 2020.
Download Paper