WDAM-2017:Papers with Abstracts

Abstract. This paper deals with investigation of complex temporal relations between some rare disorders. It proposes an interval graphs approach combined with data mining for patient history pattern mining. The processed data are enriched with context information. Some text mining tools extract entities from free text and deliver additional attributes beyond the structured information about the patients. The test corpora contain pseudonymised reimbursement requests submitted to the Bulgarian National Health Insurance Fund in 2010-2015 for more than 5 million citizens yearly. Experiments were run on 2 data collections. Findings in these two collections are discussed on the basis of comparison between patients with and without rare disorders. Exploration of complex relations in rare-disease data can support analyzes of small size patient pools and assist clinical decision making.
Abstract. Today medical data analysis is experiencing rapid development. Large volumes of uniform and verified data are required for the application of innovative analysis solutions. This ideology was the foundation for Unified Radiological Information Service (URIS), launched in Moscow. Currently, 75 clinics are connected to the URIS. In 2016 we developed remote quality assurance system and discrepancy detection module (DDM). The software is designed to review studies, provide feedback and accumulate “big data”. We have compared the number of discrepancies before and after DDM implementation (4473 anonymized CT and MRI studies). In 12 months the number of discrepancies decreased by more than a half.
Abstract. This paper is devoted to mathematical modelling of the progression considering stages of breast cancer. Given the relation between primary tumor (PT) and metastases (MTS), the problem of discovering breast cancer (BC) process seems to be twofold: firstly, it is im- portant to describe the whole natural history of BC to understand the process as a whole; secondly, it is necessary to predict the period of a clinical MTS manifestation. In order to understand growth processes of BC on each stage CoMBreC was proposed as a new research tool. The CoMBreC is threefold: CoMPaS (stages I-II), CoM-III (stage III) and CoM-IV (stage IV). A new model rests on exponential growth model and complementing formulas. For the first time, it allows us to calculate different growth periods of PT and MTS in patients with/without lymph nodes MTS: 1) non-visible period for PT; 2) non- visible period for MTS; 3) visible period for MTS. Calculations via CoMBreC correspond to survival data considering stage of BC. It may help to improve predicting accuracy of BC process using an original mathematical model referred to CoMBreC and corresponding software. Consequently, thesis concentrated on: 1) modelling the whole natural history of PT and MTS in patients with/without lymph nodes MTS; 2) developing adequate and precise CoMBreC that reflects relations between PT and MTS; 3) analysing the CoMBreC scope of application. The CoMBreC was implemented to iOS application as a new predictive tool: 1) is a solid foundation to develop future studies of BC models; 2) does not require any expensive diagnostic tests; 3) is the first predictor of survival in breast cancer that makes forecast using only current patient data.
Abstract. Clinical informatics has been undergoing radical transformation. What are the causes and the drivers of this transformation? Which task can be solved well, and which cannot? How we should implement data analysis in clinical informatics projects in new reality? What is an importance of interpretability (comprehensibility) and explanation of data analysis methods in clinical informatics? At the workshop, we will try to answer some of such questions and setup a framework for later discussion.
Abstract. Modern medicine aspire to improve the effectiveness of treatment for some diseases through, so called, personalized medicine. However, totally personalized medicine or personalized treatment of even one disease is a very ambitious goal. Subgroup analysis of patients is a preliminary step to the total personalization. Several completely different views on the principles and usefulness of subgroup analysis for treatment personalization exist. This paper is limited to data-driven subgroup discovery, when collected data analyzed for significant treatment-biomarker interactions in post-hoc manner, and presents a brief overview of key methods for this type of subgroup analysis.
Abstract. Medical laboratory "Gemotest" is a modern high-tech research center, which daily fulfills tens of thousands of medical analyses for patients all over in Russia. Regular tech base modernization and introduction of fundamentally new research methods and equipment allow "Gemotest" to fulfill widest range of analyses from clinical blood test to detection of genetic pathologies. One of the most important aspect during research conduction is instant detection of abnormal results and their verification and validation.
Abstract. Big data and deep learning technologies play an important role in the modern scientific world. The tendency to work with huge data sets is now conquering the medical area. In this article, based on the experience of the Department of medical cybernetics and informatics of the RNRMU Medical and biological Faculty, we explain the main issues that re- searchers deal with in collection and processing of medical data. We explain that problems may relate to data sources issues, semantic interoperability, data relevance, multidimensionality, completeness, and comparability. Modern digital health records and their services like EHR nowadays cannot provide necessary “Big Data” information. The healthcare system makes it impossible to collect relevant big data sets in a short period. Further issues are certain irresponsibility of doctors and patients; their truthfulness about facts happened in reality and the difference between these facts and what is written in a medical record. This often leads to incorrect and incomplete data sets in medical information systems. We conclude by stating that “Big Data” in medicine today cannot be “Big” as in other scientific areas. Re- searchers should try to collect relevant, truthful, and complete information in observable amount and time and perform their studies.
Abstract. Nowadays inthe domain of modern information technologies there is an evident trend of wide usage of intel- ligent information systems (IS), where processing of incoming data is based on multiple semantic-oriented transformations. Model Driven Engineering approach is widely used for intelligent IS development. The effectiveness of this approach is defined by availability for developers sufficient number of domain oriented models, which describe the classes of solutions and suggest effective tools for model transformation. In the paper a model driven approach for development of intelligent IS is described. The main idea of the suggest- ed approach is an implementation of semantic oriented transformations. This approach was used by the authors for building real IS for different subject domains. The example of practical usage of the suggested approach for medical IS development is described.
Abstract. Mathematical models predicting final height (FH) and its standard deviation score (SDS) for children with growth hormone deficiency is an important tool for clinicians to manage treatment process. Previously developed models do not have enough accuracy or not good enough for practical use. We used 5 binary and 7 continuous predictors available at the time of diagnosis and start of therapy and developed multiple linear regression (MLR) models and artificial neural networks (ANN). The sample included 121 patients of Endocrinology Research Center (Moscow, Russia) who were under observation in 1978-2016 and reached the final height. All of them received growth hormone replacement therapy at least for 3 years. MLR models had poor quality. The best ANN predicting FH has RMSE 4.8 cm and explains 71.3% of variance, and 10 predictors are used. The best ANN for predicting FH SDS ex- plains 50% of variance and has RMSE 0.749 SDS, and 12 predictors are used. It seems promising to increase the sample and improve the ANN models.