This lecture explores challenges in analyzing medical data, using the MIMIC database as an example. Inconsistencies arise from hospital system changes (CareVue to MetaVision), revealing age-related biases and data entry errors (e.g., ages over 90 coded as 300). The speaker details various data types (demographics, vital signs, medications, lab tests, notes, imaging) and their inherent complexities, including coding inconsistencies across different systems (e.g., NDC, MEDDRA, CPT). Analyzing this data reveals biases and difficulties in predicting mortality solely from demographic features. The lecture concludes by emphasizing the importance of understanding data limitations and the ongoing need for standardization in medical record-keeping. Types of Medical Data The Mimic Database Data Standards and Harmonization Data Analysis Examples Data Quality Issues Takeaways These are standard vital signs. and so we have lots of those recorded medications, prescription medications over-the-counter drugs, illegal drugs. If you're willing not to lie to your healthcare provider, Alcohol Again, one of my earliest days I was hanging out with the cardiologist at Tufts Medical Center and we see this elderly lady who looks kind of terrible and and we're talking to her, well the doctor is talking to her I'm trying to stay out of the way and and he says, so you know, do you drink alcohol and she says, oh no, never touch the stuff.10:26And, and then we talked some more and we go out of the out of the patients room and and the doctor turns to me out of earshot of the patient and says, oh, she's a chronic drunk, I said, Well, how do you know? And he says, well, from lab tests, from the appearance of her skin, from her general demeanor from you know, various sort of ineffable factors.10:52And so patients lie, okay, they really do because they don't want to tell you things. Okay, medications by the way, is a big deal. So there's this whole field called mad wreck medication reconciliation, which is the hospitals or the doctors offices attempt to figure out what medications you're actually taking.11:18So I'm a member of the MIT health plan and if I sign into my health plan account, it tells me that I'm taking some pills that I got 12 years ago as part of a laboratory test. But we were, you know, I took two pills which were supposed to have some physiological effect and then they measured that and I've never gotten another pill and never taken one since then.11:46Nor would it be particularly good for me, But it's still on my record and there's no notice of it ever having been discontinued and that's a real problem because if you're taking care of a patient, you'd like to understand what drugs they're actually taking.12:03It's hard to know, okay, that lab tests, so this is the things that you you imagine that we do a lot of, and these are components of the blood and the urine mainly but also of the stool. Saliva, spinal fluid fluid taken off the belly joint, fluid, bone marrow stuff coming out of your lungs.12:28it's all anything, right? And any place where you can produce something, some specimen, they can send it to a lab and measure things in it. And they measure lots and lots of different kinds of things. And these are often useful pathology, qualitative and quantitative examination of any body tissue, for example, biopsy samples, or surgical scraps.12:52You know, if they do an operation, they cut something out of you that typically winds up on a pathologist bench, who then tries to figure out what its characteristics are. And that's again, useful information, okay, microbiology. Ever since Pasteur, we know that organisms cause disease. Age Distribution: The Mimic database shows a higher prevalence of older patients in the ICU, with a smaller number of young people. Insurance Type: The age distribution varies significantly based on insurance type, with a higher proportion of Medicare and Medicaid patients over 65. Ethnicity: African Americans and Hispanics tend to be admitted to the ICU at younger ages than white patients, highlighting potential biases in healthcare access. Marital Status: Single individuals are more likely to be admitted to the ICU, possibly due to lack of home support. In-Hospital Mortality Prediction: A logistic regression model using demographic features showed that age and ethnicity are significant predictors of in-hospital mortality. Diurnal Variation in Lab Tests: A study found that lab test results tend to be more abnormal at night, possibly due to sicker patients being tested at those times. NDC (National Drug Code): A nine-digit code assigned by the FDA to identify drugs, their manufacturers, and their formulations. MedDRA (Medical Dictionary for Regulatory Activities): An international coding system for medical terms, including drugs and medical devices. CPT (Current Procedural Terminology): A coding system for medical procedures, including medication administration. HCPCS (Healthcare Common Procedure Coding System): A coding system for medical services and procedures, including medication administration. GSN (Generic Supply Number): A commercial coding system for drugs, used by Beth Israel Deaconess Medical Center. LOINC (Logical Observation Identifiers Names and Codes): A coding system for laboratory tests. SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms): A comprehensive medical terminology system. HL7 (Health Level Seven): A standards organization for electronic health information exchange. FHIR (Fast Healthcare Interoperability Resources): A modern standard for exchanging health information, designed to improve interoperability between different systems. Data Incompatibility: Different information systems used across hospitals or even within the same hospital can lead to inconsistencies in data collection and interpretation. Entry Errors: Mistakes in data entry, such as recording an incorrect age, can significantly impact analysis. HIPAA Regulations: Restrictions on disclosing sensitive information, like age for individuals over 90, can lead to data artifacts. Missing Data: Data may be missing due to various reasons, such as incomplete documentation or failure to collect specific information. MIMIC (Medical Information Mart for Intensive Care) is a database containing intensive care data from approximately 60,000 admissions at the Beth Israel Deaconess Medical Center over a 12-year period. Care View (old information system) and Meta Vision (new information system) were used during this period, resulting in data incompatibility issues. Example: Examining the distribution of heart rates revealed two peaks, which is unusual for physiological data. This was traced back to the different systems used for data collection in the NICU (Neonatal Intensive Care Unit) and the adult ICU. Know Your Data: Understanding the limitations and artifacts of your data is crucial for accurate analysis and avoiding false conclusions. Harmonization is Difficult: Standardization of medical data is complex and time-consuming, leading to inconsistencies and challenges in analysis. Data Cleaning is Essential: A significant portion of time in medical data analysis is spent on cleaning and preparing the data for analysis.