Machine learning models predict total charges and drivers of cost for transcatheter aortic valve replacement
Original Article

Machine learning models predict total charges and drivers of cost for transcatheter aortic valve replacement

Agam Bansal1, Chandan Garg2, Essa Hariri1, Nicholas Kassis1, Amgad Mentias1, Amar Krishnaswamy1, Samir R. Kapadia1

1Department of Cardiovascular Medicine, Heart and Vascular Institute, Cleveland Clinic, Cleveland, OH, USA; 2Department of Statistics, Columbia University, New York, NY, USA

Contributions: (I) Conception and design: A Bansal, SR Kapadia; (II) Administrative support: A Krishnaswamy, S Kapadia; (III) Provision of study materials or patients: A Bansal; (IV) Collection and assembly of data: A Bansal; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Samir R. Kapadia, MD, FACC, FAHA. Chair, Department of Cardiovascular Medicine Heart and Vascular Institute, Cleveland Clinic, 9500 Euclid Avenue, J2-3, Cleveland, OH 44195, USA. Email: kapadis@ccf.org.

Background: Given the increasing healthcare costs, there is an interest in developing machine learning (ML) prediction models for estimating hospitalization charges. We use ML algorithms to predict hospitalization charges for patients undergoing transfemoral transcatheter aortic valve replacement (TF-TAVR) utilizing the National Inpatient Sample (NIS) database.

Methods: Patients who underwent TF-TAVR from 2012 to 2016 were included in the study. The primary outcome was total hospitalization charges. Study dataset was divided into 80% training and 20% testing sets. We used following ML regression algorithms: random forest, gradient boosting, k-nearest neighbors (KNN), multi-layer perceptron and linear regression. ML algorithms were built for for 3 stages: Stage 1, including variables that were known pre-procedurally (prior to TF-TAVR); Stage 2, including variables that were known post-procedurally; Stage 3, including length of stay (LOS) in addition to the stage 2 variables.

Results: A total of 18,793 hospitalization for TF-TAVR were analyzed. The mean and median adjusted hospitalization charges were $220,725.2 ($137,675.1) and $187,212.0 ($137,971.0–264,824.8) respectively. Random forest regression algorithm outperformed other ML algorithms at all stages with higher R2 score and lower mean absolute error (MAE), root mean squared area (RMSE) and root mean squared logarithmic error (RMSLE) (Stage 1: MAE 79,979.11, R2 0.157; Stage 2: MAE 76,200.09, R2 0.256; Stage 3: MAE 69,350.09, R2 0.453). LOS was the most important predictor of hospitalization charges.

Conclusions: We built ML algorithms that predict hospitalization charges with good accuracy in patients undergoing TF-TAVR at different stages of hospitalization and that can be used by healthcare providers to better understand the drivers of charges.

Keywords: Transcatheter aortic valve replacement (TAVR); hospitalization charges; machine learning (ML)


Submitted Nov 11, 2021. Accepted for publication Jun 30, 2022.

doi: 10.21037/cdt-21-717


Introduction

Transcatheter aortic valve replacement (TAVR) has revolutionized the treatment of severe aortic stenosis and has become the gold standard treatment for patients with severe symptomatic aortic stenosis as approved by the US Food and Drug Administration (FDA) (1). Annual TAVR volume in the United States has increased steadily with more than 500% growth rate from approximately 5,000 in 2012 to almost 250,000 in 2019 (2). More recent trials (3-6) have expanded indications for TAVR to include patients with intermediate surgical risk, and results of the PARTNER 3 trial imply that TAVR will soon be the treatment of choice for low-risk candidates (7). With aging population, increased prevalence of aortic stenosis, and expansion of TAVR to low-risk and younger patient population group, there is going to be a rising demand and thus adequate resource utilization is of paramount importance (8). There is an increased emphasis on improving the healthcare quality in the United States. Healthcare reimbursement models are being shifted from “payment for volume” to “payment for value.” In this scenario, hospital systems are increasingly motivated to curb the hospitalization costs.

Given the increasing healthcare costs, there is an interest in developing machine learning (ML) prediction models for estimating hospitalization charges. ML models for colorectal (9) and gastric cancer (10) have been used to predict hospitalization charges. Similarly, Muhlestein et al. (11) developed ensemble ML models for estimating charges following trans-sphenoidal surgery for pituitary tumors. However, to the best of our knowledge there does not exist cost prediction models for cardiovascular procedures including TAVR. Herein, we use ML algorithms to predict hospitalization charges for patients undergoing transfemoral TAVR (TF-TAVR) utilizing the National Inpatient Sample (NIS) database. We present the following article in accordance with the TRIPOD reporting checklist (available at https://cdt.amegroups.com/article/view/10.21037/cdt-21-717/rc).


Methods

Data source and study population

We used the Agency for Healthcare Research and Quality’s NIS, the largest all-payer database of hospitalized patients in the United States. Patients aged ≥18 years with a discharge diagnosis aortic valve stenosis who underwent TF-TAVR [International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) procedure code 35.05 or 35.06 and International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) procedure codes 02RF37H, 02RF37Z, 02RF38H, 02RF38Z, 02RF3JH, 02RF3JZ, 02RF3KH, 02RF3KZ] from 2012 to 2016 were included in the study. Because the study used de-identified data, it was exempted from Institutional Review Board (IRB) approval. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Candidate variables and outcomes

Fifty-nine variables, including patient and hospital characteristics were collected for each hospitalization (description of variables in supplementary table). Patient comorbidities were identified using the Elixhauser Comorbidity Software administered by AHRQ.

The primary outcome was total hospitalization charges, calculated in US dollars. All the charges were adjusted for inflation.

Data pre-processing

The missing values were imputed using the k-nearest neighbors (KNN) algorithm. This algorithm uses ‘feature similarity’ to make predictions about the missing values by finding the k’s closest neighbors to the observation with missing data and then imputing them based on the non-missing values in neighborhood. The data was imputed after the training/testing data split.

ML model development and validation

The study dataset was divided into 80% training and 20% testing sets for the development and validation of ML algorithms respectively. In our study, we used the following ML regression algorithms: random forest, gradient boosting, KNN, multi-layer perceptron and linear regression. The important features were selected using the random forest algorithm. Grid search strategy was used to identify the combination of hyperparameters for enlisted ML algorithms based on cross-validation. The searched parameters included max_depth (range from 4 to 8), max_features (auto, sqrt, log2), and n_estimators (range from 10 to 200). The optimal values of RF model included: max_depth of 8, max_features: sqrt, and n_estimators =100. In our study, we built ML algorithms for 3 stages: Stage 1, including variables that were known pre-procedurally (prior to TF-TAVR) at the time of admission; Stage 2, including variables that were known post-procedurally; Stage 3, including length of stay (LOS) in addition to the stage 2 variables.

Statistical analysis

Model performance

The performance of the ML models was compared to average and median models using four evaluation metrics: R2 score, mean absolute error (MAE), root mean squared area (RMSE), and root mean squared logarithmic error (RMSLE). Higher R2 score and lower MAE, RMSE and RMSLE signifies better model performance.

Lift charts were generated in order to visualize how accurately the ensemble ML model predicts the LOS and hospitalization charges in the validation cohort. To generate these charts, we ranked and divided the best performing ML model predictions into 10 ‘bins’ and calculated the average LOS and hospitalization charges for each bin. We then calculated the average actual LOS and hospitalization charges respectively for each decile and then plotted the average predicted values against the average actual values.

Partial dependence plots

Partial dependence plots allow one to visualize how a model reacts to changes in a single variable. Predictions are made using the test values and the mean value of the predictions calculated. The mean prediction is plotted over the test values to generate a visual representation of the model’s response to changes in the variable.

Additional statistical analysis was performed to describe patient characteristics. Continuous variables were compared using the 2-tailed student’s t-test, whereas chi-square or Fischer exact tests were used for categorical data as appropriate. The analysis was conducted using python 3.6.9. The libraries used in the Python for this project were SciPy, Scikit-Learn and Numpy.


Results

Patient characteristics

A total of 18,793 individuals with age >18 years undergoing transfemoral TAVR from 2012 to 2016 were reviewed for the analysis. The mean age of the study population was 81.48 years and 46.6% were females. The mean adjusted hospitalization cost was $220,725.2 ($137,675.1) and the median adjusted hospitalization cost was $187,212.0 ($137,971.0–264,824.8). The distribution of adjusted hospitalization charges is described in Figure 1. In our study, around 14.2% patients had acute renal failure post-TAVR. About 2.45% (n=461) patients had cardiogenic shock and 1.78% required the use of mechanical circulatory support device. Table 1 shows the patient and hospital characteristics along with the mean (SD) adjusted hospitalization charges. The description of baseline characteristics and in-hospital outcomes in the study cohort in enlisted in the Table S1.

Figure 1 Distribution of adjusted hospitalization charges in patients hospitalized for undergoing TF-TAVR. TF-TAVR, transfemoral transcatheter aortic valve replacement

Table 1

Patient characteristics and adjusted hospitalization charges

Patient characteristics/co-morbidities and complications Adjusted hospitalization charges, mean (SD) P value
Stage 1 variables
   Year 0.051
    2012 224,867.89 (174,978.61)
    2013 215,298.70 (125,052.43)
    2014 224,481.68 (133,795.22)
    2015 223,339.75 (147,694.11)
    2016 209,838.74 (114,120.60)
   Sex 0.72
    Male 216,846.08 (139,030.20)
    Female 218,375.34 (125,693.11)
   Race 0.001
    Caucasian 216,681.73 (131,141.37)
    African American 220,776.41 (143,384.43)
    Hispanic 279,291.35 (167,575.43)
   Hospital region 0.001
    North-east 236,153.86 (138,075.82)
    Mid-west 182,217.13 (98,036.83)
    South 204,133.57 (107,372.72)
    West 263,406.45 (183,016.34)
   Hospital bed size 0.001
    Small 219,672.79 (140,947.99)
    Medium 228,039.07 (142,308.90)
    Large 215,148.13 (130,246.14)
   Hospital location 0.001
    Rural 133,955.97 (51,450.89)
    Urban non-teaching 196,018.42 (124,189.82)
    Urban teaching 220,734.69 (134,003.58)
   Elective admission 0.001
    Yes 202,184.10 (112,270.59)
    No 254,383.26 (167,198.82)
   PCI 0.001
    Yes 300,271.4 (224,738.88)
    No 214,475.13 (127,325.95)
   Fluid and electrolyte disorder 0.001
    Yes 266,090.25 (182,707.36)
    No 205,479.37 (114,267.54)
   Malnutrition disorder 0.001
    Yes 389,909.56 (309,285.66)
    No 213,328.60 (122,684.83)
   Congestive heart failure 0.001
    Yes 223,463.49 (141,387.76)
    No 199,535.96 (101,171.75)
   Coronary artery disease 0.0015
    Yes 213,065.13 (121,492.59)
    No 228,005.54 (156,055.49)
   Carotid artery disease 0.001
    Yes 205,282.37 (113,821.03)
    No 218,393.50 (134,167.09)
   Peripheral vascular disease 0.90
    Yes 217,109.05 (119,996.06)
    No 217,707.69 (137,121.41)
   Cardiac arrhythmias 0.001
    Yes 227,709.27 (144,361.28)
    No 201,005.29 (110,046.60)
   Atrial fibrillation 0.0014
    Yes 225,601.66 (148,330.24)
    No 211,607.15 (120,078.82)
   Conduction disorder 0.007
    Yes 228,566.34 (135,369.89)
    No 214,465.36 (132,168.17)
   DM controlled 0.13
    Yes 211,977.57 (131,621.56)
    No 219,488.61 (133,425.01)
   DM uncontrolled 0.40
    Yes 222,887.88 (146,086.61)
    No 216,942.75 (131,404.63)
   HTN controlled 0.001
    Yes 231,814.23 (154,060.68)
    No 201,310.42 (101,520.16)
   HTN uncontrolled 0.0058
    Yes 225,023.68 (137,513.07)
    No 212,766.25 (129,804.20)
   Chronic lung disease 0.001
    Yes 228,425.16 (142,551.68)
    No 210,898.06 (126,341.21)
   Coagulopathy 0.001
    Yes 243,984.7 (153,000.23)
    No 211,927.91 (127,644.4)
   Anemia 0.001
    Yes 246,190.87 (168,560.13)
    No 206,935.95 (115,319.60)
   Liver cirrhosis 0.001
    Yes 260,175.69 (152,713.28)
    No 217,109.58 (132,715.22)
   Dementia 0.68
    Yes 221,249.86 (100,538.88)
    No 217,342.25 (134,636.06)
   Smoking 0.14
    Yes 210,113.43 (105,665.59)
    No 218,906.05 (137,330.77)
   Obesity 0.63
    Yes 215,083.23 (119,540.02)
    No 217,992.06 (135,229.29)
   Solid tumor without metastasis 0.56
    Yes 225,695.88 (126,797.43)
    No 217,368.09 (133,136.86)
   Metastatic cancer 0.0015
    Yes 311,428.4 (208,590.93)
    No 217,054.27 (132,330.92)
   Lymphoma 0.31
    Yes 240,161.57 (138,531.16)
    No 217,343.94 (132,934.42)
   ESRD requiring dialysis 0.001
    Yes 265,055.68 (144,392.49)
    No 215,595.90 (132,149.81)
   CKD stage 5 0.75
    Yes 202,629.25 (98,134.53)
    No 217,588.23 (133,058.95)
   CKD stage 4 0.003
    Yes 255,881.67 (192,066.29)
    No 216,007.69 (129,836.85)
   CKD stage 3 0.68
    Yes 215,600.73 (119,676.91)
    No 217,953.16 (135,541.62)
   CKD stage 1–2 0.89
    Yes 215,416.35 (113,650.63)
    No 217,617.29 (133,508.59)
Stage 2 variables
   STEMI 0.001
    Yes 501,699 (550,217.80)
    No 216,798.48 (129,574.86)
   NSTEMI 0.001
    Yes 303,898.16 (183,007.20)
    No 215,099.61 (130,485.55)
   Cardiogenic shock 0.001
    Yes 404,308.31 (355,471.88)
    No 212,766.58 (118,381.33)
   Mechanical circulatory support device 0.001
    Yes 389,175.93 (294,305.99)
    No 214,489.27 (126,256.49)
   Mechanical ventilation 0.001
    Yes 504,642.89 (336,264.50)
    No 210,434.02 (115,121.91)
   Acute renal failure 0.001
    Yes 314,855.94 (219,052.69)
    No 201,970.45 (105,287.01)
   New Pacemaker Insertion 0.001
    Yes 262,910.94 (124,743.62)
    No 212,606.74 (132,939.22)
   In hospital sepsis 0.001
    Yes 284,121.28 (220,394.64)
    No 215,076.75 (127,971.09)
   Mortality 0.001
    Yes 364,549.94 (255,226.28)
    No 213,621.58 (125,851.16)
   Vascular complications 0.001
    Yes 290,258.70 (210,534.51)
    No 214,513.70 (127,874.78)
   Blood transfusion 0.001
    Yes 273,505.70 (172,527.39)
    No 208,734.09 (123,359.28)
   Acute stroke 0.001
    Yes 285,180.84 (196,567.23)
    No 216,478.06 (131,482.09)
   Cardiac tamponade 0.001
    Yes 333,938.64 (243,989.46)
    No 216,588.63 (131,300.13)

PCI, percutaneous coronary intervention; HTN, high blood pressure; ESRD, end-stage renal disease; CKD, chronic kidney disease; STEMI, ST segment elevation myocardial infarction; NSTEMI, non-STEMI; DM, diabetes mellitus.

ML regression algorithms’ predictive performance for adjusted hospitalization charges

Table 2 shows the predictive performance of various ML regression algorithms in comparison to mean and median models for estimating the adjusted hospitalization charges in patients undergoing TF-TAVR. All the ML algorithms performed significantly better than the mean or median models. Random forest regression algorithm outperformed other ML algorithms at all stages with higher R2 score and lower MAE, RMSE and RMSLE (Stage 1: MAE 79,979.11, R2 0.157; Stage 2: MAE 76,200.09, R2 0.256; Stage 3: MAE 69,350.09, R2 0.453). Apart from random forest regression, gradient boosting regression for stage 1 variables and KNN regression for stage 3 variables performed better than linear regression algorithm. As expected, there was an increase in the predictive performance from Stage 1 to Stage 3 given the addition of variables.

Table 2

Predictive performance of machine learning regression algorithms, mean and median models in predicting hospitalization costs in patients undergoing TAVR

MAE R2 score RMSE RMSLE
Stage 1
   Random forest regression 79,979.11 0.157 122,091.48 0.499
   Gradient boosting regression 81,544.21 0.114 125,124.47 0.509
   KNN regression 81,822.38 0.053 129,392.32 0.51
   MLP regression 87,292.51 0.015 131,955.43 0.53
   Linear regression 83,567.2 0.10 122,091.4 0.51
Stage 2
   Random forest regression 76,200.09 0.256 114,665.82 0.480
   Gradient boosting regression 80,541.18 0.146 125,124.47 0.502
   KNN regression 78,316.34 0.125 124,354.55 0.488
   MLP regression 84,443.79 0.082 127,426.43 0.522
   Linear regression 79,194.55 0.213 117,967.61 0.495
Stage 3
   Random forest regression 69,350.09 0.453 98,307.96 0.444
   Gradient boosting regression 74,903.48 0.27 114,208.19 0.463
   KNN regression 69,679.54 0.409 102,186.88 0.442
   MLP regression 71,833.24 0.388 104,002.02 0.457
   Linear regression 71,160.20 0.405 102,547.22 0.453
Median 83,452.40 −0.052 136,387.27 0.518
Mean 87,879.67 −0.00057 133,006.56 0.540

TAVR, transcatheter aortic valve replacement; MAE, mean absolute error; RMSE, root mean squared error; RMSLE, root mean squared logarithmic error; KNN, k-nearest neighbors; MLP, multilayer perceptron.

Predictors of hospitalization charges and partial dependence plots

Features selected for building ML algorithms at each stage in order of their importance are depicted in Figure 2. The top features using the random forest regression algorithm were based on variable importance. At the time of admission (pre-procedurally), hospital region, fluid and electrolyte disorders, age, race and elective admission were the most significant predictors of hospitalizations charges. Hospitalizations for TAVR in the west region [$263,406.45 ($183,016.34)] were more expensive than hospitalizations in north-east region [$236,153.86 ($138,075.82)] followed by south [$204,133.57 ($107,372.72)] and mid-west [$182,217.13 ($98,036.83)] regions. There were higher hospitalization charges incurred amongst the Hispanic [$279,291.35 ($167,575.43)] and African-American population [$220,776.41 ($143,384.43)] compared to the Caucasians. Patients undergoing elective TF-TAVR were likely to have less hospitalization charges [$202,184.10 ($112,270.59) vs. $254,383.26 ($167,198.82)]. There was a negative correlation of age with adjusted hospitalization charges (pearson correlation coefficient −0.0559). Individuals aged 60–75 years had higher hospitalization charges compared to ≥75 years ($227,806.93 vs. $219,196.70).

Figure 2 Top features for predicting hospitalization charges in patients undergoing TF-TAVR, stage wise. TAVR, transcatheter aortic valve replacement; TF-TAVR, transfemoral TAVR.

Amongst the stage 2 variables, mechanical ventilation, acute renal failure, cardiogenic shock, use of mechanical circulatory support devices, new pacemaker insertion and in-hospital sepsis were prominent predictors of increased hospitalization charges. Use of mechanical ventilation was associated with around 2.5-fold increase in mean adjusted hospitalization charges [$504,642.89 ($336,264.50) vs. $210,434.02 ($115,121.91); cardiogenic shock, $404,308.31 ($355,471.88) vs. $212,766.58 ($118,381.33)] and mechanical circulatory support device use [$389,175.93 ($294,305.99) vs. $214,489.27 ($126,256.49)] around 2-fold increase in hospitalization charges and acute renal failure around 1.5-fold increase [$314,855.94 ($219,052.69) vs. $201,970.45 ($105,287.01)].

For stage 3, LOS was the most important predictor of hospitalization charges. Figure 2 depicts the two-way partial dependence interaction between LOS and the second important variable (i.e., mechanical ventilation).

The actual and stage wise predicted hospitalization charges for the first 20 patients is shown in the Figure 3. The stage wise lift charts for the testing (validation) cohort are depicted in the Figure 4. Decile wise actual and stage wise predicted hospitalization costs in patients undergoing TF-TAVR can be seen in Table 3. The accuracy of various ML algorithms for predicted versus measured hospitalization charges is shown in the supplement.

Figure 3 Comparison of actual and stage wise predicted hospitalization charges in patients undergoing TF-TAVR for the first 20 patients. TF-TAVR, transfemoral transcatheter aortic valve replacement.
Figure 4 Lift charts for the validation (testing cohorts) for stage 1, 2, and 3.

Table 3

Decile wise actual and stage wise predicted hospitalization costs in patients undergoing TF-TAVR

Deciles Actual hospitalization cost ($) Stage 1 predicted hospitalization cost ($) Stage 2 predicted hospitalization cost ($) Stage 3 predicted hospitalization cost ($)
0 225,526.82 221,053.83 223,444.37 222,345.92
1 222,735.92 223,061.71 221,752.52 220,706.20
2 219,101.85 219,483.58 218,695.32 217,788.57
3 218,321.29 221,001.81 218,306.04 217,550.80
4 219,647.94 221,441.96 221,086.45 221,266.57
5 220,886.87 220,847.21 221,587.54 221,807.53
6 220,671.16 221,302.26 222,426.94 221,743.26
7 222,169.46 220,758.07 222,452.45 223,556.07
8 219,366.78 221,767.46 222,100.56 220,122.74
9 226,741.90 222,864.56 222,896.01 225,652.89

TF-TAVR, transfemoral transcatheter aortic valve replacement


Discussion

Hospitalization charges is an important indicator of resource utilization (11) and understanding the predictors of higher hospitalization charges provides practitioners an opportunity to address potentially reversible drivers of charges. In our study, we found that ML regression algorithms performed significantly better than mean and median models for predicting adjusted hospitalization charges in patients undergoing TF-TAVR. Amongst the ML algorithms, random forest regression outperformed others at all stages. LOS was the most significant predictor of adjusted hospitalization charges.

LOS

We found that LOS is by far the strongest predictor of adjusted hospitalization charges. The influence of this variable is so strong that the relative impact of all other variables is nearly negligible. Decreasing LOS is thus of paramount importance for cost-reduction strategy for this patient population. LOS is an indicator of cumulative effects of multiple factors including patient’s baseline characteristics, co-morbidities, clinical presentation, urgency of procedure, post-procedure complications, and individual hospital protocols. The adjusted costs for next-day discharge (NDD) following TAVR is nearly $7,500 lower compared with non- NDD (12).

Post-procedural complications

Mechanical ventilation, acute renal failure, cardiogenic shock, use of mechanical circulatory support device, in hospital sepsis, and new pacemaker insertion were all associated with increased hospitalization costs. Acute kidney injury (AKI) occurs frequently following TAVR and has been associated with worse outcomes (13). AKI is expensive and consumes a considerable amount of health care resources. Even the most conservative episodes attribute approximately $1,700 in excess costs for each episode of AKI and $11,000 in excess costs for each episode of dialysis-requiring AKI (14). Since from the NIS database it is not possible to determine whether cardiogenic shock or use of mechanical circulatory support device occurred before or after the procedure, we included these variables in stage 2 in our study.

Other predictors of hospitalization charges

In our study cohort, there was a negative correlation of age with adjusted hospitalization charges. Higher costs in younger population is likely because the younger patients undergoing TAVR are relatively sicker and thus are at an increased risk of post-procedure complications.

The hospitalization charges for patients undergoing TF-TAVR varied across regions with highest in the west followed by north-east, south and mid-west regions. Healthcare expenditure in general varies widely by hospital region in United States. It is thus important to understand the regional differences in practice and attempt to reduce the wasteful charges for TAVR.

Patients undergoing elective TAVR were likely to incur less hospitalization charges [$202,184.10 ($112,270.59)] than those undergoing urgent/emergent TAVR [$254,383.26 ($167,198.82)]. This is because of increased complications including AKI or dialysis requirement and increased mortality in individuals undergoing urgent/emergent TAVR (13).

It is well established that there exist racial disparities in healthcare system. In our study population, around 90% patient population were Caucasians and 4% African-Americans and Hispanics. There occurs increased racial disparity in utilization of structural heart disease interventions (15). In our study population, there existed significant differences in adjusted hospitalization charges being higher in Hispanics and African-Americans compared with Caucasians. It is possible that decreased access to healthcare results in minority race patients presenting with advanced disease, driving up hospitalization charges.

Fluid & electrolyte disorders was another significant predictor of increased hospitalization charges. In patients undergoing TAVR, fluid & electrolyte disorder have been shown to be an independent predictor of mortality (16). Fluid & electrolyte disorder could be a modifiable predictor of adjusted hospitalization charges in patients undergoing TAVR and efforts should be geared towards reducing its occurrence.

Strengths and limitations

To the best of author’s knowledge, this is the first attempt to develop a ML prediction model for estimating hospitalization charges in patients undergoing TAVR. Second, we predicted the performance of ML algorithms at various stages, from the time of admission until procedure completion and finally taking LOS into account. Third, a robust performance assessment was done for various ML algorithms using multiple evaluation metrics to identify which algorithm most accurately predicts our outcome of interest i.e., hospitalization charges in patients undergoing TF-TAVR. Fourth, ML algorithms were not only compared amongst themselves but also the mean and median models.

ML models are often criticized for overfitting. To overcome this, we validated our ML regression algorithms internally using a rigorous 5-fold cross fold validation technique. However, the models developed have not been externally validated on a separate cohort. We used the NIS database which inherently has certain limitations as have been described before. There are certain variables which were not available for analysis including type of valves (balloon-expandable or self-expandable), echocardiographic characteristics, acuity of condition, degree of pre-procedural shock, and others.

In conclusion, we built ML algorithms that predict hospitalization charges with good accuracy in patients undergoing TF-TAVR at different stages of hospitalization and that can be used by healthcare providers to better understand the drivers of charges. LOS was the strongest predictor of hospitalization charges. Post-procedure complications including need for mechanical ventilation, acute renal failure, cardiogenic shock and use of mechanical support devices, in-hospital mortality, and need for pacemaker insertion; fluid and electrolyte disorders, age, hospital region and race were other predictors of hospitalization charges.


Acknowledgments

Funding: None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://cdt.amegroups.com/article/view/10.21037/cdt-21-717/rc

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://cdt.amegroups.com/article/view/10.21037/cdt-21-717/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Smith CR, Leon MB, Mack MJ, et al. Transcatheter versus surgical aortic-valve replacement in high-risk patients. N Engl J Med 2011;364:2187-98. [Crossref] [PubMed]
  2. TVT Registry Datamart Data. TAVR Update: New Insights and Perspectives from the U.S.: National STS/ACC TVT Registry: National Cardiovascular Data Registry; 2019. Available online: https://www.sts.org/sites/default/files/102419%201645.%20Bavaria.%20TVT.pdf
  3. Leon MB, Smith CR, Mack MJ, et al. Transcatheter or Surgical Aortic-Valve Replacement in Intermediate-Risk Patients. N Engl J Med 2016;374:1609-20. [Crossref] [PubMed]
  4. Adams DH, Popma JJ, Reardon MJ, et al. Transcatheter aortic-valve replacement with a self-expanding prosthesis. N Engl J Med 2014;370:1790-8. [Crossref] [PubMed]
  5. Leon MB, Smith CR, Mack M, et al. Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. N Engl J Med 2010;363:1597-607. [Crossref] [PubMed]
  6. Reardon MJ, Van Mieghem NM, Popma JJ, et al. Surgical or Transcatheter Aortic-Valve Replacement in Intermediate-Risk Patients. N Engl J Med 2017;376:1321-31. [Crossref] [PubMed]
  7. Mack MJ, Leon MB, Thourani VH, et al. Transcatheter Aortic-Valve Replacement with a Balloon-Expandable Valve in Low-Risk Patients. N Engl J Med 2019;380:1695-705. [Crossref] [PubMed]
  8. Osnabrugge RL, Mylotte D, Head SJ, et al. Aortic stenosis in the elderly: disease prevalence and number of candidates for transcatheter aortic valve replacement: a meta-analysis and modeling study. J Am Coll Cardiol 2013;62:1002-12. [Crossref] [PubMed]
  9. Lee SM, Kang JO, Suh YM. Comparison of hospital charge prediction models for colorectal cancer patients: neural network vs. decision tree models. J Korean Med Sci 2004;19:677-81. [Crossref] [PubMed]
  10. Wang J, Li M, Hu YT, et al. Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models. BMC Health Serv Res 2009;9:161. [Crossref] [PubMed]
  11. Muhlestein WE, Akagi DS, McManus AR, et al. Machine learning ensemble models predict total charges and drivers of cost for transsphenoidal surgery for pituitary tumor. J Neurosurg 2018;131:507-16. [Crossref] [PubMed]
  12. Lauck SB, Baron SJ, Sathananthan J, et al. Exploring the Reduction in Hospitalization Costs Associated with Next-Day Discharge following Transfemoral Transcatheter Aortic Valve Replacement in the United States. Structural Heart. 2019;3:423-30. [Crossref]
  13. Kolte D, Khera S, Vemulapalli S, et al. Outcomes Following Urgent/Emergent Transcatheter Aortic Valve Replacement: Insights From the STS/ACC TVT Registry. JACC Cardiovasc Interv 2018;11:1175-85. [Crossref] [PubMed]
  14. Silver SA, Chertow GM. The Economic Consequences of Acute Kidney Injury. Nephron 2017;137:297-301. [Crossref] [PubMed]
  15. Alkhouli M, Alqahtani F, Holmes DR, et al. Racial Disparities in the Utilization and Outcomes of Structural Heart Disease Interventions in the United States. J Am Heart Assoc 2019;8:e012125. [Crossref] [PubMed]
  16. Akinseye OA, Shahreyar M, Nwagbara CC, et al. Modifiable Predictors of In-Hospital Mortality in Patients Undergoing Transcatheter Aortic Valve Replacement. Am J Med Sci 2018;356:135-40. [Crossref] [PubMed]
Cite this article as: Bansal A, Garg C, Hariri E, Kassis N, Mentias A, Krishnaswamy A, Kapadia SR. Machine learning models predict total charges and drivers of cost for transcatheter aortic valve replacement. Cardiovasc Diagn Ther 2022;12(4):464-474. doi: 10.21037/cdt-21-717

Download Citation