Machine learning models predict total charges and drivers of cost for transcatheter aortic valve replacement

Agam Bansal; Chandan Garg; Essa Hariri; Nicholas Kassis; Amgad Mentias; Amar Krishnaswamy; Samir R. Kapadia

doi:10.21037/cdt-21-717

Original Article

Machine learning models predict total charges and drivers of cost for transcatheter aortic valve replacement

Agam Bansal¹, Chandan Garg², Essa Hariri¹, Nicholas Kassis¹, Amgad Mentias¹, Amar Krishnaswamy¹, Samir R. Kapadia¹

¹Department of Cardiovascular Medicine, Heart and Vascular Institute, Cleveland Clinic, Cleveland, OH, USA; ²Department of Statistics, Columbia University, New York, NY, USA

Contributions: (I) Conception and design: A Bansal, SR Kapadia; (II) Administrative support: A Krishnaswamy, S Kapadia; (III) Provision of study materials or patients: A Bansal; (IV) Collection and assembly of data: A Bansal; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Samir R. Kapadia, MD, FACC, FAHA. Chair, Department of Cardiovascular Medicine Heart and Vascular Institute, Cleveland Clinic, 9500 Euclid Avenue, J2-3, Cleveland, OH 44195, USA. Email: kapadis@ccf.org.

Background: Given the increasing healthcare costs, there is an interest in developing machine learning (ML) prediction models for estimating hospitalization charges. We use ML algorithms to predict hospitalization charges for patients undergoing transfemoral transcatheter aortic valve replacement (TF-TAVR) utilizing the National Inpatient Sample (NIS) database.

Methods: Patients who underwent TF-TAVR from 2012 to 2016 were included in the study. The primary outcome was total hospitalization charges. Study dataset was divided into 80% training and 20% testing sets. We used following ML regression algorithms: random forest, gradient boosting, k-nearest neighbors (KNN), multi-layer perceptron and linear regression. ML algorithms were built for for 3 stages: Stage 1, including variables that were known pre-procedurally (prior to TF-TAVR); Stage 2, including variables that were known post-procedurally; Stage 3, including length of stay (LOS) in addition to the stage 2 variables.

Results: A total of 18,793 hospitalization for TF-TAVR were analyzed. The mean and median adjusted hospitalization charges were $220,725.2 ($137,675.1) and $187,212.0 ($137,971.0–264,824.8) respectively. Random forest regression algorithm outperformed other ML algorithms at all stages with higher R² score and lower mean absolute error (MAE), root mean squared area (RMSE) and root mean squared logarithmic error (RMSLE) (Stage 1: MAE 79,979.11, R² 0.157; Stage 2: MAE 76,200.09, R² 0.256; Stage 3: MAE 69,350.09, R² 0.453). LOS was the most important predictor of hospitalization charges.

Conclusions: We built ML algorithms that predict hospitalization charges with good accuracy in patients undergoing TF-TAVR at different stages of hospitalization and that can be used by healthcare providers to better understand the drivers of charges.

Keywords: Transcatheter aortic valve replacement (TAVR); hospitalization charges; machine learning (ML)

Submitted Nov 11, 2021. Accepted for publication Jun 30, 2022.

doi: 10.21037/cdt-21-717

Introduction

Transcatheter aortic valve replacement (TAVR) has revolutionized the treatment of severe aortic stenosis and has become the gold standard treatment for patients with severe symptomatic aortic stenosis as approved by the US Food and Drug Administration (FDA) (1). Annual TAVR volume in the United States has increased steadily with more than 500% growth rate from approximately 5,000 in 2012 to almost 250,000 in 2019 (2). More recent trials (3-6) have expanded indications for TAVR to include patients with intermediate surgical risk, and results of the PARTNER 3 trial imply that TAVR will soon be the treatment of choice for low-risk candidates (7). With aging population, increased prevalence of aortic stenosis, and expansion of TAVR to low-risk and younger patient population group, there is going to be a rising demand and thus adequate resource utilization is of paramount importance (8). There is an increased emphasis on improving the healthcare quality in the United States. Healthcare reimbursement models are being shifted from “payment for volume” to “payment for value.” In this scenario, hospital systems are increasingly motivated to curb the hospitalization costs.

Given the increasing healthcare costs, there is an interest in developing machine learning (ML) prediction models for estimating hospitalization charges. ML models for colorectal (9) and gastric cancer (10) have been used to predict hospitalization charges. Similarly, Muhlestein et al. (11) developed ensemble ML models for estimating charges following trans-sphenoidal surgery for pituitary tumors. However, to the best of our knowledge there does not exist cost prediction models for cardiovascular procedures including TAVR. Herein, we use ML algorithms to predict hospitalization charges for patients undergoing transfemoral TAVR (TF-TAVR) utilizing the National Inpatient Sample (NIS) database. We present the following article in accordance with the TRIPOD reporting checklist (available at https://cdt.amegroups.com/article/view/10.21037/cdt-21-717/rc).

Methods

Data source and study population

We used the Agency for Healthcare Research and Quality’s NIS, the largest all-payer database of hospitalized patients in the United States. Patients aged ≥18 years with a discharge diagnosis aortic valve stenosis who underwent TF-TAVR [International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) procedure code 35.05 or 35.06 and International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) procedure codes 02RF37H, 02RF37Z, 02RF38H, 02RF38Z, 02RF3JH, 02RF3JZ, 02RF3KH, 02RF3KZ] from 2012 to 2016 were included in the study. Because the study used de-identified data, it was exempted from Institutional Review Board (IRB) approval. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Candidate variables and outcomes

Fifty-nine variables, including patient and hospital characteristics were collected for each hospitalization (description of variables in supplementary table). Patient comorbidities were identified using the Elixhauser Comorbidity Software administered by AHRQ.

The primary outcome was total hospitalization charges, calculated in US dollars. All the charges were adjusted for inflation.

Data pre-processing

The missing values were imputed using the k-nearest neighbors (KNN) algorithm. This algorithm uses ‘feature similarity’ to make predictions about the missing values by finding the k’s closest neighbors to the observation with missing data and then imputing them based on the non-missing values in neighborhood. The data was imputed after the training/testing data split.

ML model development and validation

The study dataset was divided into 80% training and 20% testing sets for the development and validation of ML algorithms respectively. In our study, we used the following ML regression algorithms: random forest, gradient boosting, KNN, multi-layer perceptron and linear regression. The important features were selected using the random forest algorithm. Grid search strategy was used to identify the combination of hyperparameters for enlisted ML algorithms based on cross-validation. The searched parameters included max_depth (range from 4 to 8), max_features (auto, sqrt, log2), and n_estimators (range from 10 to 200). The optimal values of RF model included: max_depth of 8, max_features: sqrt, and n_estimators =100. In our study, we built ML algorithms for 3 stages: Stage 1, including variables that were known pre-procedurally (prior to TF-TAVR) at the time of admission; Stage 2, including variables that were known post-procedurally; Stage 3, including length of stay (LOS) in addition to the stage 2 variables.

Statistical analysis

Model performance

The performance of the ML models was compared to average and median models using four evaluation metrics: R² score, mean absolute error (MAE), root mean squared area (RMSE), and root mean squared logarithmic error (RMSLE). Higher R² score and lower MAE, RMSE and RMSLE signifies better model performance.

Lift charts were generated in order to visualize how accurately the ensemble ML model predicts the LOS and hospitalization charges in the validation cohort. To generate these charts, we ranked and divided the best performing ML model predictions into 10 ‘bins’ and calculated the average LOS and hospitalization charges for each bin. We then calculated the average actual LOS and hospitalization charges respectively for each decile and then plotted the average predicted values against the average actual values.

Partial dependence plots

Partial dependence plots allow one to visualize how a model reacts to changes in a single variable. Predictions are made using the test values and the mean value of the predictions calculated. The mean prediction is plotted over the test values to generate a visual representation of the model’s response to changes in the variable.

Additional statistical analysis was performed to describe patient characteristics. Continuous variables were compared using the 2-tailed student’s t-test, whereas chi-square or Fischer exact tests were used for categorical data as appropriate. The analysis was conducted using python 3.6.9. The libraries used in the Python for this project were SciPy, Scikit-Learn and Numpy.

Results

Patient characteristics

A total of 18,793 individuals with age >18 years undergoing transfemoral TAVR from 2012 to 2016 were reviewed for the analysis. The mean age of the study population was 81.48 years and 46.6% were females. The mean adjusted hospitalization cost was $220,725.2 ($137,675.1) and the median adjusted hospitalization cost was $187,212.0 ($137,971.0–264,824.8). The distribution of adjusted hospitalization charges is described in Figure 1. In our study, around 14.2% patients had acute renal failure post-TAVR. About 2.45% (n=461) patients had cardiogenic shock and 1.78% required the use of mechanical circulatory support device. Table 1 shows the patient and hospital characteristics along with the mean (SD) adjusted hospitalization charges. The description of baseline characteristics and in-hospital outcomes in the study cohort in enlisted in the Table S1.

Figure 1 Distribution of adjusted hospitalization charges in patients hospitalized for undergoing TF-TAVR. TF-TAVR, transfemoral transcatheter aortic valve replacement

Table 1

Patient characteristics and adjusted hospitalization charges

Patient characteristics/co-morbidities and complications	Adjusted hospitalization charges, mean (SD)	P value
Stage 1 variables
Year		0.051
2012	224,867.89 (174,978.61)
2013	215,298.70 (125,052.43)
2014	224,481.68 (133,795.22)
2015	223,339.75 (147,694.11)
2016	209,838.74 (114,120.60)
Sex		0.72
Male	216,846.08 (139,030.20)
Female	218,375.34 (125,693.11)
Race		0.001
Caucasian	216,681.73 (131,141.37)
African American	220,776.41 (143,384.43)
Hispanic	279,291.35 (167,575.43)
Hospital region		0.001
North-east	236,153.86 (138,075.82)
Mid-west	182,217.13 (98,036.83)
South	204,133.57 (107,372.72)
West	263,406.45 (183,016.34)
Hospital bed size		0.001
Small	219,672.79 (140,947.99)
Medium	228,039.07 (142,308.90)
Large	215,148.13 (130,246.14)
Hospital location		0.001
Rural	133,955.97 (51,450.89)
Urban non-teaching	196,018.42 (124,189.82)
Urban teaching	220,734.69 (134,003.58)
Elective admission		0.001
Yes	202,184.10 (112,270.59)
No	254,383.26 (167,198.82)
PCI		0.001
Yes	300,271.4 (224,738.88)
No	214,475.13 (127,325.95)
Fluid and electrolyte disorder		0.001
Yes	266,090.25 (182,707.36)
No	205,479.37 (114,267.54)
Malnutrition disorder		0.001
Yes	389,909.56 (309,285.66)
No	213,328.60 (122,684.83)
Congestive heart failure		0.001
Yes	223,463.49 (141,387.76)
No	199,535.96 (101,171.75)
Coronary artery disease		0.0015
Yes	213,065.13 (121,492.59)
No	228,005.54 (156,055.49)
Carotid artery disease		0.001
Yes	205,282.37 (113,821.03)
No	218,393.50 (134,167.09)
Peripheral vascular disease		0.90
Yes	217,109.05 (119,996.06)
No	217,707.69 (137,121.41)
Cardiac arrhythmias		0.001
Yes	227,709.27 (144,361.28)
No	201,005.29 (110,046.60)
Atrial fibrillation		0.0014
Yes	225,601.66 (148,330.24)
No	211,607.15 (120,078.82)
Conduction disorder		0.007
Yes	228,566.34 (135,369.89)
No	214,465.36 (132,168.17)
DM controlled		0.13
Yes	211,977.57 (131,621.56)
No	219,488.61 (133,425.01)
DM uncontrolled		0.40
Yes	222,887.88 (146,086.61)
No	216,942.75 (131,404.63)
HTN controlled		0.001
Yes	231,814.23 (154,060.68)
No	201,310.42 (101,520.16)
HTN uncontrolled		0.0058
Yes	225,023.68 (137,513.07)
No	212,766.25 (129,804.20)
Chronic lung disease		0.001
Yes	228,425.16 (142,551.68)
No	210,898.06 (126,341.21)
Coagulopathy		0.001
Yes	243,984.7 (153,000.23)
No	211,927.91 (127,644.4)
Anemia		0.001
Yes	246,190.87 (168,560.13)
No	206,935.95 (115,319.60)
Liver cirrhosis		0.001
Yes	260,175.69 (152,713.28)
No	217,109.58 (132,715.22)
Dementia		0.68
Yes	221,249.86 (100,538.88)
No	217,342.25 (134,636.06)
Smoking		0.14
Yes	210,113.43 (105,665.59)
No	218,906.05 (137,330.77)
Obesity		0.63
Yes	215,083.23 (119,540.02)
No	217,992.06 (135,229.29)
Solid tumor without metastasis		0.56
Yes	225,695.88 (126,797.43)
No	217,368.09 (133,136.86)
Metastatic cancer		0.0015
Yes	311,428.4 (208,590.93)
No	217,054.27 (132,330.92)
Lymphoma		0.31
Yes	240,161.57 (138,531.16)
No	217,343.94 (132,934.42)
ESRD requiring dialysis		0.001
Yes	265,055.68 (144,392.49)
No	215,595.90 (132,149.81)
CKD stage 5		0.75
Yes	202,629.25 (98,134.53)
No	217,588.23 (133,058.95)
CKD stage 4		0.003
Yes	255,881.67 (192,066.29)
No	216,007.69 (129,836.85)
CKD stage 3		0.68
Yes	215,600.73 (119,676.91)
No	217,953.16 (135,541.62)
CKD stage 1–2		0.89
Yes	215,416.35 (113,650.63)
No	217,617.29 (133,508.59)
Stage 2 variables
STEMI		0.001
Yes	501,699 (550,217.80)
No	216,798.48 (129,574.86)
NSTEMI		0.001
Yes	303,898.16 (183,007.20)
No	215,099.61 (130,485.55)
Cardiogenic shock		0.001
Yes	404,308.31 (355,471.88)
No	212,766.58 (118,381.33)
Mechanical circulatory support device		0.001
Yes	389,175.93 (294,305.99)
No	214,489.27 (126,256.49)
Mechanical ventilation		0.001
Yes	504,642.89 (336,264.50)
No	210,434.02 (115,121.91)
Acute renal failure		0.001
Yes	314,855.94 (219,052.69)
No	201,970.45 (105,287.01)
New Pacemaker Insertion		0.001
Yes	262,910.94 (124,743.62)
No	212,606.74 (132,939.22)
In hospital sepsis		0.001
Yes	284,121.28 (220,394.64)
No	215,076.75 (127,971.09)
Mortality		0.001
Yes	364,549.94 (255,226.28)
No	213,621.58 (125,851.16)
Vascular complications		0.001
Yes	290,258.70 (210,534.51)
No	214,513.70 (127,874.78)
Blood transfusion		0.001
Yes	273,505.70 (172,527.39)
No	208,734.09 (123,359.28)
Acute stroke		0.001
Yes	285,180.84 (196,567.23)
No	216,478.06 (131,482.09)
Cardiac tamponade		0.001
Yes	333,938.64 (243,989.46)
No	216,588.63 (131,300.13)

PCI, percutaneous coronary intervention; HTN, high blood pressure; ESRD, end-stage renal disease; CKD, chronic kidney disease; STEMI, ST segment elevation myocardial infarction; NSTEMI, non-STEMI; DM, diabetes mellitus.

ML regression algorithms’ predictive performance for adjusted hospitalization charges

Table 2 shows the predictive performance of various ML regression algorithms in comparison to mean and median models for estimating the adjusted hospitalization charges in patients undergoing TF-TAVR. All the ML algorithms performed significantly better than the mean or median models. Random forest regression algorithm outperformed other ML algorithms at all stages with higher R² score and lower MAE, RMSE and RMSLE (Stage 1: MAE 79,979.11, R² 0.157; Stage 2: MAE 76,200.09, R² 0.256; Stage 3: MAE 69,350.09, R² 0.453). Apart from random forest regression, gradient boosting regression for stage 1 variables and KNN regression for stage 3 variables performed better than linear regression algorithm. As expected, there was an increase in the predictive performance from Stage 1 to Stage 3 given the addition of variables.

Table 2

Predictive performance of machine learning regression algorithms, mean and median models in predicting hospitalization costs in patients undergoing TAVR

	MAE	R² score	RMSE	RMSLE
Stage 1
Random forest regression	79,979.11	0.157	122,091.48	0.499
Gradient boosting regression	81,544.21	0.114	125,124.47	0.509
KNN regression	81,822.38	0.053	129,392.32	0.51
MLP regression	87,292.51	0.015	131,955.43	0.53
Linear regression	83,567.2	0.10	122,091.4	0.51
Stage 2
Random forest regression	76,200.09	0.256	114,665.82	0.480
Gradient boosting regression	80,541.18	0.146	125,124.47	0.502
KNN regression	78,316.34	0.125	124,354.55	0.488
MLP regression	84,443.79	0.082	127,426.43	0.522
Linear regression	79,194.55	0.213	117,967.61	0.495
Stage 3
Random forest regression	69,350.09	0.453	98,307.96	0.444
Gradient boosting regression	74,903.48	0.27	114,208.19	0.463
KNN regression	69,679.54	0.409	102,186.88	0.442
MLP regression	71,833.24	0.388	104,002.02	0.457
Linear regression	71,160.20	0.405	102,547.22	0.453
Median	83,452.40	−0.052	136,387.27	0.518
Mean	87,879.67	−0.00057	133,006.56	0.540

TAVR, transcatheter aortic valve replacement; MAE, mean absolute error; RMSE, root mean squared error; RMSLE, root mean squared logarithmic error; KNN, k-nearest neighbors; MLP, multilayer perceptron.

Predictors of hospitalization charges and partial dependence plots

Features selected for building ML algorithms at each stage in order of their importance are depicted in Figure 2. The top features using the random forest regression algorithm were based on variable importance. At the time of admission (pre-procedurally), hospital region, fluid and electrolyte disorders, age, race and elective admission were the most significant predictors of hospitalizations charges. Hospitalizations for TAVR in the west region [$263,406.45 ($183,016.34)] were more expensive than hospitalizations in north-east region [$236,153.86 ($138,075.82)] followed by south [$204,133.57 ($107,372.72)] and mid-west [$182,217.13 ($98,036.83)] regions. There were higher hospitalization charges incurred amongst the Hispanic [$279,291.35 ($167,575.43)] and African-American population [$220,776.41 ($143,384.43)] compared to the Caucasians. Patients undergoing elective TF-TAVR were likely to have less hospitalization charges [$202,184.10 ($112,270.59) vs. $254,383.26 ($167,198.82)]. There was a negative correlation of age with adjusted hospitalization charges (pearson correlation coefficient −0.0559). Individuals aged 60–75 years had higher hospitalization charges compared to ≥75 years ($227,806.93 vs. $219,196.70).

Figure 2 Top features for predicting hospitalization charges in patients undergoing TF-TAVR, stage wise. TAVR, transcatheter aortic valve replacement; TF-TAVR, transfemoral TAVR.

Amongst the stage 2 variables, mechanical ventilation, acute renal failure, cardiogenic shock, use of mechanical circulatory support devices, new pacemaker insertion and in-hospital sepsis were prominent predictors of increased hospitalization charges. Use of mechanical ventilation was associated with around 2.5-fold increase in mean adjusted hospitalization charges [$504,642.89 ($336,264.50) vs. $210,434.02 ($115,121.91); cardiogenic shock, $404,308.31 ($355,471.88) vs. $212,766.58 ($118,381.33)] and mechanical circulatory support device use [$389,175.93 ($294,305.99) vs. $214,489.27 ($126,256.49)] around 2-fold increase in hospitalization charges and acute renal failure around 1.5-fold increase [$314,855.94 ($219,052.69) vs. $201,970.45 ($105,287.01)].

For stage 3, LOS was the most important predictor of hospitalization charges. Figure 2 depicts the two-way partial dependence interaction between LOS and the second important variable (i.e., mechanical ventilation).

The actual and stage wise predicted hospitalization charges for the first 20 patients is shown in the Figure 3. The stage wise lift charts for the testing (validation) cohort are depicted in the Figure 4. Decile wise actual and stage wise predicted hospitalization costs in patients undergoing TF-TAVR can be seen in Table 3. The accuracy of various ML algorithms for predicted versus measured hospitalization charges is shown in the supplement.

Figure 3 Comparison of actual and stage wise predicted hospitalization charges in patients undergoing TF-TAVR for the first 20 patients. TF-TAVR, transfemoral transcatheter aortic valve replacement.

Figure 4 Lift charts for the validation (testing cohorts) for stage 1, 2, and 3.

Table 3

Decile wise actual and stage wise predicted hospitalization costs in patients undergoing TF-TAVR

Deciles	Actual hospitalization cost ($)	Stage 1 predicted hospitalization cost ($)	Stage 2 predicted hospitalization cost ($)	Stage 3 predicted hospitalization cost ($)
0	225,526.82	221,053.83	223,444.37	222,345.92
1	222,735.92	223,061.71	221,752.52	220,706.20
2	219,101.85	219,483.58	218,695.32	217,788.57
3	218,321.29	221,001.81	218,306.04	217,550.80
4	219,647.94	221,441.96	221,086.45	221,266.57
5	220,886.87	220,847.21	221,587.54	221,807.53
6	220,671.16	221,302.26	222,426.94	221,743.26
7	222,169.46	220,758.07	222,452.45	223,556.07
8	219,366.78	221,767.46	222,100.56	220,122.74
9	226,741.90	222,864.56	222,896.01	225,652.89

TF-TAVR, transfemoral transcatheter aortic valve replacement

Discussion

Hospitalization charges is an important indicator of resource utilization (11) and understanding the predictors of higher hospitalization charges provides practitioners an opportunity to address potentially reversible drivers of charges. In our study, we found that ML regression algorithms performed significantly better than mean and median models for predicting adjusted hospitalization charges in patients undergoing TF-TAVR. Amongst the ML algorithms, random forest regression outperformed others at all stages. LOS was the most significant predictor of adjusted hospitalization charges.

LOS

We found that LOS is by far the strongest predictor of adjusted hospitalization charges. The influence of this variable is so strong that the relative impact of all other variables is nearly negligible. Decreasing LOS is thus of paramount importance for cost-reduction strategy for this patient population. LOS is an indicator of cumulative effects of multiple factors including patient’s baseline characteristics, co-morbidities, clinical presentation, urgency of procedure, post-procedure complications, and individual hospital protocols. The adjusted costs for next-day discharge (NDD) following TAVR is nearly $7,500 lower compared with non- NDD (12).

Post-procedural complications

Mechanical ventilation, acute renal failure, cardiogenic shock, use of mechanical circulatory support device, in hospital sepsis, and new pacemaker insertion were all associated with increased hospitalization costs. Acute kidney injury (AKI) occurs frequently following TAVR and has been associated with worse outcomes (13). AKI is expensive and consumes a considerable amount of health care resources. Even the most conservative episodes attribute approximately $1,700 in excess costs for each episode of AKI and $11,000 in excess costs for each episode of dialysis-requiring AKI (14). Since from the NIS database it is not possible to determine whether cardiogenic shock or use of mechanical circulatory support device occurred before or after the procedure, we included these variables in stage 2 in our study.

Other predictors of hospitalization charges

In our study cohort, there was a negative correlation of age with adjusted hospitalization charges. Higher costs in younger population is likely because the younger patients undergoing TAVR are relatively sicker and thus are at an increased risk of post-procedure complications.

The hospitalization charges for patients undergoing TF-TAVR varied across regions with highest in the west followed by north-east, south and mid-west regions. Healthcare expenditure in general varies widely by hospital region in United States. It is thus important to understand the regional differences in practice and attempt to reduce the wasteful charges for TAVR.

Patients undergoing elective TAVR were likely to incur less hospitalization charges [$202,184.10 ($112,270.59)] than those undergoing urgent/emergent TAVR [$254,383.26 ($167,198.82)]. This is because of increased complications including AKI or dialysis requirement and increased mortality in individuals undergoing urgent/emergent TAVR (13).

It is well established that there exist racial disparities in healthcare system. In our study population, around 90% patient population were Caucasians and 4% African-Americans and Hispanics. There occurs increased racial disparity in utilization of structural heart disease interventions (15). In our study population, there existed significant differences in adjusted hospitalization charges being higher in Hispanics and African-Americans compared with Caucasians. It is possible that decreased access to healthcare results in minority race patients presenting with advanced disease, driving up hospitalization charges.

Fluid & electrolyte disorders was another significant predictor of increased hospitalization charges. In patients undergoing TAVR, fluid & electrolyte disorder have been shown to be an independent predictor of mortality (16). Fluid & electrolyte disorder could be a modifiable predictor of adjusted hospitalization charges in patients undergoing TAVR and efforts should be geared towards reducing its occurrence.

Strengths and limitations

To the best of author’s knowledge, this is the first attempt to develop a ML prediction model for estimating hospitalization charges in patients undergoing TAVR. Second, we predicted the performance of ML algorithms at various stages, from the time of admission until procedure completion and finally taking LOS into account. Third, a robust performance assessment was done for various ML algorithms using multiple evaluation metrics to identify which algorithm most accurately predicts our outcome of interest i.e., hospitalization charges in patients undergoing TF-TAVR. Fourth, ML algorithms were not only compared amongst themselves but also the mean and median models.

ML models are often criticized for overfitting. To overcome this, we validated our ML regression algorithms internally using a rigorous 5-fold cross fold validation technique. However, the models developed have not been externally validated on a separate cohort. We used the NIS database which inherently has certain limitations as have been described before. There are certain variables which were not available for analysis including type of valves (balloon-expandable or self-expandable), echocardiographic characteristics, acuity of condition, degree of pre-procedural shock, and others.

In conclusion, we built ML algorithms that predict hospitalization charges with good accuracy in patients undergoing TF-TAVR at different stages of hospitalization and that can be used by healthcare providers to better understand the drivers of charges. LOS was the strongest predictor of hospitalization charges. Post-procedure complications including need for mechanical ventilation, acute renal failure, cardiogenic shock and use of mechanical support devices, in-hospital mortality, and need for pacemaker insertion; fluid and electrolyte disorders, age, hospital region and race were other predictors of hospitalization charges.

Acknowledgments

Funding: None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://cdt.amegroups.com/article/view/10.21037/cdt-21-717/rc

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://cdt.amegroups.com/article/view/10.21037/cdt-21-717/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Smith CR, Leon MB, Mack MJ, et al. Transcatheter versus surgical aortic-valve replacement in high-risk patients. N Engl J Med 2011;364:2187-98. [Crossref] [PubMed]
TVT Registry Datamart Data. TAVR Update: New Insights and Perspectives from the U.S.: National STS/ACC TVT Registry: National Cardiovascular Data Registry; 2019. Available online: https://www.sts.org/sites/default/files/102419%201645.%20Bavaria.%20TVT.pdf
Leon MB, Smith CR, Mack MJ, et al. Transcatheter or Surgical Aortic-Valve Replacement in Intermediate-Risk Patients. N Engl J Med 2016;374:1609-20. [Crossref] [PubMed]
Adams DH, Popma JJ, Reardon MJ, et al. Transcatheter aortic-valve replacement with a self-expanding prosthesis. N Engl J Med 2014;370:1790-8. [Crossref] [PubMed]
Leon MB, Smith CR, Mack M, et al. Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. N Engl J Med 2010;363:1597-607. [Crossref] [PubMed]
Reardon MJ, Van Mieghem NM, Popma JJ, et al. Surgical or Transcatheter Aortic-Valve Replacement in Intermediate-Risk Patients. N Engl J Med 2017;376:1321-31. [Crossref] [PubMed]
Mack MJ, Leon MB, Thourani VH, et al. Transcatheter Aortic-Valve Replacement with a Balloon-Expandable Valve in Low-Risk Patients. N Engl J Med 2019;380:1695-705. [Crossref] [PubMed]
Osnabrugge RL, Mylotte D, Head SJ, et al. Aortic stenosis in the elderly: disease prevalence and number of candidates for transcatheter aortic valve replacement: a meta-analysis and modeling study. J Am Coll Cardiol 2013;62:1002-12. [Crossref] [PubMed]
Lee SM, Kang JO, Suh YM. Comparison of hospital charge prediction models for colorectal cancer patients: neural network vs. decision tree models. J Korean Med Sci 2004;19:677-81. [Crossref] [PubMed]
Wang J, Li M, Hu YT, et al. Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models. BMC Health Serv Res 2009;9:161. [Crossref] [PubMed]
Muhlestein WE, Akagi DS, McManus AR, et al. Machine learning ensemble models predict total charges and drivers of cost for transsphenoidal surgery for pituitary tumor. J Neurosurg 2018;131:507-16. [Crossref] [PubMed]
Lauck SB, Baron SJ, Sathananthan J, et al. Exploring the Reduction in Hospitalization Costs Associated with Next-Day Discharge following Transfemoral Transcatheter Aortic Valve Replacement in the United States. Structural Heart. 2019;3:423-30. [Crossref]
Kolte D, Khera S, Vemulapalli S, et al. Outcomes Following Urgent/Emergent Transcatheter Aortic Valve Replacement: Insights From the STS/ACC TVT Registry. JACC Cardiovasc Interv 2018;11:1175-85. [Crossref] [PubMed]
Silver SA, Chertow GM. The Economic Consequences of Acute Kidney Injury. Nephron 2017;137:297-301. [Crossref] [PubMed]
Alkhouli M, Alqahtani F, Holmes DR, et al. Racial Disparities in the Utilization and Outcomes of Structural Heart Disease Interventions in the United States. J Am Heart Assoc 2019;8:e012125. [Crossref] [PubMed]
Akinseye OA, Shahreyar M, Nwagbara CC, et al. Modifiable Predictors of In-Hospital Mortality in Patients Undergoing Transcatheter Aortic Valve Replacement. Am J Med Sci 2018;356:135-40. [Crossref] [PubMed]

Cite this article as: Bansal A, Garg C, Hariri E, Kassis N, Mentias A, Krishnaswamy A, Kapadia SR. Machine learning models predict total charges and drivers of cost for transcatheter aortic valve replacement. Cardiovasc Diagn Ther 2022;12(4):464-474. doi: 10.21037/cdt-21-717

Machine learning models predict total charges and drivers of cost for transcatheter aortic valve replacement

Introduction

Methods

Data source and study population

Candidate variables and outcomes

Data pre-processing

ML model development and validation

Statistical analysis

Model performance

Partial dependence plots

Results

Patient characteristics

Table 1

ML regression algorithms’ predictive performance for adjusted hospitalization charges

Table 2

Predictors of hospitalization charges and partial dependence plots

Table 3

Discussion

LOS

Post-procedural complications

Other predictors of hospitalization charges

Strengths and limitations

Acknowledgments

Footnote

References

Article Options

Download Citation

Share