Preprocessing of Inconsistent Creep Data Collected from a Literature Survey to Provide Reliable and Consistent Creep Life Prediction

Article information

Korean J. Met. Mater.. 2025;63(3):231-241
Publication date (electronic) : 2025 March 5
doi : https://doi.org/10.3365/KJMM.2025.63.3.231
1School of Materials Science and Engineering, Pusan National University, Busan, 46241, Republic of Korea
2Korea Institute of Materials Science, Changwon-si, Gyeongsangnam-do, 51508, Republic of Korea

- 이태주: 석사과정, 오창석: 연구원, 최윤석: 교수

*Corresponding Author: Yoon Suk Choi Tel: +82-51-510-2382, E-mail: choiys@pusan.ac.kr
Received 2024 December 17; Accepted 2025 January 23.

Abstract

Numerous studies on machine learning-based creep life prediction using the creep data collected from various existing experimental data have been reported. Since the prediction of creep life is heavily influenced by the quality and integrity of the collected creep data, data preprocessing is required to eliminate physically inconsistent creep datapoints, known as outliers. In the present study, a machine learning-based data screening methodology was developed to detect and eliminate outliers from the creep data collected from a survey of various studies in the literature. The methodology consisted of selecting appropriate machine learning models for the collected creep data through an assessment of their validity in creep physics, evaluating the prediction accuracy and variability of collected datapoints through bagging of the selected machine learning models, and identifying inconsistent datapoints by ranking their residuals and prediction variabilities. The proposed methodology for detecting and eliminating outliers was successfully applied to the multi-source collected creep data of a Ni-base single crystal superalloy CMSX-4 and led to improved accuracy and consistency in creep life prediction. In addition, the proposed methodology was validated by predicting the creep life of a newly generated creep dataset that was not exposed to any model training using a machine learning model trained and optimized by the outlier-eliminated creep data.

1. INTRODUCTION

Creep is a primary deformation and failure mechanism in high-temperature structural materials such as nickel-based superalloys and heat-resistant steels, and the duration until creep rupture failure is referred to as the creep life. The creep life varies depending on operational conditions but typically ranges from 102 to 105 hours. Conducting creep tests, which involve maintaining high stress and temperature conditions over a long period, is costly. Accelerated creep tests are thus primarily employed to measure creep properties due to their cost-effectiveness[1,2].

With the data obtained from accelerated creep tests, various physics-based models or phenomenological models have been developed and utilized to predict creep life under different test temperatures and stresses[3-8]. Recently, various data analytics-based approaches have been employed to predict creep life[9-13]. These approaches are summarized in Table 1.

Summary of previous studies on data analytics approaches for the creep data

C. Wang et al. proposed a creep-resistant alloy design system based on a high-throughput design module combined with creep life prediction results after machine learning of the data consisted of compositions, heat treatment conditions, and creep test conditions[9]. J. Wang et al. predicted creep life and creep life-related parameters of Cr-Mo based steels using machine learning algorithms[10]. Y. Liu et al. proposed a divide-and-conquer self-adaptive approach to predict the creep life of Ni based superalloys by taking chemical composition, test conditions, heat treatment, and microstructural parameters as input features[11]. D. Shin et al. used machine learning approaches for the creep data of alumina-forming austenitic steels after augmenting the original creep data with calculated thermodynamic properties, such as phase fractions, transformation temperatures, and so on, and showed that machine learning with the calculated thermodynamic properties improved the predictability of the creep property[12]. Certain authors also suggested that only four input features, temperature (T), stress (σ), and the two physics-informed features (ln σ and - 1/T), are sufficient to train machine learning algorithms for consistent and reliable prediction of the creep life for each creep dataset of heat-resistant steels. They also proposed a methodology that determines the optimum Larson-Miller (LM) constant (CLM) using the trained machine learning model, and showed that a better predictability of the creep life can be obtained by combining the machine learning model with CLM optimization[13].

Although numerous data analytics-based studies have been reported for creep data of various high-temperature structural materials, these studies lack an important element: raw data screening. Although many of the studies used creep data collected from an extensive literature survey, there were no detailed discussions regarding how the collected raw data were screened or preprocessed, particularly in the sense of eliminating outliers. This inconsistency is expected to become a key issue as creep data are collected from diverse sources, even for the same material and the same test conditions. This can be a serious issue when collecting and handling the creep data of popular high temperature structural materials, such as heat-resistant steels for power plant applications and Ni-base superalloys for aero-turbine applications, since the creep behaviors of those materials have been investigated in numerous studies and a relatively large amount of creep data is readily available from various sources[14-18].

In the present study, the authors have developed a data preprocessing methodology that eliminates outliers through a machine learning-based assessment of the consistency of the creep data collected from various literature surveys. Here, CMSX-4, a second-generation Ni-base single crystal superalloy (for the turbine blade application), was chosen as the target material for the collection of the creep data, since its creep data are relatively readily available from a variety of previous studies. After eliminating outliers, a variety of machine learning algorithms were assembled to train the screened creep data for reliable and consistent creep life prediction. Finally, the proposed outlier screening methodology was validated by comparing the creep life predictability of the models, trained with and without outlier-screened creep data, for the newly generated creep data, which were not exposed to any model training.

2. DATA COLLECTION, PREPROCESSING, AND MACHINE LEARNING APPROACHES

In the present study, CMSX-4, a 2nd generation nickel-base single crystal superalloy primarily used for turbine blades, was chosen as the target material for the collection of the creep data since it is one of the most extensively studied high-temperature alloys with regard to creep properties. Numerous literature sources provide a wide range of creep data of CMSX-4 with diverse creep test conditions. A total of 144 creep datapoints, including creep test stress, temperature, and creep life, were obtained from 20 literature sources, as summarized in Table 2.

List of references from which the creep data of CMSX-4 were collected

The collected creep data consist of stress (σ) ranging from 91.43 to 846.8 MPa, temperature (T) ranging from 650 to 1150°C, and creep rupture life (tR) ranging from 9.88 to 40,592.9 hours. Fig 1 shows the stress-creep rupture life plot for initial 144 creep datapoints collected in the present study. Even though the creep data were collected through different literature sources by carefully checking the consistency in chemical composition, heat treatment, and test conditions, the resulting creep property may vary by the institutions that actually performed the creep tests. Therefore, it is necessary to preprocess the creep data to eliminate outliers, which show the inconsistency in creep responses, compared to the majority of the data collected.

Fig. 1.

Initial 144 creep datapoints plotted in the stress-rupture life coordinate in log scales for different test temperatures

A methodology for outlier detection from the multi-source collected creep data was developed in the present study, as schematically illustrated in Fig 2. In the first step, five elementary machine learning (ML) algorithms (linear regression (LR), ridge regression (RR), support vector regression (SVR), random forest (RF) regression, and extra gradient boost (XGB)) were trained using four features (T, σ, lnσ, and -1/T) and log tR as the target. The physical validity of the trained models was then assessed by plotting predicted σ-tR curves for a given T along with original data points, as schematically illustrated in step 2 of Fig 2. Here, optimum models were selected as those that capture the inverse proportionality between the stress and creep rupture life (viz., shorter creep life at higher stress and longer creep life at lower stress). For the second step, bagging (bootstrap aggregation) with the selected optimum models was conducted by performing 100 iterations of training and prediction with a train-test split ratio of 4-to-1. The resulting prediction after bagging was visualized in a parity plot of measured and predicted creep rupture lives, as schematically illustrated in step 3 of Fig 2. Here, each datapoint was ranked by its variance in prediction and its absolute residual (a difference between the actual and predicted creep rupture life), and finally, datapoints that were consistently predicted beyond the range of the acceptable variance and absolute residual were considered as outliers and eliminated.

Fig. 2.

Schematic illustration of the outlier detection procedure for the multi-source collected creep data

The creep data after the outlier elimination was further used to build a ML-based ensemble model for reliable and consistent creep life prediction. For this purpose, an optimum ensemble out of nine ML algorithms (linear regression (LR), ridge regression (RR), support vector regression (SVR), random forest (RF) regression, K-nearest neighbor (KNN) regression, gradient boost (GB), extra gradient boost (XGB), Adaboost (ADA) and artificial neural network (ANN)) was determined by the best subset selection method. In this method, the creep life predictability for different ML ensembles, consisting of one to nine ML algorithms, was assessed, and the ensemble showing the best predictability was chosen as an optimum ensemble. Finally, the proposed outlier detection methodology was validated by comparing the creep life predictability of ensemble models trained with the creep data before and after the outlier elimination, respectively. Here, a creep dataset, which was newly generated and unexposed to ensemble model training, was used for the validation.

3. RESULTS AND DISCUSSION

As a first step for the outlier detection (Fig 2) of the multi-source collected creep data, the dataset of entire 144 creep datapoints (Fig 1) was used to train the five ML algorithms (LR, RR, SVR, RF, and XGB) with a train-test split ratio of 4-to-1, and the resulting predictability was compared among the models using R2 values for the train and test data, as shown in Fig 3(a). The R2 value range between 0.78 and 0.96 for the five ML models, which appears to be a reasonable range to assess the physical validity of each model. To check if each of the trained ML models follows creep physics (the second step in Fig 2), predicted σ-tR curves for a given test temperature were plotted together with original creep datapoints, as shown in Figs 3(b) to (f). Here, 750°C was chosen as the creep test temperature for the demonstration. In Figs 3(b) to (d) LR, RR, and SVR tend to follow the creep physics, showing that the predicted creep life decreases gradually with increasing applied stress. However, for RF and XGB in Figs 3(e) and (f), the predicted creep life remains unchanged outside the stress range of the train data, which appears to be an intrinsic limitation of tree-based ML algorithms. Such an insensitive σ-tR relation outside the stress domain of the creep data violates the creep physics, and RF and XGB were not included in the subsequent outlier detection step.

Fig. 3.

(a) Predictability comparison among the five ML algorithms trained by entire 144 creep data-points, and stress-creep rupture life curves at 750°C predicted by (b) linear regression, (c) ridge regression, (d) support vector regression, (e) random forest regression, and (f) XGboost regression.

As the third step for outlier detection in Fig 2, bagging was performed 100 times for LR, RR, and SVR, and the resulting measured-predicted creep rupture lives are plotted in Fig 4(a) for comparison.

Fig. 4.

(a) Measured and predicted creep rupture lives after bagging 100 times with linear regression, ridge regression, and support vector regression, (b) the standard deviation of the prediction with the variation of the absolute error for all creep datapoints.

Here, creep datapoints tend to show different ranges of predictions, as represented by error bars, depending on the ML models. Also, some datapoints are located away from the diagonal parity line, depending on the ML models, in Fig 4(a). In the present study, the identification of outliers started by simultaneously measuring the variation of the predicted creep rupture life (quantified by the standard deviation (STD) of predicted creep rupture lives for each datapoint) and how far the predicted creep rupture life was from the parity line (quantified by the absolute error (AE) between the measured and predicted creep rupture lives). Fig 4(b) shows the resulting STD-AE distribution of all creep datapoints predicted by LR, RR, and SVR. In Fig 4(b), there appears to be a number of datapoints with both large STD and AE values across all three models, indicating that inconsistent predictions with relatively low accuracy are expected for those datapoints. Such an abnormality in STD and AE was considered to be a primary measure for identifying outliers and could be quantified by the Euclidian distance (d) in the STD-AE coordinate using the following equation:

(1) d=(10×STD)2×(AE)2

The abnormality of each datapoint was ranked by d of equation (1) with the highest rank being the largest d value. Here, in order to set the optimum number of abnormal datapoints, viz. outliers, that should be eliminated, the root mean square error (RMSE) and R2 values for each of LR, RR, and SVR were calculated for both training and test data by gradually eliminating datapoints from the highest to lowest abnormality ranking. The resulting predictability variation is plotted in Fig 5 for LR, RR, and SVR.

Fig. 5.

Variation of the root-mean-square error and R2 values for linear regression ((a) and (b)), ridge regression ((c) and (d)) and support vector regression ((e) and (f)) as abnormal datapoints are gradually eliminated from the highest to lowest rank for each of the ML models.

Specifically, in the LR model, as shown in Figs 5(a) and (b), the RMSE and R² values converge after the removal of 30 data points. However, when an excessive number of data points are removed, overfitting occurs, as indicated by a significant disparity between the training and test errors, along with an increase in the variance. Similar analyses were conducted for the RR and SVR models, where it was observed that the prediction performance of both models stabilized after the removal of 20 data points, respectively. Therefore, the optimal number of data points to be removed for models LR, RR, and SVR, was determined to be 30, 20, and 20, respectively. Finally, 13 data points, which were detected in common across all three models, were classified as outliers and eliminated. By eliminating those 13 high-abnormality-ranked common datapoints the predictability is expected to be improved for all three models while keeping the predictability difference between the train and test data unchanged, indicating the minimized risk of overfitting.

The predictability of the three ML models trained with the creep data before and after the outlier elimination were compared in Fig 7.

Fig. 6.

131 creep datapoints plotted in the stress-rupture life coordinate in log scales for different test temperatures after eliminating 13 outliers (marked by “x”) by following the procedure illustrated in Fig. 2

Fig. 7.

Comparison of the predictability of ML models trained with the creep data (a) before and (b) after the outlier elimination proposed in the present study.

It is clear that both the prediction accuracy and consistency were significantly improved for all three models by applying the outlier elimination proposed in the present study. Additionally, the effectiveness of the proposed outlier elimination methodology was assessed by predicting the creep life using the Larson-Miller Parameter (LMP) model, which is one of the most widely used physics-based creep life prediction models. Here, the LMP value was calculated by T?(log tR+ 20) for a given stress. Fig 8(a) plotted LMP values as a function of the applied stress. In Fig 8(a), square marks indicate outliers detected by the proposed methodology, and red and blue curves are fitting curves for LMP values with and without the outlier elimination, respectively. Here, a third-order polynomial equation was used to fit LMP values[39]. The creep life was predicted using the fitted LMP curves with and without outlier elimination, and the resulting predictability of the LMP model is compared in Fig 8(b). As confirmed from Fig 8(b), the predictability of the LMP model was significantly improved by 47.6% and 19.2% in terms of the R2 value (from 0.317 to 0.468) and RMSE value (from 0.650 to 0.525), respectively, after the outlier elimination. This implies that the proposed outlier elimination methodology effectively works for the physics-based creep life prediction.

Fig. 8.

(a) LMP values plotted as a function of the applied stress, (b) comparison of creep life predictability of the LMP models with and without outlier elimination. Here, square marks indicate LMP data points detected as outliers.

For further improvement and generalization of the creep life predictability, ensemble learning was adopted by applying nine ML algorithms. Fig 9 shows the R2 values of training and test data for the nine ML algorithms trained with the outlier-eliminated creep data.

Fig. 9.

Comparison of predictability among ten ML algorithms trained with the outlier-eliminated creep data.

The predictability of all possible combinations, viz. ensembles, of these ML algorithms was assessed to determine the best subset of ML algorithms showing the highest predictability. Fig 10 shows the results of the best subset selection. The R2 value for the test data was improved up to 0.89 for the ensemble of four ML algorithms, SVR, KNN, XGB and ANN, as seen in Fig 10.

Fig. 10.

Comparison of predictability for a total of 511 ensemble models. The best ensemble was a combination of four ML algorithms, SVR, KNN, XGB, and ANN.

Fig 11(a) shows the creep lives predicted by the best ensemble model, compared with the measured creep lives in a parity plot for the training and test data. The prediction accuracy quantified by R2 values was also compared among LR, RR, SVR, and ensemble models in Fig 11(b). It is clear from Fig 11 that the creep life prediction was further improved by ensemble learning.

Fig. 11.

(a) Creep lives, predicted by the best ensemble model from Fig. 9, plotted with measured creep lives, (b) comparison of the prediction accuracy among LR, RR, SVR, and ensemble models.

The results from Figs 7 and 10 indicate that the proposed outlier-elimination methodology for the multi-source collected creep data improves the accuracy and consistency of the ML models for creep life prediction. The validity of the proposed outlier-elimination methodology was then further evaluated by predicting creep lives using ML models, trained with creep data before and after outlier screening, for a new creep dataset that was not exposed to the proposed methodology, which is shown in Fig 12(a). To generate a reliable validation creep dataset, test specimens were fabricated and processed by following the globally standardized procedures for manufacturing second generation Ni-base superalloys CMSX-4. The composition of the specimen was carefully checked, as shown in Table 3, to ensure that it falls within the allowable compositional domain for CMSX-4. The single crystal casting procedure followed the Bridgeman method such that the rod specimen has a <100> crystallographic orientation along the axial direction with a tolerable orientation variation. The three-stage standard heat treatment was applied to the cast samples through solution heat treatment at 1320°C for two hours followed by two-step aging at 1140°C for two hours and at 871°C for 20 hours. Creep tests were conducted by following ASTM-E139 using a rod-shaped specimen with a diameter of 6 mm. Each test specimen was heated to the target temperature and held for one hour, and subsequently the load was applied.

Fig. 12.

(a) A new creep dataset that was not exposed to data preprocessing or model training for validation of the proposed outlierelimination methodology, (b) creep lives predicted by ensemble models with and without the outlier-eliminated creep data, compared to measured creep lives for the new creep dataset.

Composition of CMSX-4 specimens used for the generation of new validation creep dataset.

Ensemble models of Fig 11 trained with and without the outlier-eliminated creep data predicted creep lives of the new and unexposed creep dataset, and the results are shown in Fig 12(b). The RMSE decreased own to 0.266 from 0.368 by eliminating outliers. Although some residuals slightly increased after the outlier elimination, Fig 12(b) clearly validates that the outlier-elimination methodology proposed in the present study improved the overall prediction accuracy and its consistency.

Summarizing the results from Figs 2, 7, 11, and 12, the proposed data preprocessing methodology effectively screened outliers of the creep data collected from various sources. Multi-source collected creep data tend to intrinsically contain uncertainties and inconsistencies. Here, uncertainty indicates variability in the creep life while inconsistency implies variability in the stress-temperature-creep life relationship even for the same material. The outlier detection methodology proposed in Fig 2 adopted different ML algorithms to assess the physical validity of collected creep data points, and it successfully detected outliers from the collected creep data. The main contribution of the present study is that the proposed outlier detection framework, consisting of the identification of physically acceptable ML algorithms and bagging-based quantification (ranking) of the prediction accuracy and consistency for each collected datapoint, can be expanded to screening anomalies or outliers of other types of collected data. However, in the present study, a systematic assessment of the influence of the number of data points collected and the variability of the collected data points on the validity of the proposed outlier elimination methodology was not performed. In the present study, 144 creep data points were collected from 73 (stress, temperature) test conditions, which appeared to be sufficient statistically for the application of the proposed methodology. It is believed that the proposed outlier elimination methodology is not applicable for fewer data points, for example fewer than 30 data points, considering five stress conditions for each of six temperature conditions, which are typical test conditions for high temperature metallic materials. Moreover, it is important to check if the collected datapoints tend to be inhomogeneously populated in certain test conditions, which will impact preliminary ML results and adversely impact the outlier detection.

4. CONCLUSIONS

A machine learning-based data preprocessing methodology was proposed to detect and eliminate unusual datapoints (outliers) for creep data collected from a survey of various studies in the literature. The multi-source collected CMSX-4 creep data were chosen to demonstrate and validate the proposed methodology, and the following key conclusions were drawn.

1. The proposed outlier detection and elimination methodology consisted of selecting reasonable machine learning models for the collected original creep data through a physical validity assessment, bagging of the selected machine learning models to assess the prediction accuracy and variability of each datapoint, and ranking the datapoints by quantifying how accurately and consistently they were predicted by the selected machine learning models.

2. The creep life predictability of the selected machine learning models was improved from R2 = 0.701 to R2 = 0.867 after outlier elimination. The variability in prediction for each datapoint was also reduced by 70% after eliminating outliers. The ensemble model with a combination of support vector regression (SVR), K-nearest neighbor (KNN), extra gradient boost (XGB), and artificial neural network (ANN) further improved the predictability for the creep data after outlier elimination.

3. A newly generated creep dataset, which was not exposed to outlier elimination or machine learning, was used to check the robustness of the machine learning model built from creep data screened by the proposed outlier detection methodology. The ensemble model trained by the outlier-screened creep data showed improved creep life prediction accuracy and consistency for the newly generated creep dataset.

Notes

ACKNOWLEDGEMENT

The present work was supported in part by the Nano & Material Technology Development Program through the National Research Foundation of Korea (NRF) funded by Ministry of Science and ICT (RS-2024-00451579) and in part by the Agency for Defense Development Grant funded by the Korean Government (UD220004JD). Also, a newly generated creep dataset of Fig. 12(a) was kindly provided by Dr. Baig Gyu Choi of Korea Institute of Materials Science.

References

1. Kim H., Hong S., Kim J., Lee Y.. Korean J. Met. Mater 61:301. 2023;
2. Won Y., Gu J., Lee J.. Korean J. Met. Mater 62:766. 2024;
3. Monkman F. C., Grant N. J.. Proc. ASTM 56:593. 1956;
4. Kassner M., Pérez-Prado M.-T.. Prog. Mater. Sci 45:1. 2000;
5. Wilshire B., Battenbough A.. Mater. Sci. Eng. A 443:156. 2007;
6. Larson F. R., Miller J.. Trans. ASME 74:765. 1952;
7. Wilshire B., Scharning P.. Scr. Mater 56:701. 2007;
8. Williams S., Bache M., Wilshire B.. Mater. Sci. Technol 26:1332. 2010;
9. Wang C., Wei X., Ren D., Wang X., Xu W.. Mater. Des 213:110326. 2022;
10. Wang J., Fa Y., Tian Y., Yu X.. JMR&T 13:635. 2021;
11. Liu Y., Wu J., Wang Z., Lu X.-G., Avdeev M., Shi S., Wang C., Yu T.. Acta Mater 195:454. 2020;
12. Shin D., Yamamoto Y., Brady M. P., Lee S., Haynes J. A.. Acta Mater 168:321. 2019;
13. Lee C., Lee T., Choi Y. S.. Met. Mater. Int 29:3149. 2023;
14. Dong S., Wang Y., Li J., Li Y., Wang L., Zhang J.. Met. Mater. Int 30:593. 2024;
15. Kong B. O., Kim M. S., Kim B. H.. J. Alloys Compd 923:166123. 2022;
16. Murakami T., Harada H., Kobayashi T.. Metall. Mater. Trans. A 30:1143. 1999;
17. Jiang L., Harada H., Kobayashi T.. Acta Mater 51:3871. 2003;
18. Evans R. W.. Metall. Mater. Trans. A 19:2187. 1988;
19. Kitaguchi H., Enomoto M., Ohnuma K.. J. Mater. Sci 55:15314. 2020;
20. Miller M. K.. Atom Probe Tomography: Analysis at the Atomic Level p. 1–237. Kluwer Academic/Plenum Publishers. NewYork: 2000.
21. Maruyama T., Oikawa K., Ito K., Hasegawa M.. Scr. Mater 194:113699. 2021;
22. Wang Y., Wang L., Zhang H., Sun B.. J. Mater. Sci. Technol 77:1. 2021;
23. Laughlin D. E., Hono K.. Physical Metallurgy p. 1–2837. Elsevier. Oxford: 2014.
24. Morinaga M.. Intermetallics p. 1–159. Elsevier. Oxford: 2010.
25. Davis J. R.. Nickel, Cobalt, and Their Alloys p. 1–422. ASM International, Materials Park. Ohio: 2000.
26. Nitta H., Hirata M., Kanemura T., Ishizawa S.. Intermetallics 148:107797. 2023;
27. Padilha A. F., Rios P. R.. Mater. Res 6:159. 2003;
28. Kim H. Y., Lee S. W., Lee Y. H.. Scr. Mater 164:145. 2019;
29. Allen S. M., Cahn J. W., Carter W. C.. Acta Metall. Mater 38:1937. 1990;
30. Miller M. K., Russell K. F., Danoix G.. Ultramicroscopy 111:469. 2011;
31. Wang J., Chen M., Yang L., Sun W., Zhu S., Wang F.. Corros. Commun 1:58. 2021;
32. Rösler J., Näth O., Jäger S., Schmitz F., Mukherji D.. Acta. Mater 53:1397. 2005;
33. Kassner M. E.. Fundamentals of creep in metals and alloys p. 1–338. Butterworth-Heinmann. England: 2015.
34. Reed R., Matan N., Cox D., Rist M., Rae C.. Acta. Mater 47:3367. 1999;
35. Rösler J., Näth O.. Acta. Mater 58:1815. 2010;
36. Coakley J., Reed R. C., Warwick J. L., Rahman K. M., Dye D.. Acta. Mater 60:2729. 2012;
37. Rae C., Reed R.. Acta. Mater 55:1067. 2007;
38. Prasad S. C., Rajagopal K., Rao I.. Acta. Mater 54:1487. 2006;
39. Zhu X., Cheng H., Shen M., Pan J.. Adv. Mat. Res. 791- 793:374. 2013;

Article information Continued

Fig. 1.

Initial 144 creep datapoints plotted in the stress-rupture life coordinate in log scales for different test temperatures

Fig. 2.

Schematic illustration of the outlier detection procedure for the multi-source collected creep data

Fig. 3.

(a) Predictability comparison among the five ML algorithms trained by entire 144 creep data-points, and stress-creep rupture life curves at 750°C predicted by (b) linear regression, (c) ridge regression, (d) support vector regression, (e) random forest regression, and (f) XGboost regression.

Fig. 4.

(a) Measured and predicted creep rupture lives after bagging 100 times with linear regression, ridge regression, and support vector regression, (b) the standard deviation of the prediction with the variation of the absolute error for all creep datapoints.

Fig. 5.

Variation of the root-mean-square error and R2 values for linear regression ((a) and (b)), ridge regression ((c) and (d)) and support vector regression ((e) and (f)) as abnormal datapoints are gradually eliminated from the highest to lowest rank for each of the ML models.

Fig. 6.

131 creep datapoints plotted in the stress-rupture life coordinate in log scales for different test temperatures after eliminating 13 outliers (marked by “x”) by following the procedure illustrated in Fig. 2

Fig. 7.

Comparison of the predictability of ML models trained with the creep data (a) before and (b) after the outlier elimination proposed in the present study.

Fig. 8.

(a) LMP values plotted as a function of the applied stress, (b) comparison of creep life predictability of the LMP models with and without outlier elimination. Here, square marks indicate LMP data points detected as outliers.

Fig. 9.

Comparison of predictability among ten ML algorithms trained with the outlier-eliminated creep data.

Fig. 10.

Comparison of predictability for a total of 511 ensemble models. The best ensemble was a combination of four ML algorithms, SVR, KNN, XGB, and ANN.

Fig. 11.

(a) Creep lives, predicted by the best ensemble model from Fig. 9, plotted with measured creep lives, (b) comparison of the prediction accuracy among LR, RR, SVR, and ensemble models.

Fig. 12.

(a) A new creep dataset that was not exposed to data preprocessing or model training for validation of the proposed outlierelimination methodology, (b) creep lives predicted by ensemble models with and without the outlier-eliminated creep data, compared to measured creep lives for the new creep dataset.

Table 1.

Summary of previous studies on data analytics approaches for the creep data

Study Dataset Features Model Target Accuracy
C. Wang et al. [9] Low-alloy steels 1770 data points Compositions, Heat treatment conditions, Creep test conditions RFa, GBRb, XGVc, MLPd, SVRe, LRf, DTg Creep life 0.9677 (R2 value)
J. Wang et al. [10] Cr-Mo steel 2066 data points Compositions, Heat treatment conditions, Creep test conditions LR, SGDh, DT, SVR, RF, MLP, KNNi, KRj Creep life, LMPk, MHPl, MSPm 0.8481, 0.9677, 0.9669, 0.9695 (R2 value)
Y. Liu et al. [11] Ni based superalloy 266 data points Compositions, Heat treatment conditions, Microstructural factors, Creep test conditions RF, SVR, GPRn, LR, RRo, DCSAp learning Creep life 0.9176 (R2 value)
D. Shin et al. [12] AFAq stainless alloys 82502 data points Compositions, Creep test conditions, Heat treatment conditions, Computational thermodynamic features RF, LR, NNr, KR, BRs LMP 0.94 (PCC value)
C.H. Lee at al. [13] Cr steel 974 data points Creep test conditions LR, C-LMP Creep life 0.9705 (R2 value)
a

Random forest

b

Gradient boosting regression

c

Extreme gradient boosting

d

Multilayer perceptron

e

Support vector regression

f

Linear regression

g

Decision tree

h

Stochastic gradient descent

i

K-nearest neighbour algorithm

j

Kernel ridge regression

k

Larson-Miller parameter

l

Manson-Haferd parameter

m

Manson-Succop parameter

n

Gaussian process regression

o

Ridge regression

p

Divide-and-conquer self-adaptive

q

17Alumina-forming austenitic

r

Nearest neighbor

s

Bayesian ridge

Table 2.

List of references from which the creep data of CMSX-4 were collected

Reference # of data Reference # of data
W. Xuan et al. [19] 1 Q. M. Yu et al. [29] 2
P.J. Henderson et al. [20] 6 H. V. Atkinson et al. [30] 37
J. Komenda et al. [21] 7 J. Wang et al. [31] 1
L. M. Bortoluci Ormastroni et al. [22] 5 J. Rösler,et al. [32] 1
D. Bürger et al. [23] 6 M. E. Kassner et al. [33] 3
K. Cheng et al. [24] 5 R. C. Reed et al. [34] 3
M. Kamaraj et al. [25] 2 J. Rösler et al. [35] 1
S. P. Jeffs et al. [26] 27 J. Coakley et al [36] 3
J. Svoboda et al [27] 5 C. M. F. Rae et al [37] 6
K. Kakehi et al. [28] 1 S. C. Prasad et al. [38] 22

Table 3.

Composition of CMSX-4 specimens used for the generation of new validation creep dataset.

Element Ni Cr Co W Mo Ta Re Al Ti Hf
wt.% Bal. 6.5 9 6 0.6 6.5 3 5.6 1.0 0.1