DECISION TREE CLASSIFIERS FOR CLASSIFICATION OF BREAST CANCER
DOI:
https://doi.org/10.22159/ijcpr.2017v9i1.17377Keywords:
Classification, J48, REP Tree, Random Forest, Random Tree, Accuracy, RMSE, Confusion matrixAbstract
Objective: Breast cancer is one of the dangerous cancers among world's women above 35 y. The breast is made up of lobules that secrete milk and thin milk ducts to carry milk from lobules to the nipple. Breast cancer mostly occurs either in lobules or in milk ducts. The most common type of breast cancer is ductal carcinoma where it starts from ducts and spreads across the lobules and surrounding tissues. Survey: According to the medical survey, each year there are about 125.0 per 100,000 new cases of breast cancer are diagnosed and 21.5 per 100,000 women due to this disease in united states. Also, 246,660 new cases of women with cancer are estimated for the year 2016.
Methods: Early diagnosis of breast cancer is a key factor for long-term survival of cancer patients. Classification is one of the vital techniques used by researchers to analyze and classify the medical data.
Results: This paper analyzes the different decision tree classifier algorithms for seer breast cancer dataset using WEKA software. The performance of the classifiers are evaluated against the parameters like accuracy, Kappa statistic, Entropy, RMSE, TP Rate, FP Rate, Precision, Recall, F-Measure, ROC, Specificity, Sensitivity.
Conclusion: The simulation results shows REPTree classifier classifies the data with 93.63% accuracy and minimum RMSE of 0.1628 REPTree algorithm consumes less time to build the model with 0.929 ROC and 0.959 PRC values. By comparing classification results, we confirm that a REPTree algorithm is better than other classification algorithms for SEER dataset.
Downloads
References
Aruna S, Rajagopalan SP, Nandakishore LV. Knowledge-based analysis of various statistical tools in detecting breast cancer. Computer Science and Information Technology. 2011;2:37–45.
Vaidehi K, Subashini TS. Breast tissue characterization using combined K-NN classifier. Indian J Sci Technol 2015;8:23–6.
Williams K, Idowu PA, Balogun JA, Oluwaranti A. Breast cancer risk prediction using data mining classification techniques. Transactions Networks Communications 2015;3:1–11.
Xindog Wu, Vipin Kumar. Top 10 algorithms in data mining. Knowledge Information Systems 2008;14:1-37.
RW Brause. Medical analysis and diagnosis by neural networks. Lecture Notes Comput Sci 2001;2199:1-13.
http://seer.cancer.gov/popdata/popdic.html-SEER dictionary. [Last accessed on 20 Sep 2016]
TM Cover. Geometrical and statistical properties of systems of linear with applications in pattern recognition. IEEE Transactions on Electronic Computers EC-14; 1965. p. 326-34.
Ramnath Takiar. Projections of a number of cancer cases in India (2010-2020) by Cancer Groups. Asian Pacific J Cancer Prevention 2010;11:1045-9.
Evanthia E Tripoliti. Automated diagnosis of diseases based on classification: dynamic determination of the number of trees in random forests algorithm. IEEE Transactions On Information Technology In Biomedicine; 2012. p. 16.