AI Versus Human Graders: Assessing the Role of Large Language Models in Higher Education

Mahlatse Ragolane

Authors

Mahlatse Ragolane Regent Business School

Keywords:

Artificial Intelligence, LLMs, ChatGPT, Higher Education, Assessment, AI Grading

Abstract

also forced to adapt and function together with AI, especially in assessment grading. In retrospect, human grading, on the other hand, has long been the cornerstone of educational assessment. Traditionally, educators have assessed student work based on established criteria, providing feedback intended to support learning and development. While human grading offers nuanced understanding and personalized feedback, it is also subject to limitations such as grading inconsistencies, biases, and significant time demands. This paper explores the role of large language models (LLMs), such as ChatGPT-3.5 and ChatGPT-4, in grading processes in higher education and compares their effectiveness with that of traditional human grading methods. The study uses both qualitative and quantitative methodologies, and the research extends across multiple academic programs and modules, providing a comprehensive assessment of how AI can complement or replace human graders. In study 1, we focused on (n=195) scripts in (n=3) modules and compared GPT 3.5, GPT 4, and human graders. Manually marked scripts exhibited an average of 24% mark difference. Subsequently, (n=20) scripts were assessed using GPT-4, which yielded a more precise evaluation. Total average of 4% difference in results. There were individual instances where marks were higher, but this could not naturally be a marker judgment. In Study 2, the results from the first study highlighted the need for a comprehensive memorandum; thus, we identified (n=4341), among which (n=3508) scripts were used. The study found that AI remains efficient when the memorandum is well-structured. It was also found that while AI excels in scalability, human graders excel in interpreting complex answers, evaluating creativity, and picking up plagiarism. In Study 3, we evaluated formative assessments in GPT 4 (statistics n=602, Business Statistics n=859 and Logistics Management n=522). The third study demonstrated that AI marking tools can effectively manage the demands of formative assessments, particularly in modules where the questions are objective and structured, such as Statistics and Logistics Management. The initial error in Statistics 102 highlighted the importance of a well-designed memorandum. The study concludes that AI tools can effectively reduce the burden on educators but should be integrated into a hybrid model in which human markers and AI systems work in tandem to achieve fairness, accuracy, and quality in assessments. This paper contributes to ongoing debates about the future of AI in education by emphasizing the importance of a well-structured memorandum and human discretion in achieving balanced and effective grading solutions.

References

Colonna, L. (2024). Teachers informed in the loop? An analysis of automatic assessment systems under Article 22 GDPR. International Data Privacy Law, 14( 1), 3–18. https://doi.org/10.1093/idpl/ipad024

Defrijin, S.; Mathijs, E.; Gulinck, H. & Lauwers, L. (2007). Facilitating and evaluating farmer innovations toward more sustainable energy and material flows: A case study in Flanders. 8th European IFSA Symposium, 6-10 July 2008, Clermont-Ferrand (France). Available online: https://www.researchgate.net/publication/228785130_Facilitating_and_evaluating_farmer_innovations_towards_more_sustainable_energy_and_material_flows_case-study_in_Flanders [accessed Sep 17 2024].

Funda, V. & Piderit, R. (2024). A review of the application of artificial intelligence in South African Higher Education, 2024 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 2024, pp. 44-50, doi: 10.1109/ICTAS59620.2024.10507113.

GameDevNews (2023). The Evolution of AI Writing Models: From GPT-2 to the Future, Open AI & ChatGPT News, Linkedin, https://www.linkedin.com/pulse/evolution-ai-writing-models-from-gpt-2-future-open-ai-gpt-news-95zlc/

Gobrecht, A., Tuma, F., Möller, M., Zöller, T., Zakhvatkin, M., Wuttig, A., Sommerfeldt, H., & Schütt, S. (2024). Beyond human subjectivity and error: A novel AI grading system. ArXiv. /abs/2405.04323

Huriye, A. Z. (2023). The Ethics of Artificial Intelligence: Examining the Ethical Considerations Surrounding the Development and Use of AI. American Journal of Technology, 2(1), 37 44. Retrieved from https://gprjournals.org/journals/index.php/AJT/article/view/142

Kamalov F, Santandreu Calonge D., & Gurrib I. (2023). New Era of Artificial Intelligence in Education: Towards a Sustainable Multifaceted Revolution. Sustainability. 15(16):12451. https://doi.org/10.3390/su151612451

Kharbach, M. (2024). A Timeline of The Evolution of ChatGPT. https://www.educatorstechnology.com/2024/06/the-evolution-of-chatgpt.html

Khurana, D., Koli, A., Khatter, K. et al. (2023). Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl 82, 3713–3744 https://doi.org/10.1007/s11042-022-13428-4

Kortemeyer.,G Nöhl.,J. & Onishchuk., D. (2024). Grading Assistance for a Handwritten Thermodynamics Exam using Artificial Intelligence: An Exploratory Study. doi: 10.48550/arxiv.2406.17859

Kurzhals, H. D. (2022). Challenges and approaches related to AI-driven grading of open exam questions in higher education: Human in the loop, Computer Science, Education. Available online: https://essay.utwente.nl/90957/1/Kurzhals_BA_BMS.pdf

Minaee, S., Mikolov, T., Chenaghlu, N., Socher, M., Amatriain, R., X. & Gao., J. (2024). Large Language Models: A Survey, https://doi.org/10.48550/arXiv.2402.06196

Opesemowo, O., & Adekomaya, V. (2024) Harnessing Artificial Intelligence for Advancing Sustainable Development Goals in South Africa’s higher education system: A Qualitative Study. International Journal of Learning, Teaching and Educational Research, 23. 67-86. 10.26803/ijlter.23.3.4.

Patel, S., & Ragolane, M. (2024). Implementing Artificial Intelligence in Higher Education Institutions in South Africa: Opportunities and Challenges. Technium Education and Humanities, 9, 51–65. https://doi.org/10.47577/teh.v9i.11452

Ragolane, M., & Patel, S. (2024). Transforming Educ-AI-tion in South Africa: Can AI-Driven Grading Transform the Future of Higher Education?. Journal of Education and Teaching Methods, 3(1), 26–51. https://doi.org/10.58425/jetm.v3i1.267

Schleicher, A. (2018). Educating Learners for Their Future, Not Our Past. ECNU Review of Education, 1(1), 58-75. https://doi.org/10.30926/ecnuroe2018010104

Stoica, E. (2022). A Student’s Take on Challenges of AI-driven Grading in Higher Education. TScIT 37, July 8, 2022, Enschede, The Netherlands. https://essay.utwente.nl/91784/1/Stoica_BA_EEMCS.pdf

VSO. (2019). THE ACTION RESEARCH GUIDEBOOK, ‘Progress is only possible by working together.’ Available at: https://www.vsointernational.org/sites/default/files/2020-04/vso-cambodia-action-research-guidebook-english.pdf

Walter, Y. (2024). Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education. Int J Educ Technol High Educ 21, 15. https://doi.org/10.1186/s41239-024-00448-3

Walvoord, B.E. & Johnson Anderson., V. (1998). Effective Grading: A Tool for Learning and Assessment. San Francisco: Jossey-Bass.

Zuber-Skerritt, O. (Ed.). (1991). Action Research for Change and Development (1st ed.). Routledge. https://doi.org/10.4324/9781003248491

AI Versus Human Graders: Assessing the Role of Large Language Models in Higher Education

Authors

Keywords:

Abstract

References

Published

How to Cite

Issue

Section