Abstract
Bloom's taxonomy is widely used to characterize student cognitive performance, but manual coding does not scale to AI-tutored dialogue. We evaluate prompt-engineering strategies that let an LLM automatically classify student responses across Bloom levels, finding both that the task is feasible and that prompt design has substantial effect on accuracy.
Problem & Motivation
Bloom's taxonomy is a widely used framework for characterizing the cognitive level of student responses, but manual coding (remember / understand / apply / analyze / evaluate / create) does not scale. As AI-mediated learning conversations proliferate, automatic and reliable assessment of student cognitive performance becomes a central methodological problem.
Method
We explored prompt-engineering strategies that allow a large language model to automatically evaluate cognitive performance in educational contexts. Carefully designed prompts guide the LLM to classify each student response against Bloom's taxonomy.
Findings
- Prompt-engineered LLMs are effective at evaluating student cognitive level.
- Prompt design choices materially affect classification accuracy.
- Automated cognitive assessment offers a tractable path for large-scale learning analytics.
Implications
Automated cognitive-level assessment lets instructors see beyond raw correctness and look at how students think. This is particularly relevant for cultivating higher-order thinking, where instruction must respond not only to wrong answers but to shallow patterns of reasoning.
Citation
BibTeX
@inproceedings{furqon2025cognitive_eval,
author = {Elvin Nur Furqon and Chia-Kai Chang},
title = {Evaluating Cognitive Performance Through Prompt-Based Methods Using {LLM} in Education},
booktitle = {Proc. IEEE Int. Conf. on Advanced Learning Technologies (ICALT)},
pages = {209--211},
year = {2025},
month = jul,
doi = {10.1109/ICALT64023.2025.00065},
}