Abstract
Generative-AI teaching assistants are spreading rapidly across universities, but most evidence comes from single-course studies. We analyze student–AI interactions across hundreds of cross-disciplinary courses on the Uedu platform, automatically classify each message by Bloom level using GPT-5-mini (validated against human raters), and probe how assessment design shapes cognitive engagement trajectories.
Problem & Motivation
Generative-AI teaching assistants are rapidly spreading across universities, but prior work largely focuses on single courses or single disciplines. When AI TAs deploy across hundreds of cross-disciplinary courses, how do adoption patterns differ across disciplines? Do student–AI interactions truly elicit higher-order cognitive engagement, or do they remain at surface-level information lookup? How does course-level assessment design shape the cognitive depth of student engagement?
Method
We analyze student–AI TA interaction data across hundreds of courses from the multi-university deployment of the Uedu platform. Each student message is automatically Bloom-coded (remember / understand / apply / analyze / evaluate / create) by GPT-5-mini, validated against human raters at inter-rater-comparable Cohen's Kappa. In a programming-course sub-sample (n = 87) we further analyze how different assessment designs (pretest vs midterm) shift cognitive engagement patterns.
Findings
- Discipline matters: adoption and interaction modes vary substantially across disciplines; the temporal trajectory of cognitive engagement is dominated by syllabus structure rather than the AI tool itself.
- Practice volume is decisive: AI-TA interaction frequency positively predicts performance — and practice volume predicts test scores better than cognitive depth alone.
- Assessment design shapes cognition: project-based assessment elicits more high-order interactions (analyze / evaluate / create); traditional tests cluster engagement at remember / understand.
- Automated Bloom classification is viable: LLM coding reaches accuracy comparable to inter-human agreement, enabling scale analytics.
Implications
AI TAs are not one-size-fits-all — designers should tune the TA's role and guidance to disciplinary specificity. Assessment design is the most controllable lever on cognitive depth: if higher-order thinking is the goal, courses must design open-ended, project-based tasks rather than rely on conventional testing. Encouraging higher student–TA interaction frequency is itself an effective lever for learning outcomes.
Citation
BibTeX
@inproceedings{chang2026ai_ta_at_scale,
author = {Chia-Kai Chang and Kuei-Hao Li},
title = {{AI} Teaching Assistants at Scale: Cross-Disciplinary Patterns of Adoption and Cognitive Engagement Across Hundreds of University Courses},
booktitle = {Proc. ACM Conf. on Learning at Scale (L@S '26)},
year = {2026},
month = jul,
doi = {10.1145/3774398.3811596},
note = {Full Paper},
}