AI Teaching Assistants at Scale: Cross-Disciplinary Patterns of Adoption and Cognitive Engagement Across Hundreds of University Courses

Abstract

Generative-AI teaching assistants are spreading rapidly across universities, but most evidence comes from single-course studies. We analyze student–AI interactions across hundreds of cross-disciplinary courses on the Uedu platform, automatically classify each message by Bloom level using GPT-5-mini (validated against human raters), and probe how assessment design shapes cognitive engagement trajectories.

Problem & Motivation

Generative-AI teaching assistants are rapidly spreading across universities, but prior work largely focuses on single courses or single disciplines. When AI TAs deploy across hundreds of cross-disciplinary courses, how do adoption patterns differ across disciplines? Do student–AI interactions truly elicit higher-order cognitive engagement, or do they remain at surface-level information lookup? How does course-level assessment design shape the cognitive depth of student engagement?

Method

We analyze student–AI TA interaction data across hundreds of courses from the multi-university deployment of the Uedu platform. Each student message is automatically Bloom-coded (remember / understand / apply / analyze / evaluate / create) by GPT-5-mini, validated against human raters at inter-rater-comparable Cohen's Kappa. In a programming-course sub-sample (n = 87) we further analyze how different assessment designs (pretest vs midterm) shift cognitive engagement patterns.

Findings

Discipline matters: adoption and interaction modes vary substantially across disciplines; the temporal trajectory of cognitive engagement is dominated by syllabus structure rather than the AI tool itself.
Practice volume is decisive: AI-TA interaction frequency positively predicts performance — and practice volume predicts test scores better than cognitive depth alone.
Assessment design shapes cognition: project-based assessment elicits more high-order interactions (analyze / evaluate / create); traditional tests cluster engagement at remember / understand.
Automated Bloom classification is viable: LLM coding reaches accuracy comparable to inter-human agreement, enabling scale analytics.

Implications

AI TAs are not one-size-fits-all — designers should tune the TA's role and guidance to disciplinary specificity. Assessment design is the most controllable lever on cognitive depth: if higher-order thinking is the goal, courses must design open-ended, project-based tasks rather than rely on conventional testing. Encouraging higher student–TA interaction frequency is itself an effective lever for learning outcomes.

Citation

C.-K. Chang and K.-H. Li, “AI Teaching Assistants at Scale: Cross-Disciplinary Patterns of Adoption and Cognitive Engagement Across Hundreds of University Courses,” in ACM L@S 2026 (Full Paper), 2026. doi: 10.1145/3774398.3811596.

BibTeX

@inproceedings{chang2026ai_ta_at_scale,
  author    = {Chia-Kai Chang and Kuei-Hao Li},
  title     = {{AI} Teaching Assistants at Scale: Cross-Disciplinary Patterns of Adoption and Cognitive Engagement Across Hundreds of University Courses},
  booktitle = {Proc. ACM Conf. on Learning at Scale (L@S '26)},
  year      = {2026},
  month     = jul,
  doi       = {10.1145/3774398.3811596},
  note      = {Full Paper},
}

Agentic AI as a Dual-Role Lecturer and Teaching Assistant: Effects on Learner Autonomy, Self-Regulated Learning, and Intrinsic Motivation in University-Level EFL Education

PALM: Scaling Physiologically-Aware AI Tutoring Through Consumer Wearables and Large Language Models