Validation of Natural Language–Based Educational Digital Twins through Embedding Geometry in Python Courses

Abstract

Can learner-produced natural language form a stable, interpretable digital-twin state representation? Across N = 162 students in a Python programming course we aggregate per-student question utterances, fine-tune a Transformer encoder with attention-aware pooling and L2 normalization, and analyze embedding geometry. Three-level grading preserves geometric structure (separation ratio 1.107); five-level grading collapses the middle bands — embedding geometry must match label granularity for valid digital twins.

Problem & Motivation

Educational digital twins aim to go beyond score-centered evaluation toward fuller learner-state representations. Whether learner-produced natural language can yield a stable, interpretable twin-state space is still an open question, as is whether traditional bell-curve grading remains valid inside a semantic embedding space.

Method

In a Python programming course at a northern-Taiwan university (N = 162), we collected student questions submitted through the course-integrated GPT system as a language-behavior signal. Per-student questions were aggregated into a single language-behavior unit, encoded by a fine-tuned Transformer with attention-aware pooling and L2-normalized embedding, then analyzed for structural consistency and separability via PCA and distance statistics.

Findings

A five-level bell-curve grading scheme collapses the middle classes and produces severely overlapping embedding geometry — no usable discriminative structure.
A three-level grading scheme produces clearer organization with a separation ratio of 1.107, and is structurally stable on held-out data.
Learner natural language can form a stable digital-twin state space — but only when label granularity matches the latent semantic geometry.
Educational digital-twin feasibility depends on aligning representation geometry with label design.

Implications

When defining digital-twin states, defaulting to traditional bell-curve granularity is unsafe — state granularity must be aligned with representation geometry to preserve interpretability and stability for downstream pedagogical decisions. Future work will fold in multimodal data (response trajectories, behavioral signals, time-series physiological streams) to construct more robust twin states.

Citation

Y.-C. Chien, C.-C. Yen, and C.-K. Chang, “Validation of Natural Language–Based Educational Digital Twins through Embedding Geometry in Python Courses,” in IEEE ICALT 2026, 2026.

BibTeX

@inproceedings{chien2026edu_digital_twin,
  author    = {Yu-Chen Chien and Chia-Chien Yen and Chia-Kai Chang},
  title     = {Validation of Natural Language--Based Educational Digital Twins through Embedding Geometry in {Python} Courses},
  booktitle = {Proc. IEEE Int. Conf. on Advanced Learning Technologies (ICALT)},
  year      = {2026},
  month     = jul,
}

Did You Lose Them? Predicting the Exact Moment of Disengagement via Multimodal VLM Classroom Orchestration in Education

Temporal Lag Effects in Multimodal Learning Analytics: Physiological–Behavioral Characterization