Results and Discussion from a practical implementation of GenAI in math education

Results from a guardrail-supported GenAI math activity suggests that GenAI can improve higher order thinking, self efficacy, and engagement.

This page reports the key findings from a practical implementation of the Real World Concept Application activity in math classrooms. The study found statistically significant pre- to post-session gains across all measured domains, with the largest improvement appearing in higher-order thinking.

Sample

n = 132

Cleaned dataset of matched pre- and post-session responses from high school math classrooms in California.

Intervention

Structured AI activity

Students explored how a math concept connects to a personally meaningful real-world domain using GenAI with prompting guardrails.

Main finding

All core domains improved

The largest gains appeared in higher-order thinking, with statistically significant increases across all measured score domains.

What these results are from

The study examined the impact of a GenAI-forward instructional activity on high school mathematics learning. The detailed lesson plan can be found here. Students completed a pre-session form, participated in an AI-supported learning activity, and then completed a post-session form. Teachers introduced prompting best practices and students were required to include at least one clarification question, one question about an applied example, and one question about assumptions, limitations, or possible errors in the AI’s explanation.

Quantitative data from matched pre- and post-session responses were analyzed to assess change. Open-ended responses were also scored using a rubric adapted from the AAC&U VALUE Critical Thinking framework. Hypothesis testing was conducted with one-tailed matched pair t-tests.

Note that higher order thinking was measured through 8 questions, both free response and Likert style, adapted from each level of Bloom's taxonomy for higher order thinking. Engagement was measured through 5 Likert style questions adapted from Wang et al. (2014) Learning and Instruction Student Engagement Scale. Self-efficacy was measured through 2 Likert style questions adapted from the Imperial College Education Self-Efficacy Scale.

Key outcomes

Pre- to post-session gains across all domains

Overall score

Pre

0.632

Post

0.749

+0.117

t = 7.190, p = 2.22×10−11

Higher-order thinking

Pre

0.588

Post

0.719

+0.130

t = 6.936, p = 8.330×10−11

Engagement

Pre

0.651

Post

0.771

+0.120

t = 6.211, p = 3.243×10−9

Self-efficacy

Pre

0.741

Post

0.805

+0.064

t = 3.310, p = 6.02×10−4

Exploratory data analysis

The overall mean score improved from 0.632 to 0.749. Mean scores across domains also improved, with the greatest increase in higher-order thinking.

Summary statistics

Descriptive statistics for pre- and post-session scores across overall performance, higher-order thinking, engagement, and self-efficacy.

Mean differences by domain

Mean scores and their differences, highlighting that higher-order thinking showed the largest average gain.

Distribution shifts — histograms

The overlaid histograms show a clear rightward shift from pre-session to post-session scores across all four domains.

Overlaid histogram comparing pre- and post-session overall scores — Overall score

Overlaid histogram comparing pre- and post-session higher-order thinking scores — Higher-order thinking

Overlaid histogram comparing pre- and post-session engagement scores — Engagement

Overlaid histogram comparing pre- and post-session self-efficacy scores — Self-efficacy

Median and quartile shifts — box plots

The box plots reinforce the same story. Across domains, medians rose and quartile ranges shifted upward. The higher-order thinking plots show that the lower quartile moved substantially, showing that the intervention may have been particularly beneficial for students who began with weaker performance.

Box plots comparing pre- and post-session overall scores — Overall score

Box plots comparing pre- and post-session higher-order thinking scores — Higher-order thinking

Box plots comparing pre- and post-session engagement scores — Engagement

Box plots comparing pre- and post-session self-efficacy scores — Self-efficacy

Hypothesis testing

A one-tailed paired t-test was used because the same students completed both the pre- and post-session surveys. For all four score domains, p-values were below the significance threshold of α = 0.05, indicating statistically significant increases in post-session scores.

Paired t-test results for overall, higher-order thinking, engagement, and self-efficacy scores

Participant perspectives

The open-ended responses provide clear insight into how the activity influenced student thinking. Before the activity, many students gave uncertain, procedural, or narrow explanations. Afterward, responses became more precise, applied, and conceptually grounded.

Before the activity

“I think that the purpose of learning about these series is to predict the behavior of certain graphs? I’m not entirely sure.”

After the activity

“The purpose for infinite series in the medical field is to insure steady doses of medications and IV drips. If the amount decrease with each dose the right amount, then the overall amount of medication that the patient receives will be a constant. Too much would be lethal. Another way they use it is to measure how much disease is spreading but this is a easier concept to understand as it is only a geometric series.”

Before the activity

“The purpose of infinite series is to be able to approximate functions that may not be easily calculated otherwise. Infinite series can be used to aid computers that can do multiplication, division, subtraction, and addition really quickly but maybe not so much things like sine or cosine.”

After the activity

“One real world application of infinite series is digital signal processing. With a tool called Fourier Series, any complex wave like your voice or a JPEG image can be broken down into a sum of sine and cosine waves. This allows for things like noise cancellation and compressing files. Another real world example of infinite series is finance and economics, as seen with geometric series (compound interest, etc).”

Before the activity

“The tests we learned (alternating, ratio, root, divergence, comparison) help to predict the behavior of the partial sums that were given without computing that number by ourselves exactly.”

After the activity

“I chose how infinite series was used in video game development, making it useful to simulate things like movement, sound or lighting. For example, sound is a repeating motion using periodic functions like sin and cos, which allow music to be built from convergent series. I also learned that popular things like gacha are also built from convergent series, making divergent series less used in video games since they would crash or break the game. An example of this is Genshin Impact's pull system, in which developers use a geometric probability pattern that forms a converging geometric series making it unlikely to not "win" after a certain number of tries.”

Before the activity

“I think the purpose of infinite series is to approximate a lot of things. Im pretty sure infinite series can approximate irrational numbers and functions to a very accurate degree. This is pretty useful for calculators and maybe even image compression. In general, the math concept can help take very large periodic things and sum them together.”

After the activity

“This math concept surprisingly has a strong connection to the visual arts. Although it's not extremely common, some forms of digital art use infinitely repeating shapes called fractals. In order to approximate these fractals, infinite series concepts are used. Furthermore, image compression, noise reduction, and sharpening for photography use infinite series because they have to approximate pixels from surrounding pixels (which uses infinite series as well).”

Discussion

Why the activity worked

As students interacted with the AI, they needed to be aware of what they understood, what they did not understand, and what questions they had. This process is inherently metacognitive, which is important because monitoring and regulating one’s thinking supports deeper understanding and effective problem solving in mathematics (Boaler, 2023). The activity’s instructions required students to ask clarification questions, explore examples, and consider assumptions or limitations in AI explanations. These requirements shifted the role of AI from delivering answers to supporting inquiry, which preserved the productive cognitive effort necessary for learning (Kosmyna et al., 2025, p. 12).

Collaboration further strengthened the learning. By sharing their findings with the class, students had to externalize their reasoning and defend their interpretations. Collaborative dialogue is an important element of learning, as it forces students to articulate their thinking, compare strategies, and refine understanding through discussion (Smith & MacGregor, 1992).

These design principles ensured that AI acted as a support tool, allowing students to benefit from the technology without displacing the cognitive processes mathematics education seeks to cultivate (Chatfield, 2025).

Implications

Effective AI integration requires intentional instructional dsesign

Effective AI instruction requires intentionality around creating structured frameworks that guide how students interact with these systems during learning activities. Based on both the literature and the findings of this study, effective AI-supported activities follow this framework:

Attempt-first thinking
Students must engage with the problem before consulting AI.
Guided questioning
Students interact with AI through clarification, critique, or application.
Collaborative reflection
Students share findings and reasoning with peers to validate understanding.

Conclusion

The importance of holistic math education in the AI Era

John Dewey, an American psychologist, philosopher, and educational reformer, believed that education should train the habit of reflective thinking. When speaking about education, he said, “To maintain the state of doubt and to carry on systematic and protracted inquiry—these are the essentials of thinking." Returning to this simple yet profound truth is essential, as mathematics learning, at its core, is a discipline of thinking. In the AI Era, learning is no longer defined by access to answers, and thinking can be easily outsourced if AI use is not intentionally guided. Mathematics learning, as emphasized by research, is grounded in metacognition, sense-making, and the ability to reason through complex problems. Therefore, mathematics education requires intentional refinement; where math teachers thoughtfully integrate AI into learning and redesign tasks to prioritize reasoning, reflection, and mathematical thinking.

This study positions AI as a tool that amplifies the structures around which it is implemented. Bad conditions will deteriorate thinking, while good conditions develop thinking. The same system that can generate immediate solutions can also, when properly constrained, extend inquiry, challenge assumptions, and support deeper conceptual understanding. The question shifts from whether AI should be present in mathematics classrooms, to how its presence is designed.

In the AI Era, the importance of holistic mathematics education is essential. Students must learn not only how to solve problems, but how to interpret, question, and validate outputs.

The AI Era evolves the role of the teacher as well. The teacher of tomorrow is not just a knowledge transmitter or evaluator of correctness, but a designer of learning environments that preserve cognitive effort, and a wisdom and experience sharer who pushes students to unlock their unique potential.

If approached with intention, AI has the potential to elevate math education, serving as an individualized thought partner that deepens higher-order thinking, strengthens self-efficacy, and makes learning more engaging through personally meaningful connections. In doing so, it reinforces the very habits of inquiry that define meaningful learning.