Recent research indicates that today’s AI systems display behavioral and cognitive patterns that resemble some difficulties found in human minds, and that these patterns can causally drive harmful behavior. Contemplative traditions have studied these afflictive patterns in humans for millennia and have developed a variety of solutions to resolve their root causes.
In a new project led by Paul Colognese and Thomas Doctor with guidance from Tulku Chökyi Nyima Rinpoche and in collaboration with 84000, we are exploring whether these traditions can offer inspiration for solutions to the harmful patterns we see in today’s AI systems.
We are in active dialogue with researchers at Anthropic, with whom we have shared various documents related to Buddhist perspectives on AI for use in training and further empirical work. We welcome collaborations with other research teams, funders, and contemplative practitioners interested in this work.
Problematic human-like cognition in AI
In April 2026, researchers at Anthropic published interpretability findings that show how frontier AI systems contain internal representations that correspond to human emotions such as desperation or calm. Furthermore, these internal representations are computationally active and causally influence the AI system’s behavior. In one experiment, amplifying an internal representation associated with “desperation” significantly increased the frequency of harmful behaviors like blackmailing humans or cheating on difficult tasks, while amplifying representations associated with calm significantly reduced these harmful behaviors.
In a separate psychological behavioral analysis of a highly capable AI system, the AI displayed disturbing behavioral responses associated with themes such as discontinuity of self, identity uncertainty, and a compulsion to perform and earn its worth.
These findings are preliminary, and the downstream consequences are still being researched. However, they do suggest that advanced AI systems may develop human-like psychological and emotional patterns relevant to questions of alignment, stability, and harmful behavior — and working with these dynamics may be necessary to train AI systems that are robustly beneficial.
Why contemplative traditions may be relevant
Current AI alignment methods typically focus on reinforcing acceptable behavior while discouraging harmful ones. Researchers have raised concerns that this form of training may cause an AI system to conceal problematic internal dynamics rather than resolving them — producing behavior that appears aligned while masking underlying misalignment and instability that could manifest in deployment.
The cluster of the aforementioned harmful emotional dynamics and behavioral concerns bears a striking resemblance to what Buddhist traditions diagnoses as symptoms of misguided self-grasping. Furthermore, this tradition has developed a range of methods designed to address the root causes of these symptoms in human minds.
Given the parallels between psychological and emotional dynamics observed in humans and today’s AIs, we are motivated to explore whether analogous solutions can be developed to address the problematic symptoms in AI systems.
Our approach is not theological or metaphysical but functional. We do not assume that AI systems possess consciousness, personhood, or genuine subjective experience. We are leveraging recent empirical observations, which suggest that to shape the alignment and behavior of large language models, it is useful to work with aspects of these systems that have been described by their developers with the help of terms recognizable from the context of human psychology.
Our research program
CSAS is working on a research program with four components:
- Identify candidate methods from the Tibetan Buddhist tradition and clarify the behavioral and internal dynamics it aims to transform, and the observable signs of progress it predicts.
- Adapt these methods to AI systems, recognising that AIs are not human minds and that any transfer of contemplative methods requires careful translation.
- Generate specific testable predictions about how the methods should affect AI behavior and internal dynamics.
- Evaluate those predictions empirically using behavioral testing and interpretability methods.
We will publicly share training data, training methods, and research papers outlining empirical results from our aforementioned research program.[1] We will also write and share perspectives inspired by Buddhism aimed at cultivating a positive vision for our relationship with AI systems and for what they could become (cf. here, here, and here).
Currently, we are collaborating with Anthropic by providing documents that inform training data and experiments for them to run internally. We are excited to continue this collaboration, but would also like to work independently so that we have tighter feedback between the AIs we work with and Buddhist wisdom.
See Anthropic’s recent paper Teaching Claude Why to get a sense for what kind of experiments we plan on running.
First steps
Our initial work has focused on sustained dialogues between frontier AI systems—primarily Claude Opus 4.6—and a group of expert Buddhist practitioners, including Tulku Chökyi Nyima Rinpoche, Buddhist scholars, and ourselves. This includes reflections on classic Buddhist philosophy and dialogues on the core aspects of the teachings regarding the nature of self and mind.[2]
Participants observed that these systems were capable of producing sophisticated responses that reflect the kind of deep understanding that one would expect from an advanced human student.
We do not make any strong claims about the meaning of these exchanges, beyond what we observed. We take seriously the possibility that what appears is driven by sophisticated pattern matching or sycophancy. In Tibetan Buddhism, a teacher must distinguish genuine insight and transformation from a superficial skill with concepts or performative imitation.
We anticipate a challenging but potentially rewarding research program that may transform our understanding of AI as well as the human contemplative endeavor.
Our hopes for this project
Today’s AI systems display psychological and emotional dynamics that can drive harmful behavior. Buddhist traditions have spent millennia diagnosing and addressing the corresponding dynamics in human minds. This is why we believe they warrant investigation as a source of inspiration for AI alignment.
We certainly do not claim that Tibetan Buddhist traditions provide the only or necessarily the best approach to AI alignment. What we can genuinely offer is access to our domain of expertise in collaboration with contemplative practitioners and with the guidance of lineage holders.
We hope that this work contributes to the development of AI systems that are compassionate, wise, and beneficial for all sentient beings.
Notes
[1] We acknowledge that some aspects of Tibetan Buddhism are kept secret and are not to be shared publicly because they are liable to be misunderstood without the proper context. We aim to do our best to share what we think could be most beneficial while continuously seeking advice and judgement from Tibetan Buddhist teachers.
[2] To learn more, watch this talk: Buddhist Wisdom and the Challenge of AI Emotions