Aligning DMT Entities: Shards, Shoggoths, and Waluigis

We have recently seen some incredible “rogue AI behavior” in Microsoft’s Bing.

While reading some of these outputs I was reminded of… rogue DMT entities. Indeed, sometimes people have DMT experiences and encounter beautiful angelic beings that want to help and heal you (and sometimes do so!), but other times people encounter demonic beings that want to harm and hurt you (and sometimes do so!).

Just as it is unwise to roll-out a technology like Bing that is full of potential misaligned subagents, I also reckon that it’s unwise to deliver DMT therapy to the masses *before* fixing this bug. While I think that responsible consenting adults *should* be allowed to experiment with DMT as they wish, the bar for “safety and effectiveness” should be much higher when we think of it as a possible mental health intervention.

Ok, so both Bing and DMT experiences can create insane rogue subagents. How are these two things more than merely superficially connected?

Someone I talked to recently was actually worried that DMT entities are perhaps controlling these AI technologies to infiltrate our world. I don’t think that’s a very likely explanation. Rather, I think there is a much more parsimonious explanation for this similarity. Namely that both involve having a predictive system spun up misaligned agents in order to fit narratives that appear in the training data. Let’s dig in!

When in Rome

Last year when playing with GPT and also after talking to Connor from Conjecture who introduced me to JanusSimulator Theory[1] (see also: Janus’ Simulators by Scott Alexander) it became clear that there is a similarity between DMT entities and the quasi-agentic simulated characters GPT-like systems spin up in order to predict the next token in text. If this is true, then this suggests that there might be interesting transpositions between the strategies and concerns discussed in the AI Alignment world and the findings from psychedelic phenomenology about how to have a good time with the beings that you encounter in far-out places. Let me explain.

A very large fraction of our nervous system is dedicated to minimizing surprise (cf. free energy principle, predictive processing). Now, I don’t think that this is all that the nervous system is doing, nor do I think it is a theory of consciousness. But it is a very important piece of the puzzle nonetheless.

QRI has championed a set of integrative models that tie together the free energy principle within the larger context of consciousness research in order to explain psychedelic phenomenology. Most recently we have been discussing the frame of “Psychedelic Thermodynamics”, which brings together Neural Annealing, Non-Linear Wave Computing, Johnson’s Symmetry Theory of Valence, and Topological Approaches to Binding.

The bit that is relevant from Psychedelic Thermodynamics here is that there is a process by which psychedelics intensify the background noise that, together with sensory stimuli, stimulate internal representations (via a process of stochastic resonance). Importantly, internal representations function as energy sinks from the point of view of the background noise, whereas they are energy sources from the point of view of other representations. 

The two key features that work as energy sinks of this background energy are symmetry and “recognition”. This was first discussed in The Hyperbolic Geometry of DMT Experiences, but it also shows up elsewhere[2]. In particular, when you can interpret an ambiguous input as “expected given the context” then that sucks energy out of the background noise in order to energize a gestalt that binds together low-level features into a coherent high-level percept (e.g. Necker Cube). When this “clicks” it will radiate out its excess energy to the rest of the field, and also *constraint* the shape of the field such that it functions as new context that changes the probability for other ambiguous sensations to collapse into representations consistent with the new gestalt. In other words, on DMT you can go from what feels like “pure undifferentiated non dual consciousness” to “this specific carnival with harlequins doing acrobatics” by collapsing how you interpret slight imperfections in the field, which then snowball into instantiating an entire realm of experience where each shape resonates with every other shape (a “vibe lock”, as we call it).

Now, once you interpret a sufficient number of features as high-level gestalts, then they will start interacting with each other and further constraining the possible interpretations of the rest of the field. This, I believe, is somewhat similar to GPT, except on a full spatiotemporal context rather than a sequence of tokens context (cf. probabilistic graphical models).

If this model is correct, as soon as you start collapsing the energized field into interpretations, then a particular narrative structure may start dominating and “making sense” of what is happening. This can indeed snowball into getting into tricky and sometimes really unpleasant situations.

Slides from: Healing Trauma With Neural Annealing

Meet the Meeseeks

In parallel, it’s important to briefly mention the role that subagents typically have in us. Namely, what Romeo Stevens calls the “Mr. Meeseeks interpretation of subagents”. The subagents are created to achieve a goal, they don’t really like existing, but will continue to hang in there until they’re convinced the goal has been met. The subagents are spun up in order to accomplish goals that would normally require you to spend a lot of attention but that cannot be simply offloaded to muscle memory (e.g. like driving a car). Typical examples are things like the response one may have to living in an environment with very negative people (say, dark triad personalities) where you need to spin up subagents that behave like them so that you can predict their next move. In cases of PTSD, it may be that part of the problem is that one created a lot of rather negative subagents (of people, situations, dynamics, actual physical hazards, etc.) and that as a collective they reinforce each other.

Hi I’m Mr. Meeseeks! I see your grandmother is emotionally abusive. I’ll pretend to be her inside your mind so that you can predict what she will do next and thus avoid getting harmed. Let me know when she’s gone so I can go *PUFF*.

The Return to Goodness

Here loving-kindness meditation can be enormously helpful. I refer you to Anders & Maggie’s meditative exercise to heal negative internal subagents (see Letter XI: Douglas Adams). Essentially what you do is visualize a container of very positive benevolent and high-valence feelings (call it unconditional love, God, primordial goodness, Buddhamind, etc. – or whatever really resonates in your inner world simulation). You then tell the story that subagents come out of that container and once they achieve their goals they go back to it in order to “merge with love” once again. You can even explain this to the subagents, and they can feel the sense of relief that comes when they finally achieve union with this primordial love. Gently guide them towards it. And if you do this over and over, you will in fact be cleaning up a lot of subagents lingering implicit in the field, until you achieve a smooth field with high-valence and a non-dual feel.

Ok, so taking stock: our field of experience can “collapse” into familiar representations when they start predicting each other, sub-agents cease to exist once they have achieved their goal, and loving-kindness exercises can help you steer lost and lingering subagents towards their re-unification with primordial love (or, again, whatever resonates with you!). More so, these subagents are embedded in the predictive processing hierarchy and will try to do exactly what you find them most likely to do. So given these conditions, how do you align DMT entities?

Aligning DMT Entities

Here are some suggestions:

* First, the simplest and most straightforward intervention is to simply get good and prosocial training data. This is highlighted by the Waluigi Effect, in which Bing sort of turns nasty *because* character trait inversion is a *trope* in human stories, and there are plenty of such stories online. This could in principle be fixed by having an AI that classifies tropes and narrative structures and filters texts that contain any hint of Waluigi tropes or character trait switching narrative structures before feeding them as training data to GPT. Similarly, in the case of DMT entities, you can go to an environment with vetted inputs that are always really wholesome. Recall: the influence that the last couple of weeks have on what comes up in a psychedelic experience is vastly larger than what you experienced a year or a decade ago. The recent inputs matter a lot, so don’t worry about the fact that you’ve seen horror movies in the past. If you’ve been consuming really wholesome media for the last three months, that will matter enormously more.

* Second, add really highly-weighted good training data that makes it so that aligned outcomes are always the most likely. In our case, this would be indeed things like exercising the “gently guide subagents to the pool of love” move so that it’s a very likely outcome and they predict that that’s what’s going to happen. Train on visualizing the Buddha with a hand up saying “don’t fear”. Internalize that “love is always stronger than fear” (which is something I actually believe in, based on many incredible experiences). And so on.

Don’t Fear

* Third, use good vibes as the base. Essentially, negative entities feed off of negatively valanced patterns. Literally, feeling somatic sensations of pinching, pressure, twisting, etc. can become the building blocks of gestalts that end up becoming negative entities. Starting out with a very positive and smooth field reduces the fuel that negative entities have to construct themselves in resonance with patterns of dissonance. We’ve heard about good outcomes from Wim Hof and chanting metta meditation before trips (YMMV!).

* Fourth, More Dakka on equanimity. Remember the teachings of Rob Burbea (“what you resist persists”) and Shinzen Young (“suffering equals pain times resistance”). Essentially, resisting negative energies makes them stronger. This is doubly so on psychedelic states of consciousness. Instead, remember that high enough equanimity, where you don’t let positive or negative vibes “move you”, maximizes the rate of stress dissipation within your nervous system, and this accelerates the rate at which negative vibes flow through you and exit your system via some kind of radiative cooling process currently not understood by science. Practice taking cold showers without stalling or flinching, or eating relatively hot peppers without resisting or letting the pain get to you. At least for DMT realms up to Magic Eye-level the physical discomfort of the state is not stronger than a cold shower… that is, if you don’t resist it! If you do resits it, the discomfort can be drastically amplified, and you can turn some waves in a glass of water into a storm.

* Fifth, going back to the Waluigi Effect: the article explains why Reinforcement Learning via Human Feedback (RLHF) doesn’t really work for it (it encourages Waluigies to hide and pretend, rather than really getting rid of them). So instead of simply “rewarding good behavior” I suggest you reward “clean subagentic structures”. There is a “vibe” to the “intentions” of subagents. And you will soon realize that Waluigies have an “ambiguous intention” vibe. Use metta to reward sub-agents that have collapsed and clean intentions instead. Importantly, this takes priority over rewarding subagents that are really good at flattering you, for example. Because you’ve been fed enough narratives where flattery turns to betrayal that this is not a guarantee of alignment.

* Sixth, I think the principles of Shard Theory might be really useful here. In particular, really notice how not only is it that you can reward sub-agents with your attention and your top-down vibes, but once they are sufficiently “alive” they can actually start to *reward each other*. This, I believe, is how you get things like “egregore possessions” and other uncanny related phenomena. More on this below. You want to have a clean and smooth field of awareness so that subagent conspiracies can be easily spotted and addressed before they snowball.

Example Entities

Finally, let me ground this with some of the common categories of DMT entities:

Shoggoths: These are entities that seem to emerge out of the resonance of interpersonal representations of preferences at the cultural level. The things that you can “recognize the field as” in this case are “people doing what they want” where what they want may be different than what you want. If you have an adversarial relationship with a particular culture or subculture and you resist these wants, they will get reinforced by you disliking them and in some cases can start to locally bind with each other until you get what some psychonauts call “an amalgam” of cultural preferences. This is also what I think people are talking about when they say they have met an “egregore” of a culture of ideology on DMT. These are hard or perhaps impossible to align: cultures are in fact self-contradictory. So the amalgam will typically hold a lot of internal contradictions, which it will then externalize. The way to deal with a Shoggoth involves re-annealing, in addition to the suggestions above. DMT Shoggoths are sort of a symptom of failed clean annealing, in that they “coagulate” rather than “click”, and are amalgams of lots of incompatible preferences loosely held by a political coalition. This could perhaps be predicted from first principles with non-linear wave computing and Shard Theory, so the fact that it does happen to people makes it a salient case for this field of study.

Demons: these are sub-agents that come up in “hell realms” which are states of consciousness where you believe that you are a bad person and deserve some kind of punishment. The demons here are just, in my opinion, doing exactly what you expect them to do, namely, punishing you. I think that in addition to equanimity and metta, these entities also respond to boosting narratives of redemption that are wholesome in nature. For example, there is this spiritual belief that demons ultimately are all on the path towards God… they are just in a more extreme version of the Parable of the Prodigal Son, and they might take thousands of years to redeem themselves. But they will do so. In this case, you sending them metta and telling them that they are actually intrinsically good, can slowly, but surely, help them unwind their dissonant configurations.

Harlequins: these are entities from what feels like some kind of “clown dimension”. They are extremely common on DMT. Because we have so many tropes of negative clowns, this can often turn ugly. I suggest you reinforce the narrative of “harlequins as tricksters who are child-like in their curiosity about consciousness”. In fact, prompted properly and softened with enough metta, harlequins can be extremely helpful for consciousness research. You can play positive sum games with them in which you give them a really good time, and in exchange they help you explore the most surprising features of the space you are inhabiting. They can become “consciousness research assistants” with a flair for the weird and wondrous.

There is of course a zoo of possible entities, and in fact many possible entities currently exist merely in potential. As we imagine new healthy and wholesome tropes in our sense-making attempts for DMT realms, I predict that we will “unlock” new and more helpful DMT beings. In particular, I think that Team Consciousness tropes can give us a really good aesthetic to use as the primary energy sink for “recognizing” entities in this space. If you ever meet Rainbow God, say hi for me. It *always* gives you a mind-blowing revelation about reality and consciousness that enriches your life for the better 😉

How does this help AI Alignment?

I will conclude by saying that studying DMT entities might actually be a way to make headway in AI alignment in two ways. First, because they genuinely can be really smart entities you can interact with, on a bounded timeframe, and who seem to share a lot of features with AI technologies. They are human-level or higher in their intelligence (because they have access to new geometries of phenomenal space and hence to novel qualia computing, and because they lack the ego defenses that make you incapable of having certain thoughts!). And second, because all of the above may actually also transpose to discussions in AI alignment. In particular, I think the above suggestions are helpful for researchers. AI alignment can expose you to a lot of mental health risks (from the belief that “we’re doomed”, to creating strong tulpas that don’t align with your own values!). The recommendations I provide above may transpose to that domain: realize that even AI alignment research makes you spin up subagents inside you! The tools I shared may be helpful to increase the mental health of anyone studying this field who is now suffering from an infestation of negative subagents. Bring them back to Love!

See also:


[1] Not to be confused with simulationism (the belief that we actually live in a computer simulation) or indirect realism about perception (the philosophical realization that all we ever have access to are the features of an internal world simulation and we don’t perceive the world “directly”)

[2] Lehar’s Harmonic Gestalt argues that this emerges naturally out of the hill-claiming towards higher harmony between internal representations. Also discussed in Healing Trauma with Neural Annealing.