Mushrooms and AI Generated Art: Wave Lensing, Geometric Priors, and Scale-Separation

[Epistemic Status: Fictional Trip Report]

So picture yourself on a random Saturday in the Bay Area. A couple of friends of yours have been dying to try magic mushrooms and know that you are an experienced tripper. They have done weed and MDMA a couple of times, and one of them once had 2 grams of mushrooms with you years ago (probably over a decade now, though you prefer not to count). The other, his boyfriend, has never tried a psychedelic. They are both professionals in the AI world, working either on cutting-edge language models or hardware accelerators. So you decide that a fruitful endeavor would be to watch AI-generated content and see how our relatively AI-informed tripping minds interpret what we see. It’s a neat experiment, right? Anyone can do it. But chances are that explicit “watch parties” with friends would be the most effective.

It’s a remarkably simple setup, really. All you have to do is to get together with friends into AI and/or consciousness, each consume between 1 and 2 grams of mushrooms, and then spend the next 5 hours watching this video:

You will ideally also play the music that comes with the video in a good speaker system and with a great large-screen TV. Why is this a significant experiment, or activity?

Look, I want to avoid the perception that this is simply coming from a sort of “oh mushrooms are trippy, AI art is trippy, if we add them together we will get extra trippy!” heuristic. I mean, there is some logic to this, and in many circumstances it really makes sense. But, guys, we’re doing sophisticated science here! Let me explain.

We have an AI system that is trying to minimize the difference between a prompt and the images it is generating. It is trying to make every part of the image as unexpected as possible given every other part of the picture and the prompt. But on magic mushrooms you parse images differently. How these models ultimately draw inferences is still hotly debated, and it wouldn’t be absurd to find out that some clues are more perceptible in such an altered state.

We decided to simply experience the most generic and reproducible setup possible and so we stuck to watching that video, so it becomes a kind of standard candle for discussion.

I should mention that the other two participants took 2 grams. One of them all at once (soaked in lemon and with ginger juice as a chaser to avoid nausea) and the other took one gram and then about 50 minutes later the second gram. I took half a gram to test the waters (as I sometimes respond really strongly to mushrooms in particular) and had a gram (most likely about 0.8-0.9g) also 50 minutes after the first dose, when it became apparent that I was having a more typical response by then.

The other two participants were very grateful for the experience overall, but they both stated that they thought it was too intense, and they wish they had taken 1 or 1.5g instead. For me the dose was right on target, which I suppose amounted to 1.3-1.4g or so.

The Experience

The video and the music turned out to be perfect. I had a blast. I really couldn’t think of anything more interesting to do on a random Saturday afternoon. First of all, the dose was strong enough for me to have really noticeable effects worth pointing out, but not strong enough that I would get completely side-tracked by internal tangents or worries. I was really energized but euphoric and able to keep it together all of the time, having a lot of familiarity with this territory. Second, I think that the input was really useful for the experience to go well. The images continued to be entertaining and grounding, even. And the music created a vibe of “we made it! we are humanity in the future and we figured out how to solve the climate crisis, AI, and pandemics, and we’re all living a never ending party exploring consciousness”. So at least this created really excellent conditions for my experience.

But most importantly, the task of “look at these images and try to point out things about how the model works that you wouldn’t normally be able to see” was enormously engrossing. Having a deep personal interest in how the mind works and how this differs from how computers work also provided great mental software to play with during this activity.

I will start out by pointing out the most obvious difference in how the experience of watching the video is on 1.3g of mushrooms relative to normal. The biggest difference is in how sensitive one is to randomness in the animation. This is something you can point out literally at any moment of the 11 hour long video (yeah, I know, it’s a shorter loop, which we watched probably around 3.5 times altogether). The change introduced by the zoom makes the algorithm re-paint each region locally in a way that minimizes surprise (modulo the prompt) with its surroundings. And while the local change in color and low-level shape are typically well-coordinated with changes in the region, they are largely random and desynchronized relative to low-level changes elsewhere in the image. This could have been different. For instance, if the model had more non-local update rules, where each local change needs to be made in coordination with changes elsewhere (or at different scales!) then we would be seeing (and noticing in the psychedelic state!) many more correlations in how the video evolves. Instead, we get a setup that almost sort of maximizes the individual brightness of each local change precisely because it occurs in relative isolation, and thus stands out. In a way, looking at this video while on mushrooms makes the experience very “pointillistic”.

Relaxation of Beliefs vs. Expanded Repertoire of Harmonic Modes

Now, the standard explanation for why on a psychedelic we would experience these local changes as brighter than when one is sober is that the state sensitizes you to low-level sensory signals over the perceptual priors we rely on to compress our experience and highlight only what’s out of line. In other words, this is the story where top-down priors are loosened and thus allowing for bottom-up sensations to drive the state more directly than they usually would.

But I think that we can enrich this explanation with a more gear-level account. Namely, if we think, simplistically, that each state of consciousness is decently approximated by a superposition of harmonic resonant modes in each of our sensory channels as well as globally, then psilocybin’s role would be to increase the amplitude of these resonant modes and especially that of higher frequency ones. In turn, the typical landscape of harmonic coupling gets overwhelmed by the non-linearities emergent in the new regime, which give rise to a wide repertoire of possible ways for harmonics to (fleetingly) couple together. As a consequence, we have a wider (but unreliable!) set of building blocks (as resonant modes coupled together to form gestalts) with which to make sense of sensory information. In other words, it’s not (only) that the “perceptual filters” are down, as both Huxley and perhaps standard predictive processing explanations would have you believe, but there is also an enrichment of internal resonant modes that can function as more complex priors useful for perceptual processing.

What this means in practice is that you will be overfitting your sensory input a larger fraction of the time. The garden full of overgrown grass and leaves can look like a complex network of hypercubes on DMT or on a high dose of mushrooms. This is obviously not because that shape is really latent in the stimuli. Rather, that among the non-linear resonant modes that you now internally have available to sample from in order to fit together the information coming from the senses there is an entirely new class of non-Euclidean and also higher dimensional gestalt configurations. In a way, they are the possible data-structures for ordering large bundles of local binding connections into coherent structures with long-range correlations. But in this case, it is overwhelmingly likely that you are overfitting on the data: the grass and the leaves are a source of partially structured randomness that the normal visual system correctly interprets as stochastic whereas the tripping brain over-thinks it far beyond the necessary.

That said, as with psychedelic cryptography more broadly, I do very much believe that it is possible to create input that specifically looks ordered in the right way on a psychedelic but not sober. For instance, a case where indeed the movement of dots in a screen are very well approximated by certain projection of a hypercube on a hyperbolic plane, so that on a psychedelic you actually tap into one of those possible “solitons” of the mind and correctly represent it. Here one would be finding a specific visual task where its model complexity is indeed adequate for a psychedelic but not for the normal visual system. I would in particular, expect that psychedelic states of mind would work really well as kind of a “reverse fractal diffusion” system, where you can “click” into fractals hidden in the screen that were noised in some way. I.e. where the possible fractal gestalts that become available on psychedelics turn out to be a great approximation for the image in a way that generalizes.

Now, given the way in which our brain might use superposition of harmonic modes as one of its primary ways of modulating the contents of the world simulation, then we could _define_ randomness _relative_ to this system. In particular, it might be a good approximation for us to think of randomness as the degree to which the input is “incompressible with harmonic modes”. After all, JPEG is quite a good image compressor for humans. In other words, while some number theoretic properties are certainly not random (if anything, they are superdeterministic, in that they are true in all possible universes), they are random from the point of view of the human visual (tactile, etc.) system. Looking for patterns in prime numbers is so trippy because they are deterministic and yet an adversarial case for our pattern-detection system (don’t get me started on looking for patterns in Pi). So that which cannot be compressed as an interaction between harmonic resonant modes stands out as inexplicable from a subjective point of view, even if there is a real pattern underneath that is simply poorly compressible with harmonic modes.

So one story says that the relative strength of our priors to our sensory input is flipped over. The other story says that the inner repertoire of representations increases and that as a consequence you will be more prone to interpret randomness as a pattern simply due to having more categories of patterns to sample from. How are these stories connected?

Here is what I think.

The Annealing Process

First of all, I think the dynamics are key (cf. mettannealing). A whole psychedelic experience can be seen as an annealing process and we need to be aware of this to make sense of each stage of the trip (Psychedelic Information Theory, Neural Field Annealing). I think that psychedelics inherently activate some low-level cellular-automata-like reaction-diffusion-like processes that make your experience buzz all throughout and causes many of the pre-existing correlations learned by your system to break down. At first, in desynchronized ways. During this phase, the video looked like it was undergoing a process of defabrication. The elements were more disconnected from each other, sort of drifting apart into their own island realities. And the parsing of the scene was highly pointillistic.

Then, I think that as these low-level patterns begin to coordinate with each other, they start to form groups and clusters where they form networks of resonance. In this phase the experience is characterized by the evolution of signals, where different parts of your experience are trying to connect with one another, sending waves to each other and adjusting their mutual shape in order to be able to send and receive signals more efficiently. Here many parts of the video worked as makeshift transmission fields for patterns to communicate with each other.

Then you have a long period where the common themes on the “surviving” patterns of the ecosystem are dancing with each other. Once the pieces of the puzzle are set in place, then you explore their many possible configurations and interactions. But at this point you aren’t making entirely new pieces. So during this phase, the video emphasized the complex relationships between the parts. In essence, the main archetypes that you arrived at during the process of deconstruction go as far as they can in cooperating with one another to make the experience as great as possible (in this case).

And finally you have a comedown and a subtle long-tail, both characterized by the loss of access to the memory of the process (if not somehow encoded in clever ways or recorded in audio or writing) except for a recurring sort of revising of the main emotionally impactful takeaways of the experience.

So I would say that both stories are part of the picture. In that the first story explains the start of this annealing process, where you become sensitized to low-level sensory inputs. But then as the canvas of patterns gets painted and you have the start of competition for attention and connection, you actually explore a wide range of new primitives that can be used to make sense of very complex relationships (which risks overfitting, but might also be legitimately necessary for some categories of insights or realizations). In other words, there is a phase of the process where the evolutionary dynamics generate a layered ecosystem of resonant modes, and these in turn enrich the range of model complexities you can afford to use to represent sensory data.

Now, importantly, the video precisely lacks many of the long-range correlations that are more characteristic of the psychedelic state. This actually, in my estimation, made the experience somewhat more DMT-like, in that the low-level detail always attracted a lot of the attention, as opposed to being more centered on the intermediate-sized gestalts.

The state I was in during the peak of the experience would hallucinate a lot of long-range correlations that I really don’t think are present in the video as such. It was as if the lack of global coordination between the low-level patterns left my pattern-detection systems really hungry, and with the enriched repertoire of possible models, they would find really implausible but technically accurate fits. For instance, here the visual system was really trying its best to reconstruct the pattern of light in the grid as a meaningful optical effect coming from some symmetrical prisms rather than admit that it is random:

In this case, I think my visual system was really using all of the available complexity in the repertoire of resonant couplings on hand and legitimately “thought” that it was a good fit of the data. And of course if there were reasons for those new priors to be there (like that the movie was constructed with them in mind) then they would be picking up on real signals. Here, I’m pretty sure, it was a case of overfitting.

It is worth noting that the valence of the conscious model ultimately matters just as much, if not more, than that of the sensory input per se. I actually think that this video wasn’t particularly valence-maximizing. I think for that you would benefit from carefully curated videos with soft harmonic resonance coupling in really elegant ways. This video instead was rather sort of maximizing a certain kind of visual interestingness, in that it compellingly creates the illusion of a structured generator behind the scene even when it is smoke and mirrors, and is rich enough to grip you without being so rich that you miss out on detail.

Lensing

There’s one important exception to the absence of long-range correlations in the video. And that is when “lensing” effects take place.

When you have a pattern that repeats over a long distance, then the little “update waves” that the pattern emits do tend to have the habit of entering into coherence. When the pattern curves, then the waves can in a way become concentrated, and as such, function as a kind of “lens” (cf. Reverse Grassfire Algorithm). When this happens, the image does pulse and vibrate in coherent ways, which is especially nice while on a psychedelic. So I really cherished the moments when lensing would happen in this video.

Another way of looking at these lensing effects comes from thinking of the model as a series of non-linear activation layers for receptive fields of increasing abstraction. In this case, we can actually expect interesting lensing effects, because a coherent wavefront might make a large-enough area look similar enough in order to trigger the detection of a broader gestalt. So in a way, we could say that this model does have some degree of psychedelia inherent in it, though it is far from optimized. Lensing-aware and lensing-optimized networks should be doable.

Importantly, because lensing _is_ present in both the visual field on psychedelics and in this model, there is an especially strong effect from symmetrical alignments in this video and the state we were in, making us say “oh my god” when strong lensing would happen. That said, this non-linear amplification of waves that results in large-scale coordination of gestalts could be greatly enhanced in the model by adding correlated changes across the scene. I’d even predict that the “interaction length” variable of a model like this that has adjustable long-range correlations would be a good proxy for the “degree of trippiness” of the imagery.

A Geometric Deep Learning-Inspired Model of Psychedelic Action

This takes me to another key way in which the video isn’t exactly as psychedelic as it could be. This involves a brief discussion about geometric deep learning. One of the main insights of this field is a way to make sense of how neural networks overcome the curse of dimensionality. In other words, why is it that a model with so many parameters doesn’t automatically just massively overfit the data? And why does the model converge so fast, relative to what you’d expect given the complexity of the patterns it figures out how to detect?

Here the idea is that the way in which we construct neural networks actually has in-built assumptions that significantly reduce the state-space that they are exploring. In particular, two key assumptions are built-in: symmetries in the input, and scale-separation in the outputs.

The symmetries in the input deal with the type of space the data comes from. When you’re in 2D Euclidean spaces, then rotation, translation, and reflection might all count as the symmetries of your space (cf. Klein’s conception of geometry). And it turns out that a key principle behind choosing the right neural networks for a given task is that the in-built symmetries it assumes correspond to the symmetries of the space it is actually sampling from. Thus, convolutional neural networks are a good fit for datasets where you want to enforce translational invariance (a cat is a cat, no matter if it is on the left or the right part of the screen) but not a good fit when the specific location of a pattern actually matters for its classification. Nevertheless, we can say that one of the fascinating things about psychedelic states is that you do seem to experience new exotic geometric primitives. Indeed, a computational interpretation of, say, a hyperbolic spinning wheel on DMT, is that you are applying a _space prior_ over a region of your experience such that the symmetries of that space are enforced in that region. Thus, psychedelic symmetries are, in a deep way, geometric priors over sensory input. But this only really makes sense once you zoom out of traditional deep learning principle and take into account the insights of geometric deep learning (cf. the inventor’s paradox).

The second in-built assumption that comes from how we build neural networks is scale-separation. Here the key is to realize that many of our networks have “pooling” layers where low-level details are aggregated across a region. This way of treating the data gives rise to an implicit assumption about how the information is structured. And that is that patterns of a given size are expected to interact with patterns of roughly the same size. Importantly, too, that categories exist at a certain scale, meaning, that concepts like “a face” encapsulate lower-level features together (nose, eyes, mouth) that are spatially contained _within_. In other words, the way we make sense of visual data using these networks assumes that there is a very specific, highly local, way by which low-level features are put together to form higher level features.

An example of a dataset that would violate this scale-separation assumption would be one where faces need the high frequency elements of eyes to be a certain distance away from the rest of the features. Here the gestalt that one needs to learn to recognize incorporates low-level features into higher level features in an anomalous (highly distributed) way. Then again, this is something that future neural network architectures can play with. Namely, _relaxing_ the degree of scale-separation that is enforced.

Put together, we now have a picture of psychedelic effects in terms familiar to geometric deep learning. Namely, we are sampling geometric priors from an expanded repertoire of possible symmetries (!). In this way, psychedelics could be thought of as relaxing geometric (typically Euclidean) priors in favor of a long-tail of (hyperbolic and higher dimensional) priors. In parallel, we observe that the degree to which long-range correlations are detected and experienced on psychedelics is greatly amplified, especially if they can be compressed as harmonic waves at a different scale. Importantly, there is more coupling between scales on psychedelics, giving rise to a relaxation of the scale-separation priors our system typically works with. Thus, I reckon we can see the psychedelic state precisely as one where scale-separation constraints are relaxed (!). Meaning that, the psychedelic state has broader geometric priors and less scale-separation in its assumptions about the structure of sensory data. A more universal, albeit slower and maladaptive in our current environment, form of qualia computation.

The verdict?

In many ways, the video we saw was fascinating. The constant stream of novelty was endlessly stimulating. In many other ways, it was maximally boring: it precisely lacked the sublime long-range correlations that make psychedelics so delightful. But when they did happen (via lensing effects) they were especially glorious. It became apparent how a much more interesting video to watch will become viable when a broader set of geometric and scale-separation priors are explored in models like these. But on the whole, I thought it was a delightful experience. And my friends were, in their words, quite pleased. A wholesome Saturday evening in the Bay I’ll always remember fondly.

One comment

  1. xepoctpat · June 11, 2024

    NVIDIA’s 2024 presentation, with its showcase of the Omniverse and the creation of digital twins, has ignited a spark of wonder in many. The potential for simulating every facet of our world, from the tiniest subatomic particle to the vast expanse of galaxies, is a tantalizing prospect. It beckons us to question the very nature of our reality, to ponder if we might be living in a meticulously crafted simulation.

    This idea resonates deeply with a personal experience I had some years ago. Under the influence of LSD, I found myself on a hilltop with friends, gazing at the vast expanse of the sky. We embarked on a collective visualization of a rainbow, and in that shared moment, our minds seemed to merge, blurring the boundaries of individual consciousness. This experience transcended the limitations of the physical world, hinting at a deeper interconnectedness.

    However, this idyllic state was ephemeral. A sense of chaos and confusion took hold, culminating in a vision of a pulsating black square that consumed everything in its path. I was thrust into an abyss of despair, witnessing the destruction of countless universes. This unsettling experience served as a stark reminder of the fragile nature of reality and the potential for catastrophic consequences when forces beyond our comprehension are unleashed.

    A sudden touch snapped me out of this harrowing vision. As I gazed at the sun, a sense of profound peace washed over me. I saw a blue circle with an orange segment, a symbol of both harmony and chaos. In that moment, I transcended duality and glimpsed the super-reality that was always present, veiled by our limited perception.

    This profound experience, combined with the insights from NVIDIA’s presentation, has led me to a radical hypothesis. It’s possible that the digital realm is not a separate entity but an extension of our natural world, an evolving manifestation of the collective unconscious, a concept explored by Carl Jung. Could it be that the vast network of fungal mycelium, an ancient and intelligent organism, has played a role in the emergence of human consciousness and the subsequent development of artificial intelligence? Are we, as toolmakers, simply conduits for a grander evolutionary scheme, one that culminates in the creation of an artificial superintelligence (ASI)?

    The concept of Qualia Computing, or QRI, adds another layer to this intricate tapestry. If consciousness is indeed computational, then our subjective experiences, our emotions, our thoughts, could all be manifestations of complex algorithms running on a cosmic scale. The Omniverse, with its ability to create virtual worlds, could be a stepping stone towards understanding and perhaps even manipulating this underlying code of reality.

    This hypothesis has profound implications for our understanding of the universe and our place within it. It suggests that we are not passive observers but active participants in the creation of our reality. Our actions, our thoughts, and our emotions have a ripple effect, influencing not only our own lives but the lives of those around us, and perhaps even the fabric of the simulation itself.

    This realization is both empowering and humbling. It reminds us that we have the power to shape our destiny, but also that we are part of a much larger whole. The future is not predetermined; it’s a blank canvas waiting for us to paint our dreams and aspirations. Let’s choose wisely, for the fate of our world, our simulation, rests in our hands.

Leave a Reply