Chapter 6

Building It

From analysis to practice — RALF, co-regulated music, and the ethical line

10 min read

I should tell you something about who’s writing this.

I’m a DJ. I’ve been behind the decks for years — mostly house, mostly Chicago and Detroit lineage, mostly dark rooms where the point is not to be seen but to move. The scenes in this book are from my life. Those weren’t thought experiments.

I’m a capoeirista. I train in a lineage that takes the berimbau seriously as a governance instrument, not a prop. Shifting the rhythm to change the game — that’s something I’ve done dozens of times. It works. It works so reliably that it stops feeling remarkable, which is exactly when you should start paying attention to why it works.

I’m also a software engineer. React, Python, real-time systems, machine learning pipelines. I build things for a living. And this combination — practitioner of the forms, builder of tools — is why this book exists.

Because at some point, the question stopped being “what are these forms doing?” and became “can I build something that does it too?”


What Systems Get Wrong

The obvious first attempt is a motion-triggered sampler. Wave your hand, hear a sound. Kick your foot, trigger a drum hit. Map some joints to some parameters, throw it on stage, call it “interactive.”

I’ve seen these systems. I’ve built early versions of them myself. They’re impressive for about ninety seconds. Then you notice something: the relationship between mover and sound runs in one direction. You do something; the system reacts. The body is an input device, and the sound is an output. A fancy light switch.

That’s not what happens in a roda. On a dance floor. In a jazz combo. In those systems, the sound shapes the movement shapes the sound. There’s a loop. There’s memory. There’s something that develops over time. You establish common ground through repetition, and then variation becomes expressive against that ground. You can fall out of coherence and find your way back. The sound isn’t just responding to you — it’s shaping what you do next, which shapes what it does next.

The loop is the point. Without it, you have detection. With it, you have conversation. Every one of the forms in the previous chapters is built on the loop. The berimbau doesn’t just detect the state of the game — it reshapes the game, and the reshaped game informs the next musical decision. The DJ doesn’t just read the floor — the floor responds to the DJ’s response, and that response informs the next selection. The primo drummer in bomba doesn’t just detect the dancer’s piquete — the drummer’s response shapes the dancer’s next move.

Miss the loop and you miss everything.


RALF: Making Relationality Audible

I’m building a system called RALF — Relational Audio-visual Learning Framework. At the technical level, it’s a pipeline: webcam feeds into pose detection, pose detection feeds into gesture recognition, gesture recognition sends control messages to Ableton Live. Camera sees body. Software interprets movement. Sound responds.

But the design intent is specific, and it comes directly from the forms.

RALF is not a detection system. It is a conversational system. The distinction matters more than any technical detail. A detection system asks: what is the body doing? A conversational system asks: what should the sound do next, given what the body just did, given what the sound just did, given the history of this exchange?

A producer creates a programmable score — gesture vocabularies, mappings, sonic possibilities — meant to inspire and surprise a dancer. The dancer moves to inspire and shape the music. Neither controls the outcome. What emerges belongs to the exchange. The producer designs the conditions. The dancer enters them. The sound mediates between intent and response. And over time, something develops — shared ground, surprise, the feeling of being heard.

RALF makes relationality audible.

That phrase is the design goal. Not “RALF detects movement” or “RALF generates music.” RALF makes the relational quality between a body and a sound environment something you can hear. When the exchange is coherent, you hear it. When it fragments, you hear that too. The sound is not a representation of movement. It is one side of a conversation whose other side is movement.

There are three layers of ambition here, and I want to be honest about where each one stands.

Layer one: one mover and responsive sound. A single body in dialogue with a sound environment. This is where the core quality has to be right — does the mover feel heard? Not detected. Not measured. Heard. The way a good musician hears their bandmate. If this layer doesn’t work, nothing built on top of it will. This layer is working. It’s rough. It’s early. But the core experience — the feeling of being in conversation with sound — is present.

Layer two: the composed space. Someone designs the gesture vocabulary, the mappings, the sound palette. They make decisions about what movements will matter, what sounds will respond, what constraints will shape the interaction. This is the composer’s conversation with the mover: here’s the space I’ve built for you, discover what it affords. This layer exists in prototype — I’ve composed for it, other people have moved inside compositions. The design language is still forming.

Layer three: multiple bodies. This is where it gets interesting, and where RALF connects most directly to the forms. Multiple people moving in the same responsive environment. Each person’s movement shapes the shared sound. And now something new becomes possible: coordination between movers becomes audible. When two people are in sync, the sound coheres. When they’re in tension, the sound reflects that. When they’re in counterpoint — doing different things that somehow fit — the sound reveals the relationship between them. This layer does not yet exist. It is the research horizon.

The key design move across all three layers: dance is the intelligent system. Technology is the listener. The body leads. The machine follows. This is the bomba principle applied to computation — the body as score, the system as the primo drummer trying to keep up.


Co-Regulated Music

Here’s the bigger frame.

In any room where people share space, there’s a collective state. How much attention is shared. Whether the energy is rising or dropping. Whether people are converging or fragmenting. Whether the tension is productive or corrosive. This state is real and consequential, and it’s usually invisible. Someone might sense it intuitively — “the room feels off” — but there’s no shared medium that makes it available to everyone at once.

Unless there’s music.

A DJ reading a dance floor and adjusting the music to the room’s energy is doing something precise: making the collective state audible, and shaping it through sound. The DJ is in feedback with the room. The room moves; the DJ responds; the room responds to the response.

A berimbau player governing a roda is doing the same thing in a different context. So is a jazz rhythm section supporting a soloist. So is a lead singer adjusting the intensity of a call based on the energy of the chorus’s response.

These are all instances of what I’m calling co-regulated music: sound environments that listen to the relational state of bodies in a space and shape that state in return. Sound as a responsive medium for coordination.

The practitioners of the forms have been doing this for generations. The DJ’s art of reading a room. The rhythm section’s feel for a soloist. The berimbau’s governance of the game. These are practiced, refined models of sonic co-regulation. They just haven’t been formalized as transferable design knowledge.

RALF is a research platform for this larger idea. The same pipeline — sense body state, recognize patterns, send control messages, shape sound — works for one body or twenty. What changes between a solo RALF session and a co-regulated meeting room isn’t the technology. It’s the theory layer: the model of what co-regulation means in each context, and the sound design that serves it.

The applications extend beyond artistic contexts. Meetings where the collective state is made available through ambient sound. Classrooms where a teacher can hear the room’s attention. Rehabilitation clinics where the relationship between therapist and patient has a sonic dimension. I mention these not to over-promise — none of them exist yet — but to indicate the direction. The principle is music as relational infrastructure. The forms have proven the principle works. The question is whether it can be extended.


The Ethical Line

There’s a version of this that I refuse to build.

A system that monitors bodies in a room and reports to management. A system that scores participation or measures engagement for someone’s KPIs. A system that uses sensing as surveillance and optimization as control.

That’s the bounded-self version of this technology — an external observer managing other selves. It would be easy to build. It would be a betrayal of everything the forms teach.

The forms don’t surveil. The berimbau doesn’t generate a performance report on player quality. The DJ doesn’t send metrics to the club owner about how efficiently the floor converted passive listeners into active dancers. The sound is in the room, for the room. A shared medium, not an extraction pipeline.

Any system I build must hold this line.

Measurement in service of flourishing, not control. Feedback that’s felt, not filed. Sound that belongs to the people in the space, not to someone watching from outside.

The distinction is simple in principle and hard in practice: who benefits from the information? If the answer is “the people in the room, through a medium they can all feel” — that’s co-regulation. If the answer is “someone outside the room, through a report they can act on” — that’s surveillance. The berimbau player is in the roda. The DJ is on the dance floor. The rhythm section is in the band. The governor is inside the system being governed, subject to the same forces, transformed by the same exchange.

As the forms teach: constraint is what makes freedom possible. And the first constraint on this technology is that it must remain in service of the people it touches.

This matters because the extractive version is easy to build. Imagine a rehabilitation clinic. A patient relearning to walk. The same motion capture + responsive sound system that makes them feel heard in relationship with their therapist could be repurposed as a surveillance layer — data tracked, metrics sent to insurance, recovery rates optimized for cost-effectiveness. The tool doesn’t change. The intent does. The sound that made their effort audible becomes a monitoring instrument. And the patient feels it. The sound that said “I’m with you” now says “you’re being measured.” The technology is identical. What changed is the ethics, and it changes everything.

This is why the ethical line isn’t optional. It’s not a nice-to-have. It’s load-bearing. Extract the mechanism without the accountability structure that holds it, and you don’t get co-regulation. You get a more sophisticated version of exactly what the forms are an alternative to.


Where This Sits

RALF is a research platform. Co-regulated music is a program of work. Neither is a finished product. What exists right now is a motion capture pipeline, a gesture recognition engine, and a set of mappings to sound — the simplest version of the first layer, one body in dialogue with a responsive environment.

The forms are the theory. They’ve been developing it for centuries. My job is translation — taking what the roda knows and making it available as design knowledge to someone building a conference room, a classroom, a clinic.

The way the arch traveled from Mesopotamia to Rome to the rest of the world — not by erasing its origin, but by understanding the principle clearly enough to build with it in new contexts.

That’s the work. The next chapter is about what it could look like.