You found my old blog. Thanks for visiting! For my new writing, visit mikesententia.com.
From years of experience, I know how to test new models. But I’ve never expressed it in words. Here’s my first attempt.
Ananael left some great comments on my last post, which made me realized I hadn’t fully developed my ideas on testing magick. It’s not a simple matter of explaining myself poorly. It’s that I hadn’t ever explained my intuitions, even to myself.
So I spent this week developing those intuitions into something I can explain. It’s hard, but that’s how you grow as a thinker, and it’s one of the best parts of blogging.
In most posts, I’m explaining, teaching, and giving exercises. Here, I’m figuring out new ideas, and they aren’t quite a cohesive story yet. So think of each section as its own post: A separate idea about the common theme. The order isn’t terribly important, so feel free to skip around.
When you evaluate a new model (of anything: magick, physics, biology, whatever), you want to evaluate that model’s predictions. Some models will predict a refinement to a procedure right away. If that’s the model you’ve got, then A/B testing* makes sense.
*A/B testing = Comparing two procedures to determine which has a better outcome. Most useful when the difference is not obvious — a 10% improvement, rather than 10x better.
But many models — most of them, in my experience — don’t predict refinements right away. Instead, they predict something interesting but useless, and only become useful when you develop them further or combine them with other models. If your standard is “Models must produce a refinement of a procedure I use,” you’ll discard a lot of accurate models before they ever grow into usefulness.
Developing, Not Convincing
My focus here is on developing a model of magick, not convincing others that it’s useful. If you want to show someone that something’s useful, you’d better compare it to their current solution on some problem they care about solving. That’s A/B testing. But those results come from mature models. When you’re developing the model that will eventually produce that better technique, A/B testing is premature. This post is about explaining out why.
Also premature: Trying to convince people that a model is useful before it makes useful predictions. So, in the larger series about my model of magick, I’ll also show you useful techniques that either have, or clearly would, pass A/B testing. (I’ll clarify “clearly would pass” later in this post).
1. The Catapult-Maker
My thinking started with this allegory:
Imagine you build catapults. That’s your job. You live in ancient Rome, and you’ve never seen a coil spring. Your model of the world is “Only flexible wood can throw rocks.” Then Mr. Spring comes along with a metal coil spring. He explains it to you, and you decide to test it.
Sure, it’s a bit contrived, but work with me here.
You have two options. One is to run A/B tests on flinging rocks into walls. (You’re a catapult-maker, so that’s your ultimate use). You’ll find that the metal spring isn’t as good, and discard it.
The other is to look at what each model predicts. Your wood-only model predicts the non-wood spring can’t throw a rock at all. Mr. Spring’s model predicts the spring will throw a rock a short distance. Both models are predicting what will happen if you load a rock into a spring, so there’s no need to run situations A and B, you just run that one situation and see which model’s prediction is accurate. Then you see the spring throw the rock, and you learn something about the world.
Is the spring useful on its own? Probably not, at least for knocking down castle walls. But maybe you can use it to make a spring-assisted catapult. Or maybe your machinist friend can use it to make a wristwatch, or a toaster, or a door that closes behind you. (Maybe not in ancient Rome, but you get the idea).
Often, a new model won’t immediately improve your solution to a problem. Especially if it’s a problem you’ve been solving for a long time, and your current solution is optimized with A/B tests. (That is, tests not based on a model, but tests by guessing at possible improvements, trying them, and refining the procedure better than your model could predict on its own).
2. Quantum Physics Doesn’t Optimize*
The double-slit experiment (in 1909) was one of quantum phisics’ first experiments. It goes roughly like this:
Procedure: Fire a single photon at a wall with two small slits. Use a photon detector on the other side of the wall to determine which path it took.
Quantum physics prediction: The single photon behaves like a wave, so it will cancel itself out, giving the same wave-pattern as a stream of photons would.
Newtonian physics prediction: Only one photon won’t cancel itself out, so there will be no wave pattern.
Don’t worry if the physics details didn’t make sense, you won’t need them.
There aren’t two situations. Just two predictions on one scenario. It’s an experiment, but not an A/B test. It didn’t optimize anything, and it didn’t produce anything directly useful. It’s 100 years later, and aside from physics tests, we have precisely zero machines that rely on shooting a single photons through two slits.
The early predictions of a model often cover essentially useless corner-cases. The reason no one had tested how single photons move through two slits is that it hadn’t impacted any useful problem.
Quantum physics didn’t optimize anything from Newtonian physics, as far as I know. Once it matured, it gave us new, useful things, like semiconductors for computer chips. But it didn’t do anything that lends itself to A/B testing, and certainly not in the first stages of the model.
If we insist on a procedural refinement before exploring a model, we’ll throw out anything complicated that takes time to develop into maturity.
*The title was probably inaccurate. I expect that quantum physics optimizes some things. But “Quantum physics sometimes optimizes, but not in these examples” isn’t very snappy. And it doesn’t affect my larger point.
3. Stages of Science
The classic model of science is: Observe, hypothesis, test. But that’s really a simplification.
Most of the time, I find this pattern: Observe, basic model, non-useful hypothesis, test (to verify the model), develop the model further, useful hypothesis (like a new technique), test.
That’s really what we saw in the quantum physics example: The first predictions of a model aren’t techniques, they’re just validations that you’re on the right track. The useful techniques that lend themselves to A/B testing don’t come until a model matures. That’s probably the source of my feeling that A/B testing isn’t the right tool when you’re initially making a model.
4. Controls in Medical Tests
What about control conditions in medical tests, with control and treatment groups? Aren’t those comparing two procedures?
Well, there are two conditions, but it’s not really what I think of as A/B testing. When I talk about A/B testing, I don’t simply mean any comparison of two conditions. I mean a situation where you’re genuinely unsure which procedure is better, so you test them to find out.
That’s not how medical trials work. Doctors don’t simply go out and try it. They don’t even take a good model, see what it predicts, and run the tests. There’s a 10 to 20-year process of animal testing, review boards, tests on other species of animals, then finally a series of tests on humans, culminating in a full RCT. If you ran A/B tests as you developed your model of medicine, you’d get thrown out of the profession. It’s the last step of developing a technique for mass consumption, not something you’d do early on.
Even if the thing you’re testing can’t hurt people, an RCT is a very slow test. It’s great for verifying a treatment, particularly when you’re getting ready to sell it on a large scale. I’d love to be at that point, but I’m not.
When you’re making your model, though, you need fast tests that are pretty good. Case studies and small trials, with reasonable effort put in to avoid coincidence and placebo. That’s really all one person can do.
5. Local vs Global Optimization
Those terms come from computer science, but they apply any time you’re trying to find the best answer.
Local optimization means small refinements. If you ride horses, then better horse shoes, a padded saddle, and maybe a rubber saddle for a better grip are local optimizations. You’re still using a horse.
Going from a horse to a car is a global optimization. It’s a new way of solving the problem, and you don’t get there by improving a horse.
Two things to notice so far:
- You can propose local optimizations without understanding how the underlying solution works. You could easily try several options for horse shoes without being a vet.
- No matter how much you improve your horse, you don’t get a car.
The first cars were awful. They’d stall, couldn’t brake well or handle rough terrain, and were wildly unsafe. They would have lost A/B tests with horses. In fact, they did: People kept using horses for decades, until the product matured*.
*Also until prices dropped. Which maybe suggests another thing models need to become popular: An easy entry point. I’ll write about that later.
If you insist on a successful A/B test at the start of a model, you’ll wind up with incremental improvements, not a new global optimization. Improved horses, not cars.
6. Downhill at First
Let’s shift to the specifics of modelling magick. Most styles of operate by sending instructions to a system (my term for the forces mages channel). When you do a ritual, a system turns those symbolic actions into changes in the world. You can think of the system as the implementer of those natural laws that mages reference.
Like using a calculator for arithmetic, systems make magick faster and easier. It insulates you from needing to understand the details of magick’s implementation. But that insulation keeps you from exploring. So most of my magick happens without systems*, even though that makes it harder. And my first goal isn’t to improve on the system’s results, it’s to understand the system’s procedure, even if the results are (at first) less good.
*Remember, “system” = the forces mages channel. My magick definitely uses procedures, patterns, and systems-in-the-normal-english-sense.
Imagine you’re learning to program computers. One of your exercises might be programming a web browser, just to learn how they work. You wouldn’t expect it to out-perform Firefox. That’s the wrong metric. But that learning will eventually lead to programs that solve problems in new, better ways.
That’s one reason why I don’t advocate abandoning systems until you’re already experienced with direct magick. Continue to use what you’re good at when you need reliable results, and only use direct magick when you’re exploring magick’s implementation. I expect most ritual mages to use direct magick for problems their ritual style doesn’t address, not as a replacement for rituals.
7. Solving New Problems
Most of the time, I don’t explore a new part of magick to simply improve a technique. I do it to solve a new problem, one that doesn’t have a solution yet.
Let me give you an example. When I first started learning to awaken and strengthen my mental muscles, I could wake them up, but they’d draw so much power from the rest of my mind that I’d get very tired for several hours to a day afterward. I’d nap in the afternoon, then go to sleep at 7PM, and couldn’t do any creative work on the days I was activating. We’ll call that the “Do Nothing” condition.
Then I developed a better model of the power flow, which lead to better techniques, with steps I never would have thought of without those better models. Using those techniques to activate the same amount of mental muscles, I’d get tired for a few minutes at most.
Minutes vs hours, with results that happen every time. You don’t need side-by-side tests to know which one works better. It’s a big enough effect that it’s obvious.
I’ve replicated these results with 2 other mages, so it’s not something unique to me. (I wasn’t just being mean and not teaching them the better technique. You need to practice the basic version before learning the advanced one).
If a technique produces an improvement that could be mistaken for “Do Nothing,” I consider that a failure. A 5% improvement, or anything you’d need a suite of tests and statistics to detect, just doesn’t cut it for me. I’ll have more examples in the upcoming series on my model of magick and the results that make me confident in it.
8. Achieving Confidence
Predicting one new technique wouldn’t give me a lot of confidence in a model as a whole. Maybe it just got lucky.
But this year’s models build on last year’s ones. So when I create a technique this year, it doesn’t just support this year’s models, it supports all the models those build on. Almost everything I talk about on this blog is a few years old, so each model has at least a few techniques supporting it.
Thanks for reading all that. It covered a lot of ideas on testing, in a lot of topics: The limits of A/B testing, how different types of testing work better for different points in modelling, and how to become reasonably confident even without control conditions. Not everything was 100% developed, but hopefully it sparked some new ideas for you. If it did, please share in the comments.If you liked this post, consider visiting my current blog at mikesententia.com.