Working Blind

April 18, 2007

I am incapable of providing timely commentary on things … this has been stewing for a couple of weeks now.

The MiniBooNE collaboration recently released initial results searching for muon neutrinos turning into electron neutrinos. A very nice and detailed discussion of the physics and experiment is here, and I won’t repeat it; instead I’m going to talk a bit about the kind of analysis they did — a “blind analysis.”

“Blind trials,” famous from medical studies, attempt to eliminate biases by keeping the test subjects from knowing whether they’re getting Coke or Pepsi (less frivolously, a treatment or a placebo). Double-blind trials hide this information from the experimenters as well. If the subjects in a taste test know they’re getting Coke or Pepsi, their reactions may have nothing to do with how the drinks actually taste; if doctors treating a patient know if the patient is actually getting a sugar pill, they could (even just subconsciously) affect the patient by behaving differently. Double-blind experiments are critical for good results when living subjects are involved.

Things are a little different for particle physics. Quantum mechanics doesn’t care what we subconscious signals we give out; the first aspect of a double-blind trial – keeping the subject in the dark – is automatically satisfied. However, experimenters can influence how they collect and analyze their data, and so can introduce biases that way: the second aspect is generally not considered.

Experimenter bias has historically been dealt with by consciously trying to be unbiased, but it’s not hard for good intentions to go wrong. Suppose you’re making a measurement of a well-known physical quantity. You find that you are way off. You root around and you find a problem with what you’ve done, which when fixed brings your result into agreement with the “correct” value. Do you stop and declare victory? Well, you shouldn’t necessarily — things that are less obvious, or which compensate each other, may still be incorrect. However, most of us will, in fact, stop. The net effect is a bias towards previous results, which may themselves be wrong (or at least not known to high enough precision), and it’s hard to avoid this.

A related issue arises in searches for new phenomena. If you see a small discrepancy between what you observe and what you expect at some particular value of the Higgs mass, the temptation is to focus on it, see whether the events are special, and so on; but one doesn’t do the same amount of work when the observation and expectation appear to agree. It’s hard to tell how significant such discrepancies are, because there were lots of places you might have seen bumps but didn’t.

To work around these issues, people use “blind analyses.” I first became aware of these when KTeV used one for its analysis of CP violation in kaons. The main idea is to prevent the experimentalist from seeing “the answer,” in whatever form it might take, until the very end. The act of unblinding is supposed to be the last thing you do, and you are stuck with the result you get: if you change it you’ve negated the whole point of doing it blind!

I’ve heard of blinding being done in a couple of ways. If you’re trying to measure a quantity precisely, one way to do it is to arrange for an offset (unknown to you) to be combined with the result you see while you’re preparing your analysis. You can then proceed as you would normally, except that your ability to tweak the real result to agree with previous knowledge is gone.

Alternatively, you can hide the data of interest from yourself — this was the MiniBooNE approach. You choose a class of events that would contain the signal you seek. These selection criteria create a “box” that you keep “closed”: you arrange not to see any passing events. You calibrate your understanding of the detector with data sharing some commonality with what you’re interested in — the same particles detected in a different configuration, for example. You select these “sideband” regions as well as you can to test for all the effects you can imagine would give you the wrong answer. Once you think you understand all the physics going on “around” the box, you have some confidence that you understand what to expect in the box, and then you can “open” it.

You may already have seen the danger with the second kind of blinding: aberrations that show up in the box may not be visible anywhere else. For example, one of the first LIGO results searched for short, bursty gravitational waves; when they opened the box, they found an event, which were almost immediately attributed to an airplane flying over a detector — but following the blinding protocol, they couldn’t remove it from their data sample.

MiniBooNE went through a very involved unblinding procedure to obtain their result. They performed the neutrino oscillation fit, and had the software tell them how consistent the fit was with the observed data (still in the closed box!) without revealing the fit parameters. In short, this told them if they could give a reasonable description of the box, without actually revealing what the description was. In fact, this revealed a problem, and did so before they had committed themselves to a full box-opening. They were able to tighten their selection before going any further. It turns out there is an excess of low energy events (the origin of which, as far as I know, is still unknown, but doesn’t seem to be oscillations), which would have seriously mucked up their result if they hadn’t been able to remove it from their fit. MiniBooNE illustrates the benefits of closed-box analysis (they might have spent a lot of time trying to get the excess to go away and stopped when it looked like a no-oscillation result), the dangers (they didn’t fully understand the box), and an interesting approach to trying to detect such problems beforehand (a sort of non-invasive box examination).

What about me? Strict blind analyses are painfully time- and labor-intensive, and I’ve never actually done one. My current work is sort of blind, in that it is next to impossible to figure out the final answer without running a specific program, and I can avoid doing that (“obscurity through laziness”). However I don’t have protocols that forbid me from fixing an obvious mistake after the program has been run. (I could certainly implement a “χ2 consistency check” before I let myself see the fit values, though. I’ll think about it.)


Truth in Advertising

April 13, 2007

While I’m not necessarily looking forward to seeing Inland Empire (working it tomorrow, as the “early spring snowstorm” moves in) I must admit to liking the trailers for it — if only because I suspect that a long sequence of disconnected images featuring Laura Dern growing progressively more hysterical is probably an excellent summary of the film, and such honesty is rare these days.

21 Apr update: in fact, the Laura Dern hysteria level remains remarkably constant throughout the film.   I don’t think anyone who watches it will ever think quite the same way about screwdrivers, rabbits, or Polish clowns ever again.

Particle Physics 2.0?

April 3, 2007

Elsewhere, Tommaso Dorigo makes a plea for particle physics experiments to enter the (free) blogging world. I find the argument intriguing but problematic.

A couple of issues are raised. One is whether the data should be made available to the public (in ASCII four-vectors or whatever); after all the taxpayers fund us, shouldn’t they get their money’s worth? I certainly agree that this is desirable, although extremely complicated. Our experimental architectures have not been designed to enable this in a simple manner (it can take literally months for a new collaboration member to learn to access data!), but if this was specified as a requirement from the beginning, as I believe it is for NASA projects, it could probably be done at the expense of a lot of physicist-years. However what is in question is not the data, but the analyses that follow, and even projects that release their data allow that what you extract from the data is your work.

Another is how collaborations communicate their results to a wider audience. Communication can almost always be better, and it’s a fair point that analysis web pages rarely go much beyond a brief technical summary and some plots. At one point, I know CDF was trying to post “plain English” summaries of physics results on the web, though for some reason all the links seem broken now. Regrettably but understandably, in the pressure to get results approved, papers written, and talks put together, the plain English version is seen as a low priority. Blogging might improve this.

[Aside: actually, we tend to communicate directly to specialists, ignoring even other particle physicists. We don’t like to admit it, but we, too, get lost and bored very easily by work we don’t immediately grasp. Try keeping an average Tevatron physicist’s attention through a 30 minute talk about measuring angular distributions in χc radiative decays. I dare you.]

Finally, should experiments somehow embrace freer discussion of the data they collect and give up their roles as “sole authoritative commenters of the results they produce”? Are they? The speed of rumor is extremely high in the theoretical community, and (as an example) theorists seem more openly dismissive of the LSND results (because they don’t fit nicely into models) than experimentalists are (they can’t find anything obviously wrong with the experiment). It’s hard to imagine anyone being more authoritative than those who produced a result.

Oh, you mean should experimentalists be free to go “off message,” as they would say? There’s a tricky one. Why shouldn’t a collaboration micromanage how an important result is presented? If I spend a couple of years working on a result, and someone completely unrelated to it is the one who gets to reveal the numbers at a ski resort or be quoted in a popular publication, is it silly to want the right to sign off on what they say? To change this would, I believe, require a fundamental rethink of what a “collaboration” is. In the current model, you are either “in,” in which case you’re generally expected to speak for the collaboration when on record (because you have privileged access to the details and your name is on the papers), or you’re “out,” where you can say whatever you like at the expense of having your opinion discounted. If we atomize into having authors be solely responsible for their analyses, then we will all be able to say what we like, and will lose the ability to report on “our results.”

Incidentally, I’m not sure how an “experiment-approved blog” would address the last point. I think the sticking point is “experiment-approved,” not “blog.” Imagine approving blog entries at physics group meetings, spending hours discussing word choices …