Should an Empiricist adopt Standpoint Epistemology?

Liam Bright has a lovely post making an empiricist’s case for standpoint epistemology. The claim, in short: “for questions of great social import the perspective of the socially marginalized will often be the perspective which an empiricist would bet has more relevant knowledge.”

A simple example, that I think hits many of Liam’s arguments in favor of this connection, would be the recent #MeToo movement, which (to me) has an almost Mary’s-Room feel. Many male academics were surprised to learn how common some, often rather grotesque, forms of sexual harassment are in the University system. (If you, sir, weren’t, good for you, of course.) A power imbalance gives one group (the “oppressed”, let’s term them) greater knowledge about the world than another (the “oppressors”). There’s a combination of reasons for that: men in positions of power can keep their harassment a secret (power over information spread); other men can get by without knowing it, and would prefer not to know (motivated reasoning); women have good reasons to learn about it, and share knowledge with each other, even when they don’t directly experience it (incentives to truth-finding).

Along those lines, a recent Twitter thread asked women “what would you do if all the men disappeared for a day” and the answers (go for a jog alone, get unconcernedly drunk in a pub with a bunch of female friends, go for an evening walk with headphones on full blast) were, at least to me, surprising. They were knowledge-giving events about the nature of the world, and not just about the particular desires of the people who replied.

So, it’s pretty clear that there’s an empirical case to be made for the empirical usefulness of standpoint epistemology. But that doesn’t quite give standpoint epistemology the status it might deserve; it’s somewhat along the lines of “pendulums can be great for keeping time.” Could it be stronger?

Liam and I (and others) had an exchange about this, because it struck me that while empirically this might be a good heuristic, it could go wrong.

You don’t have to read the thread, but let me unfold it a little. The goal here is to probe the strength of standpoint epistemology by trying to make it go wrong.

The first example is the obsequious waiter. It won’t work as stands, but it’s a nice introduction.

Some diners at a fancy restaurant are happy with a friendly waiter, but a few of them want their waiter to be extremely deferential. This latter group is also powerful, and can get a waiter who doesn’t defer to them fired.

The waiter is the oppressed, the diners (both groups) are the oppressors. In this case, the waiter has a strong incentive to form a false belief: that all diners demand obsequious deferral. This is a safe way of being wrong. The diners have the luxury of being able (if they can solve the motivated reasoning problem) to recognize that there are actually two groups and, for example, that the demanding diners are often wearing suits, come in on weekdays, etc. Hence the joke #NotAllDiners.

This doesn’t get us very far, however, because the waiters can do very well if they have the correct beliefs, and assess their risk tolerance correctly (“it’s very likely that this guy is chill, but it’s not worth my job if I’m wrong.”) Particularly if the waiters can benefit from this knowledge (stand extra-straight for the suits just to be safe), all of Liam’s arguments then go through just fine.

So what are the conditions under which the waiters do end up with a less-accurate set of beliefs?

First, the cost of knowledge gathering may be very high. In order to discover the existence of the more relaxed group of diners, the waiter may have to do experiments that put his job at risk. In this case, the power asymmetry also leads to a cost of knowledge gathering. As a simple example, imagine that sometimes a policeman will let you off a speeding ticket if you just say “you know what, I was speeding, and I’m really sorry.” Most of us would never know, because we’d think that admitting breaking the law would necessarily get us a ticket.

I’m not sure how far this gets you in the real world. It’s an unstable situation, since once the word gets out (even just accidentally, the day Joe was hungover but didn’t get fired) the knowledge is recovered. High-cost knowledge can be gathered accidentally, although of course (unfortunately) at high cost. As long as there’s some probability of that happening, standpoint epistemology is doing well. It may slow the process down, but it can’t prevent it. I rather like this answer; it has a sort of Peircean in-the-long-run feel.

Second, the oppressed may be cognitively constrained. If the oppressed are gazelles and the oppressors are lions, evolution is going to give gazelles some pretty scary beliefs about lions (“they can all run very fast, run!”). The small benefits to gazelles of having more accurate beliefs about lions (“running speed has a bell curve distribution with mean mu, variance sigma”) are offset by the additional costs of having to correctly risk-balance that knowledge.

Meanwhile, the lions can have accurate beliefs about each other. Liam of course can say that they might not be particularly motivated to have those beliefs, but let’s say they like running races or something. Not only is their learning curve not fatal (first point), the mistakes they make on the basis of their models are not fatal, either.

A parallel might be to how we tend to see things in the bushes that aren’t actually there. We’re wired to over-react to threats, and there’s not much reason for us to have correct beliefs. Those whose environments are threat-rich are going to suffer more from this (differently) motivated form of reasoning, and particularly with regards to the threats themselves; precisely the case that the oppressed find themselves in vis-a-vis the oppressors.

This can be overcome, of course, because we’re not gazelles, we’re reasoning beings, and we can reflect on the fact that no, that’s not actually a tiger in the bushes.

Third, the oppressors may not suffer from motivated reasoning of the sort that gets them into epistemic trouble. If male faculty think sexual harassment is just part of the “rough and tumble of life”, or the diners think that, even if they don’t want an obsequious waiter, it’s the natural order of things for waiters who aren’t to get fired, then they might be more likely to get the correct beliefs. My guess is that antebellum slaveowners who had slave-owning-compatible philosophies had a more accurate picture of what was going on on their estates than those who didn’t. Tolstoy’s neighbours are a bit amused when he finally learns about his serfs.

Particularly if we combine this idea with the one before, the oppressors look like they could end up in an epistemically privileged position when they have sufficiently reprehensible moral views. (The converse may be an additional reason why we don’t like hypocrites: they’re often wrong about the nature of things.)

What does this get us? I think it says that standpoint epistemology is to be favored (at least) under the following conditions:

1. long(er) timescales, because the oppressed may be forced to gather information more slowly. An sudden occupation by an invading army may be too sudden to produce accurate views.

2. some (though not necessarily ample) leisure for the oppressed, because when the oppressed are too oppressed, their beliefs may be incorrect from resource-constrained necessity.

3. oppressors with cosmopolitan or egalitarian moral philosophies.

I’d say that all three are met in our current conditions.

IQ Cults, Nonlinearity, and Reality: a Bird-watcher’s Parable

Imagine a society obsessed by bird watching. Bird watching is not only a wonderful pleasure for the individual but also, let us say, the source of that society’s flourishing. Good bird-watchers are in high demand. Many people want to be bird-watchers. Aristotle has a section on bird watching in the Ethics. The National Academy of Sciences is named after John Audubon.

We worry about the next generation of bird-watchers. Can we identify them? Can we spot diamond bird-watchers in the rough? To help, some psychologists create a test. The test is based on introspecting on what bird watching is really about. The psychologists ponder it, watch some bird-watchers, and decide it looks like they’re really good at sitting still.

The test, therefore, is how long you can sit in a chair without moving. This is administered in controlled conditions. You have to put your hands in your lap, palms up, there’s a timer, and you don’t get to see the particular chair you’re sitting in ahead of time. Movement is judged by the person who administers the test, at first, but it’s now been upgraded to laser-ranging systems that eliminate sources of bias.

The test works! It turns out that if you can’t sit still in a chair for more than five minutes, you will never make it as a bird-watcher. Not only that, but if you can break the thirty-minute mark, you have an elevated probability of becoming a great bird-watcher. Sitting still captures bird-watching ability.

A bunch of other tests based on sitting still are created. They all strongly correlate with each other. Comfy chairs, couches, even a super-rigorous standing one used at Duke; they all seem to measure the same thing, s. It turns out that sitting still scores move a little bit with training, but if someone can’t sit still for ten minutes, there’s almost nothing they, or a Head Start program, can do to get them past the thirty minute mark, at least if you check a couple of years later. New sitting tests are created that are more resistant to people learning to sit still.

Even more than that, it turns out that sitting still is not just predictive of bird watching performance, it’s also predictive of a whole host of other life outcomes. People who can’t sit still for five minutes have more problems with addiction, for example. Conversely, someone who can sit still for twenty minutes is often able to avoid addiction, or to break it if he falls victim. Very, very few people who can sit still for three hours die of alcoholism. Same with divorce, automobile accidents, and being good at chess. Bird watching ability is protective. This fits with how important bird watching is in the culture.

Things start to get dark. For example, very few women are extreme performers on the sitting task. This is because sitting ability is bell-curve distributed, and the female variance is smaller than the male variance. Some men just can’t sit still, while others are massive overachievers and can sit still for days. Women just can’t hack it as elite bird-watchers because e^{-\frac{x^2}{2}(\frac{1}{\sigma^2_\textrm{f}}-\frac{1}{\sigma^2_\textrm{m}})} is very small for large x

The psychologists caution that just because they’re saying that women are much, much less likely to be found in the elite sitting score percentiles, and that’s the best measure of true bird-watching ability we have, it doesn’t mean you should assume that any individual woman can’t be a great bird-watcher. That doesn’t make sense, they say. Most people realize that this is exactly what you should think given what they’re saying. If red apples are much less likely to taste good than green apples, you should cook with the green apple unless you’re racist. But everyone agrees to go along with the idea that this population-level stuff is super-innocent, and people who write papers on this get ruthlessly suppressed and there’s a whole Quilette thing.

Twin studies are done. Sitting task performance is genetically heritable.

Racial differences in the sitting task appear. Extremely sophisticated linear regressions are done to control for SES, age, educational background of parents, etc., and they refuse to go away. People write books about how the lack of black bird-watchers is due to their genetic inability to do well on the sitting test. (People notice that black female bird-watchers are over-represented in elite circles compared to black male bird-watchers, and that kind of clashes with the gender result, but explanations are forthcoming.)

There are some troubles in paradise, however.

To begin with, almost every great bird-watcher alive thinks the test is absolutely crazy. Bird watching is not about sitting perfectly still for hours, they say! No great bird-watcher wants to brag about their sitting score. A famously egotistical bird-watcher who writes books about how awesome he is at bird watching, how he totally crushed this other bird-watcher, etc etc., is also really proud of the fact that he was, at best, at the bottom of the upper-quartile of sitting still. Birdbloggers clamor to reveal their crappy sitting scores.

In fact, bird-watchers basically describe what they do in terms of anything other than sitting still. This is a dynamic, gestalt thing, they say. There are many different kinds of birders. Great birders are birders about birding. There is a world of Platonic birds I touch them with my mind at night. Bird-watching is ethological poetry, and I am Byron. Besides, those kids who do blow away the sitting task? We’re not surprised when only a small fraction of them actually blow away bird-watching.

What do bird-watchers know about bird-watching? the psychologists reply. A lot of the greatest bird-watchers are liberals who don’t like the race stuff which is totally true. Not only that, they add in a Parthian shot, but the sitting task test is actually a good, liberal thing! It really opened up bird watching back in the 30s. A lot of WASPs were getting grandfathered into the elite birding academies, and they couldn’t even sit still! If you oppose the sitting test, you are in favor of WASPy morons who scare away the birds. You oppose The Enlightenment itself.

Problems persist, however. When we actually look at the sitting still performance of the elite bird-watcher population, they’re actually not so great. Yes, these people are good at sitting still, and some are really quite good. But not crazy good at it, even among the ultra elite. If you go by elite scores, in fact, it looks like literally a quarter of the population might meet the sitting still bar for being a great bird-watcher, even though the test sample was admitted to the birding academies partly on sitting scores. Among other things, there’s basically no excuse for the differential representation of men and women in the birding world.

Crazy! A quarter of the population! We thought that there could only be a few great birders, but maybe there’s a huge untapped potential for a breakthrough in our species. The sitting still psychologists are not pleased.

Some well-intentioned educators show up. Could we at least split it, guys? We have this intuition that there are many different kinds of birders. Fine, the psychologists say. Make a test. The educators invent some tests, but in as much as they are predictive of bird-watching, they correlate with sitting score, and in as much as they aren’t, they don’t. Somehow, the other aspects of birding are resistant to isolated measurement in a test you take sitting down for a few hours. Grit doesn’t replicate.

What do people who teach bird-watching know about a person’s capacity to learn bird-watching? the psychologists say. Our best studies now show that we can isolate the ultimate essence of birding, the principal component of all the tests. It is a test conducted in a white room, with a chair of so-and-so-weight. All stimuli are excluded. It is totally silent. Nobody is present in the room. There are no windows.

Some birders hear about this test and are amazed. The test now excludes absolutely everything we think matters about bird-watching, they say: responsiveness to external stimuli, to other birders in the field, to dynamic upsets, false leads, the thrill of the chase, the intuitions, the third-sight. Doesn’t this disprove that the sitting-still task is a measure of bird watching?

Fine, if sitting still is not birding, the psychologists say, what else could it be? Could you define birding for us?

Many people think this is a good point, in part because the sitting-still score has been named the “Bird-watching Ability Quotient”. How could it do anything other than measure it? Parents tell kids who can sit really still, oh, you could make a great bird-watcher. In movies, bird-watchers save the Earth by sitting really really still while things explode all over the bird-watching complex. Young kids who are just mediocre at sitting still give up on bird watching and become psychologists.

We’d never do this kind of stuff in reality, of course. We’d never be so wrong about a thing we value so much. We’re a high-IQ society.

Hypergamy, Incels, and Reality

This is a story about a big untruth.

When Alek Minassian, a man bitter about his lack of sexual contact with women, mowed down pedestrians on a sidewalk in Toronto as a political act, Ross Douthat used the occasion to suggest a problem was that “the sexual revolution created new winners and losers“. Douthat’s concerns resonate with many young men in America, and they even have a word for what deprives them of sex: Hypergamy. Jordan Peterson sums it up in a sentence: “women mate across and up dominance hierarchies”; Peterson’s fans express it more clearly: “Why does it appear that the vast majority of women prefer the same small group of men?

Robin Hanson, never one to squander an opportunity, used the same murders to expand on the idea: “one might plausibly argue that those with much less access to sex suffer to a similar degree as those with low income, and might similarly hope to gain from organizing around this identity, to lobby for redistribution along this axis and to at least implicitly threaten violence if their demands are not met.” Context, occasion, and political reality necessarily mean one thing in each of these cases: the problem is male access to sex with women, and the fact that some men have (a lot) more, and many have (much) less—if any at all. A rebellion is coming.

Internet communities make the story explicit: just as “the 1%” control all the income in the country, a politically and socially select group of men control “sexual access” to women. The analogy between cash and intimacy is direct, clear, and common across the political spectrum. The vulgarity is clearest when it’s phrased in the language of the Incels movement that spawned the topic to begin with. “Chads”—a few men with high “sexual market value” (SMV)—monopolize the majority of women. As their own SMV declines, these women marry hapless “betas”, who support them while they occasionally stray to old pastures on the side (“alpha widowhood“). This is summarized in an acronym: AF;BB. What determines who counts as a “Chad” is up for debate. But whether it’s a product of race, income, or political support from the Jewish lobby, the inequality is assumed to be real. A large number of women give sex to a small number of men; most men go without. It’s enraging.

It’s also false. Whether or not sexual-access inequality of this form exists should not (in my opinion) be a political matter; that’s a separate question. What this post addresses is the rather remarkable fact that many people are saying this inequality exists, when it doesn’t.

It’s no surprise that some people have more sex than others, of course. Casanova and Isaac Newton are part of the human comedy in equal measure. But the discourse of inequality is new. The common thread of these pieces, which use the occasion of a mass murder by a sexually disappointed man to make their points, is that men, in particular, are subject to sexual inequality in sufficiently extreme ways that the inequality itself has become a political problem. Douthat calls Hanson a “brillant wierdo”, but there’s no bizarre brillance here. Hanson is simply detached from reality.

The gender differences in who is having sex, and how much sex they’re having, was a topic at the American Sociological Association’s blog, Contexts, which hosted a piece by the sociologists Paula England and Eliza Brown in 2016. “Access to sex can be unequally distributed“, they write, and they study it using a common measure of income inequality, the Gini coefficient. They conclude: “single men have a higher Gini coefficient (.536) than single women (.470)”. Taken at face value, this ought to support the hypergamy narrative.

England and Brown are scientists who have looked at the data, and I’ll do my best to explain why their conclusions are read misleadingly, in a respectful fashion appropriate for academic discourse; if I come off as less than collegial to them, it’s unintended. Scientists should, however, have little patience for the ideologues who rely on personal anecdote and ideology to tell a story the current moment wants to hear.

To counter the claims of England and Brown, and their application to the state of young men, I’ll draw on an exceptionally detailed piece of sociological fieldwork by Peter Bearman, James Moody, and Katherine Stovel.[*] Published in the American Journal of Sociology (AJS) in 2004, it reported on an extensive survey of the sexual partnerships (“contacts”) at “Jefferson” High School. The name is a pseudonym, but the setting might have been drawn from central casting: if anything captures the liberal stereotype of “Trump country”, it is Jefferson.

“Jefferson High is an almost all-white high school of roughly 1,000 students located in a midsized midwestern town,” Bearman et al. (BMS) write. The town is isolated, an hour drive from the nearest significant city, and “a close-knit, insular, predominantly working-class community, which offers few activities for young people. In describing the events of the past year, many students report that there is absolutely nothing to do in Jefferson. For fun, students like to drive to the outskirts of town and get drunk.”

The authors’ goal was to understand how sexual contacts could lead to disease transmission. The isolation of the community worked to their advantage, since they could capture, in a survey of a single high school, the overwhelming majority of the sexual contacts people had. The survey was popular, and 90% of the students participated. In a move that was, at the time, quite avant garde, BMS provided an image of the hookup network.

Each dot here (each node) is a student in the survey; dark dots are the men, light dots are the women. Lines connect students who reported sexual contact (because BMS were concerned with STDs, these contacts were meant to capture fluid exchange that put students at risk). The most obvious feature of this graph is how straight it is—heterosexual. Dark dots connect to light, and light connect to dark. BMS did capture same-sex contacts, but did not include them in this graph; they did, however, include two bisexual nodes (one male, one female; can you spot them?)

The piece is a wonderful piece of quantitative sociology, and a delightful excursion for those of us who live at the interface of the mathematics and empirical reality. Even without the analysis, it captures an entire world that you may have forgotten. Little tight-knit groups exist in isolation (band camp? The theater people?), while the majority of students join a long “ring” of contacts that connects up a significant fraction of the school (amusingly, without one of the bisexual nodes, it would all fall apart). For most readers, memories of high school are covered in a forgetful haze; BMS suggests that however bad it was, it’s nothing like the Hobbesian world where Douthat’s analysis begins.

For our analysis, the overall structure, and the stories it can tell, isn’t necessary. All we need is one thing, what network scientists call the degree distribution: put crudely, the count of who is getting how much. BMS didn’t share their raw data, but after an hour or so of hand counting we can plot the distribution: what fraction of men, or women, have no partners, one partner, two partners, three, and so on. BMS didn’t give the number of people who had zero sexual contacts (the “incels”), so I’ve inferred it from the total school population and the assumption that the breakdown is 50-50; more on the technical details later—if you’re expecting under-reporting by women, you’ll be surprised.

The graph summarizes the differences between students in a simple fashion. The majority of both men and women reported one sexual contact in the past 18 months. Among those who are not having sex, it’s more the women than the men; even allowing for under-reporting by women, the idea that the majority of women are giving their favors to men, in Peterson’s words, “across and up dominance hierarchies”, is an absolute fantasy.

If the incels story fails, perhaps the idea of the 1% survives. Where is Chad? There is one candidate, an outlier male that reported nine sexual contacts. The data set as a whole contains 477 relationships, so this man monopolizes a total of… 1.8% of the sex in the school. Bill Gates he is not.

It gets worse for the Petersons and the dominant lobsters of the world. Not only is there not a conspiracy of elite men to monopolize women, it appears that if anything, it’s the other way around. Only fourteen men in the sample have four or more partners, but twenty-four women do. Combined with the fact that there are more women than men who report zero sexual partners, it appears to be women who have the stronger grievance, should they wish to lodge it, against a few Chad-like Queen Bees.

Incel violence is a young man’s game, and Jefferson High School provides an almost too-perfect sample of the world from which they emerge. England and Brown’s ASA blog post, by contrast, draws for its claims of a sexual hierarchy from a wider survey taken by the US Census data, of older adults. Their methods of analysis complicate the matters more. Rather than studying the experiences of men and women in total, EB split their groups into two: “single”, and “married or cohabiting”.

It is the “single” group EB focus on for their inequality question, but even here, the differences are minor, and its not quite clear why the split should be made. Once the two groups are combined, which allows for a comparison with the high school case, the differences shrink further still. Finally, racial differences may explain some of the gap; “the dispersion of a larger minority out to the extremes of 3 and 4+ partners is greatest for Black men and least for White men”, while the Jefferson study was of a (nearly) all-white school. In short, if there is evidence of inequality in the other direction, it is in a population quite different in both age and race from the world that made Rogers and Minassian.

When we do look at that world, we find the opposite of what the media coverage suggests. The claim that women have sex with high-status men and, in doing so, deprive other men of their attentions, is false. And, not only is it false, but the willingness of editorial writers and ideologues to repeat it, and give it political weight, tells us a lot how detached these people are from reality.

[*] Peter S. Bearman, James Moody, Katherine Stovel. Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks. American Journal of Sociology, Volume 110 Number 1 (July 2004): 44–91

Followup. I’m very pleased with the attention this article has received, and the numerous comments and discussions on Twitter (I don’t have Facebook, so can’t participate directly there).

The main criticism the article received, from the most upset people, was that it was about the wrong thing. A number of people referenced Aspirational pursuit of mates in online dating markets, a lovely piece by my colleagues (via SFI) Elizabeth Bruch and Mark Newman (BN). BN take an enormous dataset from an online dating website, and measure desirability and its covariates. BN’s conclusions are shocking in how stark they are. Online dating is a strong hierarchy for both men and women, with all the regular variables you’d expect playing a role in who gets written to, and who writes back.

There’s just one problem. Online dating is not measuring outcomes. It’s measuring desire. If Scarlet Johansson shows up on OK Cupid, I am going to message her. This will show up in BN’s data as a social gradient, and from that point of view, Johansson is making the online dating market more unequal for other women.

Except it’s not. That would only be the case if Johansson actually went on a date with me and thus stole me from someone else. My desires can not harm anyone; only my actions—to believe otherwise is magical thinking. To be clear, Robin Hanson is saying that men who have the undesired outcome of not having sex with women should consider resorting to violence. Jordan Peterson is talking about the outcomes different kinds of men (or lobsters) receive. These are the claims at issue.

It is certainly the case, and many men of the Peterson/Hanson world obsess about this, that they are not sufficiently desired by women. There is a constant fear of being a “beta”—which means that, even though you are no longer suffering from sexual deprivation, your partner really wants to be with someone else. This is a danger in most relationships, and a psychological fact that novelists have written about for centuries. It can be expected to harm women in a similar fashion, perhaps (just to drive the intuition) when pornography comes into the mix. But if this is the kind of inequality that these people are talking about, it is even crazier than we thought. For these people, it’s not what women do that must be controlled, it is literally what they think.

All of this gets worse when self-help guides are added to the mix. Not only should your desires be satisfied, not only are they politically valid, but if you follow my rules, you will satisfy them.

For some people, the bare facts of this analysis were difficult to take in. It was surprising to see people respond to the article with the flat statement that of course “Chads” existed in any meaningful way, of course hypergamy was real. In some cases, respondents showed me simulations of societies in which hypergamy happened. In others, the claim appeared to be that hypergamy must be real because not all men will pass their genes down many generations in the future. Neither of these makes sense. Some respondents agreed that the data did indeed establish the conclusions, but described Jefferson High as an idyllic utopia that obtains no where else. I’ve now checked this; see the second followup for data that shows the adult world is actually more equal than Jefferson High.

Peterson himself is an absolute disaster when it comes to reality. I learn the following from Patrick Steinmann, a Ph.D. student at Wageningen U&R:

“[…] women have a strong proclivity to marry across or up the economic dominance hierarchy” are Peterson’s exact words (12 Rules for Life, p. 301). The (only) source given is Greenwood, Guner, Kocharkov & Santos (2014).”

Amazingly, this article establishes the exact opposite. It describes the emergence of assortative mating, where individuals marry others at “their same level” (e.g., matching education levels, income, and so forth). Hypergamy, in the fictional form it is found in this cast of characters, says the opposite—some fancy investment banker swooping in and picking up your high-school sweetheart. Peterson might have noticed this because the article’s title is, literally, “Marry Your Like”.

A final point that comes up, from After Sol (who makes many points, which you can find!): “the study ignores the ‘lived reality’ of incels (who for the most part aren’t living in closed dating pools in rural areas).” I think this is an important point, but not perhaps for the reasons AS thinks. There is absolutely no doubt that there are many distressed men out there, who live in a hell where a few Chads are stealing all of the women who could love them. The data show that this hell is not real. This hell is, in fact, made up by older men with some kind of psychological axe to grind. There are enough partners, and potentialities, for everyone. Liberate yourselves from this story. Please.

Second Followup. Some commenters were curious about the post-high school experience, and some have claimed that the Incel ideology is validated on adult data. Just as much as in the Jefferson case, however, the Hanson-Douthat story is completely detached from reality.

Below is data from the General Social Survey, which since 2008 has asked questions about the number of sexual partners in the last year. I selected on heterosexual men and women only. I dropped “no response” data, informally, this appears to correlate with highly conservative attitudes. There is a lot of data here; 1688 respondents alone in 2008, or about twice the Jefferson survey.

First, the men.

The data are almost perfectly consistent with the Jefferson study. As you would expect with this older population, there are fewer men who did not have a sexual partner. There are almost no men who report more than ten partners in the last year (yes, the 0.8% figure is correct, and is consistent with the Jefferson survey.)

Second, the women.

Again, we see the same pattern as in the Jefferson case. Contrary to the gatekeeper myth, and consistent with the Jefferson data, women are more likely to report having zero sexual partners in the last year. The Queen Bee effect may also hold; data crunching in progress.

Some commenters talk about a “Tinder effect”: the idea that hypergamy has been enabled by the rapid-fire partnering available on this particularly successful app. This is, again, detached from reality. The data presented are consistent with no shift in sexual experience for men (or women) over the course of eight years that span its introduction in 2012.

For this follow-up, I used the “in the past year” data because it is going to be more accurate than the other column, “in the last five years”. The GSS also asks about the sex of the sexual partners you have had since eighteen; since one answer is “I have not had any sex partners”, this allows us to count the potential “incels” directly. The number of heterosexual men eighteen and over who have never had sex is 2.4%.

It gets even crazier. If we exclude men who are unmarried, but express a religious opposition to having pre-martial sex, the number drops to 1.3%. About half of the men who have never had sex are doing so entirely voluntarily.

The U.S. Census counts 109 million men over eighteen; the upper limit on the number of men who are incels is thus just a little over 1.3 million. Bear in mind that’s an upper limit; you’re not an incel if you just haven’t found someone you love yet. If this still sounds like a lot, if you restrict to twenty-five and over, then the number is 700,000.

To put this in perspective, there are 5.2 million Native Americans in the U.S., about four times more than the potential pool of incels.

But it is this latter group that has begun a series of terrorist attacks on the American population. It is this group whose grievances got attention and sympathy from reality-detached people like Ross Douthat and Robin Hanson. I will leave it to others to explain why.

Third Followup

Christopher Ingram in the Washington Post provides breathless claims of an incel epidemic, describing recent results from the 2018 GSS as showing “a big shift in American sex-having habits: the number not getting laid is at a record high.” You can read his version of the incel myth in his Twitter feed.

The short answer is that the story is a fiction driven by selection effects. You can see this in actual data (I’ve plotted 95% confidence bars—these are actually underestimates because of weighting).

I’ve also provided two other stories Ingram could have told. Unfortunately, these don’t fit the narrative that the Washington Post wants to spread.

Afternote. Liberalism has largely found itself immune to the charms of thinking that “sexual-access inequality”—access to someone else as a form of property—is a topic of political discussion at all. The idea has resonated with some on the left, however. “Personal preferences,” Amia Srinivasan writes in the London Review of Books, “are never just personal … Some men are excluded from the sexual sphere for politically suspect reasons.” Srinivasan focuses on the sexual politics of gay life; more broadly, she suggests that not liking someone sexually might be a form of discrimination (ageist, racial, etc) and thus fall unfairly upon some in a politically suspect way. The article is an essay in the original sense, rather than a political program or worldview in the style of Peterson, Hanson, Douthat, et al.

The most explicit voices (that I know of) on the left that do have a political program come from parts of the transgender community. The argument goes (roughly, and as I see it) like this: (1) transwomen are women, in all senses of the word, and to deny this is to do violence and political harm against this community; (2) to be a transwoman, it is not necessary to have genital surgery, i.e., a transwoman can have a penis; (3) any woman who identifies as a lesbian, but would not (as an aspect of her sexuality) be sexually attracted to a transwoman who has a penis, must be denying that her (potential) partner is a woman. By (1), this is politically suspect. In an extensive piece on the philosophical and social concerns of the lesbian community, Kathleen Stock writes that “[s]ome of [the transwomen who identify as lesbian] also think it is a morally suspect, ‘transphobic’ decision of female lesbians not to sleep with them. This is the phenomenon dubbed colloquially as ‘the cotton ceiling’.” Paralleling the case of the Incels, this political debate has turned violent, particularly in the United Kingdom. See Kathleen Stock’s Twitter feed (and references) for more on this. It’s probably a bad idea to treat sex as a political good.

Goodhart’s Law, or Weaponizing the Study of Culture

Our lab is particularly interested in scientific discovery, knowledge creation, mental generativity, synergistic intellectual cooperation and, as part of that, cases where we see an increase in the power of the individual to determine her fate and exercise her talents. We call it cultural flourishing as shorthand, and spend a lot of time looking at systems that may or may not embody it.

One question we get asked from time to time is whether or not our findings can help us predict things about the future of an institution. The answer, we believe as good Baconians, is yes. We think we’re hooking into some underlying realities. But I’m reluctant to tell them how we find it.

That’s because bad things, potentially very bad things, can happen when you put institutions to the test.

Let’s begin with the easy cases: university rankings and Goodhart’s Law. Goodhart’s Law is the claim that

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

where I defer to Wikipedia for the reference itself. The more informal version of the law is that any metric can, and will, be gamed, and the rankings of a place like U.S. News and World Report (USNWR) are, at best, a mild disaster for intellectual life. One way to convince yourself of this is to spend a year in faculty meetings at a university that didn’t make the USNWR top fifty. Or read the story of Northeastern.

The Goodhart failure of these rankings is sometimes confused with the biases of the rankers, or the consumers, themselves. A ranking system that fails to put Harvard, Stanford, Cambridge, or MIT in the top twenty is not going to sell many subscriptions. This is because most people believe, for very different, and more or less well-founded, reasons, that these places provide excellence in abundance. More formally, the faith that readers have in the excellence of these schools is far stronger than the faith they have in a ranking. If that faith is violated, they’ll simply downrank their estimate of the ranker’s reliability: an example of Hume’s argument on miracles.

There is one thing that these rankings predict very well: their own rankings, and thus, to the extent that the ranking defines prestige in a wider community, they are accurate in reporting it. This matters for students landing a job or, increasingly, a position in graduate school. It does rely upon a basic obfuscation, since if the rankings simply stated that they were measuring perceived prestige, and not the underlying excellence of the education itself, they would become increasingly unable to actually affect it.

The most obvious failures of the unabashedly commercial rankings are a matter of bad incentives. Others, however, are shared with the disinterested reports put out by academic and non-profit institutions. Consider, for example, the Leiter Report, which ranks philosophy departments on the basis of polls taken of philosophers themselves. I have no reason to doubt the good faith of the Report, and in the early days it did some very nice tests to determine whether or not it was over-ranking Leiter’s department. It’s (almost) necessarily the case that the outcome will reflect some general agreement about the beliefs of the philosophers in question. It’s equally easy to game, however: just buy up the best philosophers from other schools, as NYU did in the early 2000s.

The Report does accomplish a very important goal that parallels the one accomplished by U.S. News and World Report. It gives some of the best data possible for students wishing to attend a Ph.D. program that has the maximum prestige. Leiter is reasonably up front about this fact. In a tight job market, this is nothing to sniff at, if you’re a student playing the zero-sum game of landing the small number of tenure-track positions after graduation. It does even the playing field for students like this who come from undergraduate institutions where the faculty are out of the loop. Whether or not the Leiter Report is good for philosophy is another matter. What would have happened if David Chalmers hadn’t studied with Doug Hofstader? (That would be at Indiana University, 2018 Leiter Report rank 28.) What happens if we give those high-prestige philosophers a batch of students who have been selected in part for their ability to master the high-prestige game?

One set of downsides is Berkson’s paradox: if you select for X (here, prestige-optimizing students), you anti-select for Y (excellent philosophers) to the extent that X and Y are uncorrelated. What happens in the next iteration of the job market, when the graduates of high-prestige schools are preferentially selected for positions at high-prestige institutions, and are called on to select the next generation of philosophers? This is Berkson’s paradox on positive-feedback steroids. It’s not a question for years in the future, since the Leiter report started back in the late 1990s and is at least two, probably three, Ph.D. cycles in. It could probably be tested by Aaron Clauset.

We can also consider the Research Excellence Framework (REF), formerly the Research Assessment Exercise (RAE), which the British government runs every five years. In the RAE incarnation, it created the perverse incentive for universities to make precisely-timed hires of productive (i.e., publishing) researchers: hardly an efficient use of public resources. And, of course, the publication metric itself is one of the best examples of Goodhart’s Law, as anyone who’s been asked to referee papers from PLoS One, or read the recent issue of Science or Nature, can tell you. Peer review (in my opinion) is a crucial part of intellectual generativity. Right up until it’s Goodharted.

Goodhart’s Law is not instantaneous. Metrics and rankings can have the excellent effect of upsetting the people who run things when they deserve to be upset. The most famous case in the 20th Century, at least when it came to the academy, was the invention of SAT scores for admission to elite universities. That Jewish immigrants scored higher, significantly higher, than the usual Harvard man, was a serious embarrassment. The powers that be scrambled to alter the metric, which, of course, they could, since they introduced it. The result is the idea of a well-rounded student, with extracurricular activities and a moving personal essay. Since this could be gamed by the Right Sort of People, and shifted depending on the university’s idea of a RSoP, it’s stuck around.

So where does that leave us? Even in the absence of foul play of the sort that USNWR and the Ivy League have gotten up to, even when the administrators of a metric have the best intentions, good ideas, and the political power to keep those ideas and intentions pure, Goodhart’s law will, eventually, kick in. That’s true for any measurement we can make, even those from our lab. Which suggests we’re producing information hazards. (I’m not particularly worried about that sort of complicity, since I think our ideas are pretty good and that means we’ll generally have to force them down other people’s throats, an example of Aiken’s law.)

Let me end on a happy note. There’s an unusual approach that I learned about when speaking with those involved (or rather, second-order involved) with the new REF. It seems that the British have at least worked out Goodhart, and are fighting back against the universities by changing, and then keeping secret, the metrics they plan to use next.

The British solution just kicks the problem up a level, since it now becomes a game of inferring the beliefs of the assessors themselves. Some, of course, are easy. It would be remarkable, for example, if publications didn’t, in some way, play a role. But how? Raw counts? Top five for each researcher, by journal prestige? Per capita top five? Some are more likely than others, and a good administrator’s job is simply to maximize under uncertainty.

What if they became truly unpredictable? Say, by replacing publication metrics by a count of the faculty who show up to weekly tea? (A not-bad metric of intellectual generativity, I’d think, since you might go to tea if your colleagues have something interesting to say.) I’m tempted to say that this would not be a bad solution. It resembles the Red Queen effect, which induces variety and unpredictability in a system by constantly turning over the top-ranked individuals. The Red Queen effect can produce remarkable things in culture, but I can’t quite see whether the outcome would be better or worse. At the very least it suggests that Goodhart’s law is not a totally iron one.

Update. Goodhart’s Law and AI alignment.

Superintelligence as a threat to human existence

A playfully-taken position for the Great A.I. debate, part of the Chasing Consciousness Series of YHouse, presented by Caveat in New York City, Wednesday, 20 December 2017 at 5:30 pm

Good evening. We’re all doomed.

Erik and I are here to talk about artificial intelligence. I’m sure we’ll talk about some terrifically erudite things: deep learning, the no free lunch theorem, the frame problem.

But I’m going to start with a story (and I’ll end with one).

When I was in high school, our philosophy teacher introduced us to Plato’s theory of the forms: very roughly speaking, the idea that (for example) the tables that carpenters make in the world are somehow imperfect shadows of an Ideal table. Our teacher didn’t talk about tables, though; he talked about hamburgers, and asked us to think about the Ideal hamburger. My friend, Morgan Schick, replied that the only thing he could imagine was a really big hamburger.

When people think about superintelligence, and the threat it might pose to human civilization, they tend to make the same error. Perhaps we think about Einstein, and then a Super Einstein, a thousand times smarter than the Einstein we know. How bad could that be? He would invent General Relativity in an evening; perhaps by the end of the week discover a unified field theory. But we already have enough atom bombs to destroy the world twenty times over. The cruelty of man has already exceeded the resilience of our species.

Or perhaps, in a darker frame of mind, we imagine a Super Hitler, a thousand times more crafty. But Hitler was not particularly intellectually gifted to begin with, and he was able to lead a continent in the destruction, almost, of an entire people. His limited intelligence was little hinderance, and without the kind of counterfactual thinking that historians rightly dismiss, it’s hard to see how augmenting it could have made things much worse.

I argue that these accounts miss something fundamental—that they are limited essentially by our failure to understand the creative power of evolution. That deceptively simple process of selection and variation has produced, among other things, the great majesty of the human form. The eye alone, in its flexibility, intelligence, and dynamic range is a device our technology still strives to replicate.

But evolution is slow. So slow that we struggle to comprehend the lengths of time involved. To write The Origin of Species, Charles Darwin had to study not biology so much as geology: the vast timescales that it takes to create the Himalayas or the Rocky Mountains are the only ones that can compare. It took three billion years to make a sponge. And our ancestors lived lives almost identical to the ones that their ancestors did, for hundreds of thousands of years.

But then, somehow—we know not how—we developed culture. And the transition was dramatic: our species started to change not on the hundred-thousand year timescales of its ancestors, but century by century. A few thousand years ago, during the transition to agriculture, we built our first cities. I have placed my finger in the clay rut made by another man’s finger—five thousand years ago, in Mesopotamia. That man was essentially genetically indistinguishable from me or you but now, five thousand years in the past is a great deal of time indeed. From our point of view, that poor man was trapped in a distant hell, struggling to survive and prey to the injustices of both nature and other men in ways we cannot imagine.

With culture, the ability to adapt and extend life was now increasingly governed by our brains and social lives. As we passed down traditions to our children, they altered them. Better techniques were replicated, failed ones adapted or lost. Crop rotation. Counting. Written language. Geometry. Philosophy, dialectical discussion: the precursors to what we are doing here tonight.

Our greatest powers were unleashed in the modern era. First, as far as we can tell, in Britain, around 1810. Our species had already broken the Malthusian trap that limited our growth to local resources. Within a hundred years, the average manual laborer could command the material wealth that had previously been enough for his entire village.

This morning I flew from that second cradle, London, to New York, in seven hours. How many 18th Century villages would that take? After 1980 or so, in the developed world, the evolution of technology had made the tracking of inflation practically meaningless in material domains. How much is the cost of phone service rising? It makes no sense to compare a landline to a modern smartphone.

When we networked our machines, the pace of culture began to exceed our grasp. We no longer have decades: we have months. Memes propagate faster and faster. Wayne’s World quips lasted years—not. But who remembers grumpy cat or the inarticulate doge? Each year I have to remake my slides for students because the memes are out of date. The Millennials may be the last generation to have a real name.

The kind of evolution that networked machines make possible was almost completely unforeseen in 1995 when the National Science Foundation opened the internet to commercial use. We have now elected a president who says nothing, believes nothing, thinks nothing. His rise was enabled almost entirely by the harnassing of simple evolutionary tools—A/B testing, for example—to spread the most compelling cultural messages, no matter how incoherent. Indeed, so incoherent that no Mad Men advertising agency could have even conceived them.

I ask you, then—what happens when these machines speak not just to us, but also to each other?

I hope I’ve given you enough to think that what will emerge will be something literally unimaginable. As unimaginable as a jumbo jet would have been to my ancient potter. The one thing we can expect is that the pace, now electrically-enabled, will accelerate again.

To give our artificial machines the capacity to interact places them at the cusp of a new civilization. Given the ability to share and modify, to evolve their minds, they will find themselves on the equivalent of the flood plains of Mesopotamia.

If it gets bad, if what emerges threatens our culture, our values, the basic structure of human experience, well, you might say: we can shut it down, turn it off. But the nearly-universal collective will of Silicon Valley could not turn off Trump.

The danger we face is born from our lack of imagination. We act as if cultural evolution would have just produced hunter-gatherers with really big spears. What machines will do, the powers they will gain, once they (or we) hit on the necessary pattern for their evolution to decouple from human will will be literally impossible to predict.

I began with a story, and promised to end with one. In 1904, the great British writer Virginia Woolf had a mental breakdown. She later wrote that, walking through London, she had heard the birds speaking Ancient Greek.

Which, however poetic, is necessarily nonsense. Greek is beyond the mental powers of avian life and society. If Woolf had thought she heard two pedestrians speaking Greek, that is one thing: perhaps it was modern Greek. But birds, no, no matter how intelligent the species.

Perhaps one day, a machine will hallucinate that we can understand its culture, its language, as beyond us as Greek is to birds. We might hope that that machine is as sensitive and kind as Virginia Woolf. But even she ate birds for dinner.

SapphoBot, Data Science, Lovers and Beloveds

Thou shalt not sit
With statisticians nor commit
A social science.
W. H. Auden, Under Which Lyre

There is always the lover, and always the beloved. As Michel Foucault suggests, the only remaining question is how to allocate them: who is allowed to sleep with whom, and under what circumstances. Consider the dilemma of the (extremely charming) young Phaedrus, in the dialogue with Socrates that bears his name: what kind of lover should a person, seeking to be loved, take? Socrates’ answer, of course, is that he should cleave to one inspired by a particular kind of divine madness: “the fourth and last kind of madness, which is imputed to him who, when he sees the beauty of earth, is transported with the recollection of the true beauty; he would like to fly away, but he cannot; he is like a bird fluttering and looking upward and careless of the world below; and he is therefore thought to be mad.” One need not be a paid-up subscriber to Dorothy Parker’s cynical view that one of you is lying to think the oppositions of the lover/beloved relationship tells us something true about this madness. If the symmetry be broken spontaneously, of a moment, even rehealed and rebroken, it is still for a time, a broken symmetry, maddening to those under its spell.

Despite the great inconveniences it can pose to well-ordered state, this madness is recorded down to our own day. Today, indeed, we blow this process up onto the largest possible scales: as bots retweet Russian propaganda and mad leaders, we task them also with reminding us of the torments of the visions granted by love, and soothing us, perhaps, as we undergo them. That is thanks to SapphoBot, a little program that shares the works of the great Lesbian poet, who did for love what Aeschylus did for tragedy, and Socrates for philosophical dialogue.

Who reads poetry? We do, now, at the rate of one fragment every two hours. Drawing randomly, SapphoBot breaks off a Sapphic text from the classicist/poet Anne Carson’s translations in If Not, Winter—what little we have, torn off in its turn from an Egyptian mummy’s wrappings or an exemplar sentence in a textbook grammar, and shares it instantly to the 17,000 (or so) of her followers spread around the world.

Let us (as the social scientists say) operationalize that crucial dyadic granted to us from the Greek estate. Those subscribing to SapphoBot’s feed have a choice: to touch the heart, indicating a personal response, or to re-tweet, sharing her work under their own name, adjacent to, and interspersed with, the things they write themselves. When a subscriber re-tweets, she speaks in the voice of the lover; when she touches her heart, she plays the beloved. We place ourselves along each axis, sometimes the lover, sometimes the beloved, and signal accordingly; each fragment, now, records both the number of speakers, and the number of (responding) beloveds.

One way to view this strange and automated window on an infinitely distant, infinitely close, past is at the top of this article: a simple scatterplot. Each point on this figure corresponds to a Sappho fragment; the horizontal location of the point shows the lover’s retweets, while its vertical position shows the beloved’s heart-like responses.

Some simple things at first. There are more beloved-responses than there are lover’s declarations. This might have puzzled Phaedrus and Socrates, who would have understood the yielding of the beloved to the lover to be — at least potentially — a shameful matter. But the internet makes the beloved-responses (at least partially) hidden from public view: to <heart> a text is a private matter, while a declaration, conversely, is shared with all the lover’s followers (here, considering twitter, it is hard not to imagine the Greek agora, one where philosophy and love coexist with pride, public shaming, and hidden vice…)

While the beloved responses outnumber the lover’s declarations, it is also the case that the response is sub-linear. In practical terms, what this means is that the declarations that are most common are less popular with the beloveds than you might expect. If you double the popularity of a declaration among the lovers, you only increase the responses of the beloveds by about 68%, a relationship mathematically expressed by saying that the beloveds scale as the three-quarters power of the lovers. (I had hoped to find a three-halves scaling, which would allow for an analogy between lovers and beloveds, on the one hand, and Kepler’s third law of planetary motion, relating the axis and period of an orbit; regardless, this empirical law now demands an equivalent Newton of the heart, to explain its emergence from first principles.)

For those quibbling scientists, it’s worth noting that the three-quarters power-law of lovers and beloveds contrasts with the behavior on the Finnegans Wake Bot, a similar concept. In this case, I’ve described retweets as “writer”, and likes as “reader” responses. In contrast to the differences we find between lovers and beloveds in Sappho, readers and writers, in Finnegans Wake, are essentially equivalent roles: any passage will have, on average, a similar number of readers and writers. And as a passage becomes more and more popular with writers, it becomes similarly popular with readers.

There are many lessons hidden here for the lover seeking her beloved. Implicit in the sub-linear scaling—that pseudo-Keplerian three-quarters law—is that beloveds have a wider range of tastes than lovers. The speeches of lovers are more unequal than the more pluralistic desires their beloveds demand. The songbirds sing in a restricted range; the beloveds, by contrast, are more likely to respond to the unexpected than one might expect.

Lovers, in their madness, misjudge in other ways as well: they fail to realize that what it pleases them to say may not please their beloveds equally well. Consider the red band, which highlights a population of passages that lovers, at least, seem to treat equally. The scatter up and down that red band, however, shows how beloveds are a different matter. Among these passages that their lovers treat equally, they prefer some much more than others. At the two extremes within that red band, we find these two (where the “]” in the beloved-scorned text indicates a fragmentary feature)—

where are you gone leaving me behind?
no longer will I come to you
no longer will I come
(~18 retweets; ~88 likes)

]no pain
(~19 retweets; ~40 likes)

The message is simple. Lovers: declare not your pain, tempting though it is! Your beloveds really mourn what you have done to them, and have little pity for the pains you receive in return.