Comments on: OKCupid, Religion and Readability

By: instantcashforme

instantcashforme — Fri, 08 Jun 2012 04:42:07 +0000

Hello, i believe that i noticed you visited my web site thus i got here to go back the prefer?.I am trying to find things to enhance my web site!I assume its good enough to use a few of your concepts!!

By: 66Scorpio

66Scorpio — Wed, 07 Mar 2012 08:51:01 +0000

I’m with JamesD and some of the others. In the first example, for the Christian description, if you change the first period to a semicolon the score jumps by about 3 grade levels.

The larger point is that it is a readability score: that is the grade level you need to understand what is being written, not the grade level of the writer. In technical writing, to convey complex ideas, it will usually be necessary to resort to big words and complex sentence structures. In everyday writing that will almost never be necessary.

One could better conclude that Protestants are straight talkers while atheists are bloviating blowhards.

By: JamesD

JamesD — Thu, 18 Aug 2011 14:38:51 +0000

I know this is an old blog post, but I just had to reply.

Some tests:

“The evolutionary theories of emminent scientists have been remarkably quiescent with respect to historical trends.” Score: 19.33

“Aaa bbbbbbbbbbbb cccccccc dd eeeeeeee ffffffffff gggg hhhh iiiiiiiiii jjjjjjjjj kkkk lllllll mm nnnnnnnnnn oooooo.” Score: 14.00

“Aaa babababababa cacacaca dd ebebebeb fafafafafa gagg hahh ibibibibib jajajajaj kakk lalalal ma nanananana obobob.” Score: 30.00

The index metric relies very strongly on the number of syllables, but not at all on spelling, grammar, and certainly not word definitions. Assuming that it is scoring legitimate text and not the nonsense I inserted, it will very roughly correspond to the reading level one needs to understand it. If one were trying to score high on such a metric, one would be encouraged to use the passive voice (e.g., “has been” instead of “is”), long polysyllabic words, and avoid excessive use of articles and conjunctions.

Those who write clearly and plainly will score low. Those who write obscurely will score high. Even the test that used real English was just pretentious nonsense, using lots of words to say essentially nothing.

I also note that the purpose of the metric appears to be to improve readability, not measure the education level of the writer. One’s goal should be to have a low score. OKCupid is clearly misusing the test by giving the impression that high scorers are somehow smarter or more educated. A correct reading of the results would be that Protestants and Catholics speak plainly, while atheists hem and haw and speak pretentiously.

By: fire damage cleanup

fire damage cleanup — Sat, 18 Jun 2011 15:41:54 +0000

I have recently started a web site, the information you offer on this website has helped me tremendously. Thanks for all of your time & work.

By: oil expeller

oil expeller — Thu, 26 May 2011 17:25:05 +0000

oil expeller is a leading manufacture

By: Andrew

Andrew — Thu, 28 Oct 2010 15:35:38 +0000

Besides the disturbingly divisive nature of the correlation they are trying to draw (regardless of the outcome), one factor that they probably didn’t take into account, based on aggregate numbers, is that most likely serious Christians are not on OK Cupid, which lowers the overall pool of analysis. In fact, I’d wager that a majority are on specialized sites targeted to their demographic as these demographics tend towards serious relationships with long-term prospects and very similar belief systems. I would think that if you did this study based on samples from those specialized dating sites targeted to specific groups and compared the results, that the marginalized Christian cohort would attain a much higher number than this study suggests (even if there are shorter words used by some groups). Also, a site like eHarmony, which isn’t necessarily biased towards a religious component, would probably produce better results because it requires you to actually take time to setup a profile with substance. Their site is so daunting that it filters out less intelligent individuals (which is really what this graph is trying to convey) because anyone without patience quickly abandons the sign up process.

What’s most disturbing about this is that people will cite this type of thing as ‘gospel’ to prove that other groups are less intelligent than they are. Sadly, this will only help, even in error, to justify Atheistic attacks on Christianity.

Of course, statistics always takes samples, but why? If you have a whole user base to draw from, why not just process the whole thing? In the past it made sense to sample a population because you couldn’t possibly sample the whole; but now, in this golden information age, we can use the whole set.

Anyway, thanks for such a great site, keep up the good work 🙂

By: Nate

Nate — Tue, 26 Oct 2010 23:40:02 +0000

Way to use two data points (that you yourself created) to refute an entire study’s worth of data.

As others pointed out, OKCupid in all likelihood pulled the ENTIRE profile from probably THOUSANDS of profiles, not just their responses for the religious item or from only two profiles.

Please stop polluting the internet with horrible statistical abuse. Thanks!

By: Aunt KayKay

Aunt KayKay — Thu, 14 Oct 2010 22:31:24 +0000

Two points:

1) Readability indices as a measure of reading proficiency.

Readability indices do not measure the reading proficiency of the writer. Rather, they measure appropriateness of material for a TARGET audience. For the sake of argument, let’s assume that you can come up with two tests that generate scores for reading proficiency and readability on an equivalent scale. At best you can argue that the readability score of an individual’s writing is unlikely to exceed that individual’s reading proficiency score. You cannot safely argue the reverse.

2) Readability indices as a measure of writing proficiency. As a technical writer, I take pride in writing clearly, so that anyone can understand my work. I know from personal experience that it takes a lot of effort to transform engineering gobblety-gook into simple language appropriate for any audience. So, from my perspective, rather than indicating relative “stupidity,” a lower score is indicative of a more inclusive communication style.

By: Tim

Tim — Thu, 07 Oct 2010 16:09:01 +0000

So I decided to check your math (or rather, that of the Coleman-Liau tester you used in your calculations) and I discovered some interesting things. First, the numbers you used from the tester were not the Coleman-Liau index numbers, they were the Gunning-Fog index numbers. No, really: the ones you’re looking for are the first listed under “Approximate representation,” and they’re considerably lower than the GFI values you wrote about.

I also crunched the CLI numbers myself and came up with slightly different values from the tester app. The values I got skewed lower than the tester app’s, and were within 0.05 of the app’s values with a constant percent error of about 0.25%. In light of the following, however, it’s completely insignificant.

Looking at the algorithm listed on the wikipedia page you linked to, I noticed that the CLI index performs calculations per 100 words, and this led me to question the accuracy of the index on very small blocks of text. I found that changing the word “me” to “our” in your first Christian blurb, by my application of the wikipedia algorithm, moves it up from 3.403 to 3.546, a jump of over 4% with the change of a single letter. I applied the same logic to the longest version and came up with 10.956 for “my” and 11.012 for “our,” – a difference of slightly more than 0.5%.

I then changed the original to “I am a Christian. I believe that Jesus died on the cross for my sins, that he was raised again on the third day, that the Bible teaches us the truth, and that God loves us very much.” This time I got a value of 4.305. That’s a 26.5% increase over the version with an extra period. But who cares – we’re comparing this to the atheist numbers.

The original atheist blurb came out at 6.973 for me, which is still a huge difference over the Christian one. Clearly we’ve got a problem with our test samples and we need to try again with more normalized input. Let’s take the two sentences you added to your originals, even them out a little bit, and compare again.

Atheist: “When it comes to the world around us, evolution is the most likely explanation for everything.”

Christian: “When it comes to the world around us, I think that the theory of evolution can’t explain everything.”

There – that’s pretty even, let’s check again:
Atheist: 77 chars, 16 words, 1 sentence, L=481.25, S=6.25, CLI=10.648
Christian: 83 chars, 18 words, 1 sentence, L=461.111, S=5.556, CLI=9.668

Less than one grade difference, which is considerably nicer than the OKCupid data shows and is simply the result of having an extra couple of words. What can I say, it takes more words to argue against something than to agree to it. Unfortunately, what this shows us is that once again, the length of the text blocks we’re feeding in are far too short, and as a result this kind of analysis is completely pointless. The algorthim simply isn’t meant to be used on blocks of text this short.

As David Kopp pointed out, in all likelihood the entire profile was polled. Take a look at the profiles pictured at the top of the OKCupid page all this is taken from: they’re easily 100+ words each, and only one of the default subheadings (“My Self Summary”) would really accommodate any kind of religious self-identification. Indeed, by the data listed on that same page, the average profile essay is a little over 530 words long, and less than 20% of all profiles even mention any religion or deity at all. The data about religion vs. CLI is taken from the information people enter about themselves, not by data mining the descriptions. Hence there can’t be any kind of bias against specific religions for at least 80% of all profiles.

But let’s assume that the 20% of profiles that do mention religion bring the rest of them down. Your basic argument is that “God” and “Jesus” are shorter words than “Allah” and “Muhammed,” and therefore the Christians skew lower because the names they use are shorter. You’re absolutely right about this: even if everything else in a profile essay is 100% identical, a dozen or so such replacements per hundred words could raise the CLI by more than one grade point. Even if we assume that this is true, and the average overtly protestant user’s profile includes more than 60 uses of “God” or “Jesus”, weighting it down to 20% means that those users would have to be scoring around 3 grades lower, on average, than other protestants to get them to ~0.6 grades lower than the comparable Muslims (which by your logic also skew lower for the same reason).

One last gripe: you mention that the OKCupid people are dishonest for displaying this data because they don’t mention that their users skew “young, tech savvy and single.” You’re right that they don’t give a direct disclaimer in that blog post, but they did say “I remind you that OkCupid’s user base is almost all in large cities,” and they have made these kinds of disclaimers time and again in other blog posts. I would assume they’re comfortable in the knowledge that their actual users – the people who read their stats blog for cool help using their site and not for profound racial and religious commentary – have read their old stuff and know what’s going on. I think it’s also safe to assume they don’t give a crap about the people who followed a link from graphjam and didn’t read any further than the first few paragraphs.

There are still two problems with your argument for chicanery, though. First, they never said that their data holds true for the wider population, they are a dating site and their purpose is to inform their users, not to be a platform for social commentary. The closest they’ve ever come to such a declaration, in any of the blog posts, is saying that their data is likely to be better than that of any other dating site (and they’re right, it’s much better). Second, the same bias applies across the board, to all the religions evenly. Unless you believe that the data miner just happened to randomly choose a disproportionately high number of old Christians and young Jewish, Buddhist, atheist and agnostic college students, it just doesn’t make any sense.

So the upshot is this: your objections to the data are valid, but not for the reasons you give. The problem is not that there’s any kind of bias in the data set, it’s that Christian users have shorter words to describe their beliefs and the profile essays aren’t long enough for a readability index to be applicable. Try not to take this so seriously, though. The graph we’re arguing about isn’t even the main focus of the blog post; it’s the silly crap they stuck in at the end to joke about the data. There’s not even any commentary on it, it’s just presented as-is with a description.

If you really want to talk about OKCupid and experimental bias, the post “The Democrats are Doomed” outright ignores factors in their data leading to bias, and is overtly written from the perspective of the democratic side. That one actually tries to make arguments about the political climate of the entire country in the context of a biased data set, and uses false assumptions about that data.

By: Everybody loves charts, right? « About…

Everybody loves charts, right? « About… — Fri, 01 Oct 2010 18:28:31 +0000

[…] (Avots: http://www.politicalmathblog.com/?p=623 u.c.) […]