Category Archives: math misused

Debunking the “Republican Congress Creates Jobs” Chart Or “How To Make Numbers Say Anything You Want”

This is a companion piece to the previous post, so please read both of them. Here I’m going to lay out the script I had written for debunking the chart I created that asked the question “Does a Republican Congress Create More Jobs?” and then implied with a chart that this was indeed the case. I’ll walk through some process for creating charts and then talk about why I would create a chart that I was just going to debunk.

I apologize for the similarity to the post where I debunk the Obama stimulus chart. These two scripts were meant to be together.

<Start Script>

How to Make Number Say Anything You Want

Do you want to convince people that your side is right with only the flimsiest proof? Does the idea of tricking people with numbers make you all happy inside? Then come join us as we walk through “How To Use Charts To Say Anything”

Step 1: Massaging the Data

The first step is to grab the data that makes your point the best. Let’s use it to prove that a Democratic Congress is bad for jobs.

“How can we do such a thing” you ask?

In the first case, the raw jobs data looks like this

but the final chart looks like this.

How did they do that? Was it magic?

Nope, we simply smoothed the data. The raw data is a little too chaotic and has too many data point to tell the straightforward story that we want. So instead, we’ll average the monthly data so that we have quarterly data. There… now we have some nice smooth straightforward data

Step 2: Pick colors that make you look good

Next, we pick some colors. Let’s make the Democrats blue dark and bold, give it a bit of an angry feel to it. This is our way of getting the audience to look at the democrats in a harsh way. We could try to soften up on the Republicans more, but too soft of a red would look pink and we don’t want that.

Let’s compare our colors to the Excel defaults:

Step 3: Do NOT give any context!

Finally, and this is the most important part, only give information that is helpful.

Let everyone know that we saw 8 million jobs added to the economy while the Republicans were in charge and make a point to show that we lost 8 million jobs while the Democrats were in charge. But don’t mention that the Republicans took Congress only a year after 9/11 at a time when the job market was particularly low. Otherwise people will think it’s a “Well, they can’t fall off the floor” thing.

And make sure you don’t mention anything about the real estate market and how the bubble drove the labor market in a way that was clearly unsustainable. We don’t want the viewers to be confused with all these relevant details. We want them to say “Republicans good, Democrats bad”.

<End Script>

Everyone here was incredibly kind to put up with my bullshit chart for as long as I left it up without explanation. I’d like to say unequivocally: My chart is propaganda… just like the Obama administration’s chart. I was trying to use my chart as a visual talking point that said:

If you have no ethical qualms, data visualizations can be manipulated to say exactly what you want them to say.

My chart implies that the Republicans were responsible for the jobs growth between 2003 and 2007 and that Democrats were responsible for the drastic decline from 2007 to the present. Let me state plainly, I do not think that is the case.

But if we just play around with the data the right way, we get what seems to be a clear picture that portrays a correlation and gets on its hands and knees and begs us to draw causation from it. Most people will do exactly that.

I can spend hours walking patiently through what is wrong with the Obama administration’s chart. Let me recap the high points here:

  • If you look at the data with the context of what President Obama’s team was hoping the stimulus would do, the power of the chart disappears.
  • If you look at the data with the understanding that they’re charting a first derivative, you realize that we haven’t gained jobs, we’re just losing them more slowly and the power of the chart disappears.
  • If you look at the data with the understanding that they didn’t even start spending the stimulus until the job loss had started slowing down, the power of the chart disappears.
  • If you look at the data in the context of other recessions, you’ll realize that, far from showing a drastic improvement, the numbers represent a devastatingly slow jobs recovery compared to other recoveries and the power of the chart disappears.

But this kind of explanatory rebuttal would interest those already convinced. The chart I made had a power that an calm explanatory video wouldn’t have. Quite frankly, I hate that this is the case. Like President Obama’s chart, my chart doesn’t teach people anything about economics or lead people to learn important things about unemployment.

The only valuable thing my chart teaches is that charts can portray accurate data and still be manipulated to coach people along to poor conclusions. The only reason I even put my chart up is because it is the graphical equivalent of drawing out the Obama administration’s argument to its logical conclusion. My chart works with the same data, the same assumptions, and the same implications. And it leads to a completely different conclusion.

I’ve heard people describe President Obama’s chart as “powerful” and “brilliant”. The popular information visualization blog Flowing Data even tossed it up for public discussion among info viz professionals.

My point here is that it isn’t brilliant. It’s juvenile. It’s the chart equivalent of a crass political cartoon with a Snidely Whiplash mustache drawn on the bad guys. It’s a design trick imagined by cynical, self-congratulatory children fresh out of graduate school who pat themselves on the back for their ability to fool people who they think are too stupid to know the difference. They think they are special because they can get powerful people to flatter them for their ability to lie.

But they aren’t special. I can play that same childish game in my free time. The difference if that I want people to know that it’s a trick. They would rather see people fooled.

Why Take Math? So Your Ignorance Isn't Broadcast Nationwide on the AP Wire

This is pretty funny. Or horrifying. Depends on how you want to look at it.

Several days ago, I noted on Twitter that there were a lot of “saved” jobs that weren’t saved at all but actually cost of living increases. About 24 hours after I noted this, there was an Associated Press article about that very phenomena.

Coincidence? Almost certainly. But I’ll flatter myself anyway.

But the laugh riot comes several paragraphs into the article as they look into why Southwest Georgia Community Action Council was able to save 935 jobs with a cost of living increase for only 508 people. The director of the action council said:

“she followed the guidelines the Obama administration provided. She said she multiplied the 508 employees by 1.84 — the percentage pay raise they received — and came up with 935 jobs saved.

“I would say it’s confusing at best,” she said. “But we followed the instructions we were given.”

“Confusing at best”? The multiplication of percentages is “confusing at best”? It seems obvious to me she should have multiplied 508 people by the amount the increase (.0184) and gotten 9.3. But she forgot that you have to divide the percentage by 100 before you multiply.

The fact that she had “saved” more jobs than there were people in the organization should have been a tip-off. But this is a pretty common problem with people who don’t have a very good grasp on mathematics… they don’t recognize obvious mathematical errors, they just plug in the numbers and go with whatever comes out.

And this, children, is why you pay attention at school. So you don’t get in the national news for doing something really stupid and then blame it on the instruction manual.

Why Take Math? So Your Ignorance Isn’t Broadcast Nationwide on the AP Wire

This is pretty funny. Or horrifying. Depends on how you want to look at it.

Several days ago, I noted on Twitter that there were a lot of “saved” jobs that weren’t saved at all but actually cost of living increases. About 24 hours after I noted this, there was an Associated Press article about that very phenomena.

Coincidence? Almost certainly. But I’ll flatter myself anyway.

But the laugh riot comes several paragraphs into the article as they look into why Southwest Georgia Community Action Council was able to save 935 jobs with a cost of living increase for only 508 people. The director of the action council said:

“she followed the guidelines the Obama administration provided. She said she multiplied the 508 employees by 1.84 — the percentage pay raise they received — and came up with 935 jobs saved.

“I would say it’s confusing at best,” she said. “But we followed the instructions we were given.”

“Confusing at best”? The multiplication of percentages is “confusing at best”? It seems obvious to me she should have multiplied 508 people by the amount the increase (.0184) and gotten 9.3. But she forgot that you have to divide the percentage by 100 before you multiply.

The fact that she had “saved” more jobs than there were people in the organization should have been a tip-off. But this is a pretty common problem with people who don’t have a very good grasp on mathematics… they don’t recognize obvious mathematical errors, they just plug in the numbers and go with whatever comes out.

And this, children, is why you pay attention at school. So you don’t get in the national news for doing something really stupid and then blame it on the instruction manual.

Jumping Into Visualization Without the Math

I found this link from Instapundit, so credit where it is due.

You may have seen this visual of job loss across the country. It maps the job gains and losses in major metro areas across the country and, on the surface it seems pretty cool. Here’s October 2008.

JobLossOctober2008

As someone who really loves information visualization, I applaud the effort. But it’s wrong.

Let’s take a quick look at the legend. See if you can spot the problem.

JobLossScale

Keen readers will notice the problem… whoever created this visual scaled only the diameter of the circle. The problem with this is what we can see below.

JobLossScaleProblem

Here I took the “10,000” circle and duplicated it over 50 times within the “100,000” circle. If this visual were an accurate one, we would multiply the 10,000 circle ten times to get 100,000. That’s just the way these things should work.

Math Time! (skip if you don’t care)

The area of a circle is calculated with the equation:

AreaText

Which means that when they increase the height of the circle by 10, they increase it’s area by 100. This means that instead of the numbers increasing the way they should, the small numbers end up looking REALLY small and the big ones look absurdly huge.

End of Math Time

I’m not trying to be an a**hole here. The idea behind the visual was a good one. But these things really do need to be accurate. Most people don’t know how to tell when a visual is in error and they end up with an incorrect impression from a poorly built infographic.

Space Junk And Visual Lies

A little while back, due to a collision between a dead Russian military satellite and a US commercial satellite, there was some noise about space junk because of the potential danger it posed to the International Space Stations and the Shuttle. The image that of space junk that became the icon of the problem is this image (click to enlarge):

SpaceJunkImage

I hate this image. Passionately.

The reason I hate this image is because it is probably the biggest visual lie I’ve ever seen. In his book The Visual Display of Quantitative Information, Edward Tufte has a concept called the “Lie Factor”. The “Lie Factor” judges the extent to which the data and the visual are out of sync.

Nothing could be more out of sync with reality than this image. While it imagines the appropriate number of objects circling the earth, it completely misrepresents the scale of those objects.

Space is unimaginably huge. While there are thousands of objects circling the earth, they range in size from a volleyball to a small school bus. If you do the calculations, the objects in this image range in size from Delaware to Tennessee.

Math Time! (skip if you don’t care)

In this image the diameter of the earth is about 1950 pixels. The real diameter of the earth is about 8000 miles. That means that every pixel is a shade over 4 miles.

The smallest piece of space junk in this image is about 10 pixels wide and 18 pixels tall and the largest one is about 24 pixels wide and 104 pixels tall. That gives the small objects an area of about 3000 square miles (about 30% larger than Delaware) and the large ones an area of 41,000 square miles (a shade smaller than Tennessee).

End of Math Time

To give an example of this exaggeration, let’s look at Angelina Jolie. (How’s that for a non sequitur?) Jolie has a freckle (beauty mark, mole, whatever) above her right eye.

JolieFreckle

Let’s say we’re concerned about people getting skin cancer, so we want to make a shocking graphic that we hope will help people remember to monitor skin markings for signs of melanoma. If we lied visually as much as the space junk photo, we would change a picture of Angelina Jolie from:

JolieNormal

to

JolieFreckleExaggerated

Imagine the Photoshop is done a shade better than I can do. The intention to do good and get people to realize the severity of melanoma is all well and good, but it doesn’t justify lying to people.

Granted, the space junk image holds the disclaimer that it is “an artists impression”. But that isn’t how people read these kinds of things and anyone who believes otherwise is, quite frankly, lying to themselves about the realities of human perception and belief. People see these images and they expect that they match reality in some way. Do a search for “space junk” to find out how many otherwise intelligent people have accepted this image as reality without a breath to admit how inaccurate it is.

This is not to say space junk isn’t a problem. I would have “solved” the problem of visual representation by portraying the space junk as a dot. A single pixel that can clearly indicate position instead of pretending to be a representation of size. Then, I would explain that, even though these objects are very tiny compared to the size of the space they’re in, this junk moves at thousands of miles an hour… making very small objects insanely dangerous.

You could effectively compare it to shooting a bullet into the air. A tiny piece of metal in a huge space can be really dangerous. People get that. There is no reason to portray the bullet as a 747.

I’m worried that even scientific people either didn’t recognize this problem or didn’t feel the need to speak up about it. Even people experienced in infographics didn’t say anything (see here, and here). (Side Note: I take particular pleasure is smacking down Wired magazine for putting up this graphic without even mentioning that it is an “artist rendering”. As a whole, they tend to be smug and irritating in the extent to which they dismiss anyone without technical or scientific expertise. Here they reveal that they are just as susceptible to junk science as the average Joe.)

There is an extent to which many people in scientific and technical journalism are content to give people the appropriate impression (“Space junk is a dangerous problem”) without providing them with the appropriate information. Or, to put the problem simply, they think the end justifies the means.

I take the view that truth in data is the highest importance. I’m frustrated in how lonely it is out here on my high ground.

"Real Unemployment" at 16%? Color Me Skeptical

You may have seen the recent headline “Real US unemployment rate at 16 pct: Fed official. A snippet:

“If one considers the people who would like a job but have stopped looking — so-called discouraged workers — and those who are working fewer hours than they want, the unemployment rate would move from the official 9.4 percent to 16 percent, said Atlanta Fed chief Dennis Lockhart.

UPDATE: Commentor Tom M. takes note that Mr. Lockhart is probably refering to the U6 numbers and this fact was simply not reported appropriately. He says:

When economists, such as myself, talk about the “real” unemployment rate, we are usually referring to the U6 unemployment figure, which is the U3 rate (the published/official rate) plus people that are “part time for economic reasons” among other groups.

If that is the case, it makes most of the rest of what I have to say pretty much void, but I’ll leave it up anyway. Thanks Tom!

A little while back, I called “discouraged workers” the “despair numbers” (basically, they say they want a job, but they aren’t looking for one).

My conclusion was that we’ve always had despair or discouraged workers, so suddenly adding them in now seems like a dishonest tactic to artificially inflate unemployment to some scary level. In good times, we saw unemployment at about 4-5%, so we’re used to thinking about that range as being good. But if you add the “discouraged workers” in those good times, you’re looking at a “good” unemployment rate of about 7-8%.

As for the “wants to work more hours” crowd, I’m open to considering that group in some way, shape or form, but I don’t know how to add them in a way that is honest. Frankly, as a small business owner and contractor, I don’t work as many hours as I would like. But I don’t go around calling myself “unemployed” or even “underemployed”.

If you look at the Bureau of Labor’s stats on part time workers, you can see that the number has jumped about 3 million in the past year. If we add those workers plus the increase in the “discouraged workers” (about 1 million), we get a rate a little over 12%.

But the problem in my mind is that you can’t simply add part time workers to the “unemployed” list to get any kind of meaningful data. Maybe, for the sake of argumentation, you could could cast an involuntary part time worker as half a worker. Then the unemployment rate is a shade over 11%. This is, I think, a not-unreasonable number to use, given that it shaves off the standard number of “discouraged workers” and uses a dampening variable to account for the fact that part-time workers aren’t really “unemployed”, but “underemployed”.

But I could be easily convinced that crunching the numbers in a new and interesting way is basically statistical cheating and we should just use the standard definitions.

Overall, I’m really uncomfortable with the whole “let’s crunch the numbers so the situation look really terrible” methodology because all it does is try to cast the current situation in a bad light by changing the metric. But you can’t use one metric in the good times and another metric in the bad times.

As such, I think the 16% number is really more of a scare tactic than anything else.

“Real Unemployment” at 16%? Color Me Skeptical

You may have seen the recent headline “Real US unemployment rate at 16 pct: Fed official. A snippet:

“If one considers the people who would like a job but have stopped looking — so-called discouraged workers — and those who are working fewer hours than they want, the unemployment rate would move from the official 9.4 percent to 16 percent, said Atlanta Fed chief Dennis Lockhart.

UPDATE: Commentor Tom M. takes note that Mr. Lockhart is probably refering to the U6 numbers and this fact was simply not reported appropriately. He says:

When economists, such as myself, talk about the “real” unemployment rate, we are usually referring to the U6 unemployment figure, which is the U3 rate (the published/official rate) plus people that are “part time for economic reasons” among other groups.

If that is the case, it makes most of the rest of what I have to say pretty much void, but I’ll leave it up anyway. Thanks Tom!

A little while back, I called “discouraged workers” the “despair numbers” (basically, they say they want a job, but they aren’t looking for one).

My conclusion was that we’ve always had despair or discouraged workers, so suddenly adding them in now seems like a dishonest tactic to artificially inflate unemployment to some scary level. In good times, we saw unemployment at about 4-5%, so we’re used to thinking about that range as being good. But if you add the “discouraged workers” in those good times, you’re looking at a “good” unemployment rate of about 7-8%.

As for the “wants to work more hours” crowd, I’m open to considering that group in some way, shape or form, but I don’t know how to add them in a way that is honest. Frankly, as a small business owner and contractor, I don’t work as many hours as I would like. But I don’t go around calling myself “unemployed” or even “underemployed”.

If you look at the Bureau of Labor’s stats on part time workers, you can see that the number has jumped about 3 million in the past year. If we add those workers plus the increase in the “discouraged workers” (about 1 million), we get a rate a little over 12%.

But the problem in my mind is that you can’t simply add part time workers to the “unemployed” list to get any kind of meaningful data. Maybe, for the sake of argumentation, you could could cast an involuntary part time worker as half a worker. Then the unemployment rate is a shade over 11%. This is, I think, a not-unreasonable number to use, given that it shaves off the standard number of “discouraged workers” and uses a dampening variable to account for the fact that part-time workers aren’t really “unemployed”, but “underemployed”.

But I could be easily convinced that crunching the numbers in a new and interesting way is basically statistical cheating and we should just use the standard definitions.

Overall, I’m really uncomfortable with the whole “let’s crunch the numbers so the situation look really terrible” methodology because all it does is try to cast the current situation in a bad light by changing the metric. But you can’t use one metric in the good times and another metric in the bad times.

As such, I think the 16% number is really more of a scare tactic than anything else.

How Big is $9 Trillion? – Willful Omissions From Paul Krugman

You may have seen the Paul Krugman post “How Big is $9 Trillion” in which he attempts to defend the Obama administration’s recent announcement that they expect that their policies will increase the national debt by $9 trillion. His tack is to “explain” that $9 trillion isn’t really all that much when you understand it in context.

it’s being treated as an inconceivable sum, far beyond anything that could possibly be handled. And it isn’t.

What you have to bear in mind is that the economy — and hence the federal tax base — is enormous, too. Right now GDP is around $14 trillion. If economic growth averages 2.5% a year, which has been the norm, and inflation is 2% a year, which is the target (and which the bond market seems to believe), GDP will be around $22 trillion a decade from now. So we’re talking about adding debt that’s equal to around 40% of GDP.

Right now, federal debt is about 50% of GDP. So even if we do run these deficits, federal debt as a share of GDP will be substantially less than it was at the end of World War II.

I defer to Paul Krugman on a lot of things because he is transparently smarter than I am. But it is precisely because of this fact that I know he is conscious of the obvious reasons his analysis is hogwash.

First of all, the national debt in WWII was initiated by an existential threat to the very continuation of our country. Mr. Krugman does not make even attempt to make the case that we have a similar crisis that justifies this kind of debt.

Second, implicit in his observation is the concept that since we did fine after WWII, we’ll do fine now. But the years after WWII saw drastic reductions in the inflation-adjusted debt driven by drastic reductions in spending. Mr. Krugman points to no similar possibility in the post-Obama world.

Third, we have something now that we didn’t have in the 1940’s. Back in the 1945, at the height of the spending that saw our national debt rise so dramatically, entitlement spending and interest on the national debt made up a meager 5% of our total budget.

By the end of President Obama’s term (if he runs two terms) we’ll be looking at a federal budget that is 70% mandatory spending. (I assume for the purposes of consistency that mandatory spending includes interest on the national debt because we don’t really have a choice in not paying it.)

Here’s a quick visual of the difference in the budgets in 1945 and 2016. (Ugly, because I did it fast… I’m on vacation.)

1945 vs 2016

If you look at the 1945 budget with the single question “How are we going to reduce our debt?” you can identify the major problem. It’s the defense budget, which is almost 90% of the budget. Interestingly, reducing the defense budget is exactly what we did in order to reduce the debt, cutting it over 80% in 3 years (it helped that we won the war).

As a contrast, President Obama’s solution to reducing overall spending is… well, I don’t think he really has a plan. His projected budget in 2016 has reduced the defense budget as a percentage of the overall budget from 20% to 14%, but military spending isn’t what is killing us. The president has no plans to reduce mandatory spending whatsoever. In fact, his only change to entitlement spending is to increase it.

My problem with Mr. Krugman’s “How big is $9 trillion?” is that he is aware of all the problems I pointed out. He didn’t explain how much $9 trillion is; he obfuscated it. By comparing the debt load in the heart of a world-shaking war to a debt load that was accumulated in (relative) peacetime, he has misled his readers to the real significance of the data.

(By the way… if you would like to blame the debt load on the Iraq war, you should know that those costs have raised our debt by 5% of the GDP. Comparing this to WWII, which raised our debt by 70% of the GDP, is a pretty weak argument.)

President Obama, I Fixed Your Graph For You

I’ve been pretty quiet recently because 1) I’m on vacation and 2) I’m trying to wrap my head around the health care issue before I talk about it at length.

But today I saw something on healthreform.gov that bothered me:
Combined PPO

Here’s the thing, Mr. President. There is such a thing as visual lying. That is when you show a graph and you show the numbers but the two things are not in any way related to one another.

That is the problem here. If someone looks at this graph, they see that the sky is falling because the bars have increased so dramatically. On the left, your team has represented a 30% increase with a graphic that shows a 966% increase. On the right, your team has represented a 63% increase with a graphic that shows a 308% increase.

And are the two sets bars related in any way? You might think so, given that they show up next to each other and are supposed to measure the same thing. But from a data perspective, they are not even remotely close to being right.

It is possible to use graphs and numbers in such a way that is honest. That’s an important part of transparency. So, I fixed your graph for you.

You’re welcome.

UPDATE: In the comments section, James quickly identified the problem… the graph starts the y-axis at 1000 instead of 0. I double checked and it looks like he is spot on. Thanks!

With that in mind, the graph is more of a rookie mistake than a conscious attempt to deceive. I’ve edited my post to reflect that (I left my original comments in so everyone can see what a smart-ass I tend to be).

AP Apparently Dislikes Accurately Representing Abortion Violence

I love the way Freddie introduced a post on abortion late last year. He titled it “you know what we don’t talk about enough? Abortion“. 

I kind of feel the same way about the level of discussion going on with it. I probably would never have mentioned it at all on this blog if it hadn’t been for the incident on Sunday in which a man shot and killed Dr. George Tiller “one of the nation’s few providers of late-term abortions”.

In the fourth paragraph of the AP article, I came across this line:

“But the doctor’s violent death was the latest in a string of shootings and bombings over two decades directed against abortion clinics doctors and staff.” 

After reading that, I decided to look into the statistics of abortion violence with a view toward perhaps creating a visualization about it.

Sadly, there are few things more skewed than abortion violence statistics. I found this pdf on “Abortion Violence and Disruption Statistics” done by the National Abortion Federation and it is mainly propaganda dressed in numbers. But it looks like their numbers on shootings and bombings are verified by legal authorities, so I assume they are pretty accurate. 

Let’s use those statistics to deal with the “string of shootings and bombings over two decades” that the AP talks about. (In order to give the AP the benefit of the doubt, let’s assume that all the “Attemped Murders” of abortion clinic staff involved shooting of some kind. )

According to the NAF document above, this is that the “string of shootings and bombings” looked like over the last 15 years:

AbortionStats4

Did you know that this is the first abortion related murder since 1998?

I didn’t.

I was under the impression from the AP that abortion killings were like school shootings… the kind of thing that we tragically see on an ongoing basis. (I thought about a graph comparing school violence to abortion violence, but it seemed kind of apples-to-organges to compare sociopathic, psychotic and suicidal teenagers to politically motivated terrorists.) 

Given the actual data, the characterization of this incident as “the latest in a string of shootings and bombings” is deeply dishonest. It embeds into people’s minds the idea that this is a very common tragedy, like school shootings, hurricanes or gang-related violence. In fact, until I looked at the data very recently, I was under exactly that impression. 

It would be much more accurate to say something along the lines of:

This incident has shattered an eight year lull in anti-abortion related shootings, an activity that spiked to record levels in the 90’s.

UPDATE: Upon re-reading my post I realized that it sounded very dry and unfeeling… very matter-of-fact… when I talked about the recent murder. I hope no one got the impression that I’m wholly unphased by this crime. Nothing could be further from the truth. I hope that the fact that referred to crimes of violence against abortion clinics and the staff as acts of terrorism would indicate how I feel about the topic.