Will Big Data Kill Creativity?

Analytics has turned its attention toward ebook readers. Companies can now track the way you read; it may have implications for the way authors write. From the NY Times:

The longer a mystery novel is, the more likely readers are to jump to the end to see who done it. People are more likely to finish biographies than business titles, but a chapter of a yoga book is all they need. They speed through romances faster than religious titles, and erotica fastest of all…

At Oyster, a top book is “What Women Want,” promoted as a work that “brings you inside a woman’s head so you can learn how to blow her mind.” Everyone who starts it finishes it. On the other hand, Arthur M. Schlesinger Jr.’s “The Cycles of American History” blows no minds: fewer than 1 percent of the readers who start it get to the end…

He contrasted two romance novels. One had few Amazon reviews and little promotion, but Scribd’s data showed 6 out of 10 readers were finishing it — above average for the genre. Another romance had hundreds of reviews on Amazon, but only about 4 out of 10 readers bothered to finish it. They began closing the book, the data showed, when the writer plunged deeper into fantasy. Maybe this was not a good idea.

Some writers, of course, might not be receptive to hearing this.

“If you aren’t careful, you could narrow your creativity. You won’t take risks,” said Ms. Loftis, the young adult novelist. “But the bigger risk is not giving the reader what she wants. I’ll take all the data I can get.”


Inequality in America – Alternative Measures of Economic Well-Being: Consumption

Although differences in income is the most commonly discussed type of economic inequality, there are two other important measures: consumption and wealth. Each type of measure – income, consumption, and wealth – have both advantages and disadvantages relative to the others.

The disadvantages of using income as a measure arise from a phenomenon known as “income smoothing.” A concept that is familiar to all of us, if more commonly under the pseudonym of “budgeting.” Families at different stages of their lives use debt and savings to provide a relatively stable level of economic well-being, thus “smoothing” their total lifetime income over the course of their entire period of employment (and retirement).

Therefore, as the CBO states:

[A] household’s consumption might be a better measure of its economic well-being than its income is. For households whose spending tracks their annual income, the distinction does not matter. But a young family may spend more than its current income, relying on borrowing to finance current consumption, while an older family may also spend more than its current income, drawing down assets in retirement. In contrast, a household in its middle years may spend less than its current income while saving for future needs.

Consumption data itself is also flawed since the Consumer Expenditure Survey (CEX) measuring consumption are not as wide spread as those that cover income, and there is poor coverage of consumption patterns at the top of the distribution. Nevertheless, it is clear that using consumption based measures of well-being, inequality looks much better.

For instance, two economists, Michael Cox and Richard Alm – using data from 2006 – show that although income inequality between the top and bottom fifth stood at a ratio of 15-to-1, consumption inequality was only 4-to-1. (You can read their op-ed in the NY Times along with critiques from Paul Krugman and Mark Thoma here).

In a separate study James Sullivan and Bruce Meyer reconstructed consumption poverty rates from 1963 through 2009 using different types of measures. The dark blue line shows the official poverty rate using the standard income measure, while the dark brow line shows consumption data including health insurance. (This later measure also uses a different deflator, but I will save the details of measuring inflation for another post). The difference in the two measures is not trivial. Poverty drops from 14.5% to around 8.5%.

A separate, but related way to measure inequality is by the number of modern conveniences in the average poor household. Using US Census data some economists have constructed tables showing the percentage of US households that have a varying degree of common appliances and household technology. Mark Perry, for instance, constructed the chart below that shows the poor in 2005 seem to be significantly better off in basic material terms than the average American in 1971.

And Cox and Alm constructed the graph below of the rate of penetration of modern technology over time, which appeared in the NY Times (larger image here):

A new paper from May of 2012 by Mark Bliss uses more advanced statistical techniques to account for the shortcomings in survey data to estimate the change in consumption inequality since 1980. While, in absolute terms, consumption inequality is still much lower that that of income inequality, the increase in consumption inequality tracks that of inequality in income:

Our estimates suggest that consumption inequality increased by close to 30 percent between 1980 and 2010, nearly as much as the change in income inequality, and nearly three times that estimated based on directly examining relative household expenditures in the CE [Consumer Expenditure Survey].

This paper is certainly not to be the last word on the consumption versus income debate over inequality.

A more anecdotal piece of evidence comes from Don Boudreaux who revisited his 1975 Fall/Winter Sears catalog, which he then compared to the equivalent products today (he uses as a measure the number of hours the average America had to work to be able to afford  the items in 1975 and in 2006). You can read his posts here and here.

Other than the style differences, the fact most noticeable from the contents of this catalog’s 1,491 pages is what the catalog doesn’t contain. The Sears customer in 1975 found no CD players for either home or car; no DVD or VHS players; no cell phones; no televisions with remote controls or flat-screens; no personal computers or video games; no food processors; no digital cameras or camcorders; no spandex clothing; no down comforters (only comforters filled with polyester).

Sears’ lowest-priced 10-inch table saw: 52.35 hours of work required in 1975; 7.34 hours of work required in 2006.

Sears’ lowest-priced gasoline-powered lawn mower: 13.14 hours of work required in 1975 (to buy a lawn-mower that cuts a 20-inch swathe); 8.56 hours of work required in 2006 (to buy a lawn-mower that cuts a 22-inch swathe. Sears no longer sells a power mower that cuts a swathe smaller than 22 inches.)

Sears Best freezer: 79 hours of work required in 1975 (to buy a freezer with 22.3 cubic feet of storage capacity); 39.77 hours of work required in 2006 (to buy a freezer with 24.9 cubic feet of storage capacity; this size freezer is the closest size available today to that of Sears Best in 1975.)

Sears Best side-by-side fridge-freezer: 139.62 hours of work required in 1975 (to buy a fridge with 22.1 cubic feet of storage capacity); 79.56 hours of work required in 2006 (to buy a comparable fridge with 22.0 cubic feet of storage capacity.)

Sears’ lowest-priced answering machine: 20.43 hours of work required in 1975; 1.1 hours of work required in 2006.

A ½-horsepower garbage disposer: 20.52 hours of work required in 1975; 4.59 hours of work required in 2006.

Sears lowest-priced garage-door opener: 20.1 hours of work required in 1975 (to buy a ¼-horsepower opener); 8.57 hours of work required in 2006 (to buy a ½-horsepower opener; Sears no longer sells garage-door openers with less than ½-horsepower.)

Sears highest-priced work boots: 11.49 hours of work required in 1975; 8.26 hours of work required in 2006.

One gallon of Sears Best interior latex paint: 2.4 hours of work required in 1975; 1.84 hours of work required in 2006. (Actually, Sears sells no paint on-line, so the price I got for a premium gallon of interior latex paint is from Restoration Hardware.)

Sears Best automobile tire (with specs 165/13, and a treadlife warranty of 40,000 miles: 8.37 hours of work required in 1975; 2.92 hours of work required in 2006 – although, the price here is of a Bridgestone tire that I found at another on-line merchant.  Judging from its website, Sears no longer sells tires with specs 165/13 and a 40,000 mile warranty.

Improvements in the standard of living of the average American, of course, do not directly address the issue of inequality. However, those who prefer these types of real-world examples point out that they speak to the drastic increase in the material well-being of all Americans in the past thirty-five years. Improvements that are obscured if we focus only on growing inequality. They also argue that compared to nearly any previous time in the history of the world current equality is quite high. As Don himself has pointed out, there is a lower-cost version of nearly every amenity the rich enjoy that is available to all Americans. For most of human history this wasn’t the case. Critics retort that the standard of living required to participate fully in society in the twenty-first century has itself increased and thus such comparisons are futile. Instead, we should compare what poorer Americans today have relative to what they need to lead happy, productive, and integrated lives that allow for full democratic participation. I believe both views have their merits.

In my next post I’ll examine the second of the two alternative inequality measures: wealth.

Poker: Skill or Luck?

I’m just getting around to reading this 2011 paper by Steven Levitt and Thomas Miles.

In the paper, Levitt and Miles use data from the 2010 World Series of Poker (WSOP), a poker “tour” that comprises dozens of separate tournament events every year (there were 57 events in 2010). The authors first created a list of highly skilled players using a variety of sources such as previous years’ top WSOP money and tournament winners, and published lists of highly accomplished players (from BLUFF and Card Player magazines and PokerPages.com).

This investigation gave the authors a list of 720 players who were identified as highly skilled going into the 2010 WSOP season. How did they do relative to an average unranked player competing in those same tournaments? Quite well, thank you. The return on investment of those 720 players was 30.5%, or $1,200 per player per event. That’s an annual income of $68,400 if you’re wondering.What about the rest of the players? They averaged a negative return of 15.6%. In other words, they lost over $400 per event on average. In practical terms this means that in most cases these players did not win back enough money to cover their entry fees.

The paper is interesting throughout. For example:

  • “In total, over 32,000 people competed in a least one WSOP event in 2010.”
  • “Roughly 90 percent of the players in any given tournament receive no prize money, and thus suffer a net loss equal to their entry fee.”
  • In total, the players that were not identified as highly skilled lost a total of $26 million over the course of the year.
  • Altogether the 720 skilled players, by contrast, made a profit of $11 million. Remember this profit is winnings minus entry fees. And these fees can be enormous. According to the paper “one individual spent more than $260,000 on entry fees.”

Levitt and Miles also use what they term a “crude” analogy to baseball to demonstrate that poker is indeed a game of skill. I found the analogy quite instructive. Here the authors use the WSOP data to construct the winning percentage of a skilled player versus an unskilled player in a head-to-head matchup. They find that the skilled player wins 54.9% of the time. Comparing this to baseball, since 2007 skilled teams (those that made the playoffs the previous year) defeated unskilled teams (those that did not) 55.7% of the time. So the skill involved in being a good poker player is roughly analogous (in a relative sense) to being a playoff-quality team in baseball.

Lastly, I enjoyed this chart the authors included that shows the cash payout structure of a typical WSOP event.

Data, Shmata

On my bad days, when I’m wavering in my faith that humans have any air of rationality, I often begin to have a creeping fear that we have become allergic to data and evidence. And I don’t just mean that we avoid science. I mean in our everyday lives. Here’s an example. As I was logging into my WordPress account today I noticed on the homepage a blog post about Jonathan Franzan’s top 10 writing tips. I read the post, smiled, and reflected for a few moments. Noticing the blog’s author had written a related article about Franzan’s The Corrections, and having thought about picking up the book for several months now, I clicked on the link.

Several paragraphs into the post was this claim:

“Second, reviews on The Corrections from non book critics (read: normal people like you and me) are mixed. Take one look at The Corrections on Amazon and you’ll see what I mean. It’s a three-star novel with more than 1,000 reviews. Why is it three stars? People either really love it or really hate it.”

I hear these sorts of declarations made quite often. “Either you’ll love his sense of humor, or you’ll hate it.”  “Either you love her beff stroganoff, or you hate it.” “Either you’ll read this book in a single night, or you’ll throw it down in disgust after the first five pages.”  I’ve never been one to buy into such extreme pronouncements so I decided to mozy on over to Amazon and take a look at this book’s customer reviews.

If we assign “really love it” a review of 5 stars and a “really hate it” a review of 1 star and do a little math we’ll learn that 55% of people either love this book or hate it, while 45% have some ambivalence toward it. Far from the claim that The Corrections is either loved or hated, only slightly more than half of readers harbor such strong convictions regarding the novel.

I don’t mean to pick on what seems like a well intentioned blog (the author is reading all 100 of Time Magazine‘s greatest English-speaking novels since 1923, a glorious and praiseworthy effort); and perhaps a few of you will write my complaint off as pedantic. But I still think it matters.

The example above is but one case of what has become (or maybe always was) an aversion to hard evidence. That we don’t calculate the percentage of people who feel a particular way toward a novel before writing a blog post is of no import. The larger point is that, far from staying neutral on topics we know little about, we transform into raconteurs—waxing lyrical, compelled to have an opinion one way or the other on every topic at hand—all the while ignoring the solid terra firma of the measurable and real as it sinks further and further away beneath our feet.

The question, “How do people on Amazon feel about The Corrections?”, like many other questions in life, has an answer (at least in part). Many of the riddles that confront our everyday lives do not. To treat that which is fact as merely a matter of opinion—or worse, to treat that which is unknowable as something that can be made real simply by will of conviction—is to infuse into the genuine a spurious nonsense. It is to give credence to intentions, hopes, and desires while discounting outcomes, history, and evidence.  It is to bake a pie made of lies—an American Lie Pie—and try to force others to eat it.

The more we make such unwarranted claims in our daily discourse, the lazier our brains become, the more susceptible we become to specious professions, and the more we view data and evidence as the banal details that should be relegated to science labs and courtrooms. I also believe it is partly at fault for our blind adherence to our own ideology.

We are creatures of habit, running in the same circles of friends week after week, watching the same news programs, reading the same websites. The claims we make and hear from others get batted around unchallenged, slipping into conversation as easily as laughter or talk of the weather. True or not, our allegations to one another become reality. So much so that we are jarred when we are confronted with anything different. So enraptured in what we know to be “true” we respond with vitriol and indignation; only then does our scientific mind suddenly jolt to life, demanding from those who have challenged us every scrap of data and evidence on the subject hand. And even then we are likely not to believe.

We may not be able to pause for mathematical calculations or deep research with every lackadaisical comment we make. But we can stop and think, the next time we say something, “I wonder if that is true”. If there is a computer nearby maybe we can look up the answer. We can choose not to speak about the many things for which we have no knowledge, or when we do, we can state them as a matter of opinion, noting that we could easily be wrong. We can understand that on many issues there are multiple sources of data, often conflicting, and in these cases we can talk about the relative merits of each rather than discounting completely the side which contradicts our sensibilities. On matters that are settled, we can follow the evidence, even if it disagrees with what we want to believe. Or at the very least we can say, “Sorry, I understand where you’re coming from and understand your evidence, but I’m biased on the subject. Even if you’re right my heart won’t let me agree.”