Chewing Pixels: 'Video Game Review Scores: Pointless or Pertinent?'
['Chewing Pixels' is a regular GameSetWatch column written by British games journalist and producer, Simon Parkin. This latest instalment deals with video game reviews and scoring - the good, the bad, and the ugly.]
Last month a British games journalist reviewed Xbox Live Arcade’s Penny Arcade Adventures for two different publications. In one of the magazines the game scored 4/10 while, in the other it was awarded 68%. While it’s a discrepancy that caused some to raise their eyebrows, most commentators acknowledge that the difference simply reflects each publication’s own particular use of the numerical review scale.
Two weeks later Microsoft announced their plans to remove games with an average Metacritic score of 65% or lower from their XBLA service. If the decision on whether to keep Penny Arcade Adventures on the service were to be based solely on the judgement of this reviewer, its fate would swing on which review was looked at.
While a game’s Metacritic or Gameranking average score has often been used to dictate the size of a development staff’s bonuses, EA’s decision to use numerical scores as the criterion for dictating whether games can be sold on their service or not has elevated the numbers issue a whole new level of consequence.
Some argue that scores represent different things to different publications, one title’s 4/10 being another’s 68%. Others question why, when scores rarely tally with a game’s commercial success, we should use them to make commercial decisions? Always, the question behind the question is: do review scores actually matter and, if so what do they even mean?
At a glance, review scores seem to be the most harmless of things. While good critics will bemoan having to reduce a 1000-word piece of incisive criticism to a number on a 10 point scale (or, um, 19 point scale if you’re GameSpot), to the average consumer they offer a useful shorthand reference point with which to compare different titles and inform buying decisions.
But to fully understand the confusing tangle review scores have landed both reviewers, consumers and the wider industry in, it’s important to understand their origins. Review scores are a system imported from those publications that review and rate consumer products like televisions and toasters. For example, look at this review of the Canon EOS400D camera. It’s 25 pages long and is the most objective dissection of this model of camera as it is possible to create.
Every aspect to the product is pulled apart, rated and weighed with statistical graphs and comparative data. By the end of the review you know every single detail about the camera and how it empirically compares to its rivals.
It’s a huge exercise in absolute objectivity and, at the end of the gigantic review the author sums up the good points and the bad points and there is no shadow of a doubt that everything said is ‘factually correct’.
Additionally, there is a place on a defined scale of quality upon which the product sits at that moment in time. It compares to other cameras on the market in defined ways, despite being a complex product. Using the review data it would be possible to arrange all of the digital cameras into a ‘truth’ line of quality, with the ‘best’ camera sitting at 100 and the ‘worst’ at 1 and to place this camera somewhere along that line, thus communicating to a consumer its relative and inherent qualities in a single representative digit.
It seems sensible then to believe that such an exercise could be applied to video games to construct a similar scale of quality. Indeed, this is exactly what many video game consumers want from their reviews.
The average reader (even if they don’t know it) is after a complete objective, scientific comparison between game x and game y with data and statistics and, finally, a numerical point on a linear scale by which they can compare, for example, Mass Effect with Rock Band and see which one is empirically better.
Except, of course, video games don’t work in the same way as toasters or digital cameras. Sure, they have mathematical elements and measurable mechanics and it’s possible to compare the number of polygons between this one and that and spin out ten thousand graphs detailing how two specimens compare. But, unlike with the Canon EOS400D, I would have no idea at the end of those 25 pages which game was better or where they would sit on the ‘true’ scale of quality.
Games are experiential and it is impossible to be wholly empirical or objective about them. Game reviewers instead present their experience of the game with, hopefully, lots of reference points and their weight of knowledge behind them. They might make empirical comparisons between game x and game y’s framerates but they will also argue whether they think this in any way effects the experience for better (in the case of bullet hell shooters such as DoDonPachi) or for worse. They have to argue their points because there isn’t data on the overall, indefinable quality of a game.
In the early days of magazine publishing, video game reviewers would often break a game down into all of its constituent parts (graphics, sound, ‘lastability’ etc), score each on a comparative line of quality and then present the average of those scores as the game’s overall measure of quality.
However, this approach presumes that it’s possible to put each of a game’s constituent parts on a definable scale of quality. The truth is that gauging a game’s graphical appeal is a subjective pursuit in the same way that trying to comparatively score a Monet against a Picasso would be. Call of Duty 4’s competent stab at sunset-drenched realism has a certain appeal, but then so does the 8-bit elegance of a Chuckie Egg or Geometry Wars.
Secondly, games are more than the sum of their parts. You could have a visually astounding videogame with a gut-wrenching soundtrack and astute, nuanced voice acting and it could still be terrible to play and vice versa. Aggregating scores from extrapolated game elements tells you nothing anyone would actually want to know about a videogame.
At this point, defendants of the review score will offer: ‘Why not just review the game on how fun it is, then?'
The problem with wanting a purely objective ‘review’ of a video game is made doubly complicated by the fact that a video game’s purpose is never so narrow nor so easily defined. Consumer goods have a very clearly defined job to do. A digital camera is there to take the best possible photographs, a toaster is there to make toast to whatever specification the consumer requires in the shortest and most efficient timescale. And because their purpose is tight and the measure of the product’s success easily calculable, they lend themselves to ‘review’ and ’score’ testing.
In contrast, the purpose of a video game is much less narrowly defined. Most game ‘reviewers’ would say that the purpose of a game is to be fun and to entertain. But actually pinning down such abstract concepts is tricky as there are as many criteria and understandings of what is entertaining and fun as there are humans. Thus, reviewing a video game in the same way as you’d review a digital camera or other similar consumer product is inappropriate or, at very least, misleading.
All this is not to say that review scores are entirely meaningless or misleading. In fact, they do have a very clearly defined purpose; it’s just that it’s a different purpose to the one that’s widely understood.
Scores have come to represent whether a game over-achieves or unde-rachieves on the preview hype that was generated by the publication ahead of its release. As previews in the average video game magazine are so heavily influenced by advertisers (after all, a preview is offering no judgment on the quality of a game, so a magazine/website can print riotously positive spin and maintain clear conscience) this weighting of preview coverage sets imbalanced expectations in readers.
Rather than focusing on the most interesting, promising or innovative games coming out, readers are made to get excited about those whose publishers pay the most for, be it directly through advertising or indirectly through the general marketing promotion of a title.
This is why when a game like Koei’s Bladestorm gets 8/10 in some publications, readerships become incredulous. Their expectations for the game haven’t been set that high because they were being fed hype of a different flavour.
Then, conversely, when Metal Gear Solid 4 scores an 8/10 on Eurogamer last week, the readership revolts the other way - because that’s far below their expectations. Remember: in both cases nobody but the reviewer had played the game at the point the reviews came out - why then were people so quick to damn each respective score (for opposing reasons) if they’ve no hands-on experience?
Scores then become a reference to a game’s preceding hype. An 8/10 for a game that was hugely hyped to hobbyist gamers is a punch in the stomach for excited fans (see the anguish exhibited in the MGS4 comments thread). Conversely, an 8/10 for a game nobody cares about is viewed a gross over-generosity.
And that, is why video game review scores are pointless: they often answer a pertinent question that nobody realised they were asking.









Comments
If numerical scores or so useless for mass comsumption media, then why are so many publications perfectly capable of ending book, television, album, and film reviews with a score?
Posted by: Daedalus | June 5, 2008 5:20 PM
Oh, if only they used the scores as "appropriately" as film reviewers do.
If only we didn't have so many previews is perhaps a better suggestion though :)
Interesting column, and a good theory on what the scores actually mean. The reviews seem very beholden to hype even when it is obvious after the game is out the scores sometimes vastly differ from the actual game quality/fun/whatever, on anyone's scale.
(Also; shouldn't that be "100 point scale"? I thought several places used a % system, not a 10-star system. Maybe I've just not checked in ages, oh well).
Posted by: Andrew Armstrong | June 5, 2008 5:23 PM
Not to mention that numerical scores are now being taken into account by publishers, like with the stuff being pulled off Live Arcade based on Metacritic scores.
Posted by: Matthew Boyd | June 5, 2008 5:29 PM
Great article Simon, although I'd argue that those who tend towards clamouring for "objective" review scores tend to usually mean "the review and score I have concocted in my head" and little more. That is, at least from my experience as someone who's been on the sharp end of the "you should be objective" stick on numerous occasions.
Whilst an occasionally nice at a glance overview when flitting through a magazine, I'd gladly see the back of review scores in favour of more critical / enjoyable / readable content that allows the reviewers opinions to shine through without being eclipsed by a magic number at the end.
Posted by: Oddbob | June 5, 2008 5:37 PM
Reviewers hate scores, and there have been a few efforts to ditch them in favour of pure discussion pieces (EGM's "roundtable" seems like yet another attempt down this road.) In every case, it's been the readership who have demanded the scores return.
Posted by: Merus | June 5, 2008 6:14 PM
I would agree that games are not as amenable to objective review as many other products, but that doesn't completely negate their usefulness. For one, if you can find a reviewer you like it might be useful to see which games he rated the highest without having to skim the text of every single review. Concrete scores also allow for 'wisdom of crowds' type aggregation, even though particular implementations -- ahem, metacritic -- may be flawed.
Review scores aren't going away, and game reviewers should spend some time thinking about better metrics. See here for more discussion:
http://taipeigamer.blogspot.com/2008/06/reviewing-and-scoring-videogames.html
Posted by: Jon | June 5, 2008 8:32 PM
Review scores aren't bad, rather it is the 'vocal minority' that makes them so, as the most vocal supporters seemingly cannot read above a grade-school level.
Demonstrably incapable of reading and comprehending a review's text, those people make it evident when bashing a review score without context for why - incapable of having an informed opinion as they did not read the text.
An example is a gamer might buy a hardcore tactics game because it received a 9/10 from a publication, only to find they hate it because they don't like tactics games. This is revealed after the fact by the gamer saying, "How could GAME X get a 9? I hate it!" while failing to reveal either that they don't like tactics games or that they failed to read the review and instead made a purchase based only on a score.
Had they read the review text they would know the game was not for them in spite of the score and have saved themselves a purchase.
Yet as illiteracy seems to plague many, so too does the apparently 'high-concept' that a 10/10 does not mean 'perfect'.
That this ever happens at all is too often.
Posted by: Nathan | June 5, 2008 10:43 PM
Why are objective reviews a good thing? Why can't games be reviewed like films?
Posted by: Dylan | June 6, 2008 5:12 AM
The only problem with ratings is how people are using and abusing them. The numbers themselves are useful shorthand for "here's what I think of something." But the numbers alone are useless; they lack the context the text provides.
And that's where Metacritic breaks. It doesn't give context for all of those ratings; only the individual ratings do. And when you add up a bunch of arbitrary, context-less numbers, you end up with an incredibly arbitrary result.
So, Metacritic is the problem, not ratings (especially when they seem to do fine in every other entertainment field, which much less hand-wringing over their use). Using Metacritic for anything but a link farm for individual reviews is incredibly problematic for everyone involved in making videogames.
Since most of us are geeks, there's some part of our rat brains that views everything as a math problem. Games are math problems; we crunch the numbers in our brain to figure out the fastest way to get from A to B, or to "win."
So I think it logically follows that using math to determine quality is a completely viable process. You just need to improve the formula, right? And the more input data, the more "right" it'll be, right?
If you have to compile an overall rating number, Rotten Tomatoes' distillation of everything into "Like/Dislike" is a superior way of normalizing disparate rating systems. It ignores whether one site's ratings artificially boosts mediocre products; they either like it or they don't.
But gamers like the granularity. They think there's a meaningful difference between a game that gets an 83% and an 87%. In math terms, it is meaningful; in coming up with an arbitrary number, it's just a little bit differently arbitrary.
The example of a reviewer giving a different rating for different publications is one reason why I asked reviewers not review the same title for other publications. That's one specific case where side-by-side analysis of ratings makes sense.
Posted by: steve | June 6, 2008 10:26 AM
Simon,
Do you have any hard data to back up this statement?
"The average reader (even if they don’t know it) is after a complete objective, scientific comparison between game x and game y with data and statistics and, finally, a numerical point on a linear scale by which they can compare, for example, Mass Effect with Rock Band and see which one is empirically better."
It's just that I don't think there's such a thing as a) an average reader, and b) I certainly wouldn't want to guess their unconscious motivations.
Posted by: Tim E | June 8, 2008 3:45 PM
Hi there Tim,
Yeah, you're exactly right: it was dumb to talk about an 'average' reader and second guessing a non-existent entity's motivation is meaningless.
I guess the general assumption comes from a few years reading through online review comment threads that both I and others have written.
By way of anecdotal example, here are two readers talking about their score expectations (taken from the gamespot news story in which the publication announced they were moving from a 100 point scale to something much more complex):
"Wow Gamespot…You took the one thing that made your reviews better than every one elses, how intricate and specific they were, and dumbed it down to a system I would only expect from some 16 year olds freewebs site. This is horrible. Now I won’t know how much better a 9.5 game is from another 9.5 game."
"Now if i want to know how good a games graphics were compared to another game, I’ll have to read 7 paragraphs of text instead of looking at a simple, easy to understand interface that creates a well weighted average gamescore."
Posted by: Simon Parkin | June 9, 2008 3:36 AM
I'm not a fan of movie or game reviews or anything for that matter that is purely subjective when done by anyone I do not know personally. The reason is simple- I do not know the person doing the review and have no clue how their tastes match up to mine so it is silly then for me to care what they think.
A perfect example of this is Rainbow Six Vegas 2. It is a fairly low reviewed game online but it is my 2nd favourite 360 game next to GTA4. On the flip side, Call of Duty 4 is a game I won and enjoy but it is quite a ways below R6:Vegas2 in my mind yet reviews say otherwise. Why then would I put faith in what strangers say?
I'd much rather have reviews of games focus more on objective things like technical issues, game length, etc. which sadly seems to be declining as time goes by and reviews are becoming more subjective each passing day. The same is true for blu-ray/dvd type reviews where I'd rather people stick to reviewing the disc quality instead of give me their opinion of the movie because again, I do not care if they like th e film or not as I have zero basis for their tastes and what they consider fun. Many people think it is fun to go out drinking while I think it is stupid and boring. So if a person who thinks that is fun is reviewing a game, chances are they have a different overall belief of what fun is than I do.
Having said all that, I will check out various game reviews but I try to take them with a grain of salt and get a general idea of how a game may be rather than treat them as pure fact of whether a game (or movie, etc) is great or not.
Now if I know the person then I value their opinion of a game/movie, etc as I know their likes/dislikes and they know mine so they can give me a far better gauge of whether the game/movie is something I will enjoy.
Posted by: Rob | June 9, 2008 10:54 AM
I agree complete. Scores in game reviews tell me little of value about a game. I have to read all those paragraphs if I want to find out if a game is suited to me.
Another bit that is missing is information about the reviewer. Is the magazine's FPS-addict reviewing the latest RPG? If so, he might not like the plodding pace and the focus on "blah, blah, blah" that I might call "story".
One site I really enjoy is GamersInfo.net. (Full disclosure, I've written reviews for the site, but for free since I really support what they do. I don't get paid by them.) They have reviewer profiles at the end of a review so you can find out what type of person reviewed the game. In my example above, you can see if an FPS fanatic reviewed the RPG and get some perspective. The site often does multiple reviews of the same game by different people. They also don't use numeric scores. I think it's a wonderful way to do useful reviews.
Posted by: Brian 'Psychochild' Green | June 9, 2008 7:12 PM
Simon,
Those comments sums up my post above. While it was a general assumption, my feelings on the matter come from the same places - forums and more often comment threads following reviews posted online.
"Now if i want to know how good a games graphics were compared to another game, I’ll have to read 7 paragraphs of text instead of looking at a simple, easy to understand interface that creates a well weighted average gamescore."
I strongly feel that this type of gamer does not want to think, but rather be told what to think or to have their decisions made for them - to have every decision reduced to one that is purely binary - irrespective of contributing factors, especially those that are highly subjective, e.g. graphics.
For example Everquest 2 and World of Warcraft. Those games were released only weeks apart, but took very different approaches to graphics. Everquest 2 went with a photo-realistic, using lots of intensive effects / shaders. World of Warcraft looks cartoony, but has arguably significantly more detail, character, and a distinct style.
The comments above wants to believe that two art approaches can be compared equally and are not at all subjective.
A direct score-to-score comparison is an inaccurate measure and requires text to support why each is awarded a specific number. However that text expects that the reader can understand it and form their own opinion, not one dictated to them by a category (graphics) value.
"Now I won’t know how much better a 9.5 game is from another 9.5 game."
The irony is that this gamer would know, or at least know why the reviewer thinks so, if only they read the supporting text.
Posted by: Nathan | June 12, 2008 2:38 PM
play your game, i have never noticed the score~
Posted by: Kathy Mead | August 25, 2008 2:44 AM