Many of us in the search industry were caught off guard by the release of Panda 4.0. It had become common knowledge that Panda was essentially “baked into” the algorithm now several times a month, so a pronounced refresh was a surprise. While the impact seemed reduced given that it coincided with other releases including a payday loans update and a potential manual penalty on Ebay, there were notable victims of the Panda 4.0 update which included major press release sites. Both Search Engine Land and Seer Interactive independently verified a profound traffic loss on major press release sites following the Panda 4.0 update. While we can’t be certain that Google did not, perhaps, roll out a handful of simultaneous manual actions or perhaps these sites were impacted by the payday loans algo update, Panda remains the inference to the best explanation for their traffic losses.
So, what happened? Can we tease out why Press Release sites were seemingly singled out? Are they really that bad? And why are they particularly susceptible to the Panda algorithm? To answer this question, we must first address the main question: what is the Panda algorithm?Briefly: What is the Panda Algorithm?
The Panda algorithm was a ground-breaking shift in Google’s methodology for addressing certain search quality issues. Using patented machine learning techniques, Google used real, human reviewers to determine the quality of a sample set of websites. We call this sample the “training set”. Examples of the questions they were asked are below:Would you trust the information presented in this article? Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature? Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations? Would you be comfortable giving your credit card information to this site? Does this article have spelling, stylistic, or factual errors? Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines? Does the article provide original content or information, original reporting, original research, or original analysis? Does the page provide substantial value when compared to other pages in search results? How much quality control is done on content? Does the article describe both sides of a story? Is the site a recognized authority on its topic? Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care? Was the article edited well, or does it appear sloppy or hastily produced? For a health related query, would you trust information from this site? Would you recognize this site as an authoritative source when mentioned by name? Does this article provide a complete or comprehensive description of the topic? Does this article contain insightful analysis or interesting information that is beyond obvious? Is this the sort of page you’d want to bookmark, share with a friend, or recommend? Does this article have an excessive amount of ads that distract from or interfere with the main content? Would you expect to see this article in a printed magazine, encyclopedia or book? Are the articles short, unsubstantial, or otherwise lacking in helpful specifics? Are the pages produced with great care and attention to detail vs. less attention to detail? Would users complain when they see pages from this site?
Once Google had these answers from real users, they built a list of variables that might potentially predict these answers, and applied their machine learning techniques to build a model of predicting low performance on these questions. For example, having an HTTPS version of your site might predict a high performance on the “trust with a credit card” question. This model could then be applied across their index as a whole, filtering out sites that would likely perform poorly on the questionnaire. This filter became known as the Panda algorithm.How do press release sites perform on these questions?
First, Moz has a great tutorial on running your own Panda questionnaire on your own website, which is useful not just for Panda but really any kind of user survey. The graphs and data in my analysis come from PandaRisk.com, though. Full disclosure, Virante, Inc., the company for which I work, owns PandaRisk. The graphs were built by averaging the results from several pages on each press release site, so they represent a sample of pages from each PR distributor.
So, let’s dig in. In the interest of brevity, I have chosen to highlight just four of the major concerns that came from the surveys, question-by-question.Q1. Does this site contain insightful analysis?
Google wants to send users to web pages that are uniquely useful, not just unique and not just useful. Unfortunately, press release sites uniformly fail on this front. On average, only 50% of reviewers found that BusinessWire.com content contained insightful analysis. Compare this to Wikipedia, EDU and Government websites which, on average, score 84%, 79% and 94% respectively, and you can see why Google might choose not to favor their content.
But does this have to be the case? Of course not. Press release websites like BusinessWire.com have first mover status on important industry information. They should be the first to release insightful analysis. Now, press release sites do have to be careful about editorializing the content of their users, but there are clearly improvements that could be made. For example, we know that use of structured data and visual aids improves performance on this question (ie: graphs and charts). BusinessWire could extract stock exchange symbols from press releases and include graphs and data related to the business right in the post. This would separate their content from other press release sites that simply reproduce the content verbatim. There are dozens of other potential improvements that can be added either programmatically or by an editor. So, what exactly would these kinds of changes look like?
In this case, we simply inserted a graph from stock exchange data and included on the right-hand side some data from Freebase on the Securities and Exchange Commission, which could easily be extracted as an entity from the documentation using, for example, Alchemy API. These modest improvements to the page increased the “insightful analysis” review score by 15%. Q2. Would you trust this site with your credit card?
This is one of the most difficult ideals to measure up to. E-Commerce sites, in general, perform better automatically, but there are clear distinctions between sites people trust and don’t trust. Press release websites do have an e-commerce component, so one would expect them to fare comparatively well to non-commercial sites. Unfortunately, this is just not the case. PR.com failed this question in what can only be described as epic fashion. 91% of users said they would not trust the site with their credit card details. This isn’t just a Panda issue for PR.com, this is a survival-of-the-business issue.
Luckily, there are some really clear, straight-forward solutions to this address this problem. Extend HTTPS/SSL Sitewide
Not every site needs to have HTTPS enabled, but if you have a 600,000+ page site with e-commerce functionality, let’s just go ahead and assume you do. Users will immediately trust your site more if they see that pretty little lock icon in their browser. Site Security Solutions
Take advantage of solutions like Comodo Hacker Proof or McAfee SiteAdvisor to verify that your site is safe and secure. Include the badges and link to them so that both users and the bots know that you have a safe site. Business Reputation Badges
Use at least one trade group or business reputation group (like the better business bureau) or, at minimum, employ some form of schema review markup that makes it clear to your users that at least some person or group of persons out there trusts your site. If you use a trade group membership or the BBB, make sure you link to them so that, once again, it is clear to the bots as well as your users. Up-to-date Design
This is a clear issue time and time again. In the technology world, old means insecure. The site PR.com looks old-fashioned by all measures of the word, especially in comparison to the other press release websites. It is no wonder that it performs so horribly.
It is worth pointing out here that Google doesn’t need to find markup on your site to come to the conclusion that your site is untrustworthy. Because the Panda algorithm likely takes into account engagement metrics and behaviors (like pogo sticking), Google can use the behavior of users to predict the performance on these questions. So, even if there isn’t a clear path between a change you make on your site and Googlebot’s ability to identify that change doesn’t mean the change cannot and will not have an impact on site performance in the search results. The days of thinking about your users and the bots as separate audiences are gone. The bots now measure both your site and your audience. Your impact on users can and will have an impact on search performance.Q3. Do you consider this site an authority?
This question is particularly difficult for sites that both don’t control the content they create and have a wide variety of content. This places press release websites squarely in the bullseye of the Panda algorithm. How does a website that accepts thousands of press releases on nearly any topic dare claim to be an authority? Well, it generally doesn’t, and the numbers bear that out. 75% of respondents wouldn’t consider PRNewswire an authority.
Notice, though, that Wikipedia performs poorly on this metric as well (at least compared to EDUs and GOVs). So what exactly is going on here? How can a press release site hope to escape from this authority vacuum? Topically Segment Content
This was one of the very first reactions to Panda. Many of the sites that were hit with Panda 1.0 sub-domained their content into particular topic areas. This seemed to provide some relief but was never a complete or permanent solution. Whether you segment your content into sub-directories or sub-domains, what you are really doing here is helping make clear to your users that the specific content your users are reading is part of a bigger piece of the pie. It isn’t some random page on your site, it fits in nicely with your website’s stated aims. Create an Authority
Just because you don’t write the content for your site doesn’t mean you can’t be authoritative. In fact, most major press release websites have some degree of editorial oversight sitting between the author and the website. That editorial layer needs to be bolstered and exposed to the end user, making it obvious that the website does more than simply regurgitate the writing of anyone with a few bucks.
So, what exactly would this look like? Let’s return to the Businesswire press release we were looking at earlier. We started with a bland page comprised of almost nothing but the press release. We then added a graph and some structured data automagically. Now, we want to add in some editor creds and topic segmentation.
Notice in the new design that we have created the “Securities & Investment Division”, added an editor with a fancy title “Business Desk Editor” and a credentialed by-line. You could even use authorship publisher markup. The page no longer looks like a sparse press release but an editorially managed piece of news content in a news division dedicated to this subject matter. Authority done.Q4. Would you consider bookmarking/sharing this site?
When I look at this question, I am baffled. Seriously, how do you make a site in which you don’t control the content worth bookmarking or sharing? Furthermore, how do you do this with overtly commercial, boring content like press releases? As you could imagine, press release sites fair quite poorly on this. Over 85% of respondents said they weren’t interested at all in bookmarking or sharing content from PRWeb.com. And why should they?
So, how exactly does a press release website encourage users to share? The most common recommendations are already in place on PRWeb. They are quite overt with the usage of social sharing and bookmarking buttons (placed right at the top of the content). Their content is constantly fresh because new press releases come out every day. If these techniques aren’t working, then what will?
The problem with bookmarking and sharing on press release websites is two-fold. First, the content is overtly commercial so users don’t want to share it unless the press release is about something truly interesting. Secondly, the content is ephemeral so users don’t want to return to it. We have to solve both of these problems.
Unfortunately, I think the answer to this question is some tough medicine for press release websites. The solution is multi-faceted. It starts with putting a meta expires tag on press releases. Sorry, but there is no reason for PRWeb to maintain a 2009 press release about a business competition in the search results. In its place, though, should be company and/or categorical pages which thoughtfully index and organize archived content. While LumaDerm may lose their press release from 2009, they would instead have a page on the site dedicated to their press releases so that the content is still accessible, albeit one click away, and the search engines know to ignore it. With this solution, the pages that end up ranking in the long run for valuable words and phrases are the aggregate pages that truly do offer authoritative information on what is up-and-coming with the business. The page is sticky because it is updated as often as the business releases new information, you still get some of the shares out of new releases but you don’t risk the problems of PR sprawl and crawl prioritization. Aside from the initial bump of fresh content, there is no good SEO reason to keep old press releases in the index.So, I don’t own a press release site…Most of us don’t run sites with thousands of pages of low quality content. But that doesn’t mean we shouldn’t be cognizant of Panda. Of all of Google’s search updates, Panda is the one I respect the most. I respect it because it is an honest attempt to measure quality. It doesn’t ask how you got to your current position in the search results (a classic genetic fallacy problem), it simply asks whether the page and site itself deserve that ranking based on human quality measures (as imperfect as it may be at doing so). Most importantly, even if Google didn’t exist at all, you should aspire to have a website that scores well on all of these metrics. Having a site that performs well on the Panda questions means more than insulation from a particular algorithm update, it means having a site that performs well for your users. That is a site you want to have.Take a look again at the questionnaire. Does your site honestly meet these standards? Ask someone unbiased. If your site does, then congratulations – you have an amazing site. But if not, it is time to get to work building the site that you were meant to build.
About russvirante —
I am the CTO of Virante, Inc. I am married to Morgan, who is frickin awesome, and I have two daughters Claren and Aven who are also frickin awesome. We live happily in Durham, NC.
Virante, Inc. is a full service Search, Social and Analytics Consulting Company.