How does TripAdvisor and Yelp detect fake reviews?

I don’t know how they do it, but here’s how I believe they are (or should).

  • Community policing through flagging of bad reviews
  • Batch rules running text scans for word lists and regexps looking for links, crossed with account parameters (time on site, email type, other provided details) and behavior parameters (frequency and size of posts, text quality, browsing pattern).
  • Linking mechanism detecting account connection and repeated users to identify socks puppets, an attack on a store or a store boosting its own reputation
  • Velocity mechanism to identify emerging trends (certain store getting bombarded, certain IP over represented etc)
  • Possibly: crawler for posted links, evaluating the website after the jump for content
  • Possibly: semantic analysis of the actual content, specifically focusing on semantic fields, grammar and punctuation analysis
  • Possibly: statistical models pointing at possible suspected reviews

All of these leading to manual review by (most probably) off-shored staff making an actual decision. Extra points for real time rules and modes and actual real time decisions based on more than just posting velocity, but I’d be (really) pleasantly surprised to hear that those exist.

Leave a Reply