Fighting Link Spam in the trenches

(from DieSpammersDie on Tools to Make Tools)

The Problem

Because FeedMeLinks has a decent Google Page-Rank and a deliberately low barrier to entry, “link spammers” have discovered the site and are posting hundred of almost identical URLs to the site in order to promote other online properties. This disturbs users and dillutes the value of the site as an online resource of good links.

What are their methods?

Link-spammers appear to use three main tactics:

  • manually posting and tagging links
  • importing gamed bookmarks files (often exported from sites like del.i.c.iou.s)
  • using shotgunning tools like 0n1yw1re.c0m to spray links across many social-bookmarking sites at once

Some Distinctive Characteristics of Link-spammers:

  • They look alike, choosing (auto-generated?) names like username87 or HumpBackName
  • They spring from a single source, often creating multiple accounts from a single IP
  • They are tag-happy, often creating nearly as many tags as links, sometimes even MORE tags than links
  • They lie about their email addresses, registering accounts with bogus, non-working addresses
  • They repeat themselves, importing series’s of links to domains which are very similar in the “bottom half”, e.g. urls like www.crap-dongs-whatever.info, www.dingus-chomper-whatever.info, etc.

Steps already taken to burn out link-spammers

  • implement a one-button spammer-reporting ability for users to flag potential link-spammers
  • increase logging of potentially spammy activities like importing bookmarks and failed login attempts
  • confirm the email addresses of all new accounts via a mailed verification code so spammers can’t easily create multiple accounts in order to circumvent the “no duplicate links per user” rule
  • convert web forms from GET to POST to make automating activities more difficult for spammers
  • privatize duplicate links upon their addition. per-user duplicates are already disallowed; global duplicates are allowed but only one public copy of the link will be visible. This means no more safety in numbers for spammers who work in teams to create multiple accounts from different IP addresses, or who create multiple accounts from the same IP more than a week apart.
  • look for users importing bookmarks files with a tag to link ratio (T2LR) of 0.XX and block or snuff their import

Next steps to take

  • server-side validate all new usernames
  • run the import-link heuristics nightly to snuff manual link-spammers
  • upon linking: flag users with lots of hypens in the url as possible link-spammers (?, v. Kailash Nadh’s Fighting Spam Blogs: A Hypothesis)