This post was originally posted on dougpete.wordpress.com and mirrored to www.commun-it.org/community/dougpete/weblog/. If you want it elsewhere, just ask. I just might say yes.
I had a nice email from a person unknown yesterday in response to a blog post that I had made. I thought that a private email was a little out of the ordinary and so asked why the response was in email and not in a response to the original entry. The person got back to me and said that option wasn’t available. I thought that odd – so I quickly checked my wordpress and commun-it accounts and sure enough the comment feature was available. Now, I have long since stopped posting my blog to FirstClass because our current version doesn’t support commenting so I got back to the individual again for more details.
Well, it turns out that the post that he read was on a server that I had never heard of before. Here comes another time wasting initiative as I find the server, read the comment, scratch my head, and then do a whois to find out what the heck is going on.
Turns out — I’d been scraped.
Now, everyone who posts to the web ultimately gets scraped. That’s how search engines know who you are and where your content resides. When you do a search, the engine just works its way through its databases and finds your results using whatever algorithm that it employs and then returns the results in the fashion that it’s designed to do. If you want to check out evidence of scraping, check out the Wayback Machine. I’ve even had to use my favourite search engine’s “archive” at least once to recover a web page that I had accidentally clobbered. This is good and very much appreciated.
Now, we help out the cause when we post our content to a blog or we put an RSS feed on our web resources. We’re encouraging others to find us and stay up to date with the latest update on our thoughts. That’s how it’s supposed to work. With the appropriate tools, we can check out the latest so easily as a result of the miracles of RSS.
In the process, I even found out about the million RSS project. I did my quick acid test for my favourite news service – Reuters – yep, it’s there.
A scraper goes one step further though. It scrapes your content and places it on her/his own server and then makes it available as a public resource. The difference between this and the original is that you lose control over the content, your words may be attributed to someone/something else, and people who elect to comment on a posting may not even get the opportunity. In my reading from others who had been scraped, knew about it, and commented, the results go from indignation to appreciation for the free promotion of the content. I spent far too much time reading about this. It is free promotion but you do lost control over its presentation.
On the other hand, it’s kind of a kick to think that the ramblings from my keyboard in south western Ontario are now fodder for someone else somewhere else.
It points to the futility of copyright protection for the little guy. Scrape a big corporation and you’ve got a potential legal problem. Scrap me and what am I going to do? There are a couple more serious issues though. First, the comments might end up being seen as created by someone else. Secondly, if the scraper is selling advertising on their website, you might end up being a spokesperson for who knows what product.
So, do you go into silence? Do you include a sentence like I did above? I’m not egotistical enough to think that I’m being scraped by a person who actually likes my comment enough to do it manually. (It would be a hoot if it was though! <grin>) Do you report the abuse to the scraper’s ISP? Do you follow the advice of others and create a dummy RSS just for the scraper? Do you include copyright notices?
Or, do you just ignore it.
Blogged with Flock
Tags: RSS, copyright, scraping