Do you work hard to write original content for your website? Imagine how frustrating it is to find your articles reprinted without your permission on other websites — without citing you as the source/author or even linking back to you. It’s not fair, is it? It is clearly theft and it’s certainly not fair that someone tries to benefit/profit from your hard work. The good news, there are some things you can do to prevent content theft.
How does this website content theft occur?
Basically, either someone outright steals the content pages to re-use. Typically the method is called web scraping. Another method may be importing your website’s RSS feed to display your blog post content on their own websites.
How to detect website content theft or site scraping?
- The simplest place to start is copy a unique sentence from your web page, then do a Google search. Wrap the sentence in quotation marks. [Example] This is a manual process, but it’s free.
- Another free option is to use Google Alerts. You can enter various phrases and email alerts with links will be sent to you at the frequency you choose.
- You could use a paid service like CopyScape to automate the process of monitoring your original content.
- Your Content Management System might report some trackbacks and pings that alert you to bad behavior.
What can you do to prevent content theft, web scraping?
Unfortunately not much. If you put technical blocks in place, you might prevent legitimate visitor traffic and search engines from accessing the content. You could try to determine the offending IP address and block it from accessing your website. There are different ways to block an IP address or range of IP addresses depending on the type of web hosting you have — it’s best that you search Google for that info. If you want to block entire countries, search Google for lists of IP address ranges by country. If you use a Content Management System, there may be a plugin/extension to make it easier.
- Paid services like Distill help you block what they call “theft bots”.
- Cloudflare might be useful to regulate your visitor traffic.
- If you use a Content Management System, you could use a plugin like Wordfence to help monitor visitors, identify the thief and block the IP from future visits.
As for your RSS feed, you can try to prevent content theft by…
- Your Content Management System will likely have a setting to publish only a summary and not the entire post. That way you can try to limit what appears elsewhere.
- In the body of your blog posts, include hyperlinks to other pages/posts on your website. That way your links may appear on the other website so their visitors might click through to your website.
- If you use a feed management tool (i.e., Google Feedburner, Feedblitz), there are settings to insert your author info in the feed.
Is there any recourse?
As much as you might want to beat the snot out of someone, it’s unlikely you’ll get street justice. So you may be best off doing the following:
- Report it to Google using the free Google Scraper Report
- Identify the web hosting company that the offending website is using a free tool like DomainTools. Look for the server info and then contact that web hosting company to report the theft. If they are reputable, they’ll look into the claim.
There are no guarantees that it will work, but at least you can do something to not feel completely helpless.
Hope this is helpful. If you have additional suggestions that work to prevent content theft, please add them in the comments below. Thanks.