Blogs | Srijan

Yes, Drupal Too Can Deal With The Duplicate Content!

Written by Urvashi Melwani | Mar 23, 2020 7:00:00 AM

Duplicating content has been a challenge of epidemic proportions on the internet since ages. In fact, it has become a child’s play for everyone to copy-paste a webpage’s online content without even realizing the intricacies of such issues.

Besides, keeping duplicate or similar copies of the same content online implies that you are competing against yourself at the loss of your search engine visibility. Google has explicitly stated that any site that uses/ keeps duplicate content will be penalized.

This blog will shed light on the reasons that cause duplicate content, it’s common reasons, and Drupal modules that can help enterprises in dealing with the same issue.

What Causes Duplicate Content?

Duplicate content is generated when multiple versions of a single page are created. In layman's terms this generally happens when two page share similar content.

However, it happens multiple times that the user unintentionally copies the content from the existing web page, yet it can happen and they have to face the consequences.

Which leaves us with two types of major categories that these sources fall into:

  1. Malicious
    This comprises those scenarios where spammers post content from your website without your permission.
  2. Non-malicious
    The non-malicious duplicate content can have different origins.
    1. Discussion forums that generate both the standard as well as stripped-down pages (targeted for mobile users)
    2. Printer-only web page versions, or
    3. Same products displayed on multiple pages of the eCommerce site

Additionally, Duplicate content can be either identical or similar as well. Given this, below are the 7 most common types of duplicate content mentioned-


1. Scraped Content

Some websites scrape content from other reputable websites thinking that an increased volume of pages on their site will be a good marketing strategy, irrespective of the relevance or creative spirit of that content. 

Rather, this action fails to add value for your users if you are not providing additional useful services or content on your site. in fact, it may also lead to copyright infringement in some cases. 

Some examples of scraping include-

  1. Sites that replicate and republish content from other sites without adding any original content or value to it.
  2. Sites that copy content from other sites, tweak it  a bit, and republish it
  3. Sites that regenerate content feeds from other sites without offering any benefit to the user 
  4. Sites aimed at embedding content such as video, images, or other media from other sites without considerable added value to the user.

 

2. WWW & non-WWW, and HTTP and HTTPs Page Versions Of Website

When both versions of the site, i.e., WWW or non-WWW, are accessible, it leads to duplication of content.

Being the oldest trick, search engines also get confused at times and get it wrong. 

Another scenario is HTTP vs HTTPS - these two versions also lead to serving out duplicate content to users.

 

3. Printed-Friendly Versions

If your CMS is capable of generating printer-friendly pages which you link with your article pages, Google can easily find them, unless you specifically block them. 

Now, which version would you like Google to show? The one with your ads and peripheral content, or the one that shows your article only?

 

4. User Session IDs 

Keeping a tab on your visitors and allowing them to store, add, and buy products from their shopping cart, you need to give them a session.

A session comprises details in brief about the visitor like what he did on your site and can also contain things like the items in their shopping cart.

To retain that session as a visitor hops from one page to another, the unique identifier for that particular session, called the Session ID needs to be stored somewhere.  The usual place to do so is cookies. However, search engines don’t usually store cookies.

During that point, few systems slip back while using Session IDs in the URL. This implies that every internal link on the website gets that Session ID added to its URL, and since that Session ID is unique to that session, it generates a new URL, and thus duplicate content.


5. URL Parameters Used For Tracking and Sorting

Another reason for duplicate content is the URL parameters that don’t update the content of a page, for example, in tracking links-

http://www.knowledge.com/book-y/

http://www.knowledge.com/book-y/?source=rss 

These two above shown URLs are not the same URLs for the search engine. 

Though the latter one allows you to track the source of traffic on your site, but it might also make it harder for you to rank well, an unwanted after effect!

This is not limited to tracking parameters only but with every parameter than you can add to a URL without changing the important information. Whether the parameter being used is for” changing the sorting on a set of products or for “showing another sidebar”- all of them cause duplicate content.

Modules That Help Deal with Duplicate Content in Drupal

Following Drupal modules can prove useful in dealing with duplicate content-

  1. Global Redirect Module

    The issue that comes up with the alias system in Drupal is that the default URL remains there, i.e., you can still find 2 URLs pointing towards the same content on your website. However, search engine bots are also smart enough to find out duplicate content easily, thereby lowering your website rank on search engines.

    Thus, the Global redirect module checks if there is an alias already for the existing URL and if it does, then it redirects to the alias URL.

    The module is also responsible for eliminating the trailing slash in the URL, cross-checking that clean URLs are being used correctly and checking permission and access to nodes, URLs.
  2. PathAuto

    One of the prominent modules of Drupal, Pathauto, dedicates itself in creating the path/URL aliases automatically for the contents (nodes, taxonomy, terms, users) depending on the configurable patterns.

    For example, you configured a blog entry like /category/my-node-title, so Pathauto will instantly generate an SEO friendly URL, which uses tokens and can be altered by administrators.
  3. Intelligent Content Tools

    An  important tool for website designers and content editors, Intelligent Content Tools module offers three functionalities-
    1. Auto-Tagging
    2. Text Summarization, and
    3. Identifying Duplicate Content

      This smart module, based on Natural Language Processing, keeps you up-to-date on any duplicate content present on the site and then, accordingly identifies and corrects the plagiarized content.

      However, this module does not come under the security advisory policy of Drupal.

  4. Taxonomy Unique

    Drupal, by default, allows its users to create identical terms in the same vocabulary. To resolve this, Taxonomy unique ensures that no taxonomy term is saved when there is already one existing with the same name in the same vocabulary. Thus, it assures that the names saved are unique.Further, you can configure it individually for each vocabulary alongside setting up custom error messages in case a duplicate is found.
  5. Suggest Similar Titles

    Suggest Similar Titles module ensures that titles are not duplicated for any type of content. Its mechanism encompasses matching the proposed titles with the node titles of the existing content type to ensure that they are not similar to the already existing ones.This aids admins/users to avoid replication of content at the site.

    Additionally, it provides settings page where you can tweak the following settings-

    1. Activate this feature for any content type(s)
    2. Enter the keywords in title comparison that you want to be ignored
    3. Choose the maximum number of titles that you want to show up as a suggestion
    4. Select whether this module should consider node permissions before showcasing node title as a suggestion
    5. You can enter the percentage of similarity between the titles. For instance, if you enter 68, then at least 68% matching titles will be considered similar.

Wrapping up

Search engines are always looking out for unique and quality content that is engaging and informative at the same time. Enterprises might find it difficult to create a 100% duplication free website but Drupal and its modules can be your silver lining for your company. Besides, your search engine rankings can improve greatly if you avoid the mistakes mentioned above.

Planning to migrate your website to Drupal? Drop us a line and our experts will be happy to assist you.