Website addresses are technically called URI’s (Uniform Resource Identifiers). URL is an abbreviation for Uniform Resource Locator, which is synonymous with URI. Irrespective of whether site addresses are referred to as URL’s or URI’s, most search engines consider them to be a very important factor when ranking a website.
Strategically structured URL’s can influence a site’s ranking on search results greatly. Conversely, poor URL structures can seriously damage site rankings, and in extreme cases even result in the site being dropped entirely from a search engine’s index.
URL Structure
Consideration of URL structure should cover three main aspects: Presence of keywords, framework of files and directories, subdomains and the actual domain name.
Domain Names
The domain name is one of the primary factors Google, and even the other search engines consider when ranking a website. Ensuring that brand names, important keywords or corporate identity are part of the domain name is one of the best ways to ensure that the site will outrank all other sites targeting the same keywords.
However, some rules do apply:
- Domain names with multiple dashes (-) will not typically rank as well as one or two word domain names.
- Older, more authoritative websites will outrank newer ones, as they will be perceived to be the originators of the trademark term or keyword.
- Multiple words in a domain name can be individually detected by most search engines with or without the dashes.
Subdomains
Google treats individual subdomains as almost independent websites, and will even rank multiple subdomains of the same site on the first page of the search results for relevant keywords.
The general rule for creation of subdomains is to differentiate them from the main domain when they present content in a different language or for a different geographic market or when they represent different vertical segments or product groups of the business. Common examples are uk.yahoo.com and www.yahoo.com or search.live.com and favorites.live.com.
Using Keywords In The URL
Everything is good in small doses. This couldn’t be truer for SEO. Usage of keywords in the URL is a very good idea. Descriptive site names, directory names and file names are a good idea from a marketing point of view as they can be a lot easier for customers to remember and convey the gist of their content immediately. Similarly, they convey the principal keywords that summarise the page content to search engines.
Rules for using keywords in the URL are:
- Excessive keyword usage in the URI is invariably a sign of spam and should be avoided.
- Separate keywords with hyphens (-) in directory and file names.
- Do not use the underscore (_) character unless you want it to be treated as one composite term. I.e. london_hotels is treated as “london_hotels” and not “london hotels”.
- Keyword importance is greatest in the domain name, followed by the subdomain, directory, sub directories and least in the file name.
File and Directory Framework
Lower importance of keywords in the file name than in directory names necessitates the use of well-thought directory structures. A directory name is easier to remember as it doesn’t include technical-sounding file extensions. Added to this, using directories alone to specify file locations leaves webmasters the option to migrate from HTML pages to PHP pages without requiring tedious redirection or server-side scripting.
In short, it is always a good idea to use many directories rather than many files within a single directory, and always linking using directories without file names and extensions. For e.g.
www.accuracast.com/articles/ is better than www.accuracast.com/seo-weekly.php
Canonicalization And Duplicate Content
Canonicalization is a statistical term referring to the conversion of data that has more than one representation into a single, “canonical” form. In SEO, the canonicalization of URLs refers to the representation of a page with one standard URL, rather than in multiple formats.
This is best explained with an example. Consider the URI https://www.accuracast.com/services/. It can normally be accessed via the following canonical versions:
- https://www.accuracast.com/services/
- https://www.accuracast.com/services
- http://accuracast.com/services/
- http://accuracast.com/services
- https://www.accuracast.com/services/?referrer=google
- https://www.accuracast.com/services?referrer=google and so on
All of these versions lead to the same page. However, the search engines could misconstrue them to be 6 different pages with the exact same content, because strictly speaking the presence or absence of www, the file name and the variables could produce varied content. This is what gives rise to URL canonicalization issues.
Canonical forms of the same web page can mislead the search engines to believing that the site contains plenty of duplicate content, and some or all of these pages could get dropped from the search index.
Google Webmaster Tools now allows website owners to specify which version of the URL should be used. However, the most foolproof method of avoiding any URL canonicalization problems is to use only one standard reference format and to implement server-side redirects for all other formats. Hence, in the example above, any URL written in a form other than https://www.accuracast.com/services/ should redirect to the standard format on the server itself.
Dynamic Website Addresses
As illustrated in the example above, dynamic website addresses utilising variables can create duplicate content issues due to canonicalized URLs. There is another problem associated with using variables in a web page’s address – there is a limit to how many variables the search engines will be able to follow when indexing a new URL.
Google claims to currently index websites with up to 5 variables in the address. This however, is not always true. Webmasters using many popular shopping cart solutions will often find that most of their online catalog is not getting indexed as the search engines cannot access the pages due to presence of too many variables in the URL.
The best way to deal with dynamic website URLs is:
- Avoid using variables completely. Where the content management system (CMS) permits it, use directories to categorise pages in the online catalog.
- Where variables must be used, ensure that the number of variables is kept to the bare minimum, and never more than three.
- Do not use the variable “id=”.
- Implement URL rewriting on the server if you must use variables, especially if the number of variables used in the URI is three or more.
- Link to pages using the search-friendly URI (created with the URL rewrite), rather than the original URI with all the variables.
- Use XML Sitemaps to help the search engines find all the pages that are not currently being indexed.
URL Redirection
Migrating websites or web pages can be a harrowing experience for even the most seasoned webmasters. Drops in ranking are almost inevitable. Minimising the down-time is the best one can do when pages or websites must be moved. The best way to do so is to use the 301 permanent redirect.
Server side redirects should be implemented for each page that moved and for the top-level domain too.
Temporary 302 redirects should be used only when a page needs to be redirected temporarily, as the search engines will continue to list the original URL in the results.
Resources
- Search engine optimisation services for websites with URL indexing issues
- Server side scripts to avoid canonicalization of URLs
- Using Google Webmaster Tools to specify a preferred domain
- Matt Cutts’ update on Google’s handling on canonical URLs
About the Author
Farhad is the Group CEO of AccuraCast. With over 20 years of experience in digital, Farhad is one of the leading technical marketing experts in the world. His specialities include digital strategy, international business, product marketing, measurement, marketing with data, technical SEO, and growth analytics.