Proper SEO and the Robots.txt File
AU NATURAL By Mark Jackson, Search Engine Watch, Aug 12, 2008
Columns | Contact Mark | Biography
When it comes to SEO, most people understand that a Web site must have content, "search engine friendly" site architecture/HTML, and meta data -- i.e., title tags, meta description, and meta keywords tags.
But lately, I'm seeing a lot of "optimized" Web sites that have totally disregarded the robots.txt file. When optimizing a Web site, don't disregard the power of this little text file.
What is a Robots.txt File?
Simply put, if you go to domain.com/robots.txt, you should see a list of directories of the Web site that the site owner is asking the search engines to "skip" (or "disallow"). However, if you're not careful when editing a robots.txt file, you could be putting information in your robots.txt file that could really hurt your business.
There's tons of information about the robots.txt file available at the Web Robots Pages, including the proper usage of the disallow feature, and blocking "bad bots" from indexing your Web site.
The general rule of thumb is to make sure a robots.txt file exists at the root of your domain (e.g., domain.com/robots.txt). To exclude all robots from indexing part of your Web site, your robots.txt file would look something like this:
User-agent:
* Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/ The above syntax would tell all robots not to index the /cgi-bin/, the /tmp/, and the /junk/ directories on your Web site.
Real Life Examples of Robots.txt Gone Wrong
I recently reviewed a Web site that had a good amount of content and several high quality backlinks. But, the Web site had virtually no presence in the SERPs. What happened? Well, the site's owner had included a disallow to "/". They were telling the search engine robots not to crawl any part of the Web site.
In another case, a SEO company edited the robots.txt file to disallow indexing of all parts of a Web site after the site's owner stopped paying the SEO company.
And just yesterday, I reviewed a company's Web site and noticed that several directories that were part of their former site were disallowed in their robots.txt file. The company should have set up a 301 permanent redirect to pass the value from the old Web pages on the site to the new pages instead of disallowing the search engines to index any of the old legacy pages. Thus, all of the value was lost.
Robots.txt Dos and Don'ts
There are many good reasons to stop the search engines from indexing certain directories on a Web site and allowing others for SEO purposes. Let's look at some examples.
Here's what you should do with robots.txt:
- Take a look at all of the directories in your Web site. Most likely, there are directories that you'd want to disallow the search engines from indexing, including directories like /cgi-bin/, /wp-admin/, /cart/, /scripts/, and others that might include sensitive data.
- Stop the search engines from indexing certain directories of your site that might include duplicate content. For example, some Web sites have "print versions" of Web pages and articles that allow visitors to print them easily. You should only allow the search engines to index one version of your content.
- Make sure that nothing stops the search engines from indexing the main content of your Web site.
- Look for certain files on your site that you might want to disallow the search engines from indexing, such as certain scripts, or files that might contain e-mail addresses, phone numbers, or other sensitive data.
Here's what you should not do with robots.txt:
- Don't use comments in your robots.txt file.
- Don't list all your files in the robots.txt file. Listing the files allows people to find files that you don't want them to find.
- There's no "/allow" command in the robots.txt file, so there's no need to add it to the robots.txt file.
By taking a good look at your Web site's robots.txt file and making sure that the syntax is set up correctly, you'll avoid search engine ranking problems. By disallowing the search engines to index duplicate content on your Web site, you can potentially overcome duplicate content issues that might hurt your search engine rankings.
One last note: if you aren't sure whether you can do this correctly, please consult with a SEO specialist.
Join us for SES San Jose, August 18-22 at the San Jose Convention Center.
» Print this article » E-mail a colleague » Send feedback
Biography
Mark Jackson, President and CEO of VIZION Interactive, joined the interactive marketing fray in early 2000. His journey began with Lycos/Wired Digital where he managed several integrated marketing programs with a focus in the finance vertical and strategic programs involving Quote.com and Lycos Finance. Mark then worked with AOL/Time Warner on cross platform marketing programs. After having witnessed the bubble burst and its lingering effects on stability on the job front (two layoffs which were not related to performance), Mark established an interactive marketing agency and has cultivated it into one of the most respected search engine optimization firms in the United States.
VIZION Interactive was founded on the premise that honesty, integrity, and transparency forge the pillars that strong partnerships should be based upon. VIZION Interactive provides search engine friendly web design/development/content management systems, interactive marketing solutions including organic search engine optimization, pay per click bid management, social media strategies, media planning/buying/strategy, web analytics, web design/development accessibility/compliance retrofitting, custom application development, systems integration, email marketing and consulting.
Mark is a board member of the Dallas / Fort Worth Search Engine Marketing Association (DFWSEM) and a member of the Dallas / Fort Worth Interactive Marketing Association (DFWIMA).
Mark received a BA in Journalism/Advertising from The University of Texas at Arlington in 1993 and spent several years in traditional marketing (radio, television, and print) prior to venturing into all things "web".
Article Archives by Mark Jackson:
» Proper SEO and the Robots.txt File - August 12, 2008
» Top SEO Firms Paid for by the Following... - August 5, 2008
» Press Releases and Search Engine Optimization - July 29, 2008
» Usability and SEO - July 22, 2008
» Duplicate Content -- A True Story - July 15, 2008
» What are Good Links, Anymore? - July 8, 2008
» More Articles by Mark Jackson
We want to know what you think about Mark Jackson’s column, "Proper SEO and the Robots.txt File". Rant. Rave.
» Voice your opinion
No comments:
Post a Comment