Tuesday, August 12, 2008

au Natural: Proper SEO and the Robots.txt File

Today's Column: » Proper SEO and the Robots.txt File 
About | SEW Blog | Forums | SEW Experts | Search 101 | Ratings & Stats | View Online
SearchEngine Watch SEW Experts au Natural
ClickZ - Internet Marketing Solutions for Marketers ClickZ Events - World's Largest Online Resource of Interactive Marketing News Search Engine Watch - Tips About Internet Search Engines and Search Engine Submission Search Engine Strategies - the intersection of search, marketing & commerce
Search Engine Watch - Search Engine Marketing Tips & Search Engine News Subscribe to SEW Newsletters Subscribe to SEW RSS Feeds Find/Post Jobs How to Advertise on SEW Member Login


SES San Jose 2008


Top Jobs

Online Media Strategist/planner buyer
Eric Mower and Associates Atlanta, United States Charlotte, United States Syracuse, United States

Senior Advertising Sales Executive
Encyclopaedia Britannica/Merriam Webster San Francisco, United States

Manager, Digital Marketing, Telemundo Network
Telemundo Network Miami, United States

Strategic Consultant
Responsys , United States

VP of Marketing
Leadclick Media Inc San Francisco, United States

More Jobs More Jobs
SEW Expert - Mark Jackson
Proper SEO and the Robots.txt File
More AU NATURAL AU NATURAL

By Mark Jackson, Search Engine Watch, Aug 12, 2008
Columns  |  Contact Mark  |  Biography

When it comes to SEO, most people understand that a Web site must have content, "search engine friendly" site architecture/HTML, and meta data -- i.e., title tags, meta description, and meta keywords tags.

But lately, I'm seeing a lot of "optimized" Web sites that have totally disregarded the robots.txt file. When optimizing a Web site, don't disregard the power of this little text file.

What is a Robots.txt File?

Simply put, if you go to domain.com/robots.txt, you should see a list of directories of the Web site that the site owner is asking the search engines to "skip" (or "disallow"). However, if you're not careful when editing a robots.txt file, you could be putting information in your robots.txt file that could really hurt your business.

There's tons of information about the robots.txt file available at the Web Robots Pages, including the proper usage of the disallow feature, and blocking "bad bots" from indexing your Web site.

The general rule of thumb is to make sure a robots.txt file exists at the root of your domain (e.g., domain.com/robots.txt). To exclude all robots from indexing part of your Web site, your robots.txt file would look something like this:

User-agent:
* Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

The above syntax would tell all robots not to index the /cgi-bin/, the /tmp/, and the /junk/ directories on your Web site.

Real Life Examples of Robots.txt Gone Wrong

I recently reviewed a Web site that had a good amount of content and several high quality backlinks. But, the Web site had virtually no presence in the SERPs. What happened? Well, the site's owner had included a disallow to "/". They were telling the search engine robots not to crawl any part of the Web site.

In another case, a SEO company edited the robots.txt file to disallow indexing of all parts of a Web site after the site's owner stopped paying the SEO company.

And just yesterday, I reviewed a company's Web site and noticed that several directories that were part of their former site were disallowed in their robots.txt file. The company should have set up a 301 permanent redirect to pass the value from the old Web pages on the site to the new pages instead of disallowing the search engines to index any of the old legacy pages. Thus, all of the value was lost.

Robots.txt Dos and Don'ts

There are many good reasons to stop the search engines from indexing certain directories on a Web site and allowing others for SEO purposes. Let's look at some examples.

Here's what you should do with robots.txt:

  • Take a look at all of the directories in your Web site. Most likely, there are directories that you'd want to disallow the search engines from indexing, including directories like /cgi-bin/, /wp-admin/, /cart/, /scripts/, and others that might include sensitive data.
  • Stop the search engines from indexing certain directories of your site that might include duplicate content. For example, some Web sites have "print versions" of Web pages and articles that allow visitors to print them easily. You should only allow the search engines to index one version of your content.
  • Make sure that nothing stops the search engines from indexing the main content of your Web site.
  • Look for certain files on your site that you might want to disallow the search engines from indexing, such as certain scripts, or files that might contain e-mail addresses, phone numbers, or other sensitive data.

Here's what you should not do with robots.txt:

  • Don't use comments in your robots.txt file.
  • Don't list all your files in the robots.txt file. Listing the files allows people to find files that you don't want them to find.
  • There's no "/allow" command in the robots.txt file, so there's no need to add it to the robots.txt file.

By taking a good look at your Web site's robots.txt file and making sure that the syntax is set up correctly, you'll avoid search engine ranking problems. By disallowing the search engines to index duplicate content on your Web site, you can potentially overcome duplicate content issues that might hurt your search engine rankings.

One last note: if you aren't sure whether you can do this correctly, please consult with a SEO specialist.

Join us for SES San Jose, August 18-22 at the San Jose Convention Center.

» Print this article   » E-mail a colleague   » Send feedback

Biography
Mark Jackson, President and CEO of VIZION Interactive, joined the interactive marketing fray in early 2000. His journey began with Lycos/Wired Digital where he managed several integrated marketing programs with a focus in the finance vertical and strategic programs involving Quote.com and Lycos Finance. Mark then worked with AOL/Time Warner on cross platform marketing programs. After having witnessed the bubble burst and its lingering effects on stability on the job front (two layoffs which were not related to performance), Mark established an interactive marketing agency and has cultivated it into one of the most respected search engine optimization firms in the United States.

VIZION Interactive was founded on the premise that honesty, integrity, and transparency forge the pillars that strong partnerships should be based upon. VIZION Interactive provides search engine friendly web design/development/content management systems, interactive marketing solutions including organic search engine optimization, pay per click bid management, social media strategies, media planning/buying/strategy, web analytics, web design/development accessibility/compliance retrofitting, custom application development, systems integration, email marketing and consulting.

Mark is a board member of the Dallas / Fort Worth Search Engine Marketing Association (DFWSEM) and a member of the Dallas / Fort Worth Interactive Marketing Association (DFWIMA).

Mark received a BA in Journalism/Advertising from The University of Texas at Arlington in 1993 and spent several years in traditional marketing (radio, television, and print) prior to venturing into all things "web".

Article Archives by Mark Jackson:
» Proper SEO and the Robots.txt File - August 12, 2008
» Top SEO Firms Paid for by the Following... - August 5, 2008
» Press Releases and Search Engine Optimization - July 29, 2008
» Usability and SEO - July 22, 2008
» Duplicate Content -- A True Story - July 15, 2008
» What are Good Links, Anymore? - July 8, 2008
» More Articles by Mark Jackson

We want to know what you think about Mark Jackson’s column, "Proper SEO and the Robots.txt File". Rant. Rave.
» Voice your opinion


Send Us Feedback! | Technical Questions or Bug Reports | Legal Notices, Licensing, Reprints & Permissions | Privacy Policy

Incisive Interactive Marketing LLC. 2008 All rights reserved.

To unsubscribe via postal mail, please contact us at:
Incisive Media Plc.
270 Lafayette Street, Ste. 700, New York, NY 10012
Please include the e-mail address with which you have been contacted.

How to Advertise | Contact Us | Subscribe to Newsletters | ClickZ.com

Click here to update your profile or unsubscribe.
EmailLabs - High Performance Email Marketing
Get a Free Email Marketing Demo
All ClickZ newsletters are sent from the domain "newsletters.clickz.com".
When configuring e-mail or spam filter rules, please use this domain name rather than the sender address, which varies.

No comments: