Go Back   Affiliate Programs Forums > Affiliates and Webmasters > Affiliate Lounge

Notices

 
 
LinkBack Thread Tools Search this Thread Display Modes
Old 03-30-2009, 07:25 PM   #1 (permalink)
Top Level Poster
 
Join Date: Feb 1975
Location: Irvine, CA
Posts: 445
Thanks: 2
Thanked 6 Times in 6 Posts
Default robots.txt

I have heard this text will stop a crawler from following links. What are the benefits of not having a robot scan a certain page? Is anyone currently using it?
__________________
www.affiliateprograms.com
GregK is offline  
Old 04-01-2009, 02:54 AM   #2 (permalink)
Senior Member
 
Join Date: Apr 2008
Posts: 22
Thanks: 0
Thanked 0 Times in 0 Posts
Default Re: robots.txt

Hi
You can enter whatever you want there. I use it for example to stop agents crawling the templates directory, and linksmanager to crawl pages not needed for it. See example below:

User-Agent: *
Disallow: /yourdomain.com/Templates
Allow: /

User-agent: linksmanager
Disallow: /cgi-bin/
Disallow: /cp/
Disallow: /css/
Disallow: /EN/
Disallow: /images/
Disallow: /modlogan/
Disallow: /PT/
Disallow: /webalizer/
Disallow: /widgets/
Disallow: /secure/
Disallow: /secure/
Disallow: /secure/
philpe is offline  
Old 04-01-2009, 05:23 AM   #3 (permalink)
Junior Member
 
Join Date: Apr 2008
Location: India
Posts: 58
Thanks: 0
Thanked 1 Time in 1 Post
Send a message via ICQ to Coderea Send a message via AIM to Coderea Send a message via MSN to Coderea
Default Re: robots.txt

Robots.txt is useful to allow/disallow unwanted crawlers and bots to go through your specificed page or directory.
__________________
Coderea Technologies - Build your outsourcing strategy now!
Coderea is offline  
Old 04-01-2009, 01:45 PM   #4 (permalink)
Top Level Poster
 
Join Date: Feb 1975
Location: Irvine, CA
Posts: 445
Thanks: 2
Thanked 6 Times in 6 Posts
Default Re: robots.txt

Why would we want the crawler to not see a page though. They might find something that helps a site rank on that page.
__________________
www.affiliateprograms.com
GregK is offline  
Old 04-01-2009, 05:23 PM   #5 (permalink)
Senior Member
 
Join Date: Feb 2009
Posts: 44
Thanks: 0
Thanked 2 Times in 2 Posts
Default Re: robots.txt

Not always, for example, duplicate version of page aka print version, or some pages that are taken from other places and are know as duplicate, so you want to block that content out straight away. Also, you can "break" the link, by having redirect going trough blocked folder.

Saves you some PageRank as well google wont count that a link from your site (let me remind - when your site links to bad/banned sites, it can harm you as well).
Sandis is offline  
Old 04-02-2009, 11:02 AM   #6 (permalink)
Top Level Poster
 
Join Date: Feb 1975
Location: Irvine, CA
Posts: 445
Thanks: 2
Thanked 6 Times in 6 Posts
Default Re: robots.txt

i didn't think about duplicate content. I guess that makes sense.
__________________
www.affiliateprograms.com
GregK is offline  
Old 04-02-2009, 12:44 PM   #7 (permalink)
Member
 
Join Date: Feb 2009
Posts: 82
Thanks: 0
Thanked 1 Time in 1 Post
Default Re: robots.txt

For not wanting robots to follow a link to a "bad" site, does a no-follow link do the same thing?

Also, can you robots.txt a section of a page?

<body>
I like cats
<robots.txt>
No I don't
</robots.txt>
</boddy>
WestEagle is offline  
Old 04-03-2009, 11:02 AM   #8 (permalink)
Senior Member
 
Join Date: Feb 2009
Posts: 44
Thanks: 0
Thanked 2 Times in 2 Posts
Default Re: robots.txt

Yes, nofollow does the same thing with some exceptions of how PageRank is passed.

I don't have any evidence that there is a <robots.txt> tag esp. for Google Also syntax like that doesn't make any sense as there are no rules included.

Last edited by Sandis; 04-03-2009 at 11:08 AM.
Sandis is offline  
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT -5. The time now is 08:32 PM.

Inactive Reminders By Mished.co.uk