Gustavo Lacerda

put in meta noindex tag for robots

From:

but can I be sure that it's not too late?

From:

you can check the google index for it. if it's there they'll remove it from the index within a few days. there's also a removal request page:

https://www.google.com/webmasters/tools/removals?pli=1

From:

edanaher.livejournal.com

Not sure exactly what you want, but the webpage removal request tool looks like you can have Google recheck it and remove it from the listings.

Also, the Google Webmaster Tools look like they might be helpful; you put a meta tag on the site to prove to Google that you own it, and then you can do some stuff with it (I played with it a bit years ago, so I don't remember much about it).

From:

Put the url into the google search box. If it's in the index, it should show up as the only result; otherwise no results will appear.

From:

This does not work. Many results appear.
It's as if the URL had spaces in it, turning into several search terms.

From:

easwaran.livejournal.com

Even if you put it in quotes?

From:

Sure it works. You have to put in "site:http://myURL.com/myURL..." to see if it's in the index.

From:

The URL http://www.optimizelife.com/wiki/Publications shows up if the search term is site:http://www.optimizelife.com/wiki/ (no quotes)
but not if it is "site:http://www.optimizelife.com/wiki/Publications" or "site:www.optimizelife.com/wiki/Publications" (with quotes)

I think what I need is:
site:http://www.optimizelife.com/wiki/Publications

From:

see http://gustavolacerda.livejournal.com/870676.html?thread=3173908#t3173908

From:

If that's the url you're trying to keep out of the index, I note that it doesn't have a robots.txt and it doesn't seem to have any meta noindex tags, so it will probably end up in the index even if it's not there now.

From:

Hmm, given that that page _is_ in the index, I suspect that's not actually the page in question, just an example. So nevermind. :-)

From:

*AND* I wouldn't be stupid to link to it on a public LJ post.

My PUBLICations should be as PUBLIC as possible.

From:

the format for robots.txt seems to be:

User-agent: BadBot # replace the 'BadBot' with the actual user-agent of the bot
Disallow: /private/

User-agent: *
Disallow: /directory/file.html
However, pages like http://www.optimizelife.com/wiki/Publications aren't files OR directories... or are they, as far as robots are concerned?

From:

HTTP has no real concept of files or directories, just paths. According to the latest standard I was able to find:

http://www.robotstxt.org/norobots-rfc.txt

a robots.txt entry matches a URL if the former is a prefix (bytewise) of the latter.

From:

thanks.

I guess http://www.optimizelife.com/robots.txt should be able to the whole domain. (The wiki paths all begin with "http://www.optimizelife.com/")

I'd like to use a testbot... telling me exactly which URLs it will ignore.

From:

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449

btw, a robots.txt won't prevent indexing due to external links, right?

From:

jgrafton

From:

It will, yes. If you use robots.txt to disallow a url, googlebot will never attempt to retrieve it for any reason.

From:

You don't need any quotes when searching by site:url

From: