gusl: (Default)
[personal profile] gusl
Can I be sure that a certain URL of mine hasn't been indexed by Google?

(no subject)

Date: 2009-09-13 10:54 pm (UTC)
From: [identity profile] peamasii.livejournal.com
put in meta noindex tag for robots


(no subject)

Date: 2009-09-13 11:05 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
but can I be sure that it's not too late?

(no subject)

Date: 2009-09-13 11:07 pm (UTC)
From: [identity profile] peamasii.livejournal.com
you can check the google index for it. if it's there they'll remove it from the index within a few days. there's also a removal request page:

https://www.google.com/webmasters/tools/removals?pli=1

(no subject)

Date: 2009-09-13 11:12 pm (UTC)
From: [identity profile] edanaher.livejournal.com
Not sure exactly what you want, but the webpage removal request tool looks like you can have Google recheck it and remove it from the listings.

Also, the Google Webmaster Tools look like they might be helpful; you put a meta tag on the site to prove to Google that you own it, and then you can do some stuff with it (I played with it a bit years ago, so I don't remember much about it).

(no subject)

Date: 2009-09-13 11:33 pm (UTC)
From: [identity profile] gwillen.livejournal.com
Put the url into the google search box. If it's in the index, it should show up as the only result; otherwise no results will appear.

(no subject)

Date: 2009-09-14 12:03 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
This does not work. Many results appear.
It's as if the URL had spaces in it, turning into several search terms.

(no subject)

Date: 2009-09-14 02:50 am (UTC)
From: [identity profile] easwaran.livejournal.com
Even if you put it in quotes?

(no subject)

Date: 2009-09-14 06:46 am (UTC)
From: [identity profile] peamasii.livejournal.com
Sure it works. You have to put in "site:http://myURL.com/myURL..." to see if it's in the index.

(no subject)

Date: 2009-09-14 07:12 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
The URL http://www.optimizelife.com/wiki/Publications shows up if the search term is site:http://www.optimizelife.com/wiki/ (no quotes)
but not if it is "site:http://www.optimizelife.com/wiki/Publications" or "site:www.optimizelife.com/wiki/Publications" (with quotes)


I think what I need is:
site:http://www.optimizelife.com/wiki/Publications

(no subject)

Date: 2009-09-14 07:29 am (UTC)
From: [identity profile] gwillen.livejournal.com
If that's the url you're trying to keep out of the index, I note that it doesn't have a robots.txt and it doesn't seem to have any meta noindex tags, so it will probably end up in the index even if it's not there now.

(no subject)

Date: 2009-09-14 07:31 am (UTC)
From: [identity profile] gwillen.livejournal.com
Hmm, given that that page _is_ in the index, I suspect that's not actually the page in question, just an example. So nevermind. :-)

(no subject)

Date: 2009-09-14 07:44 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
*AND* I wouldn't be stupid to link to it on a public LJ post.

My PUBLICations should be as PUBLIC as possible.

(no subject)

Date: 2009-09-14 07:50 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
the format for robots.txt seems to be:

User-agent: BadBot # replace the 'BadBot' with the actual user-agent of the bot
Disallow: /private/

or
User-agent: *
Disallow: /directory/file.html

However, pages like http://www.optimizelife.com/wiki/Publications aren't files OR directories... or are they, as far as robots are concerned?

(no subject)

Date: 2009-09-14 08:00 am (UTC)
From: [identity profile] gwillen.livejournal.com
HTTP has no real concept of files or directories, just paths. According to the latest standard I was able to find:

http://www.robotstxt.org/norobots-rfc.txt

a robots.txt entry matches a URL if the former is a prefix (bytewise) of the latter.

(no subject)

Date: 2009-09-14 08:25 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
thanks.

I guess http://www.optimizelife.com/robots.txt should be able to the whole domain. (The wiki paths all begin with "http://www.optimizelife.com/")

I'd like to use a testbot... telling me exactly which URLs it will ignore.

(no subject)

Date: 2009-09-14 08:29 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
btw, a robots.txt won't prevent indexing due to external links, right?

(no subject)

Date: 2009-09-14 05:40 pm (UTC)
From: [identity profile] gwillen.livejournal.com
It will, yes. If you use robots.txt to disallow a url, googlebot will never attempt to retrieve it for any reason.

(no subject)

Date: 2009-09-14 07:07 pm (UTC)
From: [identity profile] peamasii.livejournal.com
You don't need any quotes when searching by site:url

(no subject)

Date: 2009-09-14 07:11 pm (UTC)
From: [identity profile] peamasii.livejournal.com
Maybe you already found this out. If you create a webmaster account on google, it can run tests on your robots.txt and it can tell you what it's going to exclude. Also you can submit an xml sitemap and do all sorts of other fun stuff, like monitor which pages are indexed, etc. If you still want to to go further, install google web analysis on your site (it's just a javascript) and you'll get detailed stats about what and who and where.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags