Question

Discussion in 'Computer Science & Culture' started by Enigma'07, May 23, 2004.

Thread Status:
Not open for further replies.
  1. Enigma'07 Who turned out the lights?!?! Registered Senior Member

    Messages:
    1,220
    I have a quick question, what are google spiders?
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. sargentlard Save the whales motherfucker Valued Senior Member

    Messages:
    6,698
    They are indexing thingies (?) for google. Thats how google puts in new sites to its database for users to find. They are crawling all over this forum therefore the reason Sciforums is indexed pretty well on google.
     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. spuriousmonkey Banned Banned

    Messages:
    24,066
    And therefore better not say anything too stupid.
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. spuriousmonkey Banned Banned

    Messages:
    24,066
    who is caffine in this thread?
     
  8. Stryder Keeper of "good" ideas. Valued Senior Member

    Messages:
    13,105
    I think Thor mixed up this thread with the other on websites.

    Spiders (and robots/bots/agents) are automated retreival scripts run on servers that link to databases, they hunt down specific criteria from websites, from the inclosed META Tags that are placed into a HEAD tag right down to the full contents of the site and then through their server cache them or index them in their database.

    (The HEAD tag is the fist piece of a file that is parsed that can contain data, some spiders are written just to read the HEAD tags and therefore don't read the rest of the document, where as other spiders are written to read the whole document.)

    Spiders follow the "ROBOTS rules" when they are programmed correctly, the ROBOTS rules basically tells them which pages to spider ("Spider" in this instance means looking for links off of the page and then indexing them, The spider server will then send a spider to that page to do the same there.)

    Spiders actually do build a structured view not just of an individuals website, but all websites link from and too that individual, which is why the large search engines that use the database put together (like Google) pretty much have a large preportion of the internet indexed and cached.

    (Caching in this instance is so that keywords can be indexed by the system and it's link weight weighed, meaning how many links point to it on the internet. The more popular the site, the better the mark in the search engine, so if you had a website tied with you on the number of instances of keywords the search engine would test your link weight to decide who gets shown first in the list.)

    Resource for information:
    http://www.robotstxt.org/
     
Thread Status:
Not open for further replies.

Share This Page