Wednesday, February 20, 2008

Safely probing web sites

The whole recruitmenttech.com / Bernard Haldane scam thing got me using my old telnet trick to examine the contents of a website. A number of bad eggs like to use security flaws in IE or Firefox to infect your computer. Most people try to make sure the security patches are up to date. But, there is always a period of time between when a virus is released and when a security patch is released to deal with it. If you visit the wrong website during that time you could be in for trouble.

What I like to do is avoid the security flaw by using a method the virus writers are not expecting. I like to use telnet. I tend to telnet from different operating systems as well. You might not have the luxury of a dozen different OS. You could consider Vmware or some other OS emulator.

Anyways, here is how it works. I'll use telnet from MSDOS. The connection for a web browser and for telnet is pretty much the same. Telnet defaults to port 23 and web browsers default to port 80. So if I wanted to use telnet to connect to say www.blogger.com I'd use:

telnet www.blogger.com 80

At this point the MSDOS telnet program will print nothing. If you press CTRL-] you get to the telnet settings. In there enter set localecho. Press ENTER twice, once to turn on local echo and once to get out of the telnet settings. You are now back at a blank screen again. Enter:

GET / HTTP/1.1
Host: www.bogus-computer.com


NOTE: you have to press enter twice at the end. Once to send the Host: field and once to signal the end of the HTTP header.

If you take too long to type things in, the computer at the other end will timeout and hang up on you. If you type it in quickly enough you'll get something back like:

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=0df2c62f96b9ffe7:TM=1203534740:LM=1203534740:S=0L2HwAZgQbyqrbmI; expires=Fri, 19-Feb-2010 19:12:20 GMT; path=/; domain=.google.com
Server: gws
Transfer-Encoding: chunked
Date: Wed, 20 Feb 2008 19:12:20 GMT

...

There should be a lot more as well. What I'm not posting here is all the stuff between <html> and </html>. What I have posted is the HTTP response header. Your web browser eats this and uses the information. For example, the Set-Cookie: field will make the web browser set a cookie.

The GET command is a standard HTML header command. The next part, /, is the path you want to get and the HTTP/1.1 is the protocol to use. Some web servers only work with HTTP/1.0, some only work with HTTP/1.1, most will use both.

If you wanted the web site, http://en.wikipedia.org/wiki/List_of_HTTP_headers then the sequence would be:

telnet en.wikipedia.org 80
GET /wiki/List_of_HTTP_headers HTTP/1.1
Host: www.bogus-computer.com


You'll notice that I always put a Host: field. Many web servers will not respond to robots or automation. They want to know who is talking to them. So if you don't include the computer name in the header, they just hang up or send you a HTTP/1.1 403 Forbidden response. Try using telnet to www.google.com and they will refuse you. They not only want the Host: information but they expect a number of other fields as well. If you go to the http://en.wikipedia.org/wiki/List_of_HTTP_headers web page, they talk about some of the common header fields and have a link to the HTTP standard on www.w3.org.

When you get the response back, you'll have to look through the body and see if there are other references, e.g. <SCRIPT> tags, which will create more GET requests. Your web browser is often getting the first page and from there doing multiple GET commands for the contents (each <IMG> tag is a GET, running Javascript will create more GET commands, etc.).

Once you have downloaded everything you can then look at it with a text editor and see if there is anything in it which could harm your computer. If you don't know how to read Javascript, this is obviously not an option for you but as a big nerd this is what I do. :^)

Hope you enjoy this. Let me know if you have any questions.

Happy hunting!