Author Topic: Google Code Crawler?! (Read 3683 times)

dm-horus · « **on:** October 11, 2006, 10:40:15 AM »

I received this in my inbox this morning from Joomla support:

Quote

Critical Security Update!!

It has come to our attention that Google has released a new product, Google Code Search, that is capable of indexing and crawling through archive files stored in the public directories of web servers. We are reporting this as a security advisory because we have discovered that some site administrators are storing archives / backups of their website in the web root. Because of this, Google Code Search is able to crawl the archives and read unparsed PHP files as if they were plain text. This has resulted in the disclosure of some sensitive information including MySQL passwords and SMTP credentials.

http://dev.joomla.org/component/option,com...temid,33/p,198/
http://forum.joomla.org/index.php/topic,101880.0.html

I found this very disturbing so I did some more digging...

Here is some info straight from Google's project blog:

Quote

Post by Tom Stocky, Product Manager

Since Google Code Search launched a few days ago, we've received a lot of great feedback, including some about the dangers of exposing security flaws. Our goal with Code Search is to provide a useful resource for developers and help increase collaboration within the developer community. Unfortunately, tools that ease access to information for good can sometimes do so for bad... but it's our strong belief that the positive impact outweighs the negative, a belief thankfully shared by many of you.

We hope that Code Search will be used as a tool for solving security issues and helping people prevent exploits, since security through obscurity isn't really secure. In cases where we can help prevent certain malicious behavior, we'll do our best to do that. We're working on some changes already and we're very open to suggestions -- let us know if you have ideas.

Also, for those of you who want to keep your code from being crawled, please check out the FAQ that explains how to do that with a robots.txt file either on your website, the archive file or repository itself.

Link.

I also read a very interesting article on The Register here.

Quote

Yet, it's unlikely that programmers are going to be able to stay ahead of the quickly expanding list of searches that could find interesting code properties. A simple "todo +security" query calls up many programs that have unimplemented security features. Finding files with "confidential +proprietary" could pinpoint code that has been improperly released. And, searching for the function "gets"--a notoriously insecure string operation--can reveal programs that are likely vulnerable to a memory overflow, said Veracode's Wysopal.

.....

Quote

"This is like giving everyone a telescope," Wysopal said. "It is making them more efficient. Lets just hope that they are using this for good."

.....

Quote

"Any new technology allows for a new attack vector," Long said. "The big question is whether the good guys will discover it first."

From TechCrunch:

Quote

It does seem that the Google index of source code is a lot broader than those found at competing sites Krugle and Koders. For instance, Google Code Search will index the content of zip and tarball files on open source sites such as openssl.org, while the other search sites seem to return a lot of results from sourceforge and a few other centralized repositories.

.....

Quote

This looks like bad news for the startups in this space who will need to further innovate, but it is good news for Google, a company that hasn’t really been hitting home runs recently with some of it’s recent new products.

So, Google created a bot to crawl code. Anyone can search for code by string whether its broken or not and admins have to modify their robots file to keep the damn thing out. To be honest, I consider this mildly malicious. It can be useful to a few people but many more will undoubtedly abuse it, besides the fact that any code that turns up in a search may not work at all. I find this somewhat alarming. Ive always found code repositories more useful, as code is contributed for public use freely by their authors. Its all in one spot and they all conform to certain guidelines. You usually know what youre getting.

It seems to me that Google is being incredibly naive about the whole thing. Now, Im all for being optimistic and even idealistic, but damn. Youre dealing with code here! Youre letting people freely peer into site backends, source and even archived/compressed files?! I think expecting people to behave themselves is a little too optimistic, even irresponsible. What do you guys think about all this?

You can take a look at Google Code Search right here.

BlackBox · « **Reply #1 on:** October 11, 2006, 02:41:31 PM »

Well, if someone's login or database credentials are exposed because of this, then they deserve to get hacked. There are a couple of reasons:

1- The crawler can only find files that are exposed through links. Apparently people are linking to the backups, or the crawlers are getting directory listings (because there is no 'index' page in the directory). If the crawler can find it so could any visitor to the site.

2- You don't put sensitive, unencrypted backups in the web root! That sounds about as stupid as if Galactic were to post the root password to the machine on the forums here.

As for 'unparsed PHP files.' The crawler can't do anything the web server won't allow. If the file is named .php (and there is nothing else restricting it from being parsed by PHP) then the server is not going to give the raw script code, under any circumstances, to the client.

In other words, Google is only able to crawl and index sites which actually have raw source code available to look at. It can't just look at any site's arbitrary code (i.e. it couldn't start picking apart the PHP code we use here at OPU, since it's not available in raw / source form. The only way it can access PHP pages we use are after they get rendered).

Bottom line is, people who have problems with this have no one to blame but themselves. Don't put raw DB dumps or backup files in a directory accessible to the web, and for gosh's sake, don't link to them so the robot can find them!

It's not a matter of security, it's just a matter of ignorance / stupidity on the webmaster / site developer's part if bad things are happening to them because of Google Code Search.

dm-horus · « **Reply #2 on:** October 11, 2006, 04:25:11 PM »

I think you addressed a major issue hacker, but how many people/admins/businesses are that careful? As far as I know getting the job done takes priority over security with the thought that nobody will be able to peek into the backend. This tool would allow someone to search and intentionally or not, access insecure code. Like it says in the articles, some of the major code repositories were vulnerable to this at launch. How many do-it-yourselfers or people who have do-it-yourselfers do it for them will be aware of and secure against that kind of access? Theres aLOT of stupid people out there.

BlackBox · « **Reply #3 on:** October 11, 2006, 04:55:03 PM »

That's true, but still, it's not that hard to clean up your code to fix exploits and to avoid backing up stuff in the web root (or at least use password protection).

Also, it was certainly possible to find exploitable code out on the internet before Google Code Search. There have been sites like securityfocus, etc around for a long time. If you know how a target site works you can grab the source code for their CMS / forum / whatever and study that.

If someone is really bent on hacking a site they will find the information with or without a search tool like this. The people who you have to be concerned about are the script kiddies, among others, who use it as an easy portal to get the information they need.

Perhaps it could have a beneficial effect for people as well, by giving them a wake-up call to secure their sites and information.

And finally, as far as I'm concerned, if one isn't familiar with security on the internet and in scripts, then he/she probably shouldn't be using or writing production scripts in the first place.

alice · « **Reply #4 on:** October 11, 2006, 10:48:23 PM »

Google is attempting to new security standards webwide, by saying if your code isn't secure, we'll show everyone the peepholes. The more I look, the faster I see more and more bugs being fixed extremely quickly. Maybe they're achieving what they wanted to.

Hooman · « **Reply #5 on:** October 12, 2006, 01:18:29 AM »

I'd say my outlook agrees with Hacker.

It seems pretty dumb for someone to store sensitive files in a public folder of a webserver. I don't care how lazy you are, there's just no excuse for that. Even just failing to link to it, or placing an index file to hide the directory contents is lazier than I can comprehend. And I'm a pretty lazy person.

Bottom line, if you don't want people seeing it, don't post it.

I'd also like to point out that this doesn't make anything less secure, it just makes people more aware of security issues. The holes are there whether or not people know about them. In general I think programmers need to be much more aware then they currently are. (And stop using the awful outdated standard C routines with no buffer checking. Geeze, let them die. The replacements were made for a reason.)

CK9 · « **Reply #6 on:** October 13, 2006, 01:26:28 AM »

Hell, even I know not to put sensitive info in the root!

lordly_dragon · « **Reply #7 on:** October 13, 2006, 07:16:15 AM »

He he I remember one day my old server got bandwidth limit busted...because a bot found the coldplay album of my friend and he didnt false index it...wee 8 go in like 1 day 0_o

BlackBox · « **Reply #8 on:** October 13, 2006, 07:49:12 AM »

Quote

(And stop using the awful outdated standard C routines with no buffer checking. Geeze, let them die. The replacements were made for a reason.)

We love buffer overflows!

But yes, people should start using snprintf() instead of sprintf(), strncat() instead of strcat(), etc.

Maybe it's because they're lazy because all the old C books don't talk about the versions that allow you to specify a buffer length

News:

Author Topic: Google Code Crawler?! (Read 3683 times)