Monthly Archives: February 2006

Did Google Partner With Dell for Personalized Search?

Yesterday, my roommate bought a new desktop computer from Dell. As soon as he finished to install it, I noticed the Google Desktop logo on the desktop. I’m sure he didn’t download it as he’s not connected to the Internet yet.

It may seem that Google partnered with Dell (maybe with other computer manufacturers) in order to get information about their users. We know that MSN is going to use the information stored on their users computers in order to deliver personalized/targeted results.

But what about Google? They sure don’t manufacture computers and thus, if the users don’t download the Google Desktop tool, it would be hard for Google to get precious information about them.

What if the tool was already pre-installed so that they maximize the number of people who use it? Well, it seems like that’s what they are doing with Dell. Dell is sure selling a lot of computers and Google must be delighted to see all these people get the Desktop tool by default, without having to download anything.

I tried to find any article about an eventual partnership of Google with Dell, but really didn’t find anything.

Two Stanford Graduates hope to challenge Google

Anand Rajaraman and Venky Harinarayan both studied at Stanford university, at the same time as Lary Page and Sergey Brin.
Anand and Venky are launching a promising search engine called Kosmix. They hope to challenge Google that they judge not relevant enough.

Kosmix is basically a search engine whose main feature is the categorization of pages in different categories/sections, and then allow the user to choose the category his query is related to when searching. This is a really neat feature, as people are often frustrated when they don’t get the information they need, for example receiving only search results about products for sale, other that purely informative pages.

Also, Kosmix is different than Google as it will rely less on popularity (links, age, etc) to rank results.
According to today’s article in Mercury News:

Kosmix hopes to make online search even better and more relevant than Google — especially when people are researching information on specific topics. So far, Google, has searched for pages based on a sort of popularity contest. You enter a word or phrase, and Google will search its database of Web pages to find out which pages with that word or phrase have been linked to the most. Google has made many refinements, but a page’s popularity — not necessarily its content — still drives its approach.

Instead, the start-up has developed a new kind of technology called “categorization.”First, it asks users to define a category for a search. If a search term is related to health, for example, users can make a query in a health-related search box. That way, Kosmix can find Web pages that are more closely associated in meaning with the search terms.

Kosmix will also use the content of the page that’s linking to you in order to know what your page is about and then categorize it.

“Kosmix then looks at what pages that link to other pages are saying — to take a bigger stab at judging the meaning or subject of the page. If a page is saying something similar to the page it links to, you can get enough information to categorize it by topic, Harinarayan says.”

Kosmix hopes to help people find more relevant results by allowing them to narrow their search with the category feature. Kosmix is currently testing their first search engine based on health. They are planning on adding different categories throughout the year. I tried their health search engine at www.kosmix.com and I have to admit that they have done a very good job. It’s a very comprehensive and intuitive search engine, when you do a search, it presents you all kind of categories, such as “Symptoms”, “Definitions, “Case Studies”, or even “Blogs”.

For example, say you are researching prostate cancer. Type in “prostate cancer” into Google, and you get millions of results, and most on the first page are highly relevant — offering information about symptoms and treatments. But it is hard to know what comes after the first page, without doing a lot of scanning.Type “prostate cancer” into Kosmix’s health search, and you’ll get relevant pages straight off, but also a helpful categorization of results along the left-hand column, including things like “men’s health,” indicating it is a male problem, “alternative medicine,” something you may not have thought about looking for, “blogs” and “message boards.”

It seems like a very promising search engine, I really wish them good luck. They are doing something that none amongst the big three Search engines has tried to do yet, being too focused on purchasing other services than investing money in research.

Google is very good in fighting spam ( in my mail box )

I was checking my Gmail spam box today just to see if any regular email has been considered as spam . I was intrigued by the title of a spam, so I just opened it to see what kind of idiocy I would find today. Apparently Visa contacted me to let me know that I had to update my credit card information.

Lucky enough, Gmail does a pretty good job in preventing people from being ripped off by these crooked people, and displays a big message labelled in red mentioning the danger of the spam.

You can see a screenshot of Gmail in action here.

I have to say that I’m kind of impressed by their technology here. Of course I would never believe in those kind of emails, but unfortunately, many folks who are not Internet savvy might believe in them. Besides adding these obvious warning messages, they also seem to have disabled the “click here” link.

Thanks Google for showing me a good example of your actions against spam, at least of the one in my mail box…

Dell forgot to use the robots.txt file

You often realize that Google accessed to folders you didn’t want to be accessible when you happen to see them on search engine results. It can sometimes be very annoying, for example if you uploaded a private document for your customer and gave him a link to it, but forgot to disallow the folder it was in with the robots.txt file. The same kind of problem happened to Dell, who published a confidential spreadsheet containing information about their new computers in a folder of their site, but without linking it from nowhere. Google crawled the site and indexed the page, which made it available online.

An article at ZDnet explains the situation:

Apparent specifications for Dell’s future notebooks were briefly exposed by Google’s search engine Tuesday, before the spreadsheet was removed from a Dell FTP site and from Google’s cache

The basic configurations for the Dell Inspiron e1405, Inspiron e1505, Inspiron 640m and Inspiron 6400 were available, along with several other unannounced Dell products, via a Dell FTP (File Transfer Protocol) site. A poster at technology review site NotebookReview.com noticed the spreadsheet and posted the link in one of the site’s discussion forums.

Dell, however, didn’t want to give any comments on their future laptops:

“We do not comment on unannounced products,” a Dell representative told CNET News.com.

In the spreadsheet, prices and specifications for older Dell products appear alongside recently introduced products and unannounced PCs with Intel’s Core Duo processors, which were expected to ship in February.

This is how the author of this article at Zdnet explained what happened

The search engine keeps a cache of pages from the last time it crawled the Web, but Webmasters can use an automated system on Google’s Web site to remove links that were not meant to be shared with the public.

Well, I guess the automated system he was talking about is nothing more than just the robots.txt file, which allows anyone to tell Google or any other search engines which files have to be disallowed. Google asks us to put condoms on paid links but they don’t mind hooking up with any files they find.

Optimizing your web site structure

P.J. Fusco has published an excellent and very comprehensive article today on Clickz about how to organize your pages to optimize the importance of your keywords, but also on how to keep a site easy to maintain and have accessible urls.

A few quotes from this article that I found very interesting:

Using the right structure for your pages and directories:

Some Web sites contain only a few files and require a relatively simple architecture. Others are large and require a more sophisticated structure. Large or small, well-optimized sites adhere to specific naming conventions to ensure all information is readily accessible to search bots and spiders.The deeper you bury keyword rich content, the less likely search engines will find it. Some search engine spiders won’t go deeper than a certain number of subdirectories.

When building or optimizing your website, it’s important to keep a clear organization of your pages. You would prefer to organize your pages in directories rather than putting all of them under the root folder of your site. That will first allow you to keep your files well organized and easy to find when you’ll need to make changes and also, having files put in directories that contain keywords will increase the relevance of your urls. However it’s not recommended to have deep subdirectories, it will make your files hard to find and crawled less often if not linked properly.

On page optimization:

Keywords at the beginning of the title tag are given the most weight. By leading with keywords carefully chosen for specific Web pages, you can make each site page more relevant for keywords and keyword phrases used in popular Web searches.”

I agree, the more targeted, the better. For less-competitive keywords, the use of a short and targeted title tag can make a huge difference.

Then, about H1 tags often ignored by webmasters who think that they are “too big”:

Some Web developers believe H1 tags are unsightly on the page — large, bold text that distracts from the overall site design. This needn’t be the case. The H1 tag’s font, size, color, and surrounding white space can all be defined using style sheets to complete the site design.”

With CSS, it’s indeed very easy to resize your H1 if you think it looks too big. Personally, I always resize them with a size of about 20px. You shouldn’t bypass the use of H1 just because you think it’s tedious to use CSS.


Next, you must ensure each optimized page’s body copy is adequately long and keyword rich. If at all possible, incorporate at least 250 to 300 words on each page so the search engines have enough content to determine the page’s theme.

Include relevant keywords, particularly near the top of the page, as search engines weigh these words more heavily. Optimal keyword density is a highly debated topic. Generally speaking, 5 to 8 percent keyword density in body copy is ideal. But be careful not to go overboard, or your copy won’t read well. Body copy must be useful to visitors if it’s to be relevant to search engines.”

The reason why you should include your keywords on the top is that search engines give more importance to them but also from the users’ point of view, you want them to read relevant content as soon as they land your page and don’t want them to scroll down your page to know what your site is about.

It’s also important hypertext links pointing to various site pages include your targeted keywords and keyword phrases as assigned to specific Web pages. Most major search engines still weigh link anchor text as highly relevant to the page being linked to. It’s best to keep text links relatively succinct; the longer the link text, the more diluted the theme.”

Try to link to your most important pages from the home page using relevant and short anchor texts, and try to use the same anchor text as link titles on these pages.