Papers: Why do you search?

This should be the first question of my PhD research, where I’d like to identify search task. A paper answers partially this question:

Rose, D. E., Levinson, D. 2004. Understanding User Goals in Web Search, WWW2004.

Authors give a hierarchy on Search Goals, derived by a study on Altavista queries. The three main areas of their proposed framework are:

  • Navigational: the goal is to get to a website
  • Informational: the goal is to learn something
  • Resource: the goal is to obtain a resource (download, view, interact, obtain)

Their work is based on Broder, A. 2002. A taxonomy of web search. SIGIR Forum.

It seems that Rose is going further:

Rose, D. E. 2006. Reconciling Information-Seeking Behavior with Search User Interfaces for the Web, in Journal of the american society for information science and technology, 57(6):797-799.

In this recent short paper, Rose identifies the principles that should guide who is creating next search engine:

  • Different interfaces should be available to match different search goals
  • The interface should facilitate the selection of appropriate contexts for the search
  • The interface should support the iterative nature of the search task. In particular, it should invite refinement and exploration.

[tags]web search, search goals, search tasks, rose, broder, levinson[/tags]

Some new trends in Search

According to Emre Sokullu, Read/WriteWeb, the three areas where Search Engines will improve are:

  • UI Enhancements
  • Technology Enhancements
  • Approach Enhancements (Vertical Engines)

I’m quite interested in all the three areas, because my research focuses on UI issues related to search task.

Here (and here) Sokullu defines Search 2.0 as third generation search. But Danny Sullivan does not agree. Alex Iskold talked previously on vertical search.

[tags]search engines, search 2.0, social search, vertical search[/tags]

Changes in Search

Via Slashdot, I read this post: 7 Search Evolutions for ‘07 by Dr. Riza C. Berkan. Dr. Berkan is a nuclear scientist specialized in artificial intelligence and fuzzy logic and CEO of hakia, “the Web’s new “meaning-based” search engine with the sole purpose of improving search relevancy and interactivity”.

According to Dr. Berkan, the 7 search evolutions next years will be:

    1. The first time a search engine will have an alternative to indexing; new technology like QDEXing will be developed.
    2. The first time ontological semantics will be used that will enable a search engine to perceive concepts beyond words and retrieve results with meaningful equivalents.
    3. The first time that search results will include highlighted best sentences as a result of semantic analysis rather than bolded keywords as a result of finding incidences.
    4. The first time that a single query will bring a gallery of results equivalent to running multiple queries about the meaningful variations of the same topic.
    5. The first time a search engine will let users evaluate answers on the spot by displaying uninterrupted and coherent text snippets, often letting searchers forgo having to click through to links and saving time.
    6. The first time a search engine will have a dialogue utility that will help point out best answers or suggest a Gallery for a more engaging human-like search experience.
    7. The first time a search engine’s data will grow by detection of new knowledge rather than by detection of new pages. Search engine growth by knowledge will be the new direction for the industry for 2007.

[tags]search engine, hakia, semantic web[/tags]

More Google products = few Google products

I like Google and lot of its products. I use Gmail, Calendar, Reader and many other (Search Engine included). How many products does Google offers? I don’t know. Lots. There is a bit of confusion: Search, Gourps, Blogs, AdSense/AdWords, Answers (no, it’s gone),…

By the way, I read somewhere that the frequent release of new products is a sort of human resources politics. It’s driven by the need of satisfaction of the search engineers. It’s not driven by a clear vision of products to offer to the user.

According to Nicholas Carr (didn’t remember where I found the link), Google is planning to re-organize the whole thing, grouping its products in five big product groups. He thinks that these five products will be:

Google Search (“Google” goes back to meaning just search: for all information types, on all devices, personalized)

AdMarket (a unified market place for buyers and sellers, spanning web text, web video, web banners, print, radio, TV)

YouTube (YouTube expands from video to become the common interface for all media sharing)

YouTools (what Apps for Your Domain morphs into, with different tool sets for businesses, families, universities, and hospitals)

YouFile (a personal information management service, covering health data, finances, etc.)

(source: Nicholas Carr’s blog)

Keep it simple.

[tag]google[/tag]

Two new applications: Google Reader and Thunderbird

Since last week, I’m using two new applications. Well, new for me.

I already tried to Google Reader. Now I’m giving it another chance for two reasons: BlogLines looses some of my feeds; the Reader was changed.

The new Google Reader is really better than the old one. I agree with Jame Healy: I’d like to sort the order the blogs appear (in particular, I’d like to sort it by number of new posts); I’d like to have a desktop aggregator syncronized with the Reader; I’d like a blogroll.

The second application I’m trying is Mozilla Thunderbird (with Lightning extension for Google Calendar). I need some more features than what I can find in PC-Pine. But I’m still waiting the new PC-Pine (Alpine), still in development. Thunderbird is a powerful email application. I just did a little caos with identities…

Social search is not new

Chris Sherman has some good thoughts on social search. First of all, he defines social search tools as “Internet way-finding services informed by human judgment.”

Actually, social search is older than search engines: the first rudimental examples are the link pages that everyone, in early days of the web, had on her site, starting from Tim Berners-Lee. These pages were organized lists of preferred links, chosen and commented by someone. I remember that every Internet Service Provider had such a page.

Yahoo! was an evolution of this things: an organized directory of selected website.

Now Google, with the PageRank algorithm, is a good examples of partially automated social search: webpages are collected by robots, by their ranked is evaluated starting from what webmasters decided.

Now, after the coming of Web 2.0, popular social search tools are bookmarking, tagging, boting services.

Chris Sherman focuses on some problems of social search: the web grows “too quickly for humans to keep up with it.” Algorithmic search is needed to be comprehensive. Furthermore, tags are not the solution to categorized the web: language is ambiguous and, even with a controlled vocabulary, people are too lazy to use it.

Although I think there is lot of work to solve this last problem (see my posts’ categorization, it’s a mess), I’m still positive with tagging and social search: old ways to categorized links have more problems. See Ontology is Overrated: Categories, Links, and Tags by Clay Shirky.

Detailed paper on social web

Kolbitsch, J. and Maurer, H. 2006. The transformation of the Web: How Emerging Communities Shape the Information we Consume. In Journal of Universal Computer Science, vol. 12, no. 2 (2006): 187-213. http://www.jucs.org/jucs_12_2/the_transformation_of_the
This is a very interesting paper on social web, offering in detail an overview on main application that are shaping the new web, or social web, or web 2.0.
From the abstract: This paper presents an overview of a broad selection of current technologies and services: blogs, wikis including Wikipedia and Wikinews, social networks such as Friendster and Orkut as well as related social services like del.icio.us, file sharing tools such as Flickr, and podcasting. These services enable user participation on the Web and manage to recruit a large number of users as authors of new content. It is argued that the transformations the Web is subject to are not driven by new technologies but by a fundamental mind shift that encourages individuals to take part in developing new structures and content. The evolving services and technologies encourage ordinary users to make their knowledge explicit and help a collective intelligence to develop.

In particular, I appreciated the defense of the non-hierarchical model (chapter 1.1) using ant colonies as example. “Although [...] individial ants make wrong decisions, the large number of ants in colonies assures that decisions are ultimately correct”.

The paper briefly mentionns the problem of different version of Wikipedia (one version for each different languages), that yields to unbalanced and non-communicating articles. After saying that it’s “hard to compare Wikipedia to a tradizional encyclopaedia”, last pages are devoted to social networks (chapter 8 ). Social networks are based on the concept of six degree of separation and on the rule of 150 (since an average human brain can remember factual, emotional and social details of maximum 150 people, a genuine social network is limited to about 150 people).

A first brick for my research: the authors consider Skype as a social network, although users are not conscious on this aspect: a phone-call contitutes a strong relationship through people. The same for email. This yields to an automated generation of a social network (versus a manually generated social network as Orkut or openBC).

Second brick: socially powered search engines are a potential application of social network. They may answer to query like “Has any of my acquaintances been on holidays in New Zealand?” or “recent articles on hypertext authored by people associated with Ted Nelson”.

Is Google going in this direction? Gmail and Search History are services that may take to this social aspect of the search engine. There are lot of privacy issue to solve (this will be a great problem), but something could already exist in alpha-version.

Paper: A communicative approach to web communication: the pragmatic behaviour of internet search engines

Please have a look at my Academic Publications page: I’ve just updated it with the paper recently published in Qwerty, a new italian journal, written in italian, english and other languages, dedicated to technology, culture and information.

From the journal presentation:Qwerty

“The journal arises from a growing awareness of the need to develop research and reflection on the impact, effects and nature of technology use and, as such, is intended to be a genuinely cross-disciplinary forum.
Qwerty wishes to provide a forum for discussion on the use of new technologies aimed at anyone interested in the use of technology in such fields as education, training, social and university research, including the cultural, social, pedagogical, psychological, economic, professional, ethical and aesthetical aspects of technology use.”

The paper reference is:

Cantoni, L., Faré, M., and Tardini, S. 2006.
A communicative approach to web communication: the pragmatic behaviour of internet search engines
.
In Qwerty 1/2006: 49-62.

From the abstract:

In this paper, websites are not approached as being just technological artefacts – which they also are, indeed – but from the point of view of communication, which is (one of) their structural purpose(s). In this perspective, the Website Communication Model (WCM) provides a model that highlights five main areas of interest when dealing with websites: the areas of contents and services offered through the website, of the tools for accessing them, of the people who publish the website, of those who access and use it, and of the “ecological” context which the website is part of.
The need for such an approach to electronic communication is well represented by the behavior of internet search engines, which strongly rely on the ‘pragmatic’ aspects of web communication. In fact, when performing the activities of collecting web pages, indexing them into their databases, and responding to users’ requests, internet search engines are relying more and more on criteria that are not directly deducible from web resources themselves, but that allow to capture some information about the publishers and the users of the website.
In this article, examples are presented, which show the pragmatic criteria adopted by some internet search engines in the three main phases of their working: spidering, indexing and responding.

50 things about Google

The blog of Searchenginewatch.com has two interesting articles, written by Danny Sullivan:

25 Things I hate about Google
25 Things I love about Google

Here are some of this things.

He loves personalized search, continuous improvements in web search, news, Froogle, Gmail, Google Maps, Google Sitemap, googlers (in particular Matt Cutts and Marissa Mayer), innovations proposed by Google, it’s all free!, AdSense, pure search (vs. portals), Google Desktop Search’s cache, the Library Scanning project, translating the web, Google Groups, Google Earth, Google Analytics, Picasa, the fight against the US Departement of Justice and others.

But more interesting are the things he hates:

    • Web search counts that make no sense. For example Google says 59’800’000 matches, but if you go on the last results page, you’ll find only 879.
    • Results clustering doesn’t work very well.
    • Too many interfaces.
    • “Related searches” disappeared.
    • Make Google.com show the same results regardless of country.
    • RSS feed for web search.
    • Stop giving away Blogger for free.
    • Let Gmail display more than 100 items.
    • Let Gmail have customized blacklists.
    • Stop opening products to everyone, then getting overwhelmed.
    • Charge for things!
    • Fix the philosophy. You’re doing 100 different things rather than “one thing really, really well.”

    Some of this issues may appear in contraddiction, but they aren’t. Read the complete stories for details.