Mooter search engine

MooterAfter a Multimedia class today in which we considered search engines like Google, Clusty, Kartoo, a student mentioned the Mooter search engine. It has a primitive (but fast) visual mapping of related topics. The sponsored links seem even more in the way than Google or Clusty. I doubt I would use it regularly, but it’s good to know it’s there.

AJAX Dynamic Web Programming

WiredWired has an interesting article entitled “You Say You Want a Web Revolution” that describes a recent trend in web-based programming that allows a page, once loaded, to change dynamically based on interim communication with a server. One of the star examples is Google Maps, where the maps can be manipulated smoothly without requiring a refresh of the page (like MapQuest does).

AJAX, like DHTML, isn’t itself a programming language, it’s an amalgam of technologies and techniques including HTML, CSS, DOM, and Javascript (especially the XMLHttpRequest function). Though web browsers have their limitations and frustrations (consistency and compatibility chief among them), they do provide a rich foundation for web applications, in a generally familiar environment. I suspect the next generation web-based text analysis tools will employ AJAX techniques.

The Business Readiness Rating

An initiative is underway to develop an open standard to facilitate assessment and adoption of open source software: the Business Readiness Rating (BRR). The proposal’s whitepaper describes a common dilemma: the need for software is identified, but there’s little guidance on how to assess the alternatives rigorously; moreover, much of the assessment effort is lost because it’s not shared with others.

Although the BRR has a very business feel to it (including, of course, the title), many of the issues are the same in the academic world. In fact, many of the problems are even more acutely felt because – perhaps like very small businesses – there’s usually not an IT staff on-site with the time and expertise to evaluate software for specific purposes (like, say, XML editors). Having a robust open source software assessment mechanism in place could help individuals make more useful evaluations and could make it easier for smaller groups to determine alternative would be most appropriate. Obviously, as with any such initiative (and there have been similar ones in the past), there’s a danger that the interests of certain groups skew what is available, or that certain perspectives are insufficiently represented (the BRR is being sponsored by Carnegie Mellon West Center for Open Source Investigation, O’Reilly CodeZoo, SpikeSource and Intel).

In any case, I’ve been working with several colleagues (John Bradley, Steve Ramsay, Geoffrey Rockwell, Ray Siemens) on possible mechanisms for peer review of software in the humanities, and I suspect that the BRR will be of some interest to us as we move forward.

Bloggers and Academe

There’s a great discussion happening over at Matt Kirschenbaum’s blog on blogging in academe. The impetus for the discussion was an article in the Chronicle of Higher Education entitled “Bloggers Need Not Apply” (July 8, 2005). The author describes the experience of an academic job selection committee that consulted the blogs of a number of its candidates and, in general, found more to frown upon than to rejoice about. Several comic/tragic examples are given of blog content that exposed compromising traits ranging from dishonesty to neurosis.

I actually agree with part of what the author is saying: people – and particularly people looking for a job – should be mindful about what they say in a public forum, especially a written (or recorded) one. You can choose to entertain folks by, say, complaining bitterly about your employer or burrowing down into the depths of your soul, but you’ll need to live with the consequences. Even semi-professional posts (like this one!) can reveal ideological or ethical perspectives that may very well not be to the liking of someone whose opinion matters; a situation in which silence may have been preferable. If I’m on a selection committee that’s considering hiring a colleague for several decades, I’m also going to try to assemble as much public information as possible (including gleaning from blogs where possible). I think what the author gets right, almost incidentally, is that the Internet can tend to underplay notions of responsibility and accountability and that people are easily surprised by how their actions come back to haunt them. In some ways the simplicity of writing on the web doesn’t commensurate with the potential consequences of what’s written.

However, as many have pointed out on Matt’s blog, the author of the Chronicle article (who uses a pseudonym – a ruse that won’t endear him to many readers in the context of an academic publication) seems eager to throw the baby out with the bathwater. Is it really that difficult to distinguish between personal and professional blogging? Should bad judgement by a handful of naive candidates really lead an author to issue a broad condemnation of blogging in a widely-read publication like the Chronicle? I just don’t understand why the author wasn’t capable of a bit more nuance (even at the cost of a catchy title). Now that I’ve dissed him, let’s just hope that he’s not on the selection committee for my next job application….

To help offset the negative light shed on blogging by the Chronicle article, Matt has sent a call for anecdotes where blogging has had a positive impact on careers – that’s a great idea. At the same time, I wouldn’t want new scholars to think of blogging as a sure means of wider recognition; it may help, it may not. Personally, I have a whole other set of reasons why I blog, reasons that have little to do with external professional “benefits”. Ultimately, I’ve continued blogging for its own sake, because I enjoy it.

Lessons from Silicon Valley

I enjoyed a Wired article entitled “Lessons from Silicon Valley” that featured Joe Kraus, one of the co-founders of Excite. Kraus and the author mention a number of ways in which the Internet really requires a different way of doing business. In particular,

  • “being early is the same as being wrong,” which is another way of saying that sticking your neck out can be dangerous, especially in the fast-paced world of the Internet, and that it can be better to wait and learn from what other have done right or wrong.
  • “the internet is a new sort of market place that needs new business plans to make it work”

Seaching vs. Organizing

Wired NewsWired has an article entitled “Tiger Tweaks Could Kill Folders” that suggests that the use of hierarchical folders for organizing files is a dying metaphor, to be replaced by very fast system-wide searches provided by tools like Mac OS X’s Spotlight. The shift from Yahoo! categorized directories to Google’s flat searches is provided as evidence for the strength of the “new” search paradigm. Indeed, Google Mail’s first marketing slogan is “Search, don’t sort”.

While I agree that searching features in Google Mail and Tiger are extremely powerful, I disagree that hiearchical sorting of items is going to disappear any time soon. I’ve been impressed by the speed of Spotlight, but in many instances it returns far too many results, and it’s easier and faster for me to go through my folder structures to find files I want. Likewise, I love doing fast searches through all my email with a certain keyword (I’ve kept all my messages since 1992), but many times I’d rather view a folder with all the messages having to do with, say, a certain research project (where messages don’t necessarily all contain a keyword that would allow for effective searching). In other words, I think searching and categorizing are complementary approaches and I’m happy that tools for doing both are becoming increasingly refined.

Exploring Enron and Graphing

A short while ago I posted a message about a New York Times article on the Enron emails. As a follow-up, Jeffrey Heer has been working on an Enron Corpus Viewer (formerly “enronic”) and has some intriguing graphs posted on his site. Unfortunately, there only seem to be screenshots available for download, not a fully functioning application. However, the graphing engine that makes most of the images possible seems to be available under the name Prefuse, and it apparently resembles TouchGraph. In the same family is Heer’s Vizster, project for visualizing online social networks (the photo gallery here is well worth a visit).

Thanks to Stan Ruecker and Darren Harkness for the link.

Word Crunchers and Statistically Improbable Phrases

Deborah Friedell wrote an interesting mini-essay entitled “The Word Crunchers” in the New York Times Book Review last weekend (free registration required, see Bugmenot for logins). Friedall presents a solid, if extremely brief, history of quantitative text analysis, especially of the concordancing flavour. Much of the discussion seems to be in preparation for mentioning Amazon’s concordancing features, and in particular its Statistically Improbable Phrases feature that one can access when browsing certain books. The end of the article is remarkable, perhaps because it suggests that the author might not realize that the more things change, the more they stay the same:

Once it would have seemed unnecessary to point out that a statistical tool has no ear for allusions, for echoes, for metrical and musical effects, for any of the attributes that make words worth reading. Today, perhaps it bears reminding.

Kinetic Browsers

D’Art Design Gruppe has an intriguing browser that allows you to navigate a collection of projects by moving through a 3D space. I really like the kinetic browsing and I like the bird’s eye overview panel of the content, though I think it actually falls short of its potential. For instance, what does the location of each item in space represent (or is it random)? Does the overview panel have to show a flattened (2D) view of the space? What’s with the bright yellow items that seem irrelevant? Despite these criticisms, I think this is a very cool interface.

Be sure to use the arrow keys to navigate. Thanks to Drew for mentioning this site to me.

Text Analysis and Email

Enron emailThe New York Times had an interesting article last weekend about text analysis of email in the context of the Enron scandal (the article is available here, but requires free registration – see Bugmenot for logins).

Computer scientists are analyzing about a half million Enron emails. Here is a map of a week’s email patterns in May 2001, when a new name suddenly appeared. Scientists found that this week’s pattern different greatly from others, suggesting different conversations were taking place that might interest investigators. Next step: word analysis of these messages.

I’d like to think that in a couple of years it will be scholars in humanities computing who will be contributing revealing (and aesthetically pleasing) visualizations of data and describing the techniques for their interpretation.

Syndicate content