Friday, February 02, 2007

Rethinking Science Gateways

Note: I intend to make this a series of articles on Web 2.0 technologies that follow my usual focus on nuts and bolts issues. However, I will start with 1-2 polemics.

Science gateways are essentially user-centric Web portals and Web Services for accessing Grid resources (primarily Globus and Condor in the US). Java-based portals are typically (but not always) built using the JSR 168 portlet standard. Numerous examples can be found in the Science Gateways Workshop at Global Grid Forum 14 (see Science Gateways Workshop at Global Grid Forum 14) and the associated special issue of Concurrency and Computation (they are available online but not yet published; search at http://www3.interscience.wiley.com/cgi-bin/jtoc/77004395/) to see articles). See also GCE06 for a more recent overview.

Java-based gateways are characterized by their reliance of Enterprise standards, particularly numerous Java Specification Requests, and related Enterprise Web application development frameworks such as Struts, Java Server Faces, Velocity, and so on. This has the advantage of allowing portals to work with numerous third party jars and also (by implementing standards) allows the science gateway community to interact with a larger community of developers than one would expect from the relatively specialized “web science” community.

This has come with a price. Java Web frameworks are designed to encourage beautiful coding practices (such as the Model-View-Controller design pattern) that are necessary for a team of professional developers. However, many casual Web developers find these frameworks very difficult to learn and furthermore (as many work in very small groups or individually) the benefits of beautiful code bases are outweighed by the difficulty getting anything working. This high entry barrier has created a “portal priesthood” possessing specialized knowledge.

While the enterprise approach and associated development practices may be appropriate for software development teams building science gateways for entire institutions (i.e. the TeraGrid User Portal), many science gateways would undoubtedly benefit from “agile development” rather than “enterprise development” practices. This is because many science teams cannot afford specialized Web and Grid developers. They would instead like to build gateways themselves. Such would-be gateway developers typically possess a great deal of programming knowledge but are more likely to be driven by practical applications rather than the desire to build elegantly architected software.

One of the recurring problems in building science gateways is that it is a very time-consuming activity if one follows the traditional software design approach: a gateway development team must extract requirements from a set of scientific users, who may not be directly involved in the gateway coding. Typically, the resulting prototype interfaces must go through many evolutions before they are useful to a working scientist or grad student.

Clearly many of these problems could be alleviated if the scientists could just build the interfaces themselves. Such user interfaces may be non-scaling, highly personal, or ephemeral, but the scientist gets what he or she wants. Well engineered software tools are still needed, but these must be wrapped with very easy to learn and use developer libraries. Google's Map API is the prime example of this sort of approach: very nice services (interactive online maps, geo-location services, direction services, interactive layer add-ons) can be built using JavaScript and thus in the classic scripting sense, “agilely” developed.

One of the useful lessons of the Programmable Web is that there is a distinction between the Enterprise and the Internet, but the two are complementary rather than opposed. Obviously Google, Flickr, et al need to build very sophisticated services (with such things as security for Grids and ACID transactions for the commercial world), but these internal guts should not be exposed in the service interface, or at least to the “Internet” face of the service, any more than they would be expose in the user interface. Again, Google serves as an example: calculating Page Ranks for the Internet is a very computationally demanding process, and Google undoubtedly has much specialized software for managing this distributed computing task. But this “enterprise” part of Google does not need to be exposed to anyone outside. Instead, we can all very easily use the results of this service through much simpler interfaces.

It seems clear that the Enterprise-centric portal interface portion of Science Gateways will need to be rethought, and I plan to do this in a series of posts examining technical issues of Web 2.0 development. The Web Service portion of the gateway architecture is, on the other hand, conceptually safe, but this does not mean that these also do not need to change. The lesson from the Programmable Web should be very clear: we must do a better job building Web services. Too many gateways have collections of Web Services that are unusable without their portal interfaces. That is, the services have not been designed and documented so that anyone can (by inspecting the WSDL and reading up a bit), create a useful client to the service. Instead, the services are too tightly coupled to their associated portal. I propose the acid test for gateway services should be simply "Can you put it up on programmableweb.com?"

No comments: