Marlon Pierce's Community Grids Lab Blog: February 2007

Tuesday, February 27, 2007

LibraryThing.com

This is brilliant: http://www.librarything.com/

See also the blog article from O'Reilly Radar: http://radar.oreilly.com/archives/2007/02/comparing_libra.html.

Thursday, February 22, 2007

Reading Property Files with Axis 1.x

From the past: here is a little code for loading a config.properties file into Axis 1.x. The first version is useful for debugging when you are running things outside the servlet container (i.e. firing from the command line rather than inside tomcat). The second method shows how to do this from within the servlet container.

//Needed to get the ServletContext to read the properties.
import org.apache.axis.MessageContext;
import org.apache.axis.transport.http.HTTPConstants;
import javax.servlet.ServletContext;
import javax.servlet.http.HttpServlet;

...

//Do this if running from the command line.
if(useClassLoader) {
System.out.println("Using classloader");
//This is useful for command line clients but does not work
//inside Tomcat.
ClassLoader loader=ClassLoader.getSystemClassLoader();
properties=new Properties();

//This works if you are using the classloader but not inside
//Tomcat.
properties.load(loader.getResourceAsStream("geofestconfig.properties"));
}
else {
//Extract the Servlet Context
System.out.println("Using Servlet Context");
MessageContext msgC=MessageContext.getCurrentContext();
ServletContext
context=((HttpServlet)msgC.getProperty(HTTPConstants.MC_HTTP_SERVLET))
.getServletContext();

String propertyFile=context.getRealPath("/")
+"/WEB-INF/classes/config.properties";
System.out.println("Prop file location "+propertyFile);

properties=new Properties();
properties.load(new
FileInputStream(propertyFile));
}

serverUrl=properties.getProperty("service.url");
baseWorkDir=properties.getProperty("base.workdir");
baseDestDir=properties.getProperty("base.dest.dir");
....

Wednesday, February 21, 2007

Aligning JSF Panel Grids

Found this at http://forum.java.sun.com/thread.jspa?threadID=736808&messageID=4233027 but the answer was a little mangled.

Suppose you want to put to tables of different sizes side-by-side in two columns of a containing table. This JSF you need is
<%@ taglib uri="http://java.sun.com/jsf/core" prefix="f"%>
<%@ taglib uri="http://java.sun.com/jsf/html" prefix="h"%>

<style>
.alignTop {
vertical-align:top;
}
</style>

<f:view>
<h:panelGrid columns="2" columnClasses="alignTop, alignTop">
<h:panelGrid columns="3" >

<h:outputText value="1"/>
<h:outputText value="2"/>
<h:outputText value="3"/>
<h:outputText value="4"/>
<h:outputText value="5"/>
<h:outputText value="6"/>

</h:panelGrid>

<h:panelGrid columns="3">

<h:outputText value="1"/>
<h:outputText value="2"/>
<h:outputText value="3"/>
<h:outputText value="4"/>
<h:outputText value="5"/>
<h:outputText value="6"/>
<h:outputText value="7"/>
<h:outputText value="8"/>
<h:outputText value="9"/>

</h:panelGrid>

</h:panelGrid>
</f:view>

That is, note you need two "alignTop" strings separated by a comma for the value of the columnClasses attribute. If you have only one "alignTop" it will only apply to the first column. To see this, switch the two inner panel grids.

Probably there is a better way to do this.

Useful Tomcat JVM Settings

Dr. Chuck recommends using

JAVA_OPTS='-Xmx512m -Xms512m -XX:PermSize=16m -XX:MaxPermSize=128m -XX:NewSize=128m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC' ; export JAVA_OPTS

See http://www.dr-chuck.com/csev-blog/000222.html.

You can also export these as $CATALINA_OPTS.

Wednesday, February 14, 2007

Some Notes on Axis2 Version 1.1

This is probably the first in a series.

* See http://ws.apache.org/axis2/ of course. Get the code and unpack.

* To run inside Tomcat, go to axis2-1.1.1/webapp and run "ant create.war" (you need ant). Then get the war from axis2-1.1.1/dist and drop into Tomcat's webapp directory.

* I'll start by just summarizing the documentation, so these are my "cut to the chase" notes.

* Start with a POJO service like the one below.

package pojo.magic;
public class PojoBean {
String name="Jojomojo";
public PojoBean() {
}
public String getName() {
return name;
}
public void setName(String name) {
this.name=name;
}
}

* To deploy this service by hand, you must do the following steps.
1. Compile it (use javac -d . PojoBean.java).
2. Copy the resulting directory (pojo) to the webapps/axis2/WEB-INF/services.
3. Mkdir webapps/axis2/WEB-INF/services/pojo/META-INF
4. Create a service.xml file with your service metadata.

* Your service.xml file will look like this:

<service name="PojoService" scope="application">
<description>
Sample POJO service
</description>
<messageReceivers>
<messageReceiver
mep="http://www.w3.org/2004/08/wsdl/in-only"
class="org.apache.axis2.rpc.receivers.RPCInOnlyMessageReceiver"/>
<messageReceiver
mep="http://www.w3.org/2004/08/wsdl/in-out"
class="org.apache.axis2.rpc.receivers.RPCMessageReceiver"/>
</messageReceivers>
<parameter name="ServiceClass">pojo.magic.PojoBean</parameter>
</service>

* I created this file BY HAND. Why, oh why, do they not have tool to do this? Note also that you will need to restart this webapp (or the whole tomcat server) if you make a mistake writing the XML or put anything in the wrong place. Apparently there's no more JWS.

* Also note you'd better not get confused (as I initially did) about the format for this file. There seem to be serveral different versions of server.xml, depending on how you develop your service--that is, if you build your service with Axis2's AXIOM classes (see below), then your service.xml will be completely different.

* You can also make .aar files for your services. In the above example, this would just be the contents of the pojo directory that I copied into WEB-INF/services.

* Take a quick look at what you have wrought:

- WSDL is here: http://localhost:8080/axis2/services/PojoService?wsdl
- Schema is here: http://localhost:8080/axis2/services/PojoService?xsd
- Invoke REST: http://localhost:8080/axis2/rest/PojoService/getName

Examine the WSDL. You will note that they explicitly define messages using an XML schema (the ?xsd response) even for this very simple service, which only gets/sets with strings.

* Note of course the limitations of my POJO: I only have String i/0 parameters, which can be
mapped to XSD defined types. I did not try to send/receive structured objects (ie other Javabeans), and of course I did not try to get/set Java objects.

* It is interesting that they support REST as well as WSDL/SOAP style invocation by default. For the above service, Axis is definitely a sledgehammer, since you could easily make the REST version of this in two seconds with a JSP page. And for more complicated inputs, REST would be really difficult.

Using AXIOM

* This would be really overkill for our simple little POJO.

* Presumably this is useful if you need to send more complicated structured messages (which need to be serialized as XML). But you don't really won't to do this if you can use XML Beans instead. That is, use the tool that specializes in serialization/deserialization of XML<->Java.

Tuesday, February 13, 2007

Notes on Amazon S3 File Services

Getting Started

* Best thing to do is go straight to the end and download the code, since the code snippets in the tutorial are not actual working programs. Go through the program s3Driver.java and then go back and read tutorial:

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/gsg/?ref=get-started

* Then read the technical documentation,

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/

It is all pretty simple stuff--all the sophistication is on the server-side, I'm sure.

Building a Client

* Interestingly, (for Java) there is no jar to download. Everything works with standard Java classes if you use the REST version of the service. If you use the WSDL, you will need jars for your favorite WSDL2Java tool (e.g. Axis).

* Java 1.5 worked fine, but 1.4.2 has a compilation error. The readme has almost no instructions on how to compile and run. To do the thing on linux,

Unpack it and cd into s3-example-libraries/java.
Edit both S3Driver.java and S3Test.java to use your access ID key and secret key.
Compile with find . -name "*.java"| xargs javac -d .
Run tests with "java -cp . S3Test" and then run the example with "java -cp . S3Drive"

This will compile various helper classes in the com.amazon.s3 package. These are all relatively transparent wrappers around lower level but standard HTTP operations (PUT, GET, DELETE), request parameters, and custom header fields, so you can easily invent your own API if you don't like Amazon's client code.

* Some basic concepts:
- To get started, you need to generate an Access ID key and a Secret Access Key.
- Your access id key is used to construct unique URL names for you.
- Files are called "objects" and named with keys.
- Files are stored in "buckets".
- You can name the files and buckets however you please.
- You can have 100 buckets, but the number of objects in a bucket is unlimited.

* Both buckets and s3objects can be mapped to local Java objects. The s3object is just data and arbitrary metadata (stored as a Java Map of name/value pairs). The bucket has more limited metadata (name and creation date). Note again this is all serialized as HTTP, so the local programming language objects are just there as a programming convenience.

* Buckets can't contain other buckets, so to mimic a directory structure you need to come up with some sort of key naming convention (i.e. try naming /mydir1/mydir2/file as the key mydir1.mydir2.file).

* In the REST version of things, everything is done by sending an HTTP command: PUT, GET, DELETE. You then write the contents directly to the remote resource using standard HTTP transfer.

* Security is handled by a HTTP request property called "Authorization". This is just a string of
the form

"AWS "+"[your-access-key-id]"+":"+"[signed-canonical-string]".

The canonical string is a sum of all the "interesting" Amazon headers that are required for a particular communication. This is then signed by the client program using the client's secret key. Amazon also has a copy of this secret key and can verify the authenticity of the request.

* Your files will be associated with the URL

https://s3.amazonaws.com/[your-access-key-id]-[bucket-name]/

That is, if your key is "ABC123DEF456" and you create a bucket called "my-bucket" and you create a file object called "test-file-key", then your files will be in the URL

https://s3.amazonaws.com/ABC123DEF456-my-bucket/test-file-key

* By default, your file will be private, so even if you know the bucket and key name, you won't be able to retrieve the file without also including a signed request. This URL will look something like this:

https://s3.amazonaws.com:443/ABC123DEF456-test-bucket/test-key?Signature=xelrjecv09dj&AWSAccessKeyId=ABC123DEF456

* Also note that this URL can be reconstructed entirely on the client side without any communication to the server--all you need is to know the name of the bucket and the object key and have access to your Secret Key. So even though the URL looks somewhat random, it is not, and no communication is required between the client and Amazon S3 to create this.

* Note also that this URL is in no way tied to the client that has the secret key. I could send it in
email or post it to a blog and allow anyone to download the contents. It is possible I suppose to actually guess this URL also, but note that guessing one file URL this way would not help you guess another, since the Signature is a hash of the file name and other parameters--a file with a very similar name would have a very different hash value.

* You can also set access controls on your file. Amazon does this with special a HTTP header,
x-amz-acl. To make the file publicly readable, you put the magic string "public-read" as the value of this header field.

* If your file is public (that is, can be read anonymously), then the URL

https://s3.amazonaws.com/ABC123DEF456-my-bucket/test-file-key-public

is all you need to retrieve it.

Higher Level Operations

* Error Handling: The REST version relies almost entirely on HTTP response codes. As noted in the tutorial, the provided REST client classes have minimal error handling, so this is something that would need to be beefed up.

For more information, see the tech docs:

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/

It is not entirely clear that the REST operations give you as much error information as the SOAP faults. Need to see if REST has additional error messages besides the standard HTTP error codes.

* Fine-Grained Access Control: Both objects and their container buckets can have ACL. The ACL of objects is actually expressed in XML. These can be sent as stringified XML. For more information on this, you have to go to the full developer documentation:

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/

* Objects can have arbitrary metadata in the form of name/value pairs. These are sent as "x-amz-meta-" HTTP headers. If you want something more complicated (say, structured values or RDF triplets) you will have to invent your own layer over the top of this.

Security Musings

* Amazon provides X509 certs optionally, but all Amazon services work with Access Key identifiers. The Access Key is public and is transmitted over the wire as an identification mechanism. You verify your identity by signing your Access Key with your Secret Access Key. This is basically a shared secret, since Amazon also has a copy this key. Based on the incoming Access Key Identifier, they can look up the associated secret key and verify that the message indeed comes from the proper person and was not tampered with by reproducing the message hash.

* You use your secret key to sign all commands that you send to the S3 service. This authorizes you to write to a particular file space, for example.

* This is of course security 101, but I wonder how they handle compromised keys. These sorts of non-repudiation issues are the bane of elegant security implementations, so presumably they have some sort of offline method for resolving these issues. I note that you have to give Amazon your credit card to get a key pair in the first place, so I suppose on the user side you could always dispute charges.

* Imagine, for example, that someone got access to your secret key and uploaded lots of pirated copies of large home movies. This is not illegal material (like pirated movies or software) but it will cost you money. So how does Amazon handle this situation? Or imagine that I uploaded lot of stuff and then claimed my key was stolen in order to get out of paying. How does Amazon handle this?

Monday, February 12, 2007

A Quick Look at One Click Hosting

There are an enormous number of so-called one-click web hosting services, some offering (unbelievably) unlimited storage for an unlimited time for free. For a nice summary table, see of course, Wikipedia: http://en.wikipedia.org/wiki/Comparison_of_one-click_hosters.

Here are a few of the best-featured free services:

DivShare: Has a limit on the size of files, but has no limit on disk usage. Their blog reveals they have been in operation since late 2006 and have just past the TeraByte storage mark--i.e. they are really still a small operation.
FileHO: No limits on file sizes or diskusage, and no limit on time. Some interesting comments here. Sounds too good to be true. Provides FTP services as well as web interfaces. Not much other information about the company.
in.solit.us: Damned by its blog. They reveal that they were shut down temporarily in early 2007 by their host, Dreamhost, for violating terms of usage--probably for sharing illegal files. Overall, they look very amateurish.

All of these allow you to make files either public or private.

Usually, you get what you pay for, so my guess is that if your files really must be preserved, you should pay for this service. Amazon S3 is the most famous of these, with fees of $0.15/GB/month for storage and $0.20/GB of data transferred. Several smaller companies attempt to provide higher level services on top of S3.

Sunday, February 11, 2007

Displaying RSS and Atom Feeds in MediaWiki

I used SimplePie to display Blogger atom feeds in Media Wiki (using the ancient 1.5.2 version). Here's an example:

http://www.chembiogrid.org/wiki/index.php/SimplePieFeedTest

To add another feed, all you need to do is edit a page and add this:

<:feed>:http://chembiogrid.blogspot.com/atom.xml</feed>

(Replacing chembiogrid's atom.xml with the alternative feed URL).

More info is here: http://simplepie.org/. It is as easy as advertised. See particularly here: http://simplepie.org/docs/plugins/mediawiki. Or just google "simplepie mediawiki".

Avoid Magpie RSS. I spent the whole afternoon trying to get it to work with Atom with no luck. SimplePie worked in about 5 minutes.

Mediawiki's Secret RSS/Atom Feeds

Several Mediawiki "special pages" can be viewed as RSS and Atom feeds. See http://swik.net/MediaWiki/Documentation.

For example, our CICC wiki's recent changes are published via the URL
http://www.chembiogrid.org/wiki/index.php?title=Special:Recentchanges&feed=atom.

Friday, February 09, 2007

Fun with YUI's Calendar

* Thanks for Jason Novotny for pointing me to this.

* Download code from Yahoo's web site: http://developer.yahoo.com/yui/.

* To get started, you must first put the javascript libraries in an appropriate path on
your web server. Yahoo does not for some reason supply a global URL. I did this by copying
the YUI zip file into my Tomcat's ROOT webapp. More below.

* First trick: adding a caledar to a Web Form. Read the instructions. One variation on the
instructions is that the src attribute needs to be set correctly. If you unpacked the
YUI zip in ROOT as described above, then you should use the following minimal page.

<html>
<head>

<script type="text/javascript" src="/yui_0.12.2/build/yahoo/yahoo.js"></script>
<script type="text/javascript" src="/yui_0.12.2/build/event/event.js"></script>
<script type="text/javascript" src="/yui_0.12.2/build/dom/dom.js"></script>

<script type="text/javascript" src="/yui_0.12.2/build/calendar/calendar.js"></script>
<link type="text/css" rel="stylesheet" href="/yui_0.12.2/build/calendar/assets/calendar.css">

<script>
YAHOO.namespace("example.calendar");
function init() {
YAHOO.example.calendar.cal1=new YAHOO.widget.Calendar("cal1","cal1Container");
YAHOO.example.calendar.cal1.render();
}
YAHOO.util.Event.addListener(window,"load",init);
</script>

</head>

<body>
Here is the caledar <p>
<div id="cal1Container"></div>
<p>
</body>

</html>

Note this code seems to work just fine in the HTML <body> so I will put it there henceforth.

* Let's look now how to get this value into a Web Form. We'll do the web form with JSF just
to make it more difficult. The particular example I'll use is a front end to a data
analysis service (RDAHMM) that will look for modes in selected GPS station over the selected
date range, but these details are not part of the web interface.

This will actually put the calendar right next to the form (ie not the way we would like it)
but that's fine for now.

<%@ taglib uri="http://java.sun.com/jsf/html" prefix="h"%>
<%@ taglib uri="http://java.sun.com/jsf/core" prefix="f"%>
<html>
<head>
<title>RDAHMM Minimalist Input</title>

<script type="text/javascript" src="/yui_0.12.2/build/yahoo/yahoo.js"></script>
<script type="text/javascript" src="/yui_0.12.2/build/event/event.js"></script>
<script type="text/javascript" src="/yui_0.12.2/build/dom/dom.js"></script>

<script type="text/javascript" src="/yui_0.12.2/build/calendar/calendar.js"></script>
<link type="text/css" rel="stylesheet" href="/yui_0.12.2/build/calendar/assets/calendar.css">

</head>

<body>
<script>
//Set up the object and add a listener.
YAHOO.namespace("example.calendar");
function init() {
YAHOO.example.calendar.cal1=new YAHOO.widget.Calendar("cal1","cal1Container");
YAHOO.example.calendar.cal1.render();

YAHOO.util.Event.addListener(window,"load",init);

//Add an alert window.
var mySelectHandler=function(type,args,obj) {
var dates=args[0];
var date=dates[0];
var year=date[0],month=date[1],day=date[2];
var startDate=year+"-"+month+"-"+day;

var newStartDateVal=document.getElementById("form1:beginDate");
newStartDateVal.setAttribute("value",startDate);
}

YAHOO.example.calendar.cal1.selectEvent.subscribe(mySelectHandler,YAHOO.example.calendar.cal1, true);
YAHOO.example.calendar.cal1.render();
}
YAHOO.util.Event.addListener(window,"load",init);
</script>

Here is the caledar <br>
<div id="cal1Container"></div>

The input data URL is obtained directly from the GRWS web service
as a return type.

<f:view>
<h:form id="form1">
<b>Input Parameters</b>
<h:panelGrid columns="3" border="1">

<h:outputText value="Site Code"/>
<h:inputText id="siteCode" value="#{simpleRdahmmClientBean.siteCode}"
required="true"/>
<h:message for="siteCode" showDetail="true" showSummary="true" errorStyle="color: red"/>

<h:outputText value="Begin Date"/>
<h:inputText id="beginDate" value="#{simpleRdahmmClientBean.beginDate}"
required="true"/>
<h:message for="beginDate" showDetail="true" showSummary="true" errorStyle="color: red"/>

<h:outputText value="End Date"/>
<h:inputText id="endDate" value="#{simpleRdahmmClientBean.endDate}"
required="true"/>
<h:message for="endDate" showDetail="true" showSummary="true" errorStyle="color: red"/>

<h:outputText value="Number of Model States"/>
<h:inputText id="nmodel" value="#{simpleRdahmmClientBean.numModelStates}"
required="true"/>
<h:message for="nmodel" showDetail="true" showSummary="true" errorStyle="color: red"/>
</h:panelGrid>
<h:commandButton value="Submit"
action="#{simpleRdahmmClientBean.runBlockingRDAHMM2}"/>
</h:form>
</f:view>
</body>
</html>

The basic thing to see is that the YUI event listener connects the "mySelectHandler" Javascript to
calendar mouse clicks, which then update the form item labeled "form1:beginDate".

Friday, February 02, 2007

Rethinking Science Gateways

Note: I intend to make this a series of articles on Web 2.0 technologies that follow my usual focus on nuts and bolts issues. However, I will start with 1-2 polemics.

Science gateways are essentially user-centric Web portals and Web Services for accessing Grid resources (primarily Globus and Condor in the US). Java-based portals are typically (but not always) built using the JSR 168 portlet standard. Numerous examples can be found in the Science Gateways Workshop at Global Grid Forum 14 (see Science Gateways Workshop at Global Grid Forum 14) and the associated special issue of Concurrency and Computation (they are available online but not yet published; search at http://www3.interscience.wiley.com/cgi-bin/jtoc/77004395/) to see articles). See also GCE06 for a more recent overview.

Java-based gateways are characterized by their reliance of Enterprise standards, particularly numerous Java Specification Requests, and related Enterprise Web application development frameworks such as Struts, Java Server Faces, Velocity, and so on. This has the advantage of allowing portals to work with numerous third party jars and also (by implementing standards) allows the science gateway community to interact with a larger community of developers than one would expect from the relatively specialized “web science” community.

This has come with a price. Java Web frameworks are designed to encourage beautiful coding practices (such as the Model-View-Controller design pattern) that are necessary for a team of professional developers. However, many casual Web developers find these frameworks very difficult to learn and furthermore (as many work in very small groups or individually) the benefits of beautiful code bases are outweighed by the difficulty getting anything working. This high entry barrier has created a “portal priesthood” possessing specialized knowledge.

While the enterprise approach and associated development practices may be appropriate for software development teams building science gateways for entire institutions (i.e. the TeraGrid User Portal), many science gateways would undoubtedly benefit from “agile development” rather than “enterprise development” practices. This is because many science teams cannot afford specialized Web and Grid developers. They would instead like to build gateways themselves. Such would-be gateway developers typically possess a great deal of programming knowledge but are more likely to be driven by practical applications rather than the desire to build elegantly architected software.

One of the recurring problems in building science gateways is that it is a very time-consuming activity if one follows the traditional software design approach: a gateway development team must extract requirements from a set of scientific users, who may not be directly involved in the gateway coding. Typically, the resulting prototype interfaces must go through many evolutions before they are useful to a working scientist or grad student.

Clearly many of these problems could be alleviated if the scientists could just build the interfaces themselves. Such user interfaces may be non-scaling, highly personal, or ephemeral, but the scientist gets what he or she wants. Well engineered software tools are still needed, but these must be wrapped with very easy to learn and use developer libraries. Google's Map API is the prime example of this sort of approach: very nice services (interactive online maps, geo-location services, direction services, interactive layer add-ons) can be built using JavaScript and thus in the classic scripting sense, “agilely” developed.

One of the useful lessons of the Programmable Web is that there is a distinction between the Enterprise and the Internet, but the two are complementary rather than opposed. Obviously Google, Flickr, et al need to build very sophisticated services (with such things as security for Grids and ACID transactions for the commercial world), but these internal guts should not be exposed in the service interface, or at least to the “Internet” face of the service, any more than they would be expose in the user interface. Again, Google serves as an example: calculating Page Ranks for the Internet is a very computationally demanding process, and Google undoubtedly has much specialized software for managing this distributed computing task. But this “enterprise” part of Google does not need to be exposed to anyone outside. Instead, we can all very easily use the results of this service through much simpler interfaces.

It seems clear that the Enterprise-centric portal interface portion of Science Gateways will need to be rethought, and I plan to do this in a series of posts examining technical issues of Web 2.0 development. The Web Service portion of the gateway architecture is, on the other hand, conceptually safe, but this does not mean that these also do not need to change. The lesson from the Programmable Web should be very clear: we must do a better job building Web services. Too many gateways have collections of Web Services that are unusable without their portal interfaces. That is, the services have not been designed and documented so that anyone can (by inspecting the WSDL and reading up a bit), create a useful client to the service. Instead, the services are too tightly coupled to their associated portal. I propose the acid test for gateway services should be simply "Can you put it up on programmableweb.com?"

Marlon Pierce's Community Grids Lab Blog