aspose file tools*
The moose likes Web Services and the fly likes RESTful architecture for integration of government information? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Web Services
Bookmark "RESTful architecture for integration of government information?" Watch "RESTful architecture for integration of government information?" New topic
Author

RESTful architecture for integration of government information?

Rickard �berg
Greenhorn

Joined: Jan 03, 2007
Posts: 5
Hi guys!

I have a REST-question which I think is sort of interesting. I live in a small country (Sweden) which has just released a report stating that we need to be able to centralize all information about an individuals interactions with the government, both in order to make the government handling more efficient, and also to allow citizens easy access to a complete list of the open queries/issues/etc., regardless of which government branch is responsible for handling it, through a web interface. The report includes a number with many zeroes as projected monetary gain if this is done.

As a developer I'm curious about how to construct an architecture to support this goal. We're talking 500+ gov. branches, with let's say 10 systems each, handling the information about 9M citizens in total. Basically the flow of information probably has to be something like this:
System->Branch->Centralized store->split by SSN->gov. branch or citizen who wants info about themselves

Each Branch is hence responsible for aggregating information from it's own internal Systems, which is then sent to a Centralized store. With all info in one place it is then reasonably easy to split by individual, and present it, using web/RSS/SMS/email/?.

To make things more interesting, the systems are very diverse in terms of how they are implemented (.Net/Java/C/whatever), and there is currently no common authentication/security infrastructure in place. There is also a shortage of consultants, so any architecture will have to be reasonably simple to deal with the fact that mostly "average" consultants, in a multitude of companies, will be implementing it (this is IMO what rules out any fancy WS-* approaches). Implementability is more important than technical excellence.

How would one go about making an architecture based on REST-ideas to support this scenario? Should it be push (SMTP?) or pull(HTTP)? How do you handle security/authentication? How do you handle the sheer volume of data? Can any current standards be applied (Atom, GData)? Is it at all possible?

Any ideas are most welcome. This is not a joke btw :-)

ps. The report (in Swedish) can be found here:
http://verva.se/web/t/Publication____4667.aspx
Leonard Richardson
author
Ranch Hand

Joined: Jun 08, 2007
Posts: 37
Rickard,

This is a fun question. Here are some thoughts off the top of my
head. The workflow you describe sounds a lot like document publishing,
so something like the Atom Publishing Protocol might work. Each branch
would expose an APP collection to which its 10 systems would
post. After massaging the data as neccessary the branch would pass it
on to the central store, again using the APP. Citizens can then use
APP clients or feed readers to get information about themselves from
the central store. Email and web access would work off of the same
data store. I envision the system as a continuous rain of updates, but the central store can gather those updates by SSN and by query number.

Probably the most difficult part (technically and politically) will be
designing the data format to be used in the central store. The
individual systems won't need to care about the exact format, but all
the branches will need to agree on it, and all the systems will have
to output the appropriate data even if the format is different. The
format could be as simple as an Atom entry which has a textual
description and a few extension tags associating the entry with a
person, a government branch, and an open query.

The diversity of implementation languages won't be a problem. The lack
of common security infrastructure probably will be. The centralized
store, at least, will probably support a wide variety of
authentication mechanisms. You can do these as HTTP auth extensions,
but you'll need to create most of the extensions yourself. You may end
up porting some of the WS-Security authentication mechanisms to HTTP
auth extensions, the way username token was ported as WSSE. If so, it
would be nice to do those as an open standard so we all can build on
that work. Other authentication mechanisms may not correspond to
anything used publicly, and the govt. branches will have to hash it
out with the people minding the centralized store.

To handle the volume of data, I might extend the APP to support batch
operations--make it possible to POST an entire feed to a collection or
something. I'd also try to streamline the trip from system to branch
to central store, by making the systems generate data ready for the
central store. The line will move faster if the branch doesn't do much
besides span a trust boundary.

Sam probably has opinions on this, too.
Rickard �berg
Greenhorn

Joined: Jan 03, 2007
Posts: 5
Originally posted by Leonard Richardson:
Rickard,

This is a fun question. Here are some thoughts off the top of my
head. The workflow you describe sounds a lot like document publishing,
so something like the Atom Publishing Protocol might work. Each branch
would expose an APP collection to which its 10 systems would
post. After massaging the data as neccessary the branch would pass it
on to the central store, again using the APP. Citizens can then use
APP clients or feed readers to get information about themselves from
the central store. Email and web access would work off of the same
data store. I envision the system as a continuous rain of updates, but the central store can gather those updates by SSN and by query number.

So, you would use APP to push changes from the branches to the central store? If the central store is down, or otherwise unavailable (network error, or similar), would the branch then have to store and resend, or could the central store ask periodically for changes too? I.e. both push and pull, but using Atom for both?


Probably the most difficult part (technically and politically) will be
designing the data format to be used in the central store. The
individual systems won't need to care about the exact format, but all
the branches will need to agree on it, and all the systems will have
to output the appropriate data even if the format is different. The
format could be as simple as an Atom entry which has a textual
description and a few extension tags associating the entry with a
person, a government branch, and an open query.

I agree that the format will be the main technical/POLITICAL issue :-) That goes with the territory. But it should be doable.


The diversity of implementation languages won't be a problem. The lack
of common security infrastructure probably will be. The centralized
store, at least, will probably support a wide variety of
authentication mechanisms.

Has anyone tried using CAS for this purpose? There are some ideas around here to use CAS for centralized logins to all gov. websites (i.e. one central login site with strong auth. that everyone else then piggybacks), and I guess there's no reason why this could not work for automated web interaction as well.


To handle the volume of data, I might extend the APP to support batch
operations--make it possible to POST an entire feed to a collection or
something. I'd also try to streamline the trip from system to branch
to central store, by making the systems generate data ready for the
central store. The line will move faster if the branch doesn't do much
besides span a trust boundary.

Right, that sounds like a good idea. If one can control the system integration reasonably well it should be possible to churn out well-behaved data early on in the pipeline.


Sam probably has opinions on this, too.
Any help would be much appreciated :-) It's both a fun project and an absolutely huge one, should it ever take off, so it would be nice to get it reasonably right...
Leonard Richardson
author
Ranch Hand

Joined: Jun 08, 2007
Posts: 37

So, you would use APP to push changes from the branches to the central
store? If the central store is down, or otherwise unavailable (network
error, or similar), would the branch then have to store and resend, or
could the central store ask periodically for changes too? I.e. both
push and pull, but using Atom for both?


In general, when you try an HTTP request and the server is down, you
store and resend. My design assumes that a system is more likely to
be down than a branch, which is more likely to be down than the
central store.

An alternative along the lines of what you suggest is a single APP
implementation in the branch, which the systems POST to and the
central store GETs. That's very elegant. The main downside I see is
that a branch only has one client. Once the client GETs data, the
branch might as well purge that data--but the purge would make GET an
unsafe operation, which doesn't fit with the constraints of
REST. There are ways to fix this, but they make the system more
complex than one where the branch forwards data to the central store.


Has anyone tried using CAS for this purpose? There are some ideas
around here to use CAS for centralized logins to all gov. websites
(i.e. one central login site with strong auth. that everyone else then
piggybacks), and I guess there's no reason why this could not work for
automated web interaction as well.


I'd never heard of CAS, but its architecture looks a *lot* like the
ad hoc mechanisms devised by Google, eBay, et al. to let a web service
client authenticate an end-user without handling the end-user's
password (a topic we cover at the end of chapter 8). In those
scenarios the credential holder is still a human who trusts their web
browser, but the same architecture should work when the credential
holder is a software program.
Rickard �berg
Greenhorn

Joined: Jan 03, 2007
Posts: 5
About resends, would it make sense to do the Branch->Central posts using SMTP/IMAP instead, i.e. use an email server as a mediator? In this case the central repository does not have to be up all the time, at all, and the Branch does not even have to know where it is. All it does is email the data to some known mailbox. Or what is it worth to always stay within HTTP in these kinds of architectures? I guess authentication will have to be reinvented if one uses SMTP, so that's one thing. But other than that? Any experience with that approach?
Leonard Richardson
author
Ranch Hand

Joined: Jun 08, 2007
Posts: 37
Yes, SMTP/IMAP or SMTP delivered to a special processor might make sense.
SMTP and IMAP are both stateful protocols, so I don't think you could do this RESTfully.

I suspect Sam would say that if you use SMTP you'll end up reinventing a lot more than you would by using HTTP. He might well be right. A URI is easier to scale up than an email address. But I think it's worth experimenting.
Peter Krantz
Greenhorn

Joined: Jun 12, 2007
Posts: 1
Originally posted by Rickard �berg:
How do you handle security/authentication?


A partial solution to the security issue is the existing system SHS - Spridnings och h�mtningssystem (loosely translated as "dissemination and collection system"). SHS (See architecture overview in english, PDF). If I remember correctly SHS provides a directory (LDAP) of connected organizations, a secure transport mechanism, an envelope format with liberal payload options and a message queue for async communication.


Many government agencies are already connected to SHS and there are suppliers that sell services to get connected. This may be an important success factor for a project of this size.

Regards,

Peter
Rickard �berg
Greenhorn

Joined: Jan 03, 2007
Posts: 5
Hey Peter! Nice to see you here :-) (if I understood Jenny@Verva correctly, you now work for them, right?)

I read through the SHS architecture docs last week, and my gut feeling is that it will never work on the scale that the referenced ELSA report suggests. It is too complex and expensive. If the aim is 500*10 integrated systems (roughly), then you're not going to be able to find enough consultants to do it in a reasonable amount of time. The software needed to implement it is also going to be way too expensive for most gov. branches. This is part of why I am trying to find simpler solutions to this problem.


If I remember correctly SHS provides a directory (LDAP) of connected organizations, a secure transport mechanism, an envelope format with liberal payload options and a message queue for async communication.

And as far as I know there are not many vendors implementing the SHS specs(2 last time I checked), and those that do are not exactly cheap. This may be fine for integrating specialized gov. branches that has the energy and resources to do it, but not on the scale suggested in the report.


Many government agencies are already connected to SHS and there are suppliers that sell services to get connected. This may be an important success factor for a project of this size.

And on the contrary, in line with the above to me it seems like the current SHS initiative, which is based on WS-*, is really an inhibiting factor, as it is too ambitious and complex. I think it blocks simpler and more cost-effective solutions.

But I have admittedly not looked into it in great detail, and you are probably in a better position to know. If there are SHS implementationd and enough consultants around to do "Mina Sidor"("My Pages") in a reasonable amount of time and with reasonable funding, that's great. But from what I have seen so far, that's not likely to happen.

It'll be interesting to follow the developments in this area, for sure. If it takes off it's going to be one of the larger integration scenarios ever attempted...

/Rickard
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: RESTful architecture for integration of government information?