I have a web service design question here and hope to get your inputs.
I need to design two services, say, WS1 and WS2. WS1 needs to talk to a database to get some data and WS2 is the consumer of WS1 and itself is a service as well. Now, for WS1, I need to make DB call to get a few hundred rows of data (4 or 5 fields). I want to make this data into XML format before WS2 consumes it. My question is: what is the best way to design this, especially, get this large amount XML data (the xml tags and real data) transmitted without affect performance? First of all, I don't want to make a DB call each time when WS2 consumes WS1. Certainly, I am thinking some sort of cache mechanism for WS1 to make its DB call first (or even when WS1 is consumed at its very first time) and cache the data somewhere then when WS2 consumes WS1, WS1 goes to the cache to obtain this large amount of data. However, the issue here is what if the DB source changes (i.e. updates and modifications, etc.)? How do I refresh my cache? Any other suggestions? What is the typical way for a web service (or web services) to handle large amount XML data?
Originally posted by Ricky Murphy: Any other suggestions?
The answer depends on the capabilities of your environment (and your database specifically). Caching is problematic when the underlying data is updated by multiple sources at random times - it is much easier when you are dealing with static data or data that is only updated in well known intervals that allow for the synchronization of the cache. Also industrial strength database engines already cache their result data - so this capability should always be leveraged before introducing yet another caching layer. The simplest solution is to collocate the web service WS1 on the same hardware server as the database engine eliminating any network latency between the web service and the database engine. You may not be allowed to deploy your web service on the database server. However you can still collocate the web service with (a mirror of) its data if your database has data replication features (i.e. the primary database sends out updates to any replicated mirror databases). Lacking that you could always write a series of triggers and/or stored procedures that sends the updated data from the primary database to the database that is collocated with the web service.
Do you actually have any (profiling) evidence that the network latency between the database and the web service will become a performance bottleneck? The data exchanged between DB-WS1 is sent in binary form which is much more compact than the data exchanged between WS1-WS2. The flow of a request would be something like this:
send SOAP request from WS2 to WS1
WS1 unmarshals SOAP request
send QUERY request from WS1 to DB
QUERY processing on DB
DB marshals QUERY result
send QUERY result from DB to WS1
WS1 unmarshals QUERY result and marshals XML data/SOAP response
send SOAP response from WS1 to WS2
The DB-WS1 network latency only effects [send QUERY request from WS1 to DB] and [send QUERY result from DB to WS1]. [send QUERY result from DB to WS1] is going to be minor compared to [send SOAP response from WS1 to WS2] even if you ignore [WS1 unmarshals QUERY result and marshals XML data/SOAP response]. Now if you are considering a cache to reduce the load on the primary DB then a secondary (replicated) DB collocated with the web service still makes sense because the web service would query the secondary DB and the DB engine would handle the caching of the query results.
Ultimately you want to keep your setup as simple a possible. So initially access the primary DB directly from WS1 but include the contingency of creating a secondary (replicated) DB collocated with WS1 if there is evidence of any undue load on the DB or excessive DB-WS1 network latency. [ January 15, 2008: Message edited by: Peer Reynders ]
Thank you Peer for your detailed suggestion. I am more concerned the Step 8 that you mentioned: "8) send SOAP response from WS1 to WS2". The issue is if the WS1 has big size of XML on its payload, that certainly will affect the WS2 to perform its job. What would you recommend to handle this case? Thank you!
Joined: Aug 19, 2005
Originally posted by Ricky Murphy: I am more concerned the Step 8 that you mentioned: "8) send SOAP response from WS1 to WS2". The issue is if the WS1 has big size of XML on its payload, that certainly will affect the WS2 to perform its job. What would you recommend to handle this case? Thank you!
Depends on what "large XML data" means and also depends on your environment. This is the perfect opportunity for a tracer bullet. Create a static SQL query for your database that will result in a suitably large sized (and preferrably stable) result set and code it up in a simple web service that marshalls that data to the XML response payload (logging the times on the service when it starts the DB query, when it starts marshalling, when it starts to send the response and when its done). Also code up a console client (again logging times of interest). Running the client on the same machine as the service will give you an idea of the overhead that is contributed by the DB query and XML marshalling; try this experiment during different load conditions on the database server. Then run the console client "as far away as possible" over the network to get an idea of the network latency contribution; again do this under different network load conditions.
This tracer bullet may tell you that you have nothing to worry about - if there is a problem then you have something that you can experiment with to assess different solution approaches.
Joined: Oct 16, 2007
Thank you Peer. The "Tracer Bullets" is a good place to begin with. I agree with you. I will put some effort in for some time study to assess the impact. I think it will be more concrete to ask specific questions at that time.