I would like to know a few things about approaching a data sharing problem. Pentaho Data Integration has a 'Web Services' step that allows a designer to consume a web service for looking up data (input -> web service -> output). There are some confusing things about this that I would like to understand better before proposing a solution in my workplace.
1/ Is the Web Services step explained in your book?
2/ What is a WSDL file?
3/ If I am to propose a Web Service based on XML-RPC as a possible solution inside my office, is there a guidline / turnkey approach that can be implemented?
The backend would either be based in Tomcat or similar, or possible as a Google App. The purpose is to receive a chunk of data (eg/ an ID number of sorts) and return the associated information living in one of our databases (multiple fields) - or some standard response if there was no data to return (ie/ lookup was unsuccessful). We would also at a later data want to implement a type of 'insert' and 'update' operation for the web service, but I dont believe that PDI supports this at the moment.
Any advice or recommendations are most welcome. We are just at the brainstorming stage at the moment, so other peoples experiences in approaching this type of problem would be most welcome. The 'larger picture' for us is to be able to provide a single trusted source of information for our internal processes as well as to our external partners. We have some great data, knowledge and experience in our area and this is one way we can share this with others.
1/ I haven't had the honor of reading Maria's book yet, but the Pentaho DI manual describes each of its building blocks in detail, including the Web Services block.
2/ WSDL is the Web Services Description Language. It's used to define a SOAP interface. By publishing a WSDL file, clients can setup communications with the web service in a language-independent manner. There are WSDL-to-Java compilers, WSDL-to-C#, etc.
3/ XML-RPC isn't the same thing as SOAP. It's XML, but a less rigorous specification designed to cut down on the overhead at the expense of some of the general-purpose nature of SOAP.
If you're looking at downloading large quantities of data via HTTP and you don't need it formatted for a general audience, you might look at just downloading straight text instead. It has the advantages of simplicity (you can generally do it as a simple HTTP GET of a JSP that returns something like rows of CSV data) and less resource usage, since only the data is transmitted, and not a lot of metadata the way that protocols like XML do. It's also much easier to debug with.
Customer surveys are for companies who didn't pay proper attention to begin with.
Good to see you in this forum!
In my book I don't cover web services. My purpose when writing the book was to cover the basics and the most used PDI features. I didn't think that web services met any of those goals.
However the incoming more advanced Kettle book, Pentaho Kettle Solutions, is likely to cover them,
Thankyou for the replies. I appreciate the time you have taken.
One of the reasons for wanting to follow the XML-RPC path is to allow a standardised implementation that is compatible with PDI and the Web Services step. The data scenario is one of basically doing a 'lookup' such that descriptive data is returned to 'external' (ie/ not the source) data systems so that these systems are basing their information on the same 'single source of truth' (which will actually be a dimension of a data warehouse).
The processes we have in place (okay, almost have in place - still a few kinks to iron out but very close to finished) provide us with a very high quality of data and trustworthiness. This is extremely uncommon in our industry, and has taken us a very long time to achieve (read ten of thousands of man hours). Sharing this both internally and externally with our partners will be of great benefit to us and allow much closer co-ordination between companies and departments. Being able to provide this in a way directly compatible with a data integration tool like PDI seems the perfect way to go.
I would love to follow the text only route, but we have to be able to support the argument that we are adhering to some form of industry accepted standard(s). Saying that we have a specification, and being able to provide a WSDL file seems like a good idea from the sound of it - the clients really need to do nothing except use it (and maybe keep it updated - which PDI can do for them too!).
Does anyone know of a simple java class (EE compatable) or perhaps a framework that can 'showcase' approaching this problem? I would like to be able to better understand it myself and also to be able to provide an 'example' scenario to others involved. I am a little new to dealing with web services so please forgive my ignorance in this area.
Maria, thankyou for the feedback. My book I have ordered for our library here has just arrived and I can pick it up today. Having a personal copy would be brilliant! Please keep me in the running :-)