File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Sockets and Internet Protocols and the fly likes Not able to retrieve data for Secured site Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "Not able to retrieve data for Secured site" Watch "Not able to retrieve data for Secured site" New topic
Author

Not able to retrieve data for Secured site

Himanshu Agrawal
Greenhorn

Joined: Feb 16, 2010
Posts: 9
hi
There is a secure site:
https://farm3.sat.gob.gt/saqbe-arancel-publico/aduana/arancel/consulta/consulta.jsf
After the landing page is retrieved, just provide 01 in Posición arancelaria text field and click on buscar. I am able to retrive this page via httpunit api. but after that CLASIFICACIÓN ARANCELARIA page... on clicking of 0101.10.90, i m not able to retrieve that page via Java code.
Please help me asap.
I have tried the following way to retrieve data:

1) Setting parameters into WebRequest:

//Retrieving webForm from the response of landed page secured site:
WebForm webForm = response.getFormWithID("formar");

//Then retrieve the request from the webForm and setting all relevant parameters associated with particular hsCodes:

WebRequest req = webForm.getRequest();
req.setParameter("formar_SUBMIT","1");
req.setParameter("jsf_sequence","3");
req.setParameter("formar:_link_hidden_","");
req.setParameter("pCodigo","01019000");
req.setParameter("formar:_idcl","formar:tree:0:0:1:parter2");
WebResponse resp = webForm.submit();

But we have redirected again to the same landing page from which we want to retrieve subsequent page for a HsCode.

We have also tested it after setting the following properties :

ClientProperties props = webConversation.getClientProperties();
props.setUserAgent("Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7");
props.setAcceptCookies(true);
props.setAutoRedirect(false);
props.setAutoRefresh(false);

But still we are not able to achieve desired response.

2) Retrieving data via WebLink:

//Retrieve link from response for a particular hsCode
WebLink link = response.getLinkWith("0101.10.10");

//Get the webrequest for a particular link.
WebRequest req = link.getRequest();

//Now webConversation API must return the response.
WebResponse resp = webConversation.getResponse(req);

But the response don't contain the subsequent page associated with particular HsCode.

3)Setting Request headers with Session fields and other parameters:

At first We Retrieve HeaderFields from rquest and response in java code and comparing these fields to HttpAnalyzer header fields.
Now setting the parameters which are alike or don't exist in request.

req.setHeaderField("ORA_WX_SESSION", "10.1.0.34:47873-2#3");
req.setHeaderField("Keep-Alive","300");
req.setHeaderField("Connection","keep-alive");
req.setHeaderField("JSESSIONID", "0a01002230d7760c3064f5d54d71b35e84bd606926fc.e3aOaNuPbhiQe3uMc3mPbNaRbO0");

But still the same response has been retrieved.

4)Using HttpsURLConnection:

We have opened a Url connection and then retrieving response

HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.setDoInput(true);
connection.setDoOutput(true);
String response = showText(con.getInputStream());

Via this way we are able to retrieve response for the landing page, but for subsequent pages its still a challenge to retrieve data.

We have also tried with setting some other options with HttpsURLConnection like as the example in following link:
http://www.java-samples.com/java/POST-toHTTPS-url-free-java-sample-program.htm

System.setProperty("java.protocol.handler.pkgs","com.sun.net.ssl.internal.www.protocol");
java.security.Security.addProvider(new com.sun.net.ssl.internal.ssl.Provider());
SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory.getDefault();
connection.setSSLSocketFactory(factory);
connection.setFollowRedirects(true);

5)Using HttpClient:

HttpClient client = new HttpClient();
HttpMethod method = new PostMethod("https://farm3.sat.gob.gt/saqbe-arancel-publico/aduana/arancel/consulta/consulta.jsf");
client.executeMethod(method);
byte[] body = method.getResponseBody();
InputStream is = new ByteArrayInputStream(body);
String resp = showText(is);
method.releaseConnection();

6) Using SSL implementation:
As per the Url describes http://httpunit.sourceforge.net/doc/sslfaq.html
We have downloaded the JSSE package from the Sun URL and placed the key jar files into JVM ext directory, and updated java.security file
and then we have used all the options described above, but still the same story has been repeated.

7)Using HttpAnalyzer:
We have also used HttpAnalyzer and some other tools also to analyze the web content but HttpAnalyzer is not able to trace data when Url connection has been done via eclipse for a secured site.

So still its remaining a challenge for retrieval of tariff data for GT.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42944
    
  68
I would use an HTTP proxy like tcpmon to study how accesses through the browser differ from accesses done programmatically by your code; there must be a difference somewhere.
Himanshu Agrawal
Greenhorn

Joined: Feb 16, 2010
Posts: 9
Hi
Thanks for your response but we have used many tools like Httpanalyzer etc but it didn't help.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42944
    
  68
What does "it didn't help" mean? Are you saying that the requests were bit for bit identical? I find that hard to believe.
Himanshu Agrawal
Greenhorn

Joined: Feb 16, 2010
Posts: 9
Ulf Dittmer wrote:What does "it didn't help" mean? Are you saying that the requests were bit for bit identical? I find that hard to believe.


Hi Ulf,
I have written the following code as pe your suggestion.. i.e passing bit for bit identical request

ClientProperties props = webConversation.getClientProperties();

props.setUserAgent("Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7");
props.setAcceptCookies(true);

String url = "https://farm3.sat.gob.gt/saqbe-arancel-publico/aduana/arancel/consulta/consultaCapitulo.jsf";
WebRequest req = new PostMethodWebRequest(url);

req.setHeaderField("Host", "farm3.sat.gob.gt");
req.setHeaderField("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7");
req.setHeaderField("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
req.setHeaderField("Accept-Language", "en-us");
req.setHeaderField("Accept-Encoding", "gzip,deflate");
req.setHeaderField("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
req.setHeaderField("Keep-Alive","300");
req.setHeaderField("Connection","keep-alive");
req.setHeaderField("Referer","https://farm3.sat.gob.gt/saqbe-arancel-publico/aduana/arancel/consulta/consulta.jsf");
req.setHeaderField("Cookie","ORA_WX_SESSION=10.1.0.34:47873-2#3; tree=0%3A0%3Dx%3B0%3A0%3A0%3Dx; JSESSIONID=0a01002230d7ff823b331f984a638e24cdee6e54c0d3.e3aOaNuPbhiQe3uMc3mPbNeSbO0");
req.setHeaderField("Content-Type","application/x-www-form-urlencoded");
req.setHeaderField("Content-Length","122");

req.setParameter("formar_SUBMIT","1");
req.setParameter("jsf_sequence","5");
req.setParameter("formar:_link_hidden_","");
req.setParameter("pCodigo","02011000");
req.setParameter("formar:_idcl","formar:tree:0:0:0:parter2");

WebResponse resp = webConversation.getResponse(req);
saveAsHtml(resp.getText());

But no luck to retrieve the response.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42944
    
  68
How are you making sure that the requests are, in fact, bit for bit identical? Which HTTP proxy/monitor are you using?
Himanshu Agrawal
Greenhorn

Joined: Feb 16, 2010
Posts: 9
Ulf Dittmer wrote:How are you making sure that the requests are, in fact, bit for bit identical? Which HTTP proxy/monitor are you using?

I am using HttpAnalyzer available @
http://www.ieinspector.com/
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42944
    
  68
Does the page you're getting back contain an error message, or other indication what might be going wrong?

But fundamentally, I'd use a library like HtmlUnit to go from page to page; that way you don't need to handle all the parameters and cookies in your code, but can let the library do it for you.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19790
    
  20

Himanshu, could you please UseCodeTags next time?


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Himanshu Agrawal
Greenhorn

Joined: Feb 16, 2010
Posts: 9
Hi Ulf
Thanks for your suggestion. The problem got solved using Htmlunit.
Himanshu Agrawal
Greenhorn

Joined: Feb 16, 2010
Posts: 9
Hi
I was able to use htmlunit 2.6 for downloading data from website using jdk1.6 but the new restriction came into place is that we could use only jdk1.4 or below. So I had to port my application on jdk 1.4 but since htmlunit 2.6 don't support jdk1.4 so i have switched to previous version of htmlunit like i have tried to use htmlunit 1.14.
But while running through htmlunit 1.14 with jdk1.4 it throws following error:

27 Feb 2010 00:07:25,289] - INFO [main]: Requesting given URL. =https://farm3.sat.gob.gt/saqbe-arancel-publico/aduana/arancel/consulta/consulta.jsf
[27 Feb 2010 00:09:39,997] - ERROR [main]: Exception while initializing JavaScript for the page
org.mozilla.javascript.EvaluatorException: Can not get field 'ELEMENT_NODE' for type: Node
at org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.java:98)
at org.mozilla.javascript.Context.reportRuntimeError(Context.java:966)
at org.mozilla.javascript.Context.reportRuntimeError(Context.java:1022)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.configureConstantsPropertiesAndFunctions(JavaScriptEngine.java:314)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.configureClass(JavaScriptEngine.java:292)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.init(JavaScriptEngine.java:212)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.access$000(JavaScriptEngine.java:96)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$1.run(JavaScriptEngine.java:156)
at org.mozilla.javascript.Context.call(Context.java:519)
at org.mozilla.javascript.Context.call(Context.java:450)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.initialize(JavaScriptEngine.java:167)
at com.gargoylesoftware.htmlunit.WebClient.initialize(WebClient.java:1084)
at com.gargoylesoftware.htmlunit.WebWindowImpl.setEnclosedPage(WebWindowImpl.java:115)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:238)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:116)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:89)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:450)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:359)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:407)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:395)
at com.mycustoms.dataload.tools.crawl.AbstractWebCrawler.getResponseHtml(AbstractWebCrawler.java:221)
at com.mycustoms.dataload.tools.gt.GTWebCrawler.crawl(GTWebCrawler.java:86)
at com.mycustoms.dataload.tools.crawl.AbstractWebCrawler.run(AbstractWebCrawler.java:88)
at com.mycustoms.dataload.tools.gt.GTWebCrawler.main(GTWebCrawler.java:56)

Could you please suggest the reason
Himanshu Agrawal
Greenhorn

Joined: Feb 16, 2010
Posts: 9
Hi
I am using following code for this:

As soon as it tries to retrieve response for the url, it throws above exception in the last post.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42944
    
  68
I'd advise to push back on the Java 1.4 requirement. That's been obsolete for years. Heck, even Java 5 is EOL'd now. Besides the fact that you won't get security fixes, it's becoming less and less likely that you'll be able to get help online, since fewer and fewer developers use it.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Not able to retrieve data for Secured site