aspose file tools*
The moose likes General Computing and the fly likes How to build a web proxy Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » General Computing
Bookmark "How to build a web proxy" Watch "How to build a web proxy" New topic
Author

How to build a web proxy

Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

I have seen amny sites that provide access to thw websites which are blocked.
I know that they fetch the page from the server and provide you that page. I want to develop similar application. But theproblem is that the page being fetched may have any number of hyperlinks. When clicked they will try to go directly to Their host server. What we want is that the linked when cliked should send request to our application andf then that appl;ication will retrieve information from the main host server.

Can someone help me in this?


My Blog SCJP 5 SCWCD 5
V Vijay Veeraraghavan
Greenhorn

Joined: Apr 06, 2008
Posts: 21
the proxy server basically works as the middle man between the web and the internet browser. you provide the IP and the port number of the machine that runs the proxy server to the internet browser like IE or mozilla, example: 192.168.xxx.xx or even localhost can do and its port number 1000.
now the browser, IE, will direct all the request to this machine. whatever the link is, even the hyperlink in the html page will be directed to this machine only. the proxy will connect to the web address given and returns back the page. the rules can be set in the proxy server to disable the client connecting to particular sites like porn or public mail sites.

working: the proxy server receives the request given by browser and creates a new thread that takes care of this request. till this session is finished this thread can take care of this client. the proxy server is similar to a web server but only different being, proxy server forwards the link with the internet and fetch the page to the requester, where as the web server just serves you the request page in its system (plainly).
to design the proxy in java we can use sockets. i suppose we can do it in the servlets too. each session will be run as threads.
do develop in any other language like "C" or c++ the main process forks to create child processes for each sessions.

[ October 15, 2008: Message edited by: V Vijay Veeraraghavan ]
[ October 15, 2008: Message edited by: V Vijay Veeraraghavan ]
Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

To add more on this is as I discussed it earlier also that a page can have any number of hyper links and submit buttons. Now we have to code the servlet in a manner that it has the intelligence to figure out that which url to send request to and when it gets the Page it will forward it to the end user.

Now we can make use of a map which will have the id or name of the hyperlink and its correct href(location). We can then parse the page and replace the HREF wherever found. This seems to be a solution but I think it is complex and open to a number of bugs. Like if someone wants to Open you tube than we have to stream whole data in a synchronous manner.

Well i am not very clear about it.

Help needed.
V Vijay Veeraraghavan
Greenhorn

Joined: Apr 06, 2008
Posts: 21
--------------------------
no... there is no need to alter the hyperlink present in the html pages, how much ever the count may be.
please see the general documentation of the proxy server architecture. because i think you have understood the concept wrongly.
once the proxy configuration is set in the internet browser, all the request sent from it will be piped through the proxy ip only. whether it is an hyperlink or a button submit or any other way the page is requested. there is no need for tinkering the html page each and every time it is fetched by the proxy server.
if you need to set the access rights, then probably the url is scanned for a particular words. if the word is present the page is not fetched, but forwarded to error page saying "this page cannot be accessed due access violation" etc etc. else the page is fetched and sent back to the browser.
if you feel still not clear from this text please do read a documentation of some of the proxy servers.
Proxy Server is good to start with.

vijay
--------------------------------
Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

Thanks a lot Vijay for clearing my doubts.

Vijay one more point i want to clear is that as far as I know we have to apply proxy setting in the browser. In case that user is not aware of anything and he types the URL of our web Proxy then how is this all configured? I will also refer to some sites to know the concept.

Really whatever you provided proved to be a great help for me.

Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

Some of the proxies of which I am talking about is

Kproxy
TigerGateway
V Vijay Veeraraghavan
Greenhorn

Joined: Apr 06, 2008
Posts: 21
the proxy server available in the internet are mostly for surfing internet anonymously. they are called anonymous proxy servers. they act as a middle man between the browser and internet including the proxy in the request path. the request goes like this...

browser-> request(http://www.youhide.com) -> proxy server (xxx.xxx.xx.xx:3128) -> http://www.youhide.com (done) -> reply proceeds back again to the browser requested.

browser-> enter (www.gmail.com) address in http://www.youhide.com (submit the page) post request -> proxy server (xxx.xxx.xx.xx:3128) -> http://www.youhide.com (fetch the url parameter that came, ie gmail.com and request it) connect -> www.gmail.com -> reply to youhide.com -> reply from youhide.com to proxy-> then reply back from proxy to browser.

this is the way the anonymous web proxies work. if you want your proxy server to analyse and block using this proxies, then the proxy should check for the post request parameters. if the post request contains any words that resembles the words that should be blocked, then it can restrict it.

it depends how the the proxy server was developed.
Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

So in case if we want to write our own web proxy we have to write them as usual web application and host them through some proxy servers. Do we have some proxy servers which are Open Source?

Please guide. I wanna write Web proxy as my spare time project.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42599
    
  65
Any number of existing proxy servers are available; some are listed in http://www.roseindia.net/opensource/freeproxyservers.php

Google for "java web proxy" and you'll find even more.


Ping & DNS - my free Android networking tools app
V Vijay Veeraraghavan
Greenhorn

Joined: Apr 06, 2008
Posts: 21
-------------------------
if you want to write a proxy server, there can be two ways....
one is to deploy the server as an application in a network through which all the computers in the network connect to the internet.
the second way is to write a server (web application) which works as a website providing services of the proxy web server to the public. this web application can also be installed in a network environment although.
* in java, web proxy can be written in servlets (web application).
* if you need to develop as an application then we can use sockets (standalone application).

better clear your mind and gather lots of information on the topic (lots of info too will lead to confusion) before starting.

vijay
-------------------------
Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

Thanks Vijay

I am aiming at first point
* in java, web proxy can be written in servlets (web application).


have to done this type of work before? If yes you can give some guidelines for help.
V Vijay Veeraraghavan
Greenhorn

Joined: Apr 06, 2008
Posts: 21
-----------------------
good, and all the very best

vijay
-----------------------
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42599
    
  65
While a proxy could be written as a web app using the Servlet API, I'd make it a standalone application that uses raw sockets.

What the proxy needs to do is to take the incoming stream and send it someplace else. It doesn't need to do anything with the data stream, so making it a web app would introduce a lot of overhead (like decoding the data stream, extracting all the parameters and headers etc. - steps the web app would essentially need to reverse in order to send out the request to the target server). Much easier (and faster) to send the complete request stream on to the target server without doing those steps.

The only thing the proxy server does need to do with the incoming requets is to extract the target URL, but that's contained in the first line of the incoming request, and can be extracted quite easily without using the Servlet API.

Be sure to read the HTTP specification; it has a lot to say about how proxies should work.
Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

WoW

This seems to be a feasible solution. What I got is we can simply strip the header of th e request in the servlet and then we can set the new Header depending upon some parameters in the session and then forward it to the real Server.

Is that Right?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42599
    
  65
What I got is we can simply strip the header of the request in the servlet and then we can set the new Header depending upon some parameters in the session and then forward it to the real Server.


You seem to have missed the point that you should not be using a servlet for this.

Also, I'm not sure what parameters you're talking about, and there shouldn't be any session.

You shouldn't dig as deep into the request as to even be able to distinguish HTTP headers. The main thing the proxy needs to do is to replace the absolute URL in the first line of the request by a URL that's relative to the root.

But you really need to read the HTTP spec so that you know what's going on there.
Himanshu Gupta
Ranch Hand

Joined: Aug 18, 2008
Posts: 598

Thanks will read the HTTP spec first and then I will get back to you.

------------------------------------------------------
Really happy to get wonderful support here.
Thanks a lot everyone.

 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to build a web proxy