I have seen amny sites that provide access to thw websites which are blocked. I know that they fetch the page from the server and provide you that page. I want to develop similar application. But theproblem is that the page being fetched may have any number of hyperlinks. When clicked they will try to go directly to Their host server. What we want is that the linked when cliked should send request to our application andf then that appl;ication will retrieve information from the main host server.
the proxy server basically works as the middle man between the web and the internet browser. you provide the IP and the port number of the machine that runs the proxy server to the internet browser like IE or mozilla, example: 192.168.xxx.xx or even localhost can do and its port number 1000. now the browser, IE, will direct all the request to this machine. whatever the link is, even the hyperlink in the html page will be directed to this machine only. the proxy will connect to the web address given and returns back the page. the rules can be set in the proxy server to disable the client connecting to particular sites like porn or public mail sites.
working: the proxy server receives the request given by browser and creates a new thread that takes care of this request. till this session is finished this thread can take care of this client. the proxy server is similar to a web server but only different being, proxy server forwards the link with the internet and fetch the page to the requester, where as the web server just serves you the request page in its system (plainly). to design the proxy in java we can use sockets. i suppose we can do it in the servlets too. each session will be run as threads. do develop in any other language like "C" or c++ the main process forks to create child processes for each sessions.
[ October 15, 2008: Message edited by: V Vijay Veeraraghavan ] [ October 15, 2008: Message edited by: V Vijay Veeraraghavan ]
To add more on this is as I discussed it earlier also that a page can have any number of hyper links and submit buttons. Now we have to code the servlet in a manner that it has the intelligence to figure out that which url to send request to and when it gets the Page it will forward it to the end user.
Now we can make use of a map which will have the id or name of the hyperlink and its correct href(location). We can then parse the page and replace the HREF wherever found. This seems to be a solution but I think it is complex and open to a number of bugs. Like if someone wants to Open you tube than we have to stream whole data in a synchronous manner.
Well i am not very clear about it.
V Vijay Veeraraghavan
Joined: Apr 06, 2008
-------------------------- no... there is no need to alter the hyperlink present in the html pages, how much ever the count may be. please see the general documentation of the proxy server architecture. because i think you have understood the concept wrongly. once the proxy configuration is set in the internet browser, all the request sent from it will be piped through the proxy ip only. whether it is an hyperlink or a button submit or any other way the page is requested. there is no need for tinkering the html page each and every time it is fetched by the proxy server. if you need to set the access rights, then probably the url is scanned for a particular words. if the word is present the page is not fetched, but forwarded to error page saying "this page cannot be accessed due access violation" etc etc. else the page is fetched and sent back to the browser. if you feel still not clear from this text please do read a documentation of some of the proxy servers. Proxy Server is good to start with.
Vijay one more point i want to clear is that as far as I know we have to apply proxy setting in the browser. In case that user is not aware of anything and he types the URL of our web Proxy then how is this all configured? I will also refer to some sites to know the concept.
Really whatever you provided proved to be a great help for me.
the proxy server available in the internet are mostly for surfing internet anonymously. they are called anonymous proxy servers. they act as a middle man between the browser and internet including the proxy in the request path. the request goes like this...
browser-> enter (www.gmail.com) address in http://www.youhide.com (submit the page) post request -> proxy server (xxx.xxx.xx.xx:3128) -> http://www.youhide.com (fetch the url parameter that came, ie gmail.com and request it) connect -> www.gmail.com -> reply to youhide.com -> reply from youhide.com to proxy-> then reply back from proxy to browser.
this is the way the anonymous web proxies work. if you want your proxy server to analyse and block using this proxies, then the proxy should check for the post request parameters. if the post request contains any words that resembles the words that should be blocked, then it can restrict it.
it depends how the the proxy server was developed.
------------------------- if you want to write a proxy server, there can be two ways.... one is to deploy the server as an application in a network through which all the computers in the network connect to the internet. the second way is to write a server (web application) which works as a website providing services of the proxy web server to the public. this web application can also be installed in a network environment although. * in java, web proxy can be written in servlets (web application). * if you need to develop as an application then we can use sockets (standalone application).
better clear your mind and gather lots of information on the topic (lots of info too will lead to confusion) before starting.
* in java, web proxy can be written in servlets (web application).
have to done this type of work before? If yes you can give some guidelines for help.
V Vijay Veeraraghavan
Joined: Apr 06, 2008
----------------------- good, and all the very best
Joined: Mar 22, 2005
While a proxy could be written as a web app using the Servlet API, I'd make it a standalone application that uses raw sockets.
What the proxy needs to do is to take the incoming stream and send it someplace else. It doesn't need to do anything with the data stream, so making it a web app would introduce a lot of overhead (like decoding the data stream, extracting all the parameters and headers etc. - steps the web app would essentially need to reverse in order to send out the request to the target server). Much easier (and faster) to send the complete request stream on to the target server without doing those steps.
The only thing the proxy server does need to do with the incoming requets is to extract the target URL, but that's contained in the first line of the incoming request, and can be extracted quite easily without using the Servlet API.
Be sure to read the HTTP specification; it has a lot to say about how proxies should work.
This seems to be a feasible solution. What I got is we can simply strip the header of th e request in the servlet and then we can set the new Header depending upon some parameters in the session and then forward it to the real Server.
Is that Right?
Joined: Mar 22, 2005
What I got is we can simply strip the header of the request in the servlet and then we can set the new Header depending upon some parameters in the session and then forward it to the real Server.
You seem to have missed the point that you should not be using a servlet for this.
Also, I'm not sure what parameters you're talking about, and there shouldn't be any session.
You shouldn't dig as deep into the request as to even be able to distinguish HTTP headers. The main thing the proxy needs to do is to replace the absolute URL in the first line of the request by a URL that's relative to the root.
But you really need to read the HTTP spec so that you know what's going on there.