File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes Java text parsing and html parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Java text parsing and html parsing" Watch "Java text parsing and html parsing" New topic
Author

Java text parsing and html parsing

vanio begic
Greenhorn

Joined: Aug 16, 2011
Posts: 5
Ok few days ago I started creating my first real world application.And that application relies a lot on usage of proxies as source of anonymitiy.So I decided to add a proxy harvester and proxy checker witch would,by utilizing google,find fresh and working proxies every day.But after 4 hours of searcing the web for html and text parsing in java all I have learned is that there are stuff called DOM,CORBA,regex and StringTokenizer.But none of this helped since nowhere did it tell how to use them nor what they are/their purpose.

So my question is can anybody direct me towards a book/site/documnet/anything that would help me start understanding java text proccesing capabilites and classes that I can use.

Please note that my knowlegde on this subject is non-existent.I do not seek you to tell me advanced processes needed to parse the google result page as I plan it a chalange for myself.Thanks in advance.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18991
    
    8

CORBA? I can't imagine how you found something which said that CORBA was useful for processing text.

However, what you're asking is "I have these requirements, but I don't really know what they encompass because I'm a beginner, so can you show me a book which is all about them?" I suggest you refocus your search. Start with something specific which you don't know how to do, and ask about that.
Philip Thamaravelil
Ranch Hand

Joined: Feb 09, 2006
Posts: 99
If you don't require java, Perl is amazing at parsing text. It's regular expression and simplicity to write is great on flat files.
vanio begic
Greenhorn

Joined: Aug 16, 2011
Posts: 5
Paul Clapham wrote:CORBA? I can't imagine how you found something which said that CORBA was useful for processing text.

However, what you're asking is "I have these requirements, but I don't really know what they encompass because I'm a beginner, so can you show me a book which is all about them?" I suggest you refocus your search. Start with something specific which you don't know how to do, and ask about that.



I actually know what I need to do.I need to open a google via proxy connection,make a search,get html of the page,parse it in a way that will get me the urls,open urls,find all the lines of txt that contain a proxy port such as :80,:8080...Then put those proxies in a text field,and on user request export them to a chosen file.

But since I like to learn I decided to learn more about text processing in java as I am sure it will come in handy sooner or later.But yes you are right I maybe need to refocus my search but I did now ask to point me to total beginner,I would just like to get a hightlight on few basic classes and I would be able to go on from there.


And for second answer this project has no specific programming language but since I started learning how to program like 2 months ago and I started with java I would very much like to procced with learning and mastering Java


So anyone else?
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10535
    
    9

vanio begic wrote:
...I need to open a google via proxy connection,...Then put those proxies in a text field,and on user request export them to a chosen file.

Frankly speaking I dont understand what you mean by "open a google..". What text field? Does your application have a UI? In case you did not know, a text field can display one line of text, without any scroll bars. You might want to reconsider this using something else instead in case you have multiple "proxies"

Anyway, coming back to your other requirement of
...get html of the page,parse it..
something like html parser might be useful to you.


[How to ask questions] [Donate a pint, save a life!] [Onff-turn it on!]
vanio begic
Greenhorn

Joined: Aug 16, 2011
Posts: 5
Maneesh Godbole wrote:
vanio begic wrote:
...I need to open a google via proxy connection,...Then put those proxies in a text field,and on user request export them to a chosen file.

Frankly speaking I dont understand what you mean by "open a google..". What text field? Does your application have a UI? In case you did not know, a text field can display one line of text, without any scroll bars. You might want to reconsider this using something else instead in case you have multiple "proxies"

Anyway, coming back to your other requirement of
...get html of the page,parse it..
something like html parser might be useful to you.



Yes yes I know txt field can write only one line,I mix it up alot with text area.

Frankly point here is I wana use core java as is without external libraries.But thanks for recomandation I will check it out.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Java text parsing and html parsing