File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Java text parsing and html parsing

 
vanio begic
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok few days ago I started creating my first real world application.And that application relies a lot on usage of proxies as source of anonymitiy.So I decided to add a proxy harvester and proxy checker witch would,by utilizing google,find fresh and working proxies every day.But after 4 hours of searcing the web for html and text parsing in java all I have learned is that there are stuff called DOM,CORBA,regex and StringTokenizer.But none of this helped since nowhere did it tell how to use them nor what they are/their purpose.

So my question is can anybody direct me towards a book/site/documnet/anything that would help me start understanding java text proccesing capabilites and classes that I can use.

Please note that my knowlegde on this subject is non-existent.I do not seek you to tell me advanced processes needed to parse the google result page as I plan it a chalange for myself.Thanks in advance.
 
Paul Clapham
Sheriff
Pie
Posts: 20208
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
CORBA? I can't imagine how you found something which said that CORBA was useful for processing text.

However, what you're asking is "I have these requirements, but I don't really know what they encompass because I'm a beginner, so can you show me a book which is all about them?" I suggest you refocus your search. Start with something specific which you don't know how to do, and ask about that.
 
Philip Thamaravelil
Ranch Hand
Posts: 99
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you don't require java, Perl is amazing at parsing text. It's regular expression and simplicity to write is great on flat files.
 
vanio begic
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:CORBA? I can't imagine how you found something which said that CORBA was useful for processing text.

However, what you're asking is "I have these requirements, but I don't really know what they encompass because I'm a beginner, so can you show me a book which is all about them?" I suggest you refocus your search. Start with something specific which you don't know how to do, and ask about that.



I actually know what I need to do.I need to open a google via proxy connection,make a search,get html of the page,parse it in a way that will get me the urls,open urls,find all the lines of txt that contain a proxy port such as :80,:8080...Then put those proxies in a text field,and on user request export them to a chosen file.

But since I like to learn I decided to learn more about text processing in java as I am sure it will come in handy sooner or later.But yes you are right I maybe need to refocus my search but I did now ask to point me to total beginner,I would just like to get a hightlight on few basic classes and I would be able to go on from there.


And for second answer this project has no specific programming language but since I started learning how to program like 2 months ago and I started with java I would very much like to procced with learning and mastering Java


So anyone else?
 
Maneesh Godbole
Saloon Keeper
Posts: 10976
11
Android Eclipse IDE Google Web Toolkit Java Mac Ubuntu
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
vanio begic wrote:
...I need to open a google via proxy connection,...Then put those proxies in a text field,and on user request export them to a chosen file.

Frankly speaking I dont understand what you mean by "open a google..". What text field? Does your application have a UI? In case you did not know, a text field can display one line of text, without any scroll bars. You might want to reconsider this using something else instead in case you have multiple "proxies"

Anyway, coming back to your other requirement of
...get html of the page,parse it..
something like html parser might be useful to you.
 
vanio begic
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maneesh Godbole wrote:
vanio begic wrote:
...I need to open a google via proxy connection,...Then put those proxies in a text field,and on user request export them to a chosen file.

Frankly speaking I dont understand what you mean by "open a google..". What text field? Does your application have a UI? In case you did not know, a text field can display one line of text, without any scroll bars. You might want to reconsider this using something else instead in case you have multiple "proxies"

Anyway, coming back to your other requirement of
...get html of the page,parse it..
something like html parser might be useful to you.



Yes yes I know txt field can write only one line,I mix it up alot with text area.

Frankly point here is I wana use core java as is without external libraries.But thanks for recomandation I will check it out.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic