File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Book Reviews and the fly likes Text Processing in Python by David Mertz Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Books » Book Reviews
Bookmark "Text Processing in Python by David Mertz" Watch "Text Processing in Python by David Mertz" New topic

Text Processing in Python by David Mertz

Book Review Team

Joined: Feb 15, 2002
Posts: 959
<pre>Author/s : David Mertz
Publisher : Addison-Wesley
Category : Other
Review by : Margarita Isayeva
Rating : 8 horseshoes
This book provides a thorough overview of techniques, standard and non-standard modules to perform various tasks that fall under "text processing" umbrella. An ideal reader should be already familiar with Python or experienced in other languages. For the latter category there is an Appendix with a short introduction into Python basics.
The text is evenly divided into five chapters, 70-100 pages each.
Chapter 1 starts with a discussion of functional programming and higher-order functions, followed by an overview of Python's features and data types important for text processing. Relevant (if even remotely) modules in the Standard library are listed, most important of them are illustrated with examples. Chapter 2 shows how standard Python functions, including the most important string module, can be used to solve problems (example: counting number of words in a given text). Chapter 3 offers a short introduction into Regular Expressions followed by several examples of Python programs, usually about a page long (one of the problems to solve: detecting duplicate words). Chapter 4 starts with a light introduction into parsing, grammars and state machines. The author advises on when to use them and when not, then proceeds to an overview of the standard library. Non-standard mx.TextTools, SimpleParse and PLY libraries are compared and their functionality described in more details. Chapter 5 is devoted to assorted tasks, from working with E-mail to parsing HTML and XML, and consists mostly of standard and third-party libraries overviews.
The overall approach is a bit conceptually-oriented, there are questions and problems to solve at the end of the chapters, as one segment of the book's target audience are students. Practitioners will appreciate this book as a solid reference on available Python text-processing tools.

More info at
More info at
I agree. Here's the link:
subject: Text Processing in Python by David Mertz
It's not a secret anymore!