• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Tips to create ubb code filter

 
clojure forum advocate
Posts: 3479
Mac Objective C Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all.
I have developed a small web forum and now I'm seeking to provide the following item.
As you all know, forums provide the ability to use emoticons.
When retrieving the user's comment from the database, a special tool should filter his text and replaces it with HTML tags.
My question is : how to provide such a filter ?
I mean, what should I use ?
Examples :
[imagehere]http://images.com/jt.gif[/imagehere]
[boldhere]JT[/boldhere]
)))
the previous examples should produce :
<img> HTML tad, <b> HTML tag, and get the text between ubb code.
))) should be replaced with smiley pic.
How can I get the text between ubb code ? How can I delete ubb code ?
Assume that I have 8 ubb code and 12 emoticons, should I apply my patterns one by one ? What about the performance ? How JR perform this functionality ?
Any hints or tips ?
Thanks.
[ March 25, 2005: Message edited by: John Todd ]
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can use String.replaceAll() on each of the patterns, of course, but as you suspect, this isn't the most efficient way to do things. The fastest way to do it would be to build a finite state machine out of all the patterns, then feed each character in the String to the FSM; then you could make only one pass through. Off the top of my head, I can think of one instance where this approach is described in gread detail: in Brian Kernighan and Rob Pike's "The Practice of Programming", published by Addison Wesley a few years ago.

The best advice is always to try the easy way and see if it works well enough. If it's too slow, then you can try making it faster using the FSM approach.

Note that the original UBB (the software that runs this site) is written in rather convoluted Perl, not Java.
 
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ernest Friedman-Hill:
Note that the original UBB (the software that runs this site) is written in rather convoluted Perl, not Java.



if i were feeling less charitable, this is the point where i would ask you if you'd ever seen any other kind of Perl.
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
UBB doesn't look that different from Wiki markup to me. At Fitnesse.org you can download the Wiki source and see how Martin & Micah Fowler did it with regular expressions. I lifted the technique for my own Wiki.
 
M Beck
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
personally, i suspect i'd still do it with a state machine. i think i would have it pushing closing tags onto a stack as it replaced opening tags; that way, i'm fairly sure i could work out some scheme to ensure each tag was properly closed. doing that with regexes, i think, would be tricky.

it's not as if it'd be any slower, either. most regular expression parsers are implemented as state machines scanning their input anyway. regexes would likely be more compact, of course, and quicker and easier to develop - but one might have to be a true master of regular expressions in order to make it any more readable or maintainable.
 
Stan James
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm with you on using whatever makes most sense to you. Finite state parsing is pretty well understood and known to work, so I wouldn't criticize it.

I borrowed the RegEx technique from Fitnesse. It's pretty cool. You define a set of handler classes, each with a pattern. It concatenates all the patterns together with parens to make one capturing group per, and adds the handlers to a collection.

(pattern1)|(pattern2)|(pattern3)

When matcher finds a match on that giant mess it can get the matching group index to tell which pattern matched and get the handler by that index. I pass the matching fragment to the handler. The matcher seems to find the largest pattern it can, so many of the handlers call the matcher recursively for any nested patterns. I gotta read up on the theory there some day.

Anyhow with that you can define patterns like this:

opentag(.*)closetag

to get markup that has open & close elements.

My earlier code read a character at a time and parsed for tags, maintained state, etc. I haven't run fine-grained timings, but my perception is no difference in performance. Makes me think that even with RegEx, this is not the slow part of my Wiki.

Cheers!
 
I hired a bunch of ninjas. The fridge is empty, but I can't find them to tell them the mission.
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic