• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Quartz Scheduler - CPU Intesive/DB Intensive

 
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rafael,

It seems as though 2.1.5 brought on more processor spikes that 2.1.4 didn't have. When looking at my cpu, it seems that there is a process or thread that is taking constant cpu as if stuck in a loop. My web server is at a constant 40-50%. The only configuration settings that could have something to do with it (that I can tell) are listed below:

background.tasks = true

# the period in milliseconds the config files are watched for changes
# set it to 0 (zero) to disable it completely
file.changes.delay = 10000

These do not seem out of the ordinary so I am wondering if there is something that is running in constant loops without a fraction of a second sleep statement to break up the cpu usage.

Any ideas?

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey Rafael,

Yes, I am using quartz. I am going to try something...

I am guessing that the code is stuck in GenerSearchIndexerDAO. How about if I put a Thread.sleep(10) statement within that inner while loop



The pauses will be so quick that it shouldn't take much away from the code's execution time and it should bring the cpu back to normal.

Do you think this will work or do you think it is getting stuck somewhere within the quartz jar?

By the way...I forgot to mention that the this problem only occurs in the morning. I am trying to locate the scheduled default time for the quarts indexer to see if that is truly the issue, but right now, I assume it is.

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So, are you using quartz to index the messages, right?!.. Well, take a look in the file WEB-INF/config/quartz-jforum.properties

One thing is fact: message indexation is very very cpu and hard disk intensive.

You can try Thread.sleep().. now, anything is valid until we find the source of the problem

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is Quartz running, for scheduled jobs, but it's unlikely that this lib is causing the problem.

Anyway, open your JForumBaseServet.java (net/jforum), and remove this line:



depending the CVS version you have, you will also need to remove this one:



and see if things gets better.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I do believe the quartz indexer to be the problem. My cpu went to 100% (dual procesoor) and stayed there for 2 hours. I went in to fix the situation by setting it back to the default indexer. I shut down the web server causing the cpu problems and then restarted it. After doing this, everyone making posts were getting errors which were coming from GernericPostDAO's addNew() method. I commented out the SearchFacade.index(post) line and then it worked again. Basically, right now, if I leave that line in, nobody can make any posts. I haven't traced it any further yet.

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, I'll test it here.

Btw, scheduled indexation uses system resouces once a day (so it's better to schedule it after midnight, or something), while indexation on post time takes some resources every time a new message is posted.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you have the error raised? I have the key "search.indexer.implementation" set to "net.jforum.util.search.simple.SimpleSearchManager" and it didn't thrown any exception..

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rafael,

Why is it that the code is trying to index posts when usually we rely on the database to do such a thing. Since I haven't look further into the code, I don't quite understand what is being done behind the scenes.

Maybe I am not understand exactly what is being indexed. Is it the cache that is being indexed, and if so, what does that have to do with a search?

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The message is broken and indexed. In other words, JForum will get the message contents, split it by space, and insert each word as a single record on the database, and then associate the generated word id to the post id. Of course that when some word already exists, only the association part is done.

This way the search is much much performance, since we dont' use any LIKE queries. (and yes, this performs better than full text search from mysql / postgresql)

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Gotcha.

So what you are saying is that I need to uncomment the line I currently have commented (SearchFacade.index(post)) and get it back into production.

I will try and debug the issue I am having in terms of the posts failing. I will also turn quartz search indexing back on and put a sleep statement into the code.

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, I changed the topic title so that it makes more sense to the direction this thread is going.

Posts were failing because I commented


That line was need to init the SystemGlobals file for the indexer (since I had indexing turned on still).

Ok, no problem.

Now, I have added two sleep statements...one to file GenericScheduledSearchIndexerDAO and another to GenericSearchIndexerDAO (just to cover all bases).

My question is, in the scenerio of multiple forum/web servers, if quartz is enabled, will this work ok still? It should because it's only indexing based on what has been posted to each individual server (meaning no overlapping or duplicate data)?

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
After further analysis of the situation on indexing and how posts are made, I would like to help others understand the what is happening.

With indexing turned on, you have two choices...

1) Default indexing where each post made gets broken up by word and inserted to the database.

2) Scheduled quartz indexing. This method is the same except for the fact that the words of each post are inserted to the db at a scheduled time.

Watching the intensity of both methods, I have to say that I am not conviced that either is a good method for speeding up searches. The reason being has nothing to do with the speed of a search as it does with the downfall or cost associated with inserting/deleting the indexed keywords. I believe there to be more db/cpu intensity with inserting/deleting all the keywords of a post (especially on a busy system) than doing a full text index using only the db. I would recommend that people using the forums on a very light trafficed site go ahead and turn on indexing using the systemglobals property file. People using the forums on a heavily trafficed site shouldn't not use indexing as it will bring your system down.

These are only my recommendations/opinions after testing both methods in a live production/ heavily trafficed environment.

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rafael,

As merely a suggestion, I think the following two queries will still be sufficient in terms of data:

1) Full text search on subject

SELECT ? From jforum_posts_text Where post_subject REGEXP ?

2) Search by Username

You could even give users the ability to turn on a full text search by posts, but that could be a configurable option.

This would greatly improve the performance of jforum and give it less complication as simplicity is always a good thing....just a thought.

Brakker

[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Full text search is database dependent, which implies in complex code to maintain and test. I'm not discarding the possibility of use, but it is more likely that we go with Lucene (http://jakarta.apache.org/lucene) than full text.

I will take a look in how phpbb / <put other forum systems here> do this job. Currently I'm doing4 or 5 SQL statements *per* word, which is not good.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I totally understand as you have looked into this stuff way more than I have.

I will most likely change the search queries to be talored to our specific needs. Robust code is great for users who want plug and play, but for us, speed is the name of the game.

I currently have 55,000 records in the jforum_posts_text table and without converting to a full text index, I did a few queries just to test the results.

Here's the benchmarks:




Now selecting only from the subjects





Basically, the search would do fine even without full text indexing until, of course, it grows too big.

[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I have a forum with 153,289 messages (jforum_posts) and 400.000 entries in jforum_search_words right now. The users table is ~8000 entries long.

For large systems, I believe that the approach used by jforum is very good - however, I'm not saying that my actual implementation is "good".

The search itself is not a problem - it may be 100ms faster or slower most of the time, and that will not make much difference.
The problem sits down on message spliting and storing.

However, as I can understand from your last message, REGEXP will perform good enough but without the performace impact to having to index the messages first.

If this proves to be true, I could consider making the search classes more "plugable", so you - or anybody else - can write code to handle the search.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I made some changes to the indexer, and get it working.. well.. faster.. so much faster.. incredibly faster..

There are some issues to fix. I hope I can get it finished until next weekend.

Rafael
[originally posted on jforum.net by Rafael Steil]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What are you changing?

I made a change to the query to search by subject only using the regexp without a full text index. It's not in production, but will be tomorrow morning. It's fast enough and keeps things simple.

Let me know what you are working on when you have time.

Thanks,

Brakker
[originally posted on jforum.net by coolbreeze]
 
Migrated From Jforum.net
Ranch Hand
Posts: 17424
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

brakker wrote:What are you changing?

I made a change to the query to search by subject only using the regexp without a full text index. It's not in production, but will be tomorrow morning. It's fast enough and keeps things simple.

Let me know what you are working on when you have time.

Thanks,

Brakker



You can show us your changes? Wll, I thinking in a new way to store the search index.
[originally posted on jforum.net by diego_sl]
 
Consider Paul's rocket mass heater.
    Bookmark Topic Watch Topic
  • New Topic