• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

use of LIKE

 
Ranch Hand
Posts: 163
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok this is nothing related to JDBC but more a database design q.

I have say, a 'news' table with 'sites' as one if its column,

a news item can be assigned to more than one site and so 'sites' column contains a 'list' of all site ids that this news item is assigned to.

so whenever there is a query to get all itesm related to a site, we need to do a 'like' and use regexp such as like 'sid,%' or '%,sid,%' or '%,sid' etc.


I know this is a bad design, that the news-site association should be stored in a different table etc etc... but that implies all 'getnews' queries then will have to be 'join's between news and site tables.

And if there are any such dependencies, say another field called 'categories' then teh ideal design would be to add another 'categories' table in the join to get news items for a site and a category...

And so on...

Can anybody convince me that NO MATTER WHAT, storing the ids as a list in a single column and using 'like' for searches is ALWAYS a bad idea and always a performance hit compared to the multiple table joins solution???
 
author
Posts: 4335
39
jQuery Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Easily.

Since string searching has to be done on every record in the table, there's no way to optimize this. In every query you could be searching through huge amounts of data which may be spread across a large drive.

The correct solution is to create a table with two columns, (nid,sid), where records in the table may have multiple news ids but (nid,sid) are unique primary keys. Then you can build a tree index on sid and retrieve results extremely fast (logn).

The thing you're forgetting is that more complex data structures can be managed with intelligent uses of indexes where as overly simplistic, denormalized data, is extremely limited. The database can't build an index to intelligently search through a field with LIKE-style syntax.
 
Scott Selikoff
author
Posts: 4335
39
jQuery Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
BTW a second solution, although it'd practically be an anti-pattern since maintenance would be horrible, is if you know there is a limit on the number of pairs such as 5, you could add 5 columns in the original table instead of a list. Its better than a list is than it has faster performance but you now have to build index on 5 different fields, and as I said, maintenance would be hell.
 
Brian Percival
Ranch Hand
Posts: 163
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks.. just wanted a confirmation, that no matter how bulky joins might get, it is always to go ..
 
Ranch Hand
Posts: 338
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Brian Percival:
Thanks.. just wanted a confirmation, that no matter how bulky joins might get, it is always to go ..



Joins are not a bad thing.

A query (full of joins) against a well-designed star schema will always out perform a normalized data structure representing the same data.
 
Ranch Hand
Posts: 425
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Performance comes into consideration only if you know how much volume of data is stored in the table. I am not in favor of creating multiple tables and have joins - DB's are getting faster and storage is cheap. But using LIKE will definitely result in (a) FTS (Full table scan) or (b) Full Index Scan if table is IOT.

By the way you didn't say which database you are using. If you happen to use Oracle then you can look at InterMedia/Text Indexes to improve the performances and avoid FTS. It offers additional features as well. Using Text index depends on how many inserts you anticipate in a second.
 
Paul Campbell
Ranch Hand
Posts: 338
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Purushothaman Thambu:
Performance comes into consideration only if you know how much volume of data is stored in the table. I am not in favor of creating multiple tables and have joins - DB's are getting faster and storage is cheap. But using LIKE will definitely result in (a) FTS (Full table scan) or (b) Full Index Scan if table is IOT.

By the way you didn't say which database you are using. If you happen to use Oracle then you can look at InterMedia/Text Indexes to improve the performances and avoid FTS. It offers additional features as well. Using Text index depends on how many inserts you anticipate in a second.



I fail to understand the resistance to utilizing joins. What is the basis for the point of view... it isn't something resisted in the BI or data warehousing world. Is this something that causes a problem for JDBC?
 
Purushoth Thambu
Ranch Hand
Posts: 425
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Didn't mean that joins are bad. But we can void them if possible. I don't see why I should be creating multiple tables ,normalize, index , fk constraint, non mergeable views and stuff if join doesn't promise huge performance/scalability improvement.
 
Paul Campbell
Ranch Hand
Posts: 338
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Purushothaman Thambu:
Didn't mean that joins are bad. But we can void them if possible. I don't see why I should be creating multiple tables ,normalize, index , fk constraint, non mergeable views and stuff if join doesn't promise huge performance/scalability improvement.



That's the point. Your query and performance in presenting the query results is completely dependent on the design of your data base. The payoff of doing the ground work now... and honestly... how hard is it to write multiple create DDL statments... though in an ideal world it would be generated from your logical data model.
 
I have a knack for fixing things like this ... um ... sorry ... here is a consilitory tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic