This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
This is just a fairly general question to canvas people's experience or opinions. We are weighing up the pros and cons of how to model multi-lingual data in a relational database. The "clean" design we reckon is to have a translation table for every entity (so "person" would also have "person_tr"), each translation table would mirror the translatable attributes (i.e. just character data) and be differentiated by locale. So for the entry in "person":
we would have the following in "person_tr":
This makes the data easy to maintain, however we are concerned that we have tables that might have quite a high number of translatable entities associated with them, so searches by a translated value become significantly more complex. We've got a variety of exotic ways of mitigating this, but I'm curious if anyone has already dealt with this problem or can suggest a better design?
Oracle Apps (Oraccle eCommerce Suite) uses a similar design as the one you are suggesting.
To avoid that each report or form has to recreate the proper links between the core and translation tables, they supply a view that does that work for you.
The view uses a stored function in it's joins to look up the current logged on user's language, and will return the data in that language.
Why multi-language in the database at all? Technically, its a facet of the UI. The person is a "secretary", regardless of the language or how its spelled. In other words, it mixes data and presentation layers, to have the database concerned about supporting multiple languages. If possible, I would leave the multi-language aspect in the control of whatever is outputting the data and use common i18n properties files as needed.
Is there an advantage to having the database know about different languages?
The application of the client I work for has to support 2 languages: Dutch and French (because we live in Belgium ) So we have a similar issue like you, although in our case it's very limited. When we had to decide about the approach we would take we discussed 2 alternatives:
a) the one you mentioned
b) for each translatable attribute create 2 columns: one for dutch and one for french
We opted for the 1st alternative, because it's a lot easier to maintain (certainly if you have to add a 3rd language). So we have a language table with the possible languages and our person_tr table has columns person_id, language_id and a column for each translatable attribute.
Regarding searches it's not that complex, maybe also because we have built a framework ourselves which takes care of these things. Of course you have to join to the appropriate tables, but I would not consider joining as something complex.
Jan - views are a distinct possibility, unfortunately we have quite a significant support headache for that. We support multiple versions of multiple databases, so pushing the logic into the database is not a popular choice.
Scott/Mark - the reasons is we need end users to be able to set this stuff up on the fly. Though I'd agree with your logic (and its how we deal with all static text data), we can't really push this maintenance on to our customers. My secretary is maybe a bad example, a more common example of what we need is a business description associated with an entity - so several hundred lines of changing text explaining a thing. We need to support customers adding these as new languages are supported by them, and updating their content frequently (in some cases many times in one day).
Roel - its good to hear your experience. Our primary diver was Canadian customers, so a very similar requirement as you probably have in Belgium. Our concern with the complexity of join is that we have some big entities that join to a lot of other entities (we have an awful legacy data model). We already mitigate performance issues by denormalizing some of these; we'd loose that if we used translation tables. So a query currently performing five joins might jump to being a query performing twenty five joins, which we are worried will be a significant determiner of the performance of the application. Needless to say these big entities are by far the most frequently queried things in the database.
Thanks again for you input. Its good to hear we are having the same sort of thoughts as you. Now we just need to make a decision...