Win a copy of Escape Velocity: Better Metrics for Agile Teams this week in the Agile and Other Processes forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Paul Clapham
  • Jeanne Boyarsky
Sheriffs:
  • Ron McLeod
  • Frank Carver
  • Junilu Lacar
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Al Hobbs
  • Carey Brown
Bartenders:
  • Piet Souris
  • Frits Walraven
  • fred rosenberger

Is it possible to make a datawarehouse system using Hadoop?

 
Ranch Hand
Posts: 701
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
 
author
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, but you need understand what you want from your data warehouse. If you need to ingest largely structured data and run SQL-type queries against it this is the core Hive use case and is probably what most people think when considering a DW and Hadoop.

If you need more custom analytics then its possible but at that point your integration with the front-end, user-facing querry tools will also become more specialised.

I also recommend thinking about what things Hadoop is good for and what workloads may make sense in a traditional DW. At my current company I've got both and I see it as case of complimentary and not necessary competing technologies.

In either case remember that Hadoop has a core trade-off of optimizing for throughput at the cost of latency. In a true DW type scenario this shouldn't be a consideration as almost by definition no one expects DW queries to have sub-second response times.

What is really nice is that with Hadoop you can have the best of both worlds. Use Hive for more traditional heavy analytic queries but then perhaps HBase for more user-facing lighter queries. And to look ahead a little both Cloudera's Impala and Apache Drill will offer a much lower-latency SQL interface to data residing on HDFS so the fusion will become greater.

Garry
 
straws are for suckers. tiny ads are for attractive people.
The trailboss has a kickstarter
https://coderanch.com/t/754577/Garden-Master-kickstarter
reply
    Bookmark Topic Watch Topic
  • New Topic