ETL stands for Extract ,Transform and Loading of Data .It is an technique to execute transformation and aggregation on data and load it into a table .Apache Hadoop is an platform for storing and analyzing large amount of data and provides different tool for aggregating data .In addition to that we case different tools like hive ,Spark ,Vertica to do ETL transformation on data .
Purvi Barot wrote:ETL stands for Extract ,Transform and Loading of Data .It is an technique to execute transformation and aggregation on data and load it into a table .Apache Hadoop is an platform for storing and analyzing large amount of data and provides different tool for aggregating data .In addition to that we case different tools like hive ,Spark ,Vertica to do ETL transformation on data .
Is the work done by hadoop map reduce "ETL on extremely large data sets " in other words ?
ETL is typically done using external utility programs such as Hitachi/Pentaho PDI or Talend. These are "Swiss Army Knife" utilities and are capable of transferring and/or transforming data sets in a variety of formats. including, but definitely not limited to CSV files, Excel spreadheets, XML files, generated data, database tables, NoSQL servers, remote data servers such as Amazon S3, email, ftp, web services and much more.
Technically, what ETL tools do is "processing and storage" - and fetching and they're very much tuned for massive data processing, but they don't hold data themselves (they store into things like databases) and they generally don't work well as a generic application framework.
I've used ETL to do things like pull tables once an hour (business hours) from a database, format them as CSV files and then upload them to a remote reporting server via FTP. I worked with a system that pulled gigabyte flat files down from a remote IBM mainframe, converted EBCDIC to ASCII, translated IBM's unique binary number formats to something more Java-friendly, and built Oracle Financials transactions. Generally speaking, anything involving large sets of data that was too awkward for shell scripts and simple utilities like awk and perl but not gnarly enough to demand a custom application program is a potential candidate for me to use ETL tools.
Bjoke: A "Bully Joke". A Statement or action made with malicious intent - unless challenged. At which point it magically transforms into "I was just funnin'" or "What's the matter, can't take a joke?"
Monica Shiralkar wrote:Would my original question in the thread have made sense had I said between ETL and Big Data (instead of Hadoop)?
Not really. Big Data is about problems arising from the handling of very large data sets, and the applications and approaches needed to handle those, whereas ETL concerns itself with data transfer in and out of DBs and data transformation, irrespective of data size. Certainly Big Data also needs ETL tools, but those two things are largely orthogonal.
ETL stands for extraction, transformation and loading. We will have or get the data from different source systems that is external sources, operational Systems. ETL has some tools name ETL tools. By this ETL tools we can change the data in a particular format which we want. Both Hadoop and ETL is used to move and transform data from many different sources and load it into various targets. complex ETL jobs are deployed and executed in a distributed manner due to programming and scripting frameworks on Hadoop. Hadoop is not an ETL tool but it acts like a helper. And when we want to use or work with an ETL tool we must likely use the Hadoop Map reduce and HDFS(Hadoop Distributed File System) because they are executed in a distributed manner. Hadoop helps the ETL tool to find the information and it do that by using the Map Reduce Technique .Hadoop is a best platform for ETL because it is considered an all purpose staging Area and landing Zone for big data. The ETL process feeds traditional warehouses directly
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop