• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

When ETL is done using Hive where does the transform part happen?

 
Ranch Hand
Posts: 2949
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ETL means Extract Transform Load. Hive is a tool from Hadoop ecosystem for analysis using ad hoc queries. For this one has to move data to default hive data warehouse directory and then when hive table will be created , one can run ad hoc queries for analysis. ETL involves the transform part too. So where does this happen in case of ETL using hive? Does that instead happen in cases hive is used for requirements other than Ad hoc analysis which is the primary use of it?
thanks
 
Ranch Hand
Posts: 32
3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is no magic here.  Hive jobs run on Hadoop and are executed in the same way as other Hadoop jobs.

  • When you run a process on Hadoop, where is the process executed?  (clue: it's a distributed processing platform)
  • What does Hive do?  (clue: translates SQL queries into Hadoop jobs)
  • Where do the jobs execute?  (see above)


  • Meanwhile, ETL is Extract (read), Transform (process), Load (write), so figure out where each of these operations would happen.  

    If you're working with Hive tables, then presumably the data will be read from/written to your Hive directories.

    Your Hive processing will almost certainly involve several shuffles, and when you write the data to your target table, it will need to be moved again, so you have to assume there will be a lot of data moving around the cluster at certain stages of your process.  

    You can run an EXPLAIN for your Hive queries.

    Like I say, no magic, and no free lunches.
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Thanks.

    Christopher Webster wrote:

    If you're working with Hive tables, then presumably the data will be read from/written to your Hive directories.

    Your Hive processing will almost certainly involve several shuffles, and when you write the data to your target table, it will need to be moved again, so you have to assume there will be a lot of data moving around the cluster at certain stages of your process.  



    But when we read data from hive directories and when data is written to target tables, there is no transformation of data. That is exactly what I am trying to understand.
     
    Sheriff
    Posts: 28322
    95
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    You seem to have the idea that the input has to be changed in some way to qualify as "transforming". That's the same as asking why adding zero to a number is really addition.
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:You seem to have the idea that the input has to be changed in some way to qualify as "transforming".



    Thanks. Yes, exactly that was my doubt. What else may transformation mean (except for modifying of input)?
     
    Paul Clapham
    Sheriff
    Posts: 28322
    95
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    And then I pointed out to you (not very clearly) that there exists a "null transformation" which does nothing to the input. Would you tell people that they aren't allowed to use the null transformation? Would you write code to throw an exception if the output was the same as the input? Perhaps your Java compiler would reject a method with no code in the body?

    Doing nothing should be an option.
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:
    Doing nothing should be an option.



    Yes, but then why not call it something else than ETL (Extract Transform Load).

    When I read about Hive doing ETL it confuses me because I have worked on Hive but do not know about ETL.
     
    Paul Clapham
    Sheriff
    Posts: 28322
    95
    Eclipse IDE Firefox Browser MySQL Database
    • Likes 2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Monica Shiralkar wrote:Yes, but then why not call it something else than ETL (Extract Transform Load).



    Because usually it does a non-trivial transform. Occasionally it may do a null transform. There's no need to think of a special word for that when "Transform" describes it perfectly well. People in the computer world are quite comfortable with the idea of operations which might sometimes not do anything anyway.
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I am still trying to understand it.  I mean Aeroplanes do flying and we do not call a bus as aeroplane saying it does "NULL Flying".
     
    Paul Clapham
    Sheriff
    Posts: 28322
    95
    Eclipse IDE Firefox Browser MySQL Database
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Well, sorry. This is a standard and uncontroversial way to describe things in the computer business, so if you can't understand it you might as well just declare defeat.
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I think the clue for me lies in

    Paul Clapham wrote:

    Because usually it does a non-trivial transform.



    And


    might sometimes not do anything anyway.



    I need to understand what does non trivial transform be like here and I need to understand when it "might sometimes not do anything " then what does it do the other times .

     
    Sheriff
    Posts: 4639
    582
    VSCode Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:Because usually it does a non-trivial transform. Occasionally it may do a null transform. There's no need to think of a special word for that when "Transform" describes it perfectly well. People in the computer world are quite comfortable with the idea of operations which might sometimes not do anything anyway.

    Monica Shiralkar wrote:I am still trying to understand it.  I mean Aeroplanes do flying and we do not call a bus as aeroplane saying it does "NULL Flying".


    Java supports the concept of an Identity Function where the value returned is the same as the value passed in to the function.  Does that mean that it not a real function?
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Ron McLeod wrote:

    Paul Clapham wrote:
    Java supports the concept of an Identity Function where the value returned is the same as the value passed in to the function.  Does that mean that it not a real function?



    Thanks. Now, I am first trying to understand that why is Identify Function used. I mean why would we use a function which returns exactly what it receives.

     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Ron McLeod wrote:
    Java supports the concept of an Identity Function where the value returned is the same as the value passed in to the function.  Does that mean that it not a real function?

     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Ron McLeod wrote:
    Java supports the concept of an Identity Function where the value returned is the same as the value passed in to the function.  Does that mean that it not a real function?



    Thanks. Now I am first trying to (still) understand that why is Identify function used. I mean why would one use a function which would return exactly the same output as the input it receives.
     
    Paul Clapham
    Sheriff
    Posts: 28322
    95
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Monica Shiralkar wrote:Thanks. Now I am first trying to (still) understand that why is Identify function used. I mean why would one use a function which would return exactly the same output as the input it receives.



    What for? If people can only use things which you personally understand, then the work of programmers is going to come to a standstill. Instead, perhaps you should consider that people can write code which you don't understand, and accept it as valid. Then you will be able to move on.
     
    Monica Shiralkar
    Ranch Hand
    Posts: 2949
    13
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:

    Monica Shiralkar wrote:Thanks. Now I am first trying to (still) understand that why is Identify function used. I mean why would one use a function which would return exactly the same output as the input it receives.



    If people can only use things which you personally understand, then the work of programmers is going to come to a standstill .



    Yes, absolutely.

    consider that people can write code which you don't understand, and accept it as valid. Then you will be able to move on.



    Yes, ofcourse, I do not have any doubt regarding it being valid. But I can move on after making an  attempt to understand the purpose of that for my learning.
     
    moose poop looks like football shaped elk poop. About the size of this tiny ad:
    Gift giving made easy with the permaculture playing cards
    https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
    reply
      Bookmark Topic Watch Topic
    • New Topic