This week's book giveaway is in the Cloud forum.
We're giving away four copies of Terraform in Action and have Scott Winkler on-line!
See this thread for details.
Win a copy of Terraform in Action this week in the Cloud forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

Recursion confusion...recursion confusion..rec

 
village idiot
Posts: 1208
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is from a Computer Science course offered for free at Udacity, so it may belong in another forum, but since it is written in Python, I put it here.
This quiz asks us to write a procedure that strips the html tags out of a string and return the plain text as a list. I decided to do it recursively and
stumbled across something I don't understand. If I run the code as is, the first 3 tests return None while the 4th returns the desired list. If I
uncomment line 10 and comment out line 11, all 4 tests return the desired list. The difference is whether remove_tags(string) is called from
line 11, or is returned by uncommenting line 10.  I have tried stepping through both versions of the code in the
debugger to see if I can picture exactly what the differences between the two are, but it is still not clear.

My thinking: Both ways update the variable 'string' in the same way, as shown by calling print string.split() inside the else block on line 14.
Since calling print string.split() produces the list I want in both versions, why isn't that list returned by the call on line 15 in both versions?
If string itself has been stripped of all tags by looping through the if block, why doesn't the last call to remove_tags(string) on line 11 simply
fall through to the else block and return the result of string.split() as a list? Why does return need to be called inside the if block for this to work?

I am missing some important concept in my understanding of recursive functions. Can anyone help me with this?


 
Saloon Keeper
Posts: 13427
302
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Carol,

After executing the last statement in the if-clause, it doesn't fall through to the else-clause. Execution continues with the next statement after the else-clause. Since you don't have any statements after the else-clause that belong to the remove_tags function, the function returns None.

Let's take a simple example: <p>Hello World!</p>

When you call remove_tags, it finds '<' so it proceeds with the if-clause. The string variable is set to "Hello World!</p>", and then remove_tags is called recursively on that string.

The second call finds '<', so once again it proceeds with the if-clause and sets the string variable to "Hello World!" and calls remove_tags recursively.

The third call doesn't find '<', so the else-clause is executed. It splits "Hello World!" into ["Hello", "World!"] and returns that list.

Now we're back on line 11 of the second function call. The list that was returned by the third call is discarded, and the function continues with the next statement. Since there is no next statement, the function returns None.

Now we're back on line 11 of the first function call. The None that was returned by the second call is discarded, and the function continues with the next statement. Since there is no next statement, the function returns None.

If you comment out line 11, and uncomment line 10, the second function call would have returned the list returned by the third call, and the first call would have returned the list returned by the second call.

Another way of achieving the same effect is removing one level of indentation from line 15. The return statement then doesn't belong to the else-clause, but it will also be executed after the if-clause finishes.
 
Carol Murphy
village idiot
Posts: 1208
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for that explanation! The "unwinding" of the recursive calls is a great stumbling block for my visualization of what is happening in this code. Bear with me here.
When I moved the return string.split() out of the else loop, the string it used was the one returned by the second call. Only the first tag was removed. The fact that

print string.split() and return string.split() don't reference the same object inside the else block still confuses me. Why isn't the string that was passed in with the
third call to remove_tags(string) returned after it was printed? I think I am misunderstanding what happens during the
return process itself. It seems to me that the last method call bypasses the if statement so the string passes as an argument should be split and returned without any
trouble. I am not seeing how the value that gets thrown out inside the if block affects what gets passed into the else block.

I feel I'm close, but I don't quite see it yet.
 
Stephan van Hulst
Saloon Keeper
Posts: 13427
302
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Carol Murphy wrote:When I moved the return string.split() out of the else loop, the string it used was the one returned by the second call. Only the first tag was removed. The fact that print string.split() and return string.split() don't reference the same object inside the else block still confuses me.


Sorry, I may have added to your confusion a bit here. I said that moving the return statement out of the else-clause would be equivalent to returning the results of the recursive call. This is not true.

I think you'll agree with me that this pseudo-code:
Is equivalent to this pseudo-code:

So instead of moving the return statement out of the else-clause, let's add it to the if-clause to easily see why it's not sufficient to return string.split(), but that we must return the recursive results.

In the first function call, we find that string has the value "Hello World!</p>" before we make the recursive call to remove_tags. The recursive call then returns ["Hello", "World!"], but we completely ignore its results, and just return string.split("Hello World!</p>"). Remember that the recursive call does NOT change the value of the string variable in the current call. Strings are immutable, and you pass the object that is held by string, not the string variable itself.

Clearly, it's pretty useless to call a function unless it has side-effects, or you do something with its return value. Since your remove_tags function doesn't have side effects (it doesn't affect data outside of its stack frame), you MUST do something with the return value.
 
Stephan van Hulst
Saloon Keeper
Posts: 13427
302
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think this is a classic case of getting confused by your own function names. Having appropriate function names is *especially* important in recursive functions.

When I read remove_tags(string), I expect it to remove tags from a string. That's not what your function does though (and it's also impossible, because strings are immutable). Given well-formed XML, it returns a list of all words that appear in text nodes. Here's how I would do it instead:

Disclaimer: I've never written Python in my life before, so I don't know if this works correctly.
 
Carol Murphy
village idiot
Posts: 1208
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for your response, Stephan.

Your solution makes sense. I'm still working on getting a mental image of how recursive function calls operate, and I sometimes
expect them to operate like iterative code, with which I am more comfortable.

Thanks for your help!

Cheers!
 
reply
    Bookmark Topic Watch Topic
  • New Topic