• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Proper Silence Detection - having problem

 
Ranch Hand
Posts: 387
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In my app, I listen on a voice modem line for when the other end's microphone actually starts sending data. The problem is, the line puts out occasional sounds/blips every now and then even when no one's connected to it.

Currently, my code checks the bytes received against a threshold. Since occasionaly blips occur, I thought I could check if it's above threshold twice in a row, but that happens even if they're not connected to the mic. I also thought I could check how many times it's gone over threshold, but that's no good either (as it sometimes occurs multiple times with no one on the line), nor is checking how far over threshold they've gone.

Help! I'm at a complete loss as to what to do. How do I properly check for "silence"? (i.e. know they're really able to speak/hear and it's not just blips on the line?)
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What API are you using here? Is there any documentation discussing these spurious signals?

I'm guessing that the bytes you receive represent some sort of signal amplitude. You say you're applying a threshold somehow. I guess there are at least three ways I can think to do this:
  • Ignore any signal that does not exceed threshold amplitude
  • Ignore any signal that does not exceed a threshold of elapsed time
  • Ignore any signal that does not exceed some combination of amplitude and elapsed time. the simplest such combination would be to simply add the absolute values of the bytes of the received signal. A more correct approach may be to add the squares of the amplitudes, as for most signals, energy is proportional to the square of amplitude.

  • Do any of those techniques sound like something potentially useful which you haven't yet tried?
    [ October 22, 2006: Message edited by: Jim Yingst ]
     
    Dan Bizman
    Ranch Hand
    Posts: 387
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Originally posted by Jim Yingst:
    A more correct approach may be to add the squares of the amplitudes, as for most signals, energy is proportional to the square of amplitude.



    I think that might have some potential to it! I saw some code sample online that was doing something similar. Can you give me a little more info on how that works/why? Or point me to a resource talking about that?

    Thanks!
     
    Dan Bizman
    Ranch Hand
    Posts: 387
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Unfortuately, it's never going to be an exact science so there's some guessing involved but using the square roots of the samples seems to be much more accurate! thanks!

    Now what I do is convert the byte array of data to a float array of samples and then check the average value of the square roots of those samples. I noticed that while there are still bumps they're less frequent and much less pronounced but more imporantly, now, when the caller actually picks up, it's consistently above a certain level (save for the very rare time). Previously, there was no consistent level that it was continually above. So now, I can at least fairly reliably check 3 or 4 times in a row of it being above a level and 9 times out of 10 it's at the right time.

    Why is it that the square roots are more accurate?

    Anyways, thanks again!
    [ October 22, 2006: Message edited by: Dan Bizman ]
     
    Jim Yingst
    Wanderer
    Posts: 18671
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    [Dan]: now, when the caller actually picks up, it's consistently above a certain level (save for the very rare time)

    OK, that's promising. But when you say "level", are you saying that all the bytes in stream exceed some threshold value? Or that they are ascillating about a mean of zero, but the amplitude exceeds some threshold? I realize I've inserted some unstated assumptions into the second scenario, but they're fairly common in reality, so I'm making some guesses. If in fact all the bytes are uniformly above some threashold value, then I confess I don't really know what sort of signal you're talking about. Maybe some more info on what API you're using would be useful.

    As for the business about the sum of the squares of the bytes being significant - well, for starters, that assumes the common case of analyzing a signal which has been normalized to an average of zero. That is, the numeric values should average to zero, oscillating from positive to negative in equal amounts of each (on average). If that's not the case, maybe the first thing you should be looking at it, what is the average value of the bytes? Maybe just computing the average from the last n seconds would be good. Once you know the average, then subtract it from each value to normalize it, and then start squaring the diffferences and adding them together. As a last step you take the square root of the sum. This is discussed in greater detail here.

    All of which doesn't really answer your questions very directly, but it's a complex subject, and I think at this point there are too many unknowns about what sort of signal you're dealing with - it's hard to say how much of this theory really applies.
     
    Dan Bizman
    Ranch Hand
    Posts: 387
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Argh! What was working beautifully yesterday, no longer is. I think there's more noise on the line today or something, but it's no longer consistent.

    Originally posted by Jim Yingst:
    But when you say "level", are you saying that all the bytes in stream exceed some threshold value? Or that they are ascillating about a mean of zero, but the amplitude exceeds some threshold?



    To tell you the truth, I'm not 100% certain. I've never worked with sound before so this is all a bit confusing. Here's what I'm doing:

    1. The TAPI system sends me 1 second worth or bytes

    (Note: MSFT says the format of waveIn's stream is always the same, so I use that default)

    2. I convert the bytes to float samples via Tritonus' FloatSampleBuffer.convertByteToFloat(...)

    3. Grab the average of: the sum of the square roots of the samples in that full 1 second



    I then check #3 (see above) against my threshold value. But I'm not sure I'm doing this all correctly. After #3, I'm getting values of 0.49f to 0.54f approximately. Is this correct? Am i doing something wrong?
     
    Jim Yingst
    Wanderer
    Posts: 18671
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    That looks reasonable. However it's not certain that the signal has been normalized to zero. For exacmple, you could have a flat signal with value around .49-.53, and that would give you this result. We really want to know how much the waves are moving up and down - the difference from the average. So it would be helpful to calculate the average of these signals too, so you can subtract it out:

    From statistics, what we're ultimately measuring here is the standard deviation of the signal - how big its variations are. Using the idea that

    sumOfSquares = squareOfSum + squareOfStandardDeviation

    or

    standardDeviation = sqrt( sumOfSquares - squareOfSum )

    It may also be worthwhile to just print out the value of sum / length, so you know what the average is. If it's consistently zero, or very close to it, then that makes things easier for you. But if it's nonzero, you probably need to pay attention to its value.

    I should also note that this may be more cumputation than is really necessary, as your earlier mention of checking to see if the signal values exceed some threshold is also a viable alternative, with much less multiplication along the way. To screen out the occasional random spike (a click of some sort?) you could measure how many times the threshold is exceeded in a given time frame. If the clicks are short in duration ( < .2 s perhaps?) while voice is not, that would be way to ignore clicks. It really depends on the sort of noise you have.
     
    Dan Bizman
    Ranch Hand
    Posts: 387
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Originally posted by Jim Yingst:



    When i use your code I get NaN. Perhaps the code's missing a Math.abs(..) call? Would it still be doing your original intention then?
     
    Jim Yingst
    Wanderer
    Posts: 18671
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Right, careless of me, missed a 1/length factor in one term:

    Various versions of this formula can be found under standard deviation.
    [ October 23, 2006: Message edited by: Jim Yingst ]
     
    Dan Bizman
    Ranch Hand
    Posts: 387
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Originally posted by Jim Yingst:
    Right, careless of me, missed a 1/length factor in one term:

    Various versions of this formula can be found under standard deviation.



    Thanks, that appears to work VERY well, but I'll need to try it on diff. days to see how it is with diff. in line noise.

    Just to make sure the parentheses are correct, the final return statement's doing this?



    Is that correct?
     
    Jim Yingst
    Wanderer
    Posts: 18671
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    No, I've corrected the parentheses in my code above. It's equivalent to:
     
    If you believe you can tell me what to think, I believe I can tell you where to go. Go read this tiny ad!
    a bit of art, as a gift, the permaculture playing cards
    https://gardener-gift.com
    reply
      Bookmark Topic Watch Topic
    • New Topic