aspose file tools*
The moose likes Java in General and the fly likes Proper Silence Detection - having problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Proper Silence Detection - having problem" Watch "Proper Silence Detection - having problem" New topic
Author

Proper Silence Detection - having problem

Dan Bizman
Ranch Hand

Joined: Feb 25, 2003
Posts: 387
In my app, I listen on a voice modem line for when the other end's microphone actually starts sending data. The problem is, the line puts out occasional sounds/blips every now and then even when no one's connected to it.

Currently, my code checks the bytes received against a threshold. Since occasionaly blips occur, I thought I could check if it's above threshold twice in a row, but that happens even if they're not connected to the mic. I also thought I could check how many times it's gone over threshold, but that's no good either (as it sometimes occurs multiple times with no one on the line), nor is checking how far over threshold they've gone.

Help! I'm at a complete loss as to what to do. How do I properly check for "silence"? (i.e. know they're really able to speak/hear and it's not just blips on the line?)
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
What API are you using here? Is there any documentation discussing these spurious signals?

I'm guessing that the bytes you receive represent some sort of signal amplitude. You say you're applying a threshold somehow. I guess there are at least three ways I can think to do this:
  • Ignore any signal that does not exceed threshold amplitude
  • Ignore any signal that does not exceed a threshold of elapsed time
  • Ignore any signal that does not exceed some combination of amplitude and elapsed time. the simplest such combination would be to simply add the absolute values of the bytes of the received signal. A more correct approach may be to add the squares of the amplitudes, as for most signals, energy is proportional to the square of amplitude.

  • Do any of those techniques sound like something potentially useful which you haven't yet tried?
    [ October 22, 2006: Message edited by: Jim Yingst ]

    "I'm not back." - Bill Harding, Twister
    Dan Bizman
    Ranch Hand

    Joined: Feb 25, 2003
    Posts: 387
    Originally posted by Jim Yingst:
    A more correct approach may be to add the squares of the amplitudes, as for most signals, energy is proportional to the square of amplitude.


    I think that might have some potential to it! I saw some code sample online that was doing something similar. Can you give me a little more info on how that works/why? Or point me to a resource talking about that?

    Thanks!
    Dan Bizman
    Ranch Hand

    Joined: Feb 25, 2003
    Posts: 387
    Unfortuately, it's never going to be an exact science so there's some guessing involved but using the square roots of the samples seems to be much more accurate! thanks!

    Now what I do is convert the byte array of data to a float array of samples and then check the average value of the square roots of those samples. I noticed that while there are still bumps they're less frequent and much less pronounced but more imporantly, now, when the caller actually picks up, it's consistently above a certain level (save for the very rare time). Previously, there was no consistent level that it was continually above. So now, I can at least fairly reliably check 3 or 4 times in a row of it being above a level and 9 times out of 10 it's at the right time.

    Why is it that the square roots are more accurate?

    Anyways, thanks again!
    [ October 22, 2006: Message edited by: Dan Bizman ]
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [Dan]: now, when the caller actually picks up, it's consistently above a certain level (save for the very rare time)

    OK, that's promising. But when you say "level", are you saying that all the bytes in stream exceed some threshold value? Or that they are ascillating about a mean of zero, but the amplitude exceeds some threshold? I realize I've inserted some unstated assumptions into the second scenario, but they're fairly common in reality, so I'm making some guesses. If in fact all the bytes are uniformly above some threashold value, then I confess I don't really know what sort of signal you're talking about. Maybe some more info on what API you're using would be useful.

    As for the business about the sum of the squares of the bytes being significant - well, for starters, that assumes the common case of analyzing a signal which has been normalized to an average of zero. That is, the numeric values should average to zero, oscillating from positive to negative in equal amounts of each (on average). If that's not the case, maybe the first thing you should be looking at it, what is the average value of the bytes? Maybe just computing the average from the last n seconds would be good. Once you know the average, then subtract it from each value to normalize it, and then start squaring the diffferences and adding them together. As a last step you take the square root of the sum. This is discussed in greater detail here.

    All of which doesn't really answer your questions very directly, but it's a complex subject, and I think at this point there are too many unknowns about what sort of signal you're dealing with - it's hard to say how much of this theory really applies.
    Dan Bizman
    Ranch Hand

    Joined: Feb 25, 2003
    Posts: 387
    Argh! What was working beautifully yesterday, no longer is. I think there's more noise on the line today or something, but it's no longer consistent.

    Originally posted by Jim Yingst:
    But when you say "level", are you saying that all the bytes in stream exceed some threshold value? Or that they are ascillating about a mean of zero, but the amplitude exceeds some threshold?


    To tell you the truth, I'm not 100% certain. I've never worked with sound before so this is all a bit confusing. Here's what I'm doing:

    1. The TAPI system sends me 1 second worth or bytes

    (Note: MSFT says the format of waveIn's stream is always the same, so I use that default)

    2. I convert the bytes to float samples via Tritonus' FloatSampleBuffer.convertByteToFloat(...)

    3. Grab the average of: the sum of the square roots of the samples in that full 1 second



    I then check #3 (see above) against my threshold value. But I'm not sure I'm doing this all correctly. After #3, I'm getting values of 0.49f to 0.54f approximately. Is this correct? Am i doing something wrong?
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    That looks reasonable. However it's not certain that the signal has been normalized to zero. For exacmple, you could have a flat signal with value around .49-.53, and that would give you this result. We really want to know how much the waves are moving up and down - the difference from the average. So it would be helpful to calculate the average of these signals too, so you can subtract it out:

    From statistics, what we're ultimately measuring here is the standard deviation of the signal - how big its variations are. Using the idea that

    sumOfSquares = squareOfSum + squareOfStandardDeviation

    or

    standardDeviation = sqrt( sumOfSquares - squareOfSum )

    It may also be worthwhile to just print out the value of sum / length, so you know what the average is. If it's consistently zero, or very close to it, then that makes things easier for you. But if it's nonzero, you probably need to pay attention to its value.

    I should also note that this may be more cumputation than is really necessary, as your earlier mention of checking to see if the signal values exceed some threshold is also a viable alternative, with much less multiplication along the way. To screen out the occasional random spike (a click of some sort?) you could measure how many times the threshold is exceeded in a given time frame. If the clicks are short in duration ( < .2 s perhaps?) while voice is not, that would be way to ignore clicks. It really depends on the sort of noise you have.
    Dan Bizman
    Ranch Hand

    Joined: Feb 25, 2003
    Posts: 387
    Originally posted by Jim Yingst:



    When i use your code I get NaN. Perhaps the code's missing a Math.abs(..) call? Would it still be doing your original intention then?
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Right, careless of me, missed a 1/length factor in one term:

    Various versions of this formula can be found under standard deviation.
    [ October 23, 2006: Message edited by: Jim Yingst ]
    Dan Bizman
    Ranch Hand

    Joined: Feb 25, 2003
    Posts: 387
    Originally posted by Jim Yingst:
    Right, careless of me, missed a 1/length factor in one term:

    Various versions of this formula can be found under standard deviation.


    Thanks, that appears to work VERY well, but I'll need to try it on diff. days to see how it is with diff. in line noise.

    Just to make sure the parentheses are correct, the final return statement's doing this?



    Is that correct?
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    No, I've corrected the parentheses in my code above. It's equivalent to:
     
     
    subject: Proper Silence Detection - having problem