aspose file tools*
The moose likes Java in General and the fly likes Recognize dial tone in audio file(FFT) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of JavaScript Promises Essentials this week in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Recognize dial tone in audio file(FFT)" Watch "Recognize dial tone in audio file(FFT)" New topic
Author

Recognize dial tone in audio file(FFT)

Alberto Geniola
Greenhorn

Joined: Dec 23, 2012
Posts: 7
Hi all,

I have no knowledge at all about java sound package. What I need to do is to implement a function that is able to load audio from microphone or audio file, analize it and then understand if there is a dial tone into the stream. In my country (Italy) the dial tone is a sound wave at 425 Hz, not continuos (200 ms tone, 200 ms silence, 600 ms tone, 1000 ms pause, and loop).

My idea is that: start microphone, record for 2000 ms, then load recorder data, run a FFT and search for that frequency, returning true or false.

However, I don't know how to code that stuff: I've recorded file in a wav file, then loaded in a byte[] array. Now I don't know how to run FFT on those data and how to scan FFT result for searching this tone.

Can anyone help me?

Thank you!
Joe Areeda
Ranch Hand

Joined: Apr 15, 2011
Posts: 318
    
    2

Hi Alberto,

I'm not sure exactly what the problem is. Your approach sounds reasonable.

Are you asking how to call the FFT function, which variant to use, how to interpret the results?

The FFT is a fairly processor intensive operation O(n log n) so choosing sample rate and length of the FFT is significant.

I really don't know your application but to be looking only for those particular frequencies and expecting a fairly quiet background you're probably fine with a fairly low sampling rate of 2 or 3 KHz.

The FFT in general likes power of 2 sized chunks. Most implementations demand it. The longer the fft the better the frequency resolution especially in the low frequencies but it's harder to discern short sounds.

The output of the FFT is a bit strange but not bad once you figure out how to interpret it.

Perhaps if you give us more detail on what you've done so far where you're having difficulties we can be more specific. Which FFT package are you using and how are you calling it.

Joe

It's not what your program can do, it's what your users do with the program.
Joe Areeda
Ranch Hand

Joined: Apr 15, 2011
Posts: 318
    
    2

One more thing, if you're stuck with the FFT basics, I did a google search for "Java FFT tutorial" and got lots of hits.

One of the early ones is http://www.developer.com/java/other/article.php/3457251/Fun-with-Java-Understanding-the-Fast-Fourier-Transform-FFT-Algorithm.htm which might be helpful
Alberto Geniola
Greenhorn

Joined: Dec 23, 2012
Posts: 7
Hi Joe and thanks for your answer.

So, my application is basically something like a line listener to get data about phone costs: a voice modem has the speaker output connected to the line-in of the audio card and the java program must listen to it periodically. When listening to the line, it is needed to understand if the line is free, so it is necessary to intercept the 425Hz dial tone that in Italy is not continuos: 0.2 s LOUD - 0.2s SILENCE - 0.6s LOUD - 1s SILENCE and loops. So, it needs to open the audio stream, record a 2s audio file, apply a Discrete Fourier Tansform to the sampled data, then finally return a TRUE or FALSE if the line is considered free or busy. Someone on the web has pointed me to the Goertzel Filter tha is able to scan only small frequencies and is more efficient. It is good to detect the dial tone but it doesn't give any other spectrum information; it could be very good.

So my questions are:
1) Record the audio from mic, save and load or do it live using the audio stream from mic?
2) Which FFT library to use ?
3) How to uderstand if the dial tone is detected by the result obtained from DFT?

Thanks a lot for your help!
Alberto Geniola
Greenhorn

Joined: Dec 23, 2012
Posts: 7
Another thing: needing only to detect a 425 Hz frequency, which sample rate to use? You say 2-3 KHz, so basically I have to apply a low-pass filter to the signal under 2 KHz, this way the DFT or Goertzel Filter should be more efficient and fast. In this case: how to apply that filter?
Joe Areeda
Ranch Hand

Joined: Apr 15, 2011
Posts: 318
    
    2

Hi Alberto,

First I want to make clear that I am not a telephony expert by any means and while I do a fair amount signal processing and analysis in my current job I am unfamiliar how the phone people do things.

Recognizing dial tones (and the DTMF tones) is a very simple DSP problem, but still a significant programming task.

I would approach this problem experimentally working from very simple algorithms to understand what the signal looks like and what is the minimum it takes to recognize the pattern you want. Then more sophisticated techniques (read complicated) to increase accuracy.

So here's a few thoughts to get you started

I've been using JTransform (https://sites.google.com/site/piotrwendykier/software/jtransforms) to do FFTs in Java and have been very pleased. If you google there are tutorials available.

Variations of the FFT such as the DCT and Goertzel Filter are optimizations you may need to reduce the load on the CPU but I don't think they will simplify the problem very much.

As I see it you have two issues, one identify the on-off pattern of the dial tone, and two verify the tone is in the proper frequency range. I suggest you not assume whatever transform you use is going to return a clean spike at exactly 425 Hz.

The the low pass filter you asked about is an anti-aliasing technique that I doubt you'll need to identify such a simple, high SNR pattern. In theory you could identify 425 Hz with a sample rate above 850 Hz by going to 2-3KHz you are reducing the aliasing and the high SNR (signal to noise ratio) of a dial tone is going to make it a fairly minor issue anyway.

While the pattern you are looking for is exactly 2s long, I would suggest acquiring a bit more data in case you hit exactly on the beginning.

If you could attach a wmv or mp3 file of a dial tone, a dead line and someone talking (say 4s each) I can easily make some plots of what you can expect to see. If you do that please acquire them at 8KHz which is standard for low quality telephony so I can show the effect of lowering the sample rate.

Joe
Alberto Geniola
Greenhorn

Joined: Dec 23, 2012
Posts: 7
Ok, it seems I really can't get it...


I've been using a Goertzel algorithm (found on web) for searching for frequencies into a wav file. When running on a 2 seconds recorder wav 44100 Hz contanining a sound wave of 3000 Hz, the output of the script is


When using a 44100 Hz file contaning a 500 Hz file, the output is



Here's my attempt:



This is the test function


Other class functions



What I'm mising?
Joe Areeda
Ranch Hand

Joined: Apr 15, 2011
Posts: 318
    
    2

You have a bit too much code there for me to digest with the time available. So allow me to make general comments rather than debug it.

What are the arguments to that Goertzel function you are using?

Please note that running that on 5000 frequencies is not going to be more efficient or effective than doing an FFT, but it's not going to hurt the algorithm either.

I'm not sure but I think what you're missing is the relationship between the Time and Frequency Domain.

If you take a Fourier Transform over you're whole input signal you get one set of frequency/phase values that it takes to recreate the whole waveform. The On/Off nature of your dial tone will require a lot of power in frequencies not in the tones to capture that.

I would suggest you work with multiple transforms of short segments of data something on the order of 0.1 s is probably good enough. Then when you run the transform (whatever variant) you'll get some periods that are a pure tone of 425Hz. And the segment number will allow you to relate it to the time at which it occurred.

It would be much easier for me to explain this if I had a .wav file of the signal you're working with. If it's not too much trouble please attach one to your response. I'll see if I can come up with an example from something I have lying around.

Joe
Joe Areeda
Ranch Hand

Joined: Apr 15, 2011
Posts: 318
    
    2

I just reread my previous post and I want to clarify one point.

If you capture at 44KHz you're going to have very good frequency resolution but still 425Hz is not going to be an exact spike in the Frequency Domain so you will get power is some other frequencies as well but it will be very close.

Joe
Alberto Geniola
Greenhorn

Joined: Dec 23, 2012
Posts: 7
Joe Areeda wrote:You have a bit too much code there for me to digest with the time available. So allow me to make general comments rather than debug it.

What are the arguments to that Goertzel function you are using?

Please note that running that on 5000 frequencies is not going to be more efficient or effective than doing an FFT, but it's not going to hurt the algorithm either.

I'm not sure but I think what you're missing is the relationship between the Time and Frequency Domain.

If you take a Fourier Transform over you're whole input signal you get one set of frequency/phase values that it takes to recreate the whole waveform. The On/Off nature of your dial tone will require a lot of power in frequencies not in the tones to capture that.

I would suggest you work with multiple transforms of short segments of data something on the order of 0.1 s is probably good enough. Then when you run the transform (whatever variant) you'll get some periods that are a pure tone of 425Hz. And the segment number will allow you to relate it to the time at which it occurred.

It would be much easier for me to explain this if I had a .wav file of the signal you're working with. If it's not too much trouble please attach one to your response. I'll see if I can come up with an example from something I have lying around.

Joe


First of all let me thank you for all your support, it is very appreciated. I'm not familiar with this stuff, I've just little knowledge about it and I'm doing my best in order to solve this problem.
Attached to this post there is the dial tone sound wave, sampled at 8KHz with 16 bit samples. I don't have now the opportunity to register a "conversation" but this file attached has been registered in the same way as the program will do.

About the argouments:
Goertzel test = new Goertzel(44100,(float)k,256,false);

takes the sample rate, the frequency to scan for, the dimension of the inner buffer array, a boolean for "in debugging mode".
I was running it on 5000 frequencies in order to see if the MAX amplitus is returned only on the correct audio file; in other words I'm testing this algorithm: testing on a file that contains a 500 Hz continuos soundwave, I expect to get higher results on the 500 Hz frequency, but I get 435!! Same for a file that contains a sound wave of 1000 Hz: I get about 1990 Hz.
At the moment I do not care about the On/off feature of my signal, I'm using contnuos signals in test files to learn how does it work, I should have said it before, sorry.
I get your idea of sampling a 0.1s file (easier and faster), however shouldn't it work using a contnuos 50Hz wave-form of 1 s?

Thank you for all your interest again!


P.S. Cannot attach directly a WAV file, here's the link: dial_tone_8000_16bit.wav (will be available for a week)
Alberto Geniola
Greenhorn

Joined: Dec 23, 2012
Posts: 7
Joe Areeda wrote:I just reread my previous post and I want to clarify one point.

If you capture at 44KHz you're going to have very good frequency resolution but still 425Hz is not going to be an exact spike in the Frequency Domain so you will get power is some other frequencies as well but it will be very close.

Joe

Correct, but I don't expect to have power 1000 Hz far from my target frequency... I'm not sure, but I think there is a bug in the code and I don't know where to look for...
Joe Areeda
Ranch Hand

Joined: Apr 15, 2011
Posts: 318
    
    2

I've never put images into this forum so you may see this post while I work on it. Give me a few minutes.

The data you posted is a bit over 4 seconds long, while my program needs an integer number of seconds (long story) so it's padded with zeros.

The first image (dt-ts) is a time series of what the digitized data looks like.

If I take an FFT of all 4 seconds we get the second plot (dt-fft-1.png ) Note that is on a log-log scale with the major peak at 426 Hz with smaller peaks at 200 and 100 Hz. So yes I think there is a bug in determining frequencies. Notice the power in other frequencies a lot of that has to do with the on/off nature.

The 3rd image takes multiple ffts at .1 seconds and averages them together. That peak is at 430 Hz. The imprecise frequency has to do with the shorter ffts but the averaging over 40 transforms helps alot with noise.

There's a lot of details to producing these plots and most of it is irrelevant to the discussion but I'm happy to answer any questions.

Joe
The post is complete now.



[Thumbnail for dt-ts.png]


[Thumbnail for dt-fft-1.png]


[Thumbnail for dt-fft-2.png]

Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1081
    
  10

There were significant flaws in the use of the Goertzel code. I have appended a modified version which gives the correct frequency of 425 Hz on the sample file provided earlier. As it stands the code is not rugged and need significant tidying.

Notes -
1) Though this give the correct frequency (425 Hz) in its current form is seems slow compared to using the FFT which also give exactly 425 Hz. I have used my own mixed radix 2,3 and 5 FFT which allows one to transform exactly 4 seconds of the 8 KHz data in the file.
2) To my mind knowing the maximum occurs at 425 Hz is less than half the battle. I see the biggest battle in detecting the repeated 200 ms tone, 200 ms silence, 600 ms tone, 1000 ms. I can see several possible ways of doing this - since this interests me maybe I will have a go after the Xmax holiday but don't hold your breath.


Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1081
    
  10

Even after many years of using forums I am still surprised when people walk away from threads they start.

Since the topic of this thread interests me I have put some effort into defining an algorithm to identify the Italian dialling tone. A Java implementation of my algorithm works very very well even when the period is not exactly 2 seconds and/or when the tone frequency is not exactly 425 Hz and in the presence of significant white noise. If the OP is interested I will post the algorithm details (not the code) but at the moment I see no point.

Yes - sour grapes!
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2431
    
  28

Richard,

I wouldn't be so quick to call the OP off. Around this time, a lot of people take some time off. This thread was started 2 days before Christmas, and I would expect Christmas to be rather big in Italy.
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1081
    
  10

Jayesh A Lalwani wrote:
I wouldn't be so quick to call the OP off.


Am I yet able to conclude that the OP has abandoned this thread?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Recognize dial tone in audio file(FFT)