I unfortunatelly have never done DSP before, but read a lot of Articles on it by now. But I cannot get to the point of really understanding what I can do to solve my Problem, so here is what I got:

I used the Java Sound API to open a TargetDataLine through a Mixer, which is the line-in or mic port of my soundcard. What I get from that is a byte[].

And here is what I want to achieve:

I want an equalizer like App. which shows me the Volume of the single bands like an 8 band EQ. Visualizing the Data later on is not the Problem. But how can I get to the data that gives me for example the volume of the frequencies 30-80hz? I know I have to apply an Fast Fourier Transition to the Input Stream somehow, but how? I found implementations of FFTs but they all just show itself by a generated Signal, not an input Stream, like I have. So how can I do a FFT on the byte[] Stream I get off my AudioSystem and how can I get the volume like for the example I named above?

As I recall, FFT works on blocks of data so you should be reading the stream into a byte[] in chunks of a suitable size (typically some power of 2) which are then passed to the FFT/whatever analysis.

I would be thinking in terms of 3 separate Threads (hurrah for Java Threads!)

1. input stream to byte[] objects sent to a queue of byte[] objects for 2. (discard byte[] objects if queue is too full) 2. FFT or other algorithm processes blocks from the queue and deposits result objects in an output queue (likewise discard if queue is too full) 3. display Thread tries to keep up with output of 2

Let us know what you come up with, sounds fascinating.

Bill

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

The Three-Thread-Stack sounds good. But how do I process the byte[]? Here is what I got:

How can I compute bytes? Sorry if this sounds stupid...

William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 13005

5

posted

0

Well, Im not too familiar with audio formats but it seems to me the byte[] is a compact representation of a wave form that can also be represented as a float[]. I am guessing the float values are expected to be scaled between -1.0 and 1.0 so you need to figure out how to get a byte to that range. It could be as simple as casting a byte to an int, then to a float:

int v = 0xff & b[n] ; // masking so int values stay in 0 - ff range float f = ((float) ( v - 0x80 ))/ 127.0 ; x[n] = f ; // guessing that byte values representing negative part of wave form are less than 0x80 ( 127 decimal )while positive part is 0x80 - 0xff

I've done some considerable DSP work myself, so can hopefully lend a hand The byte[] is usually the raw PCM data. PCM data is an interleaved sequence of samples and channels. So if you have 8-bit stereo PCM, and write in (sample,channel) notation, you have something like:

(0,0)(0,1)(1,0)(1,1)(2,0)(2,1)...

where, for example, (2,0) is the third sample in the first channel. Like most things to do with arrays, we index from 0 in both cases.

So your first job is to interpret the samples and channels. In the simple example above, this is easy. But it gets more complex if you have 16-bit stereo PCM (this, or 24-bit, is usual for high audio)---then you have:

(0a,0)(0b,0)(0a,1)(0b,1)...

where a and b are the two bytes which make up each sample (16 bits = 2 bytes). But depending on the PCM format, this could be big or little endian. Java is big endian; the most common PCM is little endian. So when you extract the data into Java ints/doubles/whatever, you have to swap the byte order (note that Java NIO can help you here). Second thing: PCM values can either be signed (the usual) or unsigned. So you have to take account of this conversion too, and for a 2-byte value you would need to use an int rather than a short if it's unsigned (unless you want lots of conversions using 0xFFFF masks everywhere that is).

The FFT is actually a simple transformation, but is fairly computationally expensive. As pointed out earlier, you need to do it on blocks of the above samples. Small enough blocks that your data doesn't take too long to buffer out (avoiding high latency), long enough that you don't have to run the FFT too often, as it is O(N log N) whereas running it on small blocks too often approaches the naive O(N^2). Don't be fooled by the simplicity of the equations: this can make considerable difference on blocks of even reasonable sizes (e.g. 512 or 1024). BTW, if you aren't already familiar with it, the Cooley-Tukey algorithm is your safest bet as it is O(N log N). In addition, rather than computing "(float) Math.cos (arg)" every time, you can use some elementary trig identities to reduce this to multiplications and additions only. Using cos and sin functions is computationally very expensive; simple times and addition are not.

Using the three threads sounds overkill. Rather than blocking to wait for input as you'd have to in that model (or risk out of memory since you'll read an entire uncompressed file into the queue at once), you can easily input, process and output in a single queue. All sound cards have an output buffer which helps avoid the jitter caused by delays in processing. You won't notice any problems even using a small buffer. Also if you have only 44100Hz (CD quality) as the sample rate, then your 2GHz processor has plenty of cycles spare to do the FFT computations! This is the "normal" way to do it... no need for multi-threading in a simple sound app.

Finally, there's something quite subtle you may have missed: if you intend to manipulate the results in Fourier (spectral) space, for example to do EQ processing, and output the results as samples, you'll need to use the IFFT to get back again. However, this will degrade the signal giving you crackles at the boundaries between frames. It's called Gibb's phenomenon and is caused by discontinuities you create. To avoid this, you'd need to use the overlap-add method or similar, or the audio will sound awful and you'll wonder why!

Hope that helps. This is a long and complicated subject which really takes weeks or months of study to understand fully. I think a strong background in mathematics makes it considerably easier

Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / AmazonAmazon UK )

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

Phew... hard stuff. I can't help myself.

I have those two functions for the Input:

So, what exactly do I have to do with the byte[] I have to link it to an EQ Visualisation that visualizes the Volume of 8 frequency specs?

Sorry, I looked into tutorials and I ordered two books on the subject to really get into it, but I need to get this done somehow.

Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836

posted

0

Since this is digital data, you need the discrete FT, of which the FFT is a special case. Since it's discrete, you need to choose a window size to measure in; the larger the window, the better the results but the larger the latency between reading and producing output. For example, a spectrum analyser may take 30 second windows, but that means you have to wait 30 seconds before output. The DFT is a one-to-one mapping from sample space to spectral space. So number of input audio samples = number of output frequencies in spectral space. Before deciding how large you need the window, you need to ask yourself two things: (1) how much latency can you afford to have (this places an upper bound U on the size of the window); (2) what minimum resolution of output frequencies do you need (which places the lower bound L on the window--bearing in mind frequency aliasing with the FFT gives you only half the number of useful frequencies you'd expect). Hopefully L <= U so all is good and you can then choose any value W with L <= W <= U for your window size. If L > U, what you want is impossible!

Right, so you've chosen W. This now needs to be the number of samples in all channels in your input. As you've done, it's best to start with a mono input (1 channel) or it'll just get complicated since you'll need to combine the channels in some way (or process them via FFT separately, but that's really costly). So the size of the input byte[] = W * (number of bytes in a sample) = W*S/8 since sample size, S, is usually expressed in bits. Overall therefore the reading loop takes the form:Now, the key here is to choose W optimally so that doing everything including the drawing (though the FFT is usually the expensive bit) is within the ability of your CPU. Otherwise data will start backing up and the sound card's buffer will overflow so input samples are lost. Swing will run graphs in a different thread, so you can just supply the graph with the double[] and call it's repaint() or whatever. Try to convert the fftMag method you've shown above to using double[] in and out as it'll be much more accurate. One thing we find quickly with FFT is that it is prone to rounding errors - sometimes float just isn't good enough, so double is best if possible.

Right, the final bit is to implement the decode(byte[],double[]) method which does the conversion you were after since the first post... This will depend on data extracted from the AudioFormat. Of course a general method can be written; in your case you need to observe:

Sample size is 2 bytes

Using little endian so need to reverse byte order

PCM Signed

Using this, the decode method can take the form:There's just some bit shifting to do on account of the wrong "endianness". The (short) ensures the sign in the result is accounted for correctly; if this was unsigned you'd use a "0xFFFF &" instead before setting as a double.

And that's pretty much it I think... Let me know if that works for you!

Usual disclaimer: I have tested none of the above and it is all from memory. It may be slightly incorrect and may not compile. Use with care! [ October 12, 2008: Message edited by: Charles Lyons ]

Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836

posted

0

And just a couple of things on the FFT implementation too: try to re-use arrays as much as possible rather than constructing new ones. There is overhead in both CPU and memory in creating new objects (and running the GC), so avoid it where possible. Secondly, FFT produces a set of moduli of complex numbers which typically only have relative meaning (it is possible to express them in the relative dB scale with a bit of prodding around). On the other hand, Math.sqrt is expensive. Avoid it. If this is just playing around or for graphical comparison, you can probably make do with the square of all the frequency components. If not, scale your axes on the graph (non-linearly) to compensate. Also you really don't need to divide by n or multiply by 2 as for comparison all samples will look the same if multiplied by n/2. Graphically, things will look the same for comparison (just scale the axes, this time linearly); mathematically this is an arbitrary scale factor which can be placed on the IFFT rather than on the forward FFT. So you don't need the Math.sqrt nor the division, nor the multiplications (instead I do one multiplication by 0.5*0.5=0.25 on out[0], just so things stay relative for if/when you do the axis scaling later). In most Java apps, all these overheads are negligible, but in real time apps, every cycle counts. So where your FFT has:you can replace it with:Cool [ October 12, 2008: Message edited by: Charles Lyons ]

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

Woaa... that is knowledge. Thanks for your help. I will try to get this to work. What still bugs me, is the fact that I don't really know what to do with the set of complex numbers... Are those the frequencies? How do I draw a Graph with them? damn... I sound stupid...

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

Ok, got it running. With W = 1024 I have a double[] with 512 values. Now what are they expressing? If I want do show them in a graph and take them as y values, what are my y values? My brain smokes as I am trying to understand the formulas on wikipedia...

Originally posted by Sebastian Herrlinger: Woaa... that is knowledge. Thanks for your help. I will try to get this to work. What still bugs me, is the fact that I don't really know what to do with the set of complex numbers... Are those the frequencies? How do I draw a Graph with them? damn... I sound stupid...

The results of an FFT is a set of complex pairs representing the real and imaginary parts of the Discrete Fourier Transform of the input data. How you are presented with these pairs depends on the FFT implementation but that is just an implementation detail. For example, my FFT implementations present the results as an array of alternating (real, imag) doubles.

Each complex result has a related frequency. Assuming you have taken the transform of N data points with a sample rate of S samples per second then the frequency resolution is S/N so the nth complex pair is associated with frequency n*S/N .

An estimate of the Power Spectral Density of the input data is given by the modulus squared of the complex values.

The fact that the FFT is taken over a finite number of points causes a distortion of the spectral density. In order to reduce the distortion it is necessary to use a 'window' function to pre-process the data before taking the FFT. There are a large number of 'window' functions (Hanning, Hamming, Poisson etc). Google will find you further information on this but I would suggest you first use 'Hamming' and only worry further about windowing if you have very large single frequency component in your data.

Retired horse trader.
Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

I see... I cannot get a running Equalizer Visualisation without getting really into this. So back to learning...

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

Ok, I did it. So what I get now is a curve.

Mhhh... looks nicer, if all the curves are drawn, but the screenshot didn't work. What I did was:

So thats the Waveform of the Signal? A single Signal? How can I link it to a Form like:

Thank you really much to help me through this. I think I am on the way of understanding, though it is hard. Also I'm german and the english vocabulary confuses me sometimes [ October 13, 2008: Message edited by: Sebastian Herrlinger ]

Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836

posted

0

It all depends on how accurate you need to be really. Since the DFT is an approximation to the continuous version, it's never going to be 100% accurate (like trying to represent a recurring decimal as a double or float). In the above description I used the rectangular window for simplicity. Sure, it isn't the best in the world, but if you're only starting out on your journey it's good enough to play with before making your program more complicated. This is a handy little page displaying some of the windows and trade-offs. To use a window, remember to pre-calculate all the values before your program starts, to avoid calling costly functions during execution. Rectangular is the worst, but it's also the easiest for you for now, without entering into too much detail. This can get complicated quickly, for example if you want to modify the EQ before outputting, you really need to overlap those windows using the overlap-add method or similar I quoted earlier---otherwise you get blocking artefacts. I'd say steer clear for the time being until you understand the basics.

Each complex result has a related frequency. Assuming you have taken the transform of N data points with a sample rate of S samples per second then the frequency resolution is S/N so the nth complex pair is associated with frequency n*S/N.

This is all true, but to add one further thing: it turns out that for a real input signal (such as your audio, though sometimes using a complex input in engineering is useful), the complex frequencies above S/2 are complex conjugates of those below S/2. You can prove this mathematically in just a couple of lines. Hence the modulus of the nth complex frequency is the same as that of the (S-n)th (up until n=S/2). This aliasing is the reason for Nyquist's theorem---that you sample at twice the maximum frequency contained in the signal. So at your 8kHz, you can examine only up to 4kHz in your signal. The other moduli (above S/2) you have to discard because they are redundant information. Your FFT algorithm does this already.

So in your case, you have the output out[n] from the FFT algorithm, which is already of size S/2 to take account of what I mentioned above. Each of these values is a modulus squared of the corresponding frequency n*S/W. So in your case, each value is n*8000/1024 ~= n*7.81 for n in [0,S/2] corresponding to [0,4kHz]. So your output increases in roughly 7.8Hz increments, which is fairly high resolution. You can see that on a 44100Hz (CD quality) signal, using 1024 as the window only gives you 43Hz increments, considerably lower resolution but often good enough nonetheless.

Yes, I suggest you read a lot into the topic. It can be fascinating. Really you need the know the mathematics behind this though, and there is substantial background in techniques (complex numbers, integrals and contour integrals, and some engineering principles) to understand it all fully. A scholarly book on DFTs or DSP will set you up. DSP itself then diverges from this point: you can look at filtering either in Fourier space or using convolution in time space (surprisingly, this is usually more costly than doing FFT then IFFT!). You'll also meet things like the windowing we mentioned, FIR filters, and the complex z-transform. You don't really need to know much detail about the inverse transform, though if you want the challenge, it's good to demonstrate it using contour integrals in the complex plane---indeed, this is where the z-transform originates.

Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836

posted

0

How can I link it to a Form like:

This is something on the power density in the spectrum. At the moment you are pretty much there with it. To get a histogram like you have, you could simply add together the powers of the frequency components inside the ranges you require. That should do as a quick fix (you can poke around with the FFT algorithm a bit to get an actual power spectrum as described in the previous link). Alternatively you can use a smaller window size (W) and the DFT will do the work for you, though it'll be less accurate near the boundaries of your frequency blocks.

Also I'm german and the english vocabulary confuses me sometimes

But mathematics is universal? I understand what you mean though---grasping these concepts is difficult without a language barrier, and you're doing well to get this far!

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

Thanks again very much for your superior help! I managed to get what I want... in a way. But the CPU is very busy for just having a visual EQ. Is there anything I can do to speed things up with the FFT or make it use less CPU? You wrote something about using log instead of sin() and cos(). How do I do that? Haven't read anything about that in the Articles I have here right now. Sorry again for my bad Math I did a little Video and uploaded it (Big day, first youtube video =))

There you go... you can see what I have so far. I also know now how I could make it 32 or 64 bands to display. I just let the "FFT Loop" call the custom paint()-Method of my EQ-AWT-Canvas-Object every cycle. Without the recording Tool I get about 60 fps on my 2Ghz/1GB (Single Core, about 7 years).

Another Question that came up was: How do I measure the volume of each frequency spectrum I have? Decibell? All I did for now, was to multiply my doubles with 100 and (int) them to have usable numbers for animation. Is there any kind of normalization I could do?

Sorry to bother you with another bunch of questions...

Thanks very much!!!

Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836

posted

0

Well it looks impressive to me...

Is there anything I can do to speed things up with the FFT or make it use less CPU? You wrote something about using log instead of sin() and cos(). How do I do that? Haven't read anything about that in the Articles I have here right now. Sorry again for my bad Math

Actually, this kind of optimisation is a computer science problem; the mathematics is sound and it makes no difference if you use sin and cos or the trig trick. To optimise, we note sin and cos are expensive functions; multiplications and additions are relatively cheap however. So you need to use the addition/subtraction trig formulae shown on Wikipedia (the angle sum and difference identities). Given your algorithm and those identities, can you think how you might be able to replace the sin() and cos() in your FFT method with a single sin() or cos() evaluation and a set of multiplications and additions? I'll leave you to see if you can work it out.

How do I measure the volume of each frequency spectrum I have? Decibell?

What you discover quickly is there is no single definition of what volume is. It has to be relative to some fixed value. The link I gave in my previous post gives you an example of a particular display. It is usual to display an EQ in terms of the power of the signal too, not the volume. Again, the link I gave in my previous post covers that. Decibel (dB) is only a logarithmic scale---which better represents how we hear volume---and again, it is relative so you have to decide what you need it relative to. Unless you have a specific need for units on your scale, you may only want to scale the axes logarithmically to give a more perceptive display.

Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836

posted

0

Is there anything I can do to speed things up with the FFT or make it use less CPU?

Also, I didn't look carefully at your algorithm. Is this a naive FFT implementation? If so, it'll be O(n^2). You can make it O(n log n) which is a massive improvement*. The Cooley-Tukey algorithm is the most common example of the latter. Google for some code examples if you can't figure out how to do it from the equations. You can still use the sin/cos trig identity trick above to reduce cycles even further.

* As an example, I once used a naive FFT to try to solve a PDE---it took over 20 minutes. The equivalent O(n log n) FFT took under 1 minute! Some careful thought and attention to detail pays wonders when it comes to optimising algorithms! [ October 14, 2008: Message edited by: Charles Lyons ]

[Charles]: Is this a naive FFT implementation? If so, it'll be O(n^2).

One could argue that's not really an FFT then. It's just an FT.

Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836

posted

0

You're absolutely correct (though it's a DFT to be precise). Hardly anyone uses a plain DFT due to its inefficiencies, so it's a slip of my tongue to refer to everything as an FFT... my mistake! Sometimes I just write IFT too when I really mean IFFT (and probably vice-versa occasionally). Indeed I should have said the naive DFT is O(n^2) while any FFT algorithm is O(n log n). Thanks for the correction.

Originally posted by Mike Simmons: [Charles]: Is this a naive FFT implementation? If so, it'll be O(n^2).

One could argue that's not really an FFT then. It's just an FT.

Indeed. The Fast Fourier Transform (FFT) is just a fast method O( nlog(n))of implementing the Discrete Fourier Transform (DFT) O(n^2).

There are many optimizations one can make to make the FFT even faster. For example, one can save a load of time by pre-computing all the sin() and cosine() values and pre-computing the reorder indexes. With some clever indexing, the tables are only half the length of the data being transformed.

If the OP is willing to include some native code in his system then he should take a look at fftw which is written in C and assembler. This is 8 times faster than my Java based FFT and my Java based FFT is pretty fast. fftw is also about 8 times faster than my pure C++ based FFT and 6 times faster than my FORTRAN based FFT.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

[Charles]: You're absolutely correct (though it's a DFT to be precise).

Sure - but as FFT is discrete too, the D seemed irrelevant to the point I wanted to make, which was that an O(n^2) transform is not Fast.

[Charles]: Hardly anyone uses a plain DFT due to its inefficiencies, so it's a slip of my tongue to refer to everything as an FFT

Yeah, that's probably what happened in the original post as well, talking about FFT as a general synonym for Fourier Transform.

Sebastian Herrlinger
Greenhorn

Joined: Oct 08, 2008
Posts: 11

posted

0

I would use natives if there is also one for mac, because that's the target system which I will be using. For now I have to figure out how to get my DFT to a FFT... I am on it... Thanks for all your help!

Sorry for re-posting same question.
I need to calculate the volume in decibel unit of selected audio wave file under the time domain. It is ok to print the volume in decibel unit with time. But I have no idea about how to implement and have no knowledge of formula.

If you have to make a calculation of the volume in decibels but you don't have any idea of the theory and mathematics behind what you need to program, then it's going to be very hard. How can you write a program if you don't understand what it's supposed to do? Before you start programming, do some research to learn exactly what your program is supposed to calculate.