wood burning stoves 2.0*
The moose likes Java in General and the fly likes Boxing, unsigned bytes, and how to process them at various speeds. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Boxing, unsigned bytes, and how to process them at various speeds." Watch "Boxing, unsigned bytes, and how to process them at various speeds." New topic
Author

Boxing, unsigned bytes, and how to process them at various speeds.

Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Winston Gutkowski wrote:
Stevens Miller wrote:What is absolute, hard, undeniable fact, however, is that Java's lack of unsigned integral primitives is a major pain in my backside.

Actually, you do: - char - and it works just as you'd expect an unsigned short to...

The problem is that without those final casts, it attempts to print it out as a character.

I suspect there's also a lot of promotion to int going on behind the scenes; but that would be the case for any smaller primitive.


Yeah, that's the problem. In my case, I am working with rasters of pixel data. There's even an object for those. But pixels aren't made of shorts, they're made of bytes. Those bytes, in turn, aren't stored in buffers full of chars. They're stored in DataBufferByte objects (or, maddeningly, in some cases) DataBufferInt objects. To deal with those as unsigned values, you do end up having to promote them into larger primitives (which, of course, are signed, but the unsigned values I need to deal with are all within those larger primitives' positive ranges). It wouldn't help my problem (of all that overhead involved in promotion), but there isn't even a DataBufferChar object to use (there is a DataBufferUShort object, but I assure you that's useless to me, as none of my pixels can fit into a short, and there is no easy way to obtain my rasters as pixels composed of three shorts).

Bottom line is, because Java has no unsigned bytes, I do all my image manipulation in C, via native methods. For the Hell of it, I have looked into how the JDK handles some byte manipulations, and was jaw-dropped (gobsmacked, to some of you) to see a lot of code that actually looks like this:



"Eeeeeeww!," I said, and promptly learned how to call native methods, where such things look more like this:



Now, as it turns out, that little Java nightmare above is pretty fast, but my rasters tend to have a million pixels in them and I need to operate on them in less than a frame-time for normal video. Even with old NTSC (PAL, for some of you), that means processing about a quarter-million pixels in 33 milliseconds, or about 133 nanoseconds per pixel. I know we're never supposed to optimize, but in a regime like that, I don't have time to shift-mask-add-shiftBack-maskAgain, just to get a plain old unsigned byte in and out of my buffer.

There's a lot I love about Java. But, after programming for 30 years in C and happily using unsigned char data to handle pixel bytes, you probably can't imagine how I felt on the day I was madly flipping pages in all my books, looking for the "unsigned" modifier, as I became increasingly aware that, in Java, it does not exist. If you stand in certain parts of my lab, you can still hear the echoes of, "but that can't be true!"

<huff> <puff>

Okay, we now return you to your regularly scheduled thread.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:But pixels aren't made of shorts, they're made of bytes...

Ooof, really? Sounds like Mr.Blobby time to me; but maybe it's part of that wonderful spec that is NTSC (Note: != PAL...by a long chalk)

Computer screens have been at least 16-bit for more than 20 years, and 32-bit for at least 10 (I know because my clunky old Dell is 7 years old, and it wasn't new then).

It also sounds to me like you're moaning about the fact that Java's basic unit of operation is an int, rather than the fact that it's unsigned. Bitwise operations (apart from '>>') couldn't give a toss about them.

Not being an expert on these things, I don't know; but couldn't you get more throughput by dealing with your pixels 4 at a time?

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38509
    
  23
Actually those two bitwise operations can possibly be completed in two clock cycles. Both >> and & are very fast. I am pretty sure you can do & in one clock cycle.
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Winston Gutkowski wrote:
Stevens Miller wrote:But pixels aren't made of shorts, they're made of bytes...

Ooof, really? Sounds like Mr.Blobby time to me; but maybe it's part of that wonderful spec that is NTSC (Note: != PAL...by a long chalk)

Computer screens have been at least 16-bit for more than 20 years, and 32-bit for at least 10 (I know because my clunky old Dell is 7 years old, and it wasn't new then).

It also sounds to me like you're moaning about the fact that Java's basic unit of operation is an int, rather than the fact that it's unsigned. Bitwise operations (apart from '>>') couldn't give a toss about them.

Not being an expert on these things, I don't know; but couldn't you get more throughput by dealing with your pixels 4 at a time?

Winston

The pixels you see on your screen are actually 24-bit pixels (high-end image-processing hardware will often take that to 30 or even 36, but what I'll say here extends pretty naturally to those architectures which, in any case, are almost non-existent for most purposes). Each pixel is made up of three bytes, one for red, one for green, and one for blue. Even in the bad old days of eight-bit video, those eight bits looked up a 24-bit red/green/blue triplet of bytes in a 256-entry table. 16-bit video (which, in a lot of cases, was actually 15-bit video) was sometimes composed of triplets that allowed for five bits for two of those three colors, and six bits for the third (usually green, I believe, on the theory that your eye is more sensitive to different shades of green than it is to different shades of red or blue). There is/was a more complex scheme that, again, converted the 16-bit entries to 24-bit triplets by either a kind of math function or else, when memory got as cheap as sand, again by a look-up table. (In that scheme, one approach was to partition the RGB colorspace into a 40x40x40 set of subspaces, with each of 64,000 values indexing one such space, and then choosing a color from that space to proxy for all the colors it contained; there were much more complex schemes as well, but they are mostly for the history books, now.)

Today, with memory being almost free and graphic hardware being very fast, the pixels you see on your screen are each being directly represented by an RGB triplet, with one byte each, no look-ups, math, or other voodoo involved. If you want to show a black pixel, you load its associated memory locations with, in effect, a three-byte array, {0, 0, 0}. If you want gray, you use something like, {127, 127, 127}. White, unsurprisingly, is {255, 255, 255}. Depending on the order, pure bright red would be {255, 0, 0}, and yellow would be {255, 255, 0}. A pleasing sky blue might be {50, 100, 200}. A lot of hardware actually treats these arrays as though they were four bytes each, as that does simplify some aspects of moving image data around by aligning it on 32-bit boundaries. The fourth byte is usually called "alpha," and can be any of these: 1) a transparency value; 2) a value not used for anything; 3) chimerical (that is, the address location exists, but there is no real storage associated with it). When you look at the DataBuffer that defines a raster, it might be a buffer of ints, or it might be a buffer of bytes. If it is a buffer of ints, you need to do the shifting and masking I described to extract the individual color values. If it is a buffer of bytes, you don't have to do that, but you do have to cope with the fact that they are signed. In a simple case, suppose you want to combine two images to form a double-exposure. If, at one pixel location, the RGB values in one image are {100, 50, 200}, and at the same pixel location in the other image they are {120, 30, 10}, their combined values will simply be {220, 80, 210}. Alas, in Java, what I would like to have be 200 in a byte is regarded as -56, so my attempt to form a double-exposure at this pixel gets me this result if I just add them arithmetically: {-36, 80, -46}. Note how that first byte went from two positive values to a negative value, due to wrap-around. If I next want to know the maximum value in each pixel, that negative sign is going to thwart me, isn't it? While it's true that the low eight bits of -46 and 210 are the same in two's-complement, I simply can't do math with negative pixel values.

Now, the astute programmer (into which category I would certainly put Winston) will notice that I have ignored a problem with adding bytes that might be anywhere on the range of [0,255], which is that sums above 255 are possible. Won't I have to promote to a larger storage primitive to deal with that? No, not always. Suppose I want to average two images. I could divide all six values by two, then add them, sure to stay within my range (yes, that loses some precision, but one often makes that trade when doing image work). Our two pixels from before would average, in unsigned math, to {110, 40, 105}. But Java's enforced sign yields an "average" of {110, 40, -23}, the two's-complement equivalent of {110, 40, 233}. So one simply cannot ignore the signs Java applies to my unsigned color bytes and hope it will all work out, because it doesn't.

These examples may seem to be creating needless problems, because Java's extensive imaging library already provides for a lot of these operations. It is in the source for some of them that I realized there truly is no alternative in Java to shifting and masking, shifting and masking, because that's how a lot of them actually process images. When I saw that, I had to get up and go for a walk, because I realized a huge amount of my C code was never going to be ported to Java. And a lot of what I do goes way, way beyond what is provided for in the Java libraries. I simply can't ignore the effects of signed bytes in my image-processing work, nor do I wish to spend the time it takes promoting every last byte I deal with to an int, doing my processing, then "demoting" my results back to bytes. Even when you know you can ignore some of the effects of signed bytes, Java promotes bytes to ints in basic operations anyway. That is, this



won't compile, because "a + b" is an int not a byte.

I should admit that people who do more exotic imaging work than do (I am thinking mostly of streaming video here), would agree with me in part about all this, but also disagree with me in part, because a lot of video (maybe most of it, these days) is not streamed in RGB triplets. It is streamed in packed formats that often use things like pairs of 16-bit values to represent two pixels, each with the same color values, but two different brightnesses (that's a vast oversimplification, but more detailed explanation would be useless here, and very likely give at least one of us a headache, if he doesn't already have one). One can do some remarkable image processing directly on those 16-bit values without decomposing them into 8-bit triplets, and one saves a lot of processing time in the bargain. However, even in that case, one will need to be able to deal with some unsigned 8-bit data at various steps, and there are even more formats that do not rely on 16-bit values (so, even if Java's char could help, it would be of limited application).

At the end of the day, I can tell you that people like me (that is, folks who do image-processing work and got our start with C) have all lamented this at one time or another, and we have all learned to live with it, since, like it or not, Java ain't got no unsigned bytes. Some seem to cope by shifting and masking. Others, like me, use native methods. Still more find ways to get their work done within the limits of existing API calls (this works great, for example, when you are moving sprites around in a game, rotating an image, etc.). But, I can assure you, it was simply a dumb decision to exclude unsigned bytes and unsigned ints from Java.

(FWIW, though I am no network programmer, I am told that they feel much the same way, as unsigned bytes show up a lot in packets as counters. I'll have to leave that to one of them to elaborate upon.)
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Campbell Ritchie wrote:Actually those two bitwise operations can possibly be completed in two clock cycles. Both >> and & are very fast. I am pretty sure you can do & in one clock cycle.

Yup, you're right, Campbell. In my timing tests, those ops really rip. But, when you are trying to manipulate three-million bytes, thirty times every second, everything has to rip. A single clock cycle can, literally, become a big percentage of the time you've got. Very roughly, I need to pump about a billion bytes per second. Even fractions of nanoseconds matter a lot to me, in that regime.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38509
    
  23
The trouble with unpackedBytes[index][2] is that you are turning an int[] (or int*) into a byte[][] (or char[][] or char**). To do that you have to abandon checking the type of the array, or even the size of the array, and as we all know, overflowing the size of an array can produce rubbish results or even allow malware into memory. So the price you pay for type‑safety is (unpackedBytes[index] >> 0x20) & 0xff

Can you display all 24 bits together, or do you have to separate the three R G and B bytes as they go to the screen? I presume from previous discussion that the answer is no.

Yes, it would probably have been a lot easier for you if we had an unsigned keyword. I agree there.

I also think this is far too complicated for “beginning” and shall move this thread.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38509
    
  23
Actually I have only moved half the thread, because it seemed to change subject halfway through.
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Campbell Ritchie wrote:The trouble with unpackedBytes[index][2] is that you are turning an int[] (or int*) into a byte[][] (or char[][] or char**). To do that you have to abandon checking the type of the array, or even the size of the array, and as we all know, overflowing the size of an array can produce rubbish results or even allow malware into memory. So the price you pay for type‑safety is (unpackedBytes[index] >> 0x20) & 0xff



Yes and no. If the array were a byte[][] to begin with, I wouldn't be changing anything. I only used that example because so much of Java's imaging library treats pixels as bytes packed into integers. If I had my 'druthers, it would all be unsigned bytes, start to finish. (Agreed, of course, that pointers allow for overflow, but that implicates the entirety of the managed v. unmanaged language debate, which I am sure neither of us wants to see happen here.)

Can you display all 24 bits together, or do you have to separate the three R G and B bytes as they go to the screen? I presume from previous discussion that the answer is no.


Notwithstanding all my discussions about screens, earlier, the operant fact is that my data originates as a buffer full of RGB triplets. Well, okay, let's be completely accurate here: I get my data from a Webcam, which gives it to me as an array of 32-bit values, effectively ABGR quartets, packed into integers. If it gave it to me as ABGR quartets in unpacked signed ints (that is, array[0] is an integer indicating my first pixel's red value, array[1] is an integer indicating my first pixel's green value, array[2] is an integer indicating my first pixel's blue value, array[3] is an integer indicating my first pixel's alpha value, array[4] is an integer indicating my second pixel's red value, and so on), I'd be happy as a clam in mud. Indeed, that would actually make some of the things I do easier, because I could ignore the potential for intermediate results to exceed the range I could represent in an unsigned byte. But that's not how I get them. I get them as an array of bytes, one color each (or as ints, one pixel each, packing three colors and an alpha value; take your pick since, in C, I can treat each one as though it were the other). Given that I get my data that way, I must either copy it to an array of a wider class, or do that tedious business of "correcting" for cases where the high bit is "1" in one of my bytes. Either way, it's a problem that only exists because "unsigned byte" don't mean squat to Java.

If you think I sound frustrated ("moaning," was how Winston put it, I believe, and I won't argue with him), it's partly because it causes me all the problems that it does, but also partly because of this, which you can read read in its entirety online:

James Gosling wrote:Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is.


I am in humble awe of Gosling's language's success and its impact on computers, computing, and, yes, the entire world. But in this statement, I know of my own experience that Gosling is so amazingly, blazingly wrong that it boggles the mind to think he is who is. Read any of the papers published in the SIGGRAPH Proceedings from, oh, I dunno, the first one up until I stopped reading them in the mid '80s, and tell me none of those authors understood unsigned arithmetic. When I worked at the NYIT Computer Graphics Lab, if Gosling has strolled the halls, saying, "I have designed a new, superior language that will change computing forever, one that will make lost memory and invalid pointers impossible," we would have cheered him. If he had gone on to say, "And it shall only permit signed bytes," we would simply have concluded he was kidding us.

Yes, it would probably have been a lot easier for you if we had an unsigned keyword. I agree there.


Yes. Yes it would. Knowing that Gosling decided against it to, it would appear, save us from ourselves, is kind of hard to accept gracefully. Well, if nothing else, I suppose living with it is good for my soul.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:I realized there truly is no alternative in Java to shifting and masking...

You'll have to forgive me for my ignorance (despite your kind words ), but I was under the impression that even in C, bitwise operation are still the quickest (although possibly not as beneficial, timewise, as in Java), so if you can say '<< 1' rather than '* 2', you may be saving a few cycles.

I also wonder about your analysis - although, I say again, I've never dealt with real-time video - because it seems to me to be focused on the byte (the how), rather than the business of processing huge volumes of data (the what).

As a DBA, sysadmin and programmer, I have had to deal with lots of systems (including real-time ones) involving huge volumes of data before; and the problem has always broken down into the three basic IPO areas:
1. How fast you can read the input (and that will involve I/O and network pipelines).
2. How fast you can process that input.
3. How fast you can write (or stream) it to wherever it is needed.
and your focus appears to be entirely on point (2).

I can understand your thinking because, as a Java programmer, that is basically the only area that you have any control over; however, there comes a point at which (2) will be fast enough - ie, at which your program, which may historically have been the bottleneck to the process, either
(a) has been fully optimized, or
(b) (and this, to me, is programming Nirvana) has been optimized sufficiently that it is no longer a bottleneck without compromising design.

If I'm writing to an IDE drive that can take 40mbytes/s continuous throughput (as opposed to bus speed), and my program can handle that, surely my job is done? Indeed, with compression algorithms, be they lossless or lossy, I could probably achieve that with significantly less I/O on the part of my program.

So, what can my program do? It can process its data in the quickest way possible; and in Java the simplest way to do that is by making every numeric entity (and your data would seem to comprise of nothing but) involved in a numeric operation an int. Memory, as we both know, is cheap and, above all, fast; so I can't imagine that the business of expanding bytes to ints, or contracting them at the other end, of a memory-based pipeline, is going to add anything more than a fixed and measurable overhead to the process; and furthermore, it can probably be threaded.
What you do with those numbers in the middle and how you do it is entirely up to you.

Like I say, I've never had to deal with the problem that you are, but I do wonder if you're not focusing on too small an area when it comes to optimization.

Winston
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Winston Gutkowski wrote:
Stevens Miller wrote:I realized there truly is no alternative in Java to shifting and masking...

You'll have to forgive me for my ignorance (despite your kind words ), but I was under the impression that even in C, bitwise operation are still the quickest (although possibly not as beneficial, timewise, as in Java), so if you can say '<< 1' rather than '* 2', you may be saving a few cycles.

I never checked it, but some Unix wonks I worked with in the '80s claimed that the Berkley Unix C compiler actually compiled "* 2" and "/ 2" into the bit-shift operations (at least, on the VAX 11/780, they said). But the thing to understand in my case is that, if I had unsigned bytes available, I wouldn't need to do any shifting at all. All that shifting around is to unpack bytes that are packed into ints.
1. How fast you can read the input (and that will involve I/O and network pipelines).
2. How fast you can process that input.
3. How fast you can write (or stream) it to wherever it is needed.
and your focus appears to be entirely on point (2).

Correct. I don't read the input so much as it is fed to me. The camera pumps it out at a fixed frame rate. I can buffer a few frames, but that does me no good if I can't eventually catch up. The alternative is to drop a frame every so often (my system actually does this on slower CPUs, and doing that was a multi-threaded b*tch I will describe on another day). I also don't really write my data out, either. The business of processing it results in it being stored in a buffer that the hardware uses to render it to the screen. That is, once I'm done with Step 2, Step 3 is actually done as well. (If you are familiar with Microsoft's DirectShow, you'll know how that works; if not, your analysis is still correct.)

I can understand your thinking because, as a Java programmer, that is basically the only area that you have any control over; however, there comes a point at which (2) will be fast enough - ie, at which your program, which may historically have been the bottleneck to the process, either
(a) has been fully optimized, or
(b) (and this, to me, is programming Nirvana) has been optimized sufficiently that it is no longer a bottleneck without compromising design.


Yes, your thinking makes sense. Here, "fast enough" is very rigidly defined: if my code can keep up with the camera's output frame-rate, it is fast enough. If it can't, it is not.

So, what can my program do? It can process its data in the quickest way possible; and in Java the simplest way to do that is by making every numeric entity (and your data would seem to comprise of nothing but) involved in a numeric operation an int.

Well, like I said, if the camera produced its output as buffers full of ints, where each int was a single color value (instead of three color values packed as bytes into what Java thinks is an int), I'd be in clover. The problem is not that I prefer bytes to ints; the problem is that "making every numeric entity involved in a numeric operation an int" would consume a substantial amount of the time I have between frames.
Memory, as we both know, is cheap and, above all, fast; so I can't imagine that the business of expanding bytes to ints, or contracting them at the other end, of a memory-based pipeline, is going to add anything more than a fixed and measurable overhead to the process; and furthermore, it can probably be threaded.


Fixed and measurable, yes. You are getting a good grip on my problem: it is an entirely numeric issue, with no real "i/o" as we tend to think of it (no file access, no database connection, none of that). Now, as it happens, I have got my program working pretty well with its inner loop optimized to the point where, for a 640x480 image, I can process the whole thing in just under 30ms. It is a 100% compute-bound loop; no waiting, blocking, or sleeping. At 30ms, I can keep up with a 30 frames-per-second camera, with a little CPU to spare. Yes, one could allocate a CPU on another thread to do the unpacking, but, here's the thing: my process works, but it could be better. I would rather add more code to my inner loop, and split my image into chunks, with each chunk being allocated to its own CPU, before giving up any CPU cycles to anything I don't really need to do. That is, even though I am "fast enough," that's only because I have reached an acceptable result. I could reach an even better result, if I had a faster CPU, or if I could use fewer CPU cycles on each frame than I currently am. In that regime, the idea of devoting any cycles to shuffling bytes in and out of ints seems like a waste of computing power I would rather use for more important things.


What you do with those numbers in the middle and how you do it is entirely up to you.

Like I say, I've never had to deal with the problem that you are, but I do wonder if you're not focusing on too small an area when it comes to optimization.


In this particular case, I'm pretty sure I'm not. What I am grappling with is a common problem in image-processing work. You've got an image; it's made of pixels; they're made of bytes; you need to do something to those bytes that grows O(n) in the number of bytes. Typically, that means you have an inner loop that you will do anything, anything, to optimize. No matter what you do, there is going to be some image size that takes your inner loop longer than the time between frames to complete, at which point, your inner loop is just not fast enough.

Again, all of this focus on bytes is because the camera (and, as far as I know, this is pretty much how they all work) hands me my color data in bytes, not ints. I have no choice but to be working in C, btw, as (and I have searched a lot on this) the DirectShow functions I need are not accessible from Java without native methods. So, in C, I get a buffer full of bytes representing the colors of pixels. I want my Java code to manipulate those color values, but Java wants ints for that since I need to work in an unsigned numerical range. So, I just stay in C for that part. It works fine, my code is (so far) fast enough, and I am right at the limit of what my CPU can do for me. I could make my algorithm better, but I need a faster CPU or a more efficient inner loop to improve it without dropping frames. In a situation like that, I literally don't have time to shuffle bytes.

Now, maybe I've been programming graphics stuff too long, but, when I saw that bytes are always [-128, 127], and never [0, 255], I not only got frustrated, but I did wonder, "What good is a signed type with a range like that? Who would use it? For what?" In my experience, the only use I've ever made of eight-bit data types has been for pixel color values (unsigned, by their very nature) and for storing text characters (also unsigned, by their very nature). I'm still curious: what good is a signed byte? Putting my question another way, if bytes are always going to be signed, why have bytes at all?
Greg Charles
Sheriff

Joined: Oct 01, 2001
Posts: 2849
    
  11

The way I look at it, Java is an excellent applications programing language, and C is an excellent systems programing language, and never the twain shall meet. Back when I used C++ for application programing, it was pretty common to find a bug where someone compared a signed value to an unsigned value and got unexpected results. In fact, Lint used to advertise in Dr. Dobbs every week with little "find the bug" puzzles, and unsigned/signed errors were a pretty common theme. These weren't stupid developers; it was just an easy mistake to make, especially as C allows typedefs, which may obscure whether a type is signed or not, and C++ allows operators to work on classes, so the definition of the storage type can be really obscured. I think that's what James Gosling meant. Developers certainly understand what unsigned and signed integers are, but they might not fully appreciate the dangers of mixing signed and unsigned types in an application. C++ fails, in my opinion, because it tries to turn C into an applications programing language without removing its systems programing features.

Similarly, you are using Java for what I would call a systems programing job. Because of that, you are hitting the reverse of the problems that C++ has. I think doing some of your work in native functions developed in C, or, hell, assembly, makes a lot of sense. If you're not using the right tool for the job, the fault isn't the tool's.

BTW, this is a very interesting thread. Thanks for that!
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:Now, maybe I've been programming graphics stuff too long, but, when I saw that bytes are always [-128, 127], and never [0, 255], I not only got frustrated, but I did wonder, "What good is a signed type with a range like that? Who would use it? For what?" In my experience, the only use I've ever made of eight-bit data types has been for pixel color values (unsigned, by their very nature) and for storing text characters (also unsigned, by their very nature). I'm still curious: what good is a signed byte? Putting my question another way, if bytes are always going to be signed, why have bytes at all?

Well, I have done a little bit of work (and timing) on lots of bitwise operations recently, and I know they're pretty darn quick; so I though I'd check out a few things for you on my clunky old Dell.

The difference between a loop that does nothing and one that assigns an unsigned byte value (ie, b & 0xFF) to an int?
about .36 of a nanosecond - that is 0.36 seconds difference over 1 billion iterations.

Then I assigned it (still unsigned) to a 64K cyclically indexed int buffer: .75 of a nanosecond.

So, assuming you have a multi-core processor, that means you could thread that and pump unsigned ints into your mill program (hopefully running in another thread) at the rate of over a billion per second, which would be a pretty decent frame rate even at 1080p.

Now obviously, those figures need to be taken with a pinch of salt; but this is a 7 year old machine with a lot of creaks (in a lot of tests my 3 year old laptop actually runs faster).

Just thought you might be interested. Java bitwise ops are damn quick.

Winston
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Winston Gutkowski wrote:Well, I have done a little bit of work (and timing) on lots of bitwise operations recently, and I know they're pretty darn quick; so I though I'd check out a few things for you on my clunky old Dell.

The difference between a loop that does nothing and one that assigns an unsigned byte value (ie, b & 0xFF) to an int?
about .36 of a nanosecond - that is 0.36 seconds difference over 1 billion iterations.

Then I assigned it (still unsigned) to a 64K cyclically indexed int buffer: .75 of a nanosecond.


Yup, those numbers are roughly what I got in my tests (implicitly doubled, since you have to pack your bytes back again), about a year ago. For the work I am doing, that's time I would rather have available for something that affects the result, not compensates for a flaw in the language.

I mean, Winston... use a multi-core, multi-threaded (and you haven't mentioned how I synch this or the overhead involved in that yet) approach, just so I can get my bytes into ints? As a work-around for the fact that C has "unsigned byte" and Java doesn't? With all due respect, my colleague, are you kidding me? I mean, imagine if this were still in the "Beginning Java" forum and some new programmer arrived, curious to know how you cope with the lack of unsigned bytes in Java, and read this. Multi-processored, multi-threaded code... just because Mr. Gosling thought we were too dumb to get what "unsigned" meant. I don't think that's the kind of come-on that sells the product.

(To put your numbers into context, you're finding it takes about 360 ms to cope with 1Gb of data. I have 33 ms to do all of my processing on, roughly, 3Mb of data. Scaling that to match, doubling for the repacking, and adding a pinch for synch and other overhead that goes with threading, and you're proposing to use about 10% of my CPU, which, as it happens, is almost exactly what I have left, when I'm still trying to squeeze a few more cycles out for improving my result. I completely concur with your findings, but I would disagree that they describe a practical option for me.)

Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Greg Charles wrote:The way I look at it, Java is an excellent applications programing language, and C is an excellent systems programing language, and never the twain shall meet.


That's a distinction I've never seen made before, though I don't say that's a reason to disagree with it.

C came along at a time when we needed it. A powerful language that provided features not available in a lot of other languages, in a form that worked on small platforms. The fact that most of the Unix OS could be written in C was, I believe, I testimonial to its flexibility. Whether that means C is a "systems programming language" or not, I wouldn't presume to say.

Back when I used C++ for application programing, it was pretty common to find a bug where someone compared a signed value to an unsigned value and got unexpected results. In fact, Lint used to advertise in Dr. Dobbs every week with little "find the bug" puzzles, and unsigned/signed errors were a pretty common theme.


I used lint all the time. That was before we had smart IDEs to help us find these things. I would risk those errors happily to have "unsigned byte" in Java.

These weren't stupid developers; it was just an easy mistake to make, especially as C allows typedefs, which may obscure whether a type is signed or not, and C++ allows operators to work on classes, so the definition of the storage type can be really obscured. I think that's what James Gosling meant. Developers certainly understand what unsigned and signed integers are, but they might not fully appreciate the dangers of mixing signed and unsigned types in an application. C++ fails, in my opinion, because it tries to turn C into an applications programing language without removing its systems programing features.


That's a very interesting take, and one I am not equipped to assess. I will only say that, if one finds "unsigned" too dangerous to use, then one shouldn't use it. I don't want to be told I can't be trusted with it, however. (I have never, in over 30 years of C programming, btw, used "typedef." Just never saw the point, really.)

Similarly, you are using Java for what I would call a systems programing job. Because of that, you are hitting the reverse of the problems that C++ has. I think doing some of your work in native functions developed in C, or, hell, assembly, makes a lot of sense. If you're not using the right tool for the job, the fault isn't the tool's.


Ahem. Image processing is hardly systems programming.

BTW, this is a very interesting thread. Thanks for that!


It ain't the thread, it's the seamstresses . I post here (and, pretty much only here, these days) because you get a better class of response.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Stevens Miller wrote:
Greg Charles wrote:Similarly, you are using Java for what I would call a systems programing job. Because of that, you are hitting the reverse of the problems that C++ has. I think doing some of your work in native functions developed in C, or, hell, assembly, makes a lot of sense. If you're not using the right tool for the job, the fault isn't the tool's.


Ahem. Image processing is hardly systems programming.


I think I agree with Greg here. What you are doing is not image processing in the traditional sense where speed only sort of matters in terms of responsiveness to the user. You are really programming an extension to the webcam, need to worry on a low level about system resources, and need to concentrate on an effective algorithm that can take full advantage of the hardware (the camera, and CPU) with minimal overhead. There is a gray line between system programming and application programming but I think based on what I read in this thread, your app leans on the system side.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:Yup, those numbers are roughly what I got in my tests (implicitly doubled, since you have to pack your bytes back again)

Wrong. Again, you're thinking like a C programmer. You bang those ints out to another buffer and let another thread deal with compacting them.

I mean, Winston... use a multi-core, multi-threaded (and you haven't mentioned how I synch this or the overhead involved in that yet) approach, just so I can get my bytes into ints? As a work-around for the fact that C has "unsigned byte" and Java doesn't? With all due respect, my colleague, are you kidding me?



Actually, as I said before, the "unsigned" part has nothing (or very little) to do with it. The problem you're running into is that Java's basic unit of numeric operation is an int.
And what processor these days isn't multi-core? Or at the very least dual core. The fact is that in Unix, the standard pipe symbol (|) would do exactly what I'm talking about (ie, separate the processes of input, process and output) automatically; and in Java threading is a lot easier than it is in C/C++. By forcing everything into a single thread you've just prevented the OS (or JVM) from doing any parallel processing for you, so your latency will always be determined by the sum of the operations involved.

And synching is entirely up to you. If you make your buffers 64 meg instead of 64k, surely to God it's a relatively easy matter to 'chunk' your stream I/O so that sync points only occur every 64 or 128k? That's precisely how buffered I/O works.

If I've got time over the weekend I may try out some simple thoughput tests of threaded and non-threaded solutions (and they will be simple; I ain't writin' no real-time video pixel-mangler ), but I honestly think that you're obsessing on a very small part of your overall problem.

Winston
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Steve Luke wrote:
Stevens Miller wrote:
Greg Charles wrote:Similarly, you are using Java for what I would call a systems programing job. Because of that, you are hitting the reverse of the problems that C++ has. I think doing some of your work in native functions developed in C, or, hell, assembly, makes a lot of sense. If you're not using the right tool for the job, the fault isn't the tool's.


Ahem. Image processing is hardly systems programming.


I think I agree with Greg here. What you are doing is not image processing in the traditional sense where speed only sort of matters in terms of responsiveness to the user. You are really programming an extension to the webcam, need to worry on a low level about system resources, and need to concentrate on an effective algorithm that can take full advantage of the hardware (the camera, and CPU) with minimal overhead. There is a gray line between system programming and application programming but I think based on what I read in this thread, your app leans on the system side.


I would say it is much closer to a game or real-time simulator. But, I would also say that supports your "gray line" point. However, it is interesting to note that Winston is making suggestions from the "Java is the right tool for what you are doing" perspective, while, Steve is saying "Java is not the right tool for what you are doing."

What's that tell you?
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Winston Gutkowski wrote:
Stevens Miller wrote:Yup, those numbers are roughly what I got in my tests (implicitly doubled, since you have to pack your bytes back again)

Wrong. Again, you're thinking like a C programmer. You bang those ints out to another buffer and let another thread deal with compacting them.


Well, you're handing out my cores pretty lavishly, and all for the business of unpacking/packing something that, just because Java can't do what C can do, is to get around an aspect of the language. I'd still rather have unsigned bytes and use my cores for what I am trying to do, not to make up for a lack of a feature in Java that I have in C. (As a side question, what makes you think multi-threading is not something C programmers think of? I can assure you that my C-side code is heavily multithreaded, in some cases running on threads started by the Java side, in some cases started on the C side. I wouldn't agree that C programmers don't tend to use multi-threading.)

I mean, Winston... use a multi-core, multi-threaded (and you haven't mentioned how I synch this or the overhead involved in that yet) approach, just so I can get my bytes into ints? As a work-around for the fact that C has "unsigned byte" and Java doesn't? With all due respect, my colleague, are you kidding me?



Actually, as I said before, the "unsigned" part has nothing (or very little) to do with it. The problem you're running into is that Java's basic unit of numeric operation is an int.


I have to say that the "unsigned" part is why I have this problem at all. Consider: suppose I am averaging adjacent pixels (to blur them, which is a very common image-processing operation). I don't give a fig if the intermediate steps involved in summing two pixel values are treated as ints or not, but I need the values themselves to be regarded as positive. That is

Does what I want in C. In Java, it only works if p1 < 128 and p2 < 128.

And what processor these days isn't multi-core? Or at the very least dual core. The fact is that in Unix, the standard pipe symbol (|) would do exactly what I'm talking about (ie, separate the processes of input, process and output) automatically; and in Java threading is a lot easier than it is in C/C++. By forcing everything into a single thread you've just prevented the OS (or JVM) from doing any parallel processing for you, so your latency will always be determined by the sum of the operations involved.


I think you may have misused the word "latency," there, but it actually addresses another issue. By pipelining my processing, you create a stream of frames. If each frame in the stream is processed within a frame-time at every step in the pipe (more precisely, if no core is bound to a frame for more than one frame-time), you definitely do keep up with the camera's output. But, you introduce a lag of as many frames as there are steps in the pipe. Even at 30 fps, you'd be surprised how few frames of lag (latency) it takes before you notice it and find it bothersome. My current system introduces two frames of latency. I could stand three, but not four.

And synching is entirely up to you. If you make your buffers 64 meg instead of 64k, surely to God it's a relatively easy matter to 'chunk' your stream I/O so that sync points only occur every 64 or 128k? That's precisely how buffered I/O works.

That would help undo the latency issue and would definitely be a viable approach. I still say it's massive complexity used to solve a problem that never should have existed in the first place, and doesn't exist in C (which is why I use C for that part of it).

If I've got time over the weekend I may try out some simple thoughput tests of threaded and non-threaded solutions (and they will be simple; I ain't writin' no real-time video pixel-mangler ), but I honestly think that you're obsessing on a very small part of your overall problem.


Well, you're proven before that you know your stuff, so I'll keep an open mind. But give me a bit of credit here, too: I've been pumpin' pixels for a very long time, sometimes through multi-threaded pipes. My obsession in this case is pretty well-placed.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:I would say it is much closer to a game or real-time simulator. But, I would also say that supports your "gray line" point. However, it is interesting to note that Winston is making suggestions from the "Java is the right tool for what you are doing" perspective, while, Steve is saying "Java is not the right tool for what you are doing."

I'm not sure I was saying that. What I'm saying is that you may not be focusing on the right areas if you want to do it in Java (and this is, after all, a Java forum). I suspect that the number crunching probably is quicker in C/C++ than in Java - and you don't have to worry about converting your streams - but how much quicker I have no idea (I suspect it's less than you think, though). A Java version is probably a lot more portable as well.

Winston
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:...Does what I want in C. In Java, it only works if p1 < 128 and p2 < 128.

Or both are >= 128.

OK, so they have to be masked, which adds about .35 to the op if you're not prepared to convert.

I also tested the same op with shorts and chars, and the results are so close to int that I can't really make any definitive judgement; although char is generally the quickest - which suggests another solution: why not just use a Reader and Writer and deal with the pixels as characters? Then you're guaranteed unsigned results, and you may save some time on sheer volume.

Fun stuff.

Winston
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Winston Gutkowski wrote:
Stevens Miller wrote:...Does what I want in C. In Java, it only works if p1 < 128 and p2 < 128.

Or both are >= 128.

Everybody loves a smart-ass.

OK, so they have to be masked, which adds about .35 to the op if you're not prepared to convert.

And looks oh-so-readable.

I also tested the same op with shorts and chars, and the results are so close to int that I can't really make any definitive judgement; although char is generally the quickest - which suggests another solution: why not just use a Reader and Writer and deal with the pixels as characters? Then you're guaranteed unsigned results, and you may save some time on sheer volume.


Or, of course, I could just use a language that has an unsigned primitive in the size I get my data to begin with...

Fun stuff.


Quite. It is amazing to me just how fast modern hardware (and the jit compiler, I would guess) handle these things. I just remain convinced that an unsigned byte primitive would be the better way.

I'll do a few timings of my own, since this discussion has suggested a few things I didn't know last time. Let's see what we come up with.
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 523
    
    3

Winston Gutkowski wrote:
Stevens Miller wrote:I would say it is much closer to a game or real-time simulator. But, I would also say that supports your "gray line" point. However, it is interesting to note that Winston is making suggestions from the "Java is the right tool for what you are doing" perspective, while, Steve is saying "Java is not the right tool for what you are doing."

I'm not sure I was saying that. What I'm saying is that you may not be focusing on the right areas if you want to do it in Java (and this is, after all, a Java forum). I suspect that the number crunching probably is quicker in C/C++ than in Java - and you don't have to worry about converting your streams - but how much quicker I have no idea (I suspect it's less than you think, though). A Java version is probably a lot more portable as well.

My Java was much weaker when I designed the system than it is today, so I might well do more in Java today if I were to design it over again. But, I do think this is a case where some of the stuff I'm doing is best done in C. To an extent, I have no choice but to do at least some of it in C, since DirectShow only provides a C interface. (There's a C# wrapper out there, but I'll be honest and admit I don't know how to get it to work. DirectShow is so hard to use that I created native methods that deliberately hide most of what it does from the Java side, just so the messy stuff wouldn't invade my nice, neat, easy-to-understand Java side. If a Java wrapper is even possible, I still wouldn't want it, but that's because of DirectShow, not anything to do with Java.)

You know, it might be worth it here to explain that this issue is almost accidental in its origins. That is, eight-bit bytes have their origins in hardware choices made a long time ago. Some say they were the right size for EBCDIC characters. Others point to the natural progression of integral powers of 2 (since the evolution from eight-bit bytes has been to 16, then 32, then 64, and so on; indeed, even the hoary hold 8008's predecessor, the 4004, was a four-bit processor). Regardless, no one chose the "byte" as we know it today because it was a tidy size for color-channel values. Quite independently, vision researchers (mostly coming from the world of photography, which, back then, had little use for computers) were saying that, from black to white, the human eye cannot differentiate adjacent shades any closer than some degree of change. In the early days of computer graphics, display were, effectively, one-bit displays. A pixel (or vector) was either "on" or "off." Later, more bits gave us more colors. Eventually, someone asked, "How many bits do we need before we can present smoothly shaded images, instead of regions of single colors?" The vision scientists were saying we can only see 10,000,000 colors. As we all know, 2^24 = 16,777,216. Well, that's clearly bigger than 10,000,000, so the thinking was that we could nicely meet the limits of human vision with three eight-bit color channels, and make good use of existing hardware at the same time. One byte, one color, one channel. Neat, right?

Only, that turns out to have been based on a false assumption, which was that your eye can't distinguish any two shades of one color if each is on a range from 0 to 255, where 0 = black, and 255 = "full" red (or green, or blue). Firstly, "full" is relative. Some displays can cover a wider range of brightnesses than others. Further, hardware that naturally converted 256 values into 256 shades tended to do so in equal steps. But, most of us will see [0, 0, 0] and [1, 1, 1] as the same shade (that is, black) and [254, 254, 254] and [255, 255, 255] as also the same shade (white). But, in the middle, most of us can actually distinguish [127, 127, 127] from [128, 128, 128]. So why did we ever "settle" on eight-bit color channels? Because, for most things, it is good enough and, people being the sheep we are, the idea of one-byte/one-channel was just too good to pass up. In contemplation of our actual abilities to see different shades, some displays use 10-bit (and, at the very high end, 12-bit) color channels. Alas, if that had been standard practice at all levels from the start, my entire problem with unsigned bytes would go away, because, at a minimum, we would all be using shorts to store color values. Since even a signed short gives you (me, that is) 15 unsigned bits, I would have been happily doing all my graphics work in signed values, secure in the knowledge that no two 12-bit values would add together in a way that overflowed my primitive's range. But, two different lines of evolution converged on each other in a way that screwed me: computer architecture evolved into the standard eight-bit byte, and human vision evolved into being happy with (if not utterly fooled by) 256 distinct shades per each of three color channels. Intel and Almighty God (not the same person, no matter what they are saying in Santa Clara) shook hands on a number that, when Gosling was interviewing C programmers, just happened to be the one he wouldn't leave unsigned. If either the byte or the eye had been designed differently, none of this conundrum would have been affected by what the father of Java learned in those interviews.

Sorry if that's a boring history lesson, but I find it interesting and I also like to hear myself talk (what, you don't read aloud as you post in the forums? why not?), so there it is.

Everybody have a good weekend.

Stevens
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:Sorry if that's a boring history lesson...

Not at all. I knew some of what you mentioned, but not all; and I really enjoy that background stuff.

BTW, have you ever watched QI? I think you might enjoy it.

Everybody have a good weekend.

You too. Good hunting.

Winston
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38509
    
  23
Stevens Miller wrote: . . . Everybody have a good weekend.

Stevens
Thank you
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7716
    
  20

Stevens Miller wrote:Or, of course, I could just use a language that has an unsigned primitive in the size I get my data to begin with...

Except that presumably you're not - or at least not completely - otherwise this discussion would never have started.

Winston
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Boxing, unsigned bytes, and how to process them at various speeds.