Win a copy of Five Lines of Code this week in the OO, Patterns, UML and Refactoring forum!

Ryan McGuire

Rancher
+ Follow
since Feb 18, 2005
Cows and Likes
Cows
Total received
22
In last 30 days
0
Total given
0
Likes
Total received
80
Received in last 30 days
0
Total given
54
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Rancher Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Ryan McGuire

Bill Platt wrote:

Paul Clapham wrote:Does your assignment allow you to do that?



There isn't anything specified that says that I can't do it.

Unfortunately, the data set is a table that was provided contains almost 38K entries for customers, and a table that contains the orders, so creating a table would be time-consuming.


Each state contains in excess of 500 entries, it's why I wanted to use the OR clause.

thanks for the response

--Bill



If I'm not mistaken, Paul was suggesting a simple 50-row-max table that showed the region for each state:


Now by JOINing and GROUP-BYing you should be able to get data on a region-by-region basis.

Also, sometimes you don't know ahead of time what kind of Animal you need.

3 months ago
Aha!!!  It turns out I left out some code that I didn't realize had a bearing on the question.  Here's some new and improved code:

A.js:


B.js:


When B.b() is called from line 5 of A.js, the "this" inside of b() is "global" so this.console.log() is valid, and everything works fine.  

However, when B.b() is called as in line 6, "this" is the function b(), so this.console is undefined and the function this.console.log() is not found so a TypeError is thrown.

In the real code I found this in, the B.b() function never actually used "this" for anything, so that didn't trigger any alarms.  I guess A.js is using the (0, B.b) idiom to make sure B.b is executed in the global context in case it ever does use "this".

I apologize for not providing all the relevant code the first time around.

Disclaimer: I still don't quite get all the subtle details, but I at least know what to google do a web search for now.
Look at line 9.  What kind of argument does range() take and what does it return?  What kind of things can be operands for a subtraction?  Is there a function that can get the size of a list or array?
4 months ago

Mike Simmons wrote:I'm less impressed by the Stack Overflow post, particularly the part that gave us the addProperty() method discussed here.  The problem is that whether or not the individual methods are thread-safe (yes for StringBuffer, no for StringBuilder), the overall code gives unpredictable results  because you need more than one append() call to add a property, and there is nothing preventing two different threads from interfering with each other.  Even if we use a "thread safe" StringBuffer, the code using the StringBuffer is not thread safe.  So it's a confusing example.

Mike, regarding your original question, I would say that in general it's often pretty difficult to *demonstrate* a lack of thread safety.  Most of the time if you write unsafe code, it will probably work fine, most of the time.  But sometimes, a problem may occur. And when it does occur, it tends to be hard to understand the cause, especially since we can't reliably reproduce the problem.  



Agreed about the SO post.

In any but the nicest multi-threaded environment, the code given above probably won't give you what you want consistently.

If you call main() from two different threads the first one is likely to output "1=2,a=b,c=d,e=f" and the second one will give you "1=2,a=b,c=d,e=f,a=b,c=d,e=f".  (...where the bolded output comes from the second thread.)

The second thread might take control between the addProperty() calls in main.  The second thread could complete first with output, "1=2,a=b,a=b,c=d,e=f" and then the first thread would restart and complete with "1=2,a=b,a=b,c=d,e=f,c=d,e=f"

The second thread might even interrupt between append() calls in addProperty().  Second thread: "1=2,a=b,c,a=b,c=d,e=f"  First thread: "1=2,a=b,c,a=b,c=d,e=f=d,e=f".

And of course there are other possibilities as well.

If you do indeed want the StringBuilder/StringBuffer to start with "1=2" and then have one copy of the other properties for each thread that runs main(), you could make main() thread-safe by making the chunk of code that calls addProperty() synchronized.  Google "java synchronized".



4 months ago
I ran across some code in a NodeJS application recently that boils down to the following:

In A.js


In B.js:


Why is line 4 of the first file written like that?  If I understand correctly, the comma expression in the parens evaluates both operands and has a value equal to the second, so the value of the comma expression is the B.b function.  B.b is then called with a as an arg and the result is assigned back to a.  As you might expect, the output of the code is "a is now 3".  I see what it's doing, but what was the point of using (0, B.b) instead of just plain B.b?

The code was too nicely indented and commented for this to be obfuscation tactic.

Any thoughts?
We're 95% (probably more) a Windows shop.  However, there are a few linux-based containers that we'd like to take advantage of.

Correct me if I'm wrong, in general, containers use the host kernal, so it's easy to run Windows containers on Windows hosts and Linux containers on Linux hosts, but it's difficult (but not impossible) to "cross host".  What is the best way to host linux-based containers on a Windows installation of Docker?  Would such a thing be so horribly non-performant that it's not worth trying, or is it a reasonable way to go?

(BTW, don't try to convince me to completely switch over the entire enterprise to Linux.  It just ain't gonna happen.)
6 months ago

Campbell Ritchie wrote:

. . . 3 ms to complete for a data set of N=500, how long would you expect it to take for an N=50,000 . . .

At this point, you might start to move out of Big O notation; maybe 50000 executions will fill the available heap space and require multiple GC runs, causing space complexity to become its limiting factor. Repeated String concatenation is rather like that, though the Java9+ performance is much faster than in earlier versions.



Maybe if we increase the initial data set size by a factor of 3 or 5 or 12, instead of 100, we can avoid the big O expression becoming invalid while still making my point about the memorized table being restricted to only doubling the data.
8 months ago

Junilu Lacar wrote:But that kind of begs the question "Why try to come up with all these other approaches for this exercise when the 2nd column already tells you what to do?"



There are a handful of reasons I went into the math for figuring it out:
1. At least for myself, it's a lot easier to remember a single method to figure it out than to memorize a table.
2. By knowing the general method for figuring it out, you can figure out the estimates for other big O expressions that aren't in the table, such as the n²logn that Campbell contributed.
3. The 2x < work < 4x rubbed me the wrong way.  Even at the level of N=500 or 1000, the run time is multiplied by a factor of 2.2 (if I recall correctly).  As N continues to grow, the "4x" end of the range becomes meaningless.
4. The "increase by 1" for the O(logN) row in the table also rubbed me the wrong way.  As my first message in this thread suggested, giving a unitless number in this context is meaningless and can give wildly different results depending on what unit the original test run was measured in - 0.003 s, 3 ms, 3000 microseconds, 3000000 ns.  For t=3ms at N=500, the amount to be added for each doubling is 0.33 ms or so.
5. By working out the math, you can also estimate the time needed for data set sizes other than those that are some power of 2 larger than the known one.  e.g. If an O(N logN) algorithm takes 3 ms to complete for a data set of N=500, how long would you expect it to take for an N=50,000 data set?  Even if you say the that N increased by a factor of 100 which means it "doubled" between 6 and 7 times, you still need to apply the rule from the table 6 and 7 times and then estimate somewhere in between.  If you know the math, you can just work it out directly.


8 months ago

Ana Smith wrote:Thank you, also there is math associated with this so I want to learn how to prove that it is 2x<work<4x



As far as the math goes, I'm going to point at my first post in this thread:  Use the known data set size and time to determine a coefficient, and then use that coefficient to estimate the time for the larger data set size.

c * 500 log(500) = 3ms
c * 500 * 2.7 = 3ms
c * 1350 = 3 ms
c = 0.0022 ms

0.0022 ms * 1000 * log(1000) = t
0.0022 ms * 3000 = t
6.66 ms = t

So yes... the time estimate for 1000 is between 2x and 4x the measured time for 500.

The worst part about this, as has been alluded to earlier, is that it provides an exact number for what is really an estimate at best, thus potentially giving you a false sense of security.  While the time is O(N logN) as N gets very large, there may well be some lower-order terms that may be contributing significantly to the run time at the 500-1000 range.

HOWEVER, I do think that working through the math will let you see what's contributing to the estimates.  As you can see above, for O(N logN), doubling N from 500 to 1000 will take the value that the coefficient above is multiplied by from 1350 to 3000 - i.e. a little more than double.  You can backtrack through the math I did above to see where those two numbers came from for the two data set sizes.

8 months ago
I guess it's just a question of degree.  You agreed with me that using a unit of one second or one minute is unreasonable, even though using either of those units would also yield a perfectly fine log curve.  My point is that using a unit of ms, which is still off by a factor of three, is also unreasonable.  However, I will concede that being off by a factor of 3 is certainly more reasonable than being off by a factor of 3000 or 180,000
8 months ago

Junilu Lacar wrote:I dont think the point of the exercise is about getting precise measurements. Big O is about worst case. What would O(250), O(125), and O(62) be then? Are we really concerned about what those exact numbers are in this exercise?



I would say that the point of the exercise is to give the student an idea of how big O notation works.  If a bit of notation in the table is meaningless, then it fails at supporting the point of the exercise.  

I would change "increased by 1" to "increased by a constant amount".  At least that would be correct, even though you would still have to do the math to figure out what that "constant amount" is.
8 months ago

Junilu Lacar wrote:

Ryan McGuire wrote:For the O(log N) row...  Increased by 1 what?  ms?  s? minute?


I would assume the student is expected to apply critical thinking here as well. Understanding the shape of the growth curve of O(log N) is important. If n = 500 is completed in 3 ms, is it really reasonable to think that n = 1000 would take 1 whole second longer, much less 1 whole minute longer?



I 100% percent agree that some critical thinking is needed.  My point is that "increased by 1" is meaningless exactly because it doesn't indicate what the units are.  In fact, if you work through the math, the amount that is added for doubling of the data set size from 500 to 1000 is considerably less than 1 ms.  Once you know what that additional time is when increasing the data set size to 1000, you can easily work out the expected run times for data set sizes 2000, 4000, 8000, etc.

8 months ago
For the O(log N) row...  Increased by 1 what?  ms?  s? minute?

How to do these in general:

1. Create an equation that sets the known time equal to an unknown coefficient times the big O expression with the known value for N substitutes in.  
2. Solve that for the unknown coefficient.
3. Use that with the other known value for N to get the (reasonable estimate of) the time for that N value.

Let's try the O(N) row.  The known N is 500 and the known time is 3 ms.

Step 1: Let's call the unknown coefficient "c":  3ms = c * 500
Step 2: c = 3ms / 500.
Step 3: t = (3ms/500) * 1000, so t = 6ms.

Now do that for the other rows.

For the O(log N) row, the equation for step 1 would be c * log(500) = 3 ms.

You do the rest.
8 months ago

Sam Peterson wrote:Why does the greater than print out? We exit the while loop when i is equal to 10, not greater than 10.



Similarly, you might also ask why the line "10<10" is printed, since that is clearly false and you seem to have guarded executing the println() when i is not strictly less than 10.

If i has an initial value of, say, 9, the expression

has the effect of incrementing i by 1 up to 10, but i has the value of the "old" value of i, which is 9.

Try editing your code so that the while loop condition uses the prefix increment operator instead of the postfix one:



What does the program output now?  Do you see why there's a difference?
9 months ago