Rick Copeland

+ Follow
since Apr 09, 2013
Merit badge: grant badges
For More
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Rick Copeland

Mark Spritzler wrote:Also, it would be great to know if they will ever allow you to use one of the field values from one attribute in the update of another attribute.

Unfortunately, it's not currently possible to update one field based on another field's values in MongoDB (like adding two fields together, for instance). You can do queries similar to this using the aggregation framework, but it's not really the same.
9 years ago
Doing a spike to figure out the best schema is often the easiest way to get to a good schema, unfortunately. At this point in time, NoSQL schema design is in its infancy compared with RDBMSs. Some things that I would keep in mind when designing a schema:

  • When embedding an array inside a document, make sure the array doesn't grow without bound (so it won't overrun the 16MB document size limit)
  • Make sure that embedded arrays don't grow too frequently, as this can turn a fast update-in-place to a slow copy operation
  • Try to keep things that need to be updated atomically inside the same document, since making correctness guarantees across documents without real transactions is tedious and error-prone.
  • Ideally, have the 'unit of work' you're dealing with only touch a single document. In a web app, for example, this would mean that each page only loads a single document.

  • Of course, many of these rules of thumb can't be satisfied all the time, and thus comes the 'art' of schema design in MongoDB. The best advice I can give is to try different approaches, turn on slow query logging and the profiler, make sure you have the right indexes defined, and be ready to adapt when you find a better way to do it.

    Hope that helps!
    9 years ago
    Well, you can remove several elements from an array, or you can update a single element, but there is no way to update multiple elements in a single pass unless you replace the array all at once. To do the replacement, consider that you might want to increment all the integers in the 'b' property of the following document:

    In this case, you might do the following in Javascript:

    Hope that helps!
    9 years ago
    Well, the question of whether you really need transactions in banking is actually an interesting one... generally, banks do not use ACID transactions for everything; they instead use methods to bound the risks of inconsistency and perform periodic reconciliation of their legers. For instance, it's important to a bank that you be able to withdraw funds from an ATM even if that ATM cannot connect to the server, so they typically put a limit of $500 or so on withdrawals from an ATM. So it's possible to overdraw an account, but not by much.

    My book has a specific example in the ecommerce chapter describing how you can implement transaction-like guarantees (in this case, for inventory management) using a technique that I call optimistic update with compensation.

    If you really need ACID transactions, you can actually build a 2-phase commit structure atop MongoDB collections; I show how to this in chapter 3, IIRC. I will warn you, however, that it's really easy to forget an edge case when you're building something like this, so if you can restrict your atomic updates to a single document, you'll be much less likely to make a mistake.
    9 years ago
    I suppose it depends on how you want to use the object once it's in MongoDB. You certainly can choose to just shove everything into a GridFS "file", but you lose query-ability (a GridFS file should be treated like a BLOB would be in a RDBMS).

    If the object is over 70MB and you don't want to decompose it structurally, you are going to need to store it in GridFS, since MongoDB documents are limited to 16MB each.

    Hope that helps!
    9 years ago

    Pradeep bhatt wrote:ok cool. Also i think mongo db does not support ACID properties of transactions, this may speed up db operations

    This is true; MongoDB performs atomic operations, but they are limited to a single document.
    9 years ago
    You have two basic options for logging to MongoDB. One is to modify your code to log to MongoDB. Another option is to have a separate process that runs periodically (perhaps at log rotation time) to aggregate the log files into MongoDB. The second approach has the advantage of not requiring any changes to your existing applications, but it does mean that the log data in MongoDB will be somewhat out-of-date.

    As for geospatial indexing, I'm not sure of anything that Microsoft/Yahoo may have, so I'll constrain my comments to what MongoDB does. MongoDB does not contain any map data (unless you put some in). The geospatial indexing in MongoDB allows you to store geospatial coordinates (longitude and latitude) for different objects and then execute queries like "find objects in a certain collection, sorted by how close they are to a certain point" or "find all the objects with coordinates that fall within a particular circle" -- things like that.

    I hope that helps!
    9 years ago
    Using just the native Java driver, no, you have to construct special BSON objects. However, there is an open-source "Object-Document Mapper" available for Java called Morphia. I have not used it myself, but I have heard good things about it.
    9 years ago
    There's actually a Second Edition of the Definitive Guide that just came out... probably worth checking it out, as well.
    9 years ago
    I'd say that MongoDB is the one with the widest uptake. Each one has its own strengths and weaknesses; the video linked above is probably a good starting point.
    9 years ago
    MongoDB is one of the databases that people mention as being well-suited for "big data," though scaling is always a tricky thing to do right.

    Since MongoDB does provide a fairly straightforward way to do sharding (partitioning) of your data across many physical servers, it tends to scale well horizontally. So if your data can be partitioned that way, MongoDB is possibly a good fit.

    So in summary, yes, it scales well to big data, though maybe not as well as something like Apache Hadoop (though it is a good bit easier to use than Hadoop).
    9 years ago
    Thanks for the welcome! Looking forward to it!
    9 years ago
    Well, that's a pretty open-ended question, which is probably why the answers you've gotten so far have been unsatisfactory.

    In general terms, yes, MongoDB is great for structured data (also for semi-structured data). Some pros and cons to consider (with the caveat that I'm painting with a broad brush, and there are exceptions to every rule):

    MongoDB over SQL
    - MongoDB has a more flexible model (hierarchical documents versus rows/columns only)
    - MongoDB is easier to scale horizontally (automatic sharding & built-in async replication, etc.)
    - MongoDB schema migrations tend to require less downtime than SQL migrations
    - MongoDB really shines when you have a single application using MongoDB as a persistence engine since you can model your schema after the queries you need
    - In many cases, MongoDB requires fewer random disk seeks than SQL due to its document structure

    SQL over MongoDB
    - SQL typically takes much less space on disk due to its static schema
    - SQL databases (particularly normalized databases) offer a great persistence layer for sharing between multiple applications
    - It's easier to find people who know SQL than MongoDB
    - SQL databases scale better vertically

    There are probably a good number of other trade-offs, but those are the ones can think of right now.

    Hope that helps!
    9 years ago
    I think that you'd get the most out of it if you already had a basic knowledge of Python and MongoDB (completing the online tutorials would be more than enough). If you have basic familiarity with Python, you should be fine with the examples in the book. Likewise with Javascript (I'm not much of a Javascript programmer myself, but it's impossible to say much about MongoDB without mentioning Javascript, since it's the embedded scripting language).
    9 years ago
    I think that's a reasonable approach as a research topic. Some sub-topcis I'd expect to see covered in such a project would be the CAP theorem, ACID transactions, and two-phase commit (all should be easily searchable on wikipedia). As for implementation, I'd focus on the physical storage layer (some SQL databases have pretty advanced journalling and buffer management systems; NoSQL tends to be more basic: MongoDB uses the virtual memory system and memory-mapped files, for example). The approach to distributing load across many physical machines (horizontal scaling) and the approach to maximizing performance on a single machine (vertical scaling) could also provide a nice bit of material.

    I think the performance evaluation flows out of the use case, and the constraints that each system imposes on you, so rather than saying "SQL is faster" or "NoSQL is faster", it might be more appropriate to show problem domains where each SQL or NoSQL (or certain types of NoSQL) databases excel.
    9 years ago