This week's book giveaway is in the OCP forum.
We're giving away four copies of OCP Java SE 8 Programmer II Exam Study Guide and have Kathy Sierra, Bert Bates, & Elizabeth Robson on-line!
See this thread for details.
Win a copy of OCP Java SE 8 Programmer II Exam Study Guide this week in the OCP forum!

Kyle Banker

author
Greenhorn
+ Follow
since Nov 22, 2011
Engineer at 10gen. Author of MongoDB in Action.
New York, NY
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
2
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Kyle Banker

Hi Arun,

The first chapter of the book provides an overview of many NoSQL technologies and compares them to MongoDB. But the book is really devoted to MongoDB. If you'd like a deep dive into a single technology, then this will be a good book. But if you just want to learn about all these technologies as a whole, there are probably better resources.

Regards,
Kyle
6 years ago
Hi Jason,

The book has a few examples in Ruby and samples in an Appendix for Java. But on the whole, the book is language-agnostic. The techniques you'll learn will apply to any language environment.

Kyle
6 years ago
Hi Thomas,

What I can say is that although both are classified as "document database", they quite different.

There are huge differences in architecture, storage engine, etc. MongoDB has single-master replication with automated failover. CouchDB has multi-master replication, which requires client-side resolution of conflicts. MongoDB scales by partitioning data across shards. CouchDB is purely replication-based, if I recall correctly.

I do know that there have been big changes in the CouchDB project of late (see CouchBase) but don't know much about them. I'm also not familiar with CouchDB design documents. Sorry I can't be more help!

Kyle
6 years ago
Hi Marcus,

Yes, you can use any value for _id so long as it's unique.

In MongoDB, there's no multi-master replication and within a single node, all writes are serialized. So conflicts aren't possible. Last write wins.

In your example, whether our writes conflict depends on how you issue the write. If you use $set to set the value of the field containing said sentence, then like before, the last write will win. You can use optimistic locking to prevent one writer from overwriting a concurrent change, as you might do with an RDBMS. MongoDB, like I believe CouchDB, does not support transactions.

Kyle
6 years ago
Hi Sujoy,

For what it's worth, most Java developers I know use Morphia:
http://code.google.com/p/morphia/

Regards,
Kyle
6 years ago
Hi Kandpal,

Yes, there's an entire chapter in sharding in my book. You can also read a lot about it in the online docs:
http://www.mongodb.org/display/DOCS/Sharding+Introduction

Kyle
6 years ago
Hi Paul,

You have to run it as a separate process. MongoDB is written in C++, so it wouldn't be embeddable in Java anyway.

Kyle
6 years ago
Hi Askar,

I'd recommend using a single collection as opposed to a collection per user. You'll get much better space utilization this way, and as you saw, collections can eventually be sharded if needed.

What is the nature of the event? We often recommend pre-aggregating the data using counters within the document. You can see some concreate examples of this technique in the following presentation:
http://speakerdeck.com/u/mongodb/p/mongodb-for-analytics-john-nunemaker-ordered-list

Kyle
6 years ago
Hi Arun,

My two cents: there's way too much variety among the databases billed as NoSQL to be able to say anything definitive about them. So, for me, the short answer is no.

Regards,
Kyle
Hi Alessandro,

There are quite a few differences. The primary purpose of a graph database is to represent graphs and provide easy traversal of them. MongoDB, on the other hand, has been build as a generic data store with a rich query language that can replace a relational database or a key-value store for a number of use cases (web applications, e-commerce, caching, sessions storage, analytics, etc.).

Kyle
6 years ago
Hi Arun,

I can speak for MongoDB, which does make it easy to change schemas in the fly. It's easy, for instance, to do so lazily by writing code that can respect the old schema while updating it with it's next accessed. If changing the schema also means changing indexes, well, that's the sort of schema migration that's essentially analogous to that of an RDBMS: it may require some planning.

Kyle
6 years ago
Hi Khalil,

No, I don't that they will "dominate" per se. I believe that RDBMSs will be important for a long time. However, I do think that there are quite a few use cases that are more compelling for NoSQL databases than for RDBMSs.

Kyle
Hi Karthik,

The MapReduce API is a bit difficult to use, I agree. But the biggest problem with MongoDB's MapReduce is that it's kind of slow for large data sets. MongoDB 2.1 (unstable) includes the new aggregation framework that will replace MapReduce for most purposes. It's a lot faster, if a bit less flexible, than the current MapReduce. You can read about it here:
http://www.mongodb.org/display/DOCS/Aggregation+Framework

The main reason I chose not to spend a lot of time in the book on MapReduce is precisely because this new framework will become much more relevant to most developers with the next release.

Regards,
Kyle
6 years ago
Hi Chris,

Thanks for the question.

I cannot completely assure you that people aren't learning MongoDB at least in part because they don't want to learn or use SQL. MongoDB has a fairly intuitive query language, and it's a small reason why people gravitate toward it.

As with any database, success with MongoDB does depend on being able to think critically about how you're going to use your data. MongoDB supports rich, dynamic data structures, and you can change them on the fly without having to issue ALTER TABLE statements, but this does not mean that you can get away with sloppy data modeling.

We've seen the same problem of people wanting to use ORMs to hide the realities of the database, but this doesn't work for non-trivial applications, just as it doesn't for SQL databases.

As for use cases, I'd say that MongoDB is ideal for these situations:

1. When the application's data is inherently unstructured. Think products in an e-commerce site. Each product can have an arbitrary set of attributes. MongoDB documents make this pretty easy to model.

2. Rich data models that don't require joins. You'll often see relational schemas that break a single "object" into a dozen different tables. If the object in question has to be constructed using a SQL join every time it's displayed, and if there's no ancillary benefit to having the data modeled in this way, then there's a lot of unnecessary added complexity there. Consider a page in a content management system. Why does each individual element need to be in separate record or table? A MongoDB document can typically store all these elements in a structured way while still facilitating sophisticated queries over them. This has the added benefit of providing good locality.

3. Analytics. I won't go into detail now, but there are certain types of analytics applications (think website activity tracking) that MongoDB has been optimized for.

4. High availability. MongoDB's replication system provides automated failover.

5. Sharding. If you have a lot of data but want to run on commodity hardware, MongoDB 's sharding can be quite compelling.

Hope that helps!

Kyle
6 years ago