posted 4 years ago
Hello
Cassandra is a write-intensive database. Its write performance is higher than most other Nosql dbs. Cassandra follows a peer to peer architecture, as opposed to master-slave architecture of MongoDB and most RDBMS. That means you can write to any peer and Cassandra will take care of data synchronization. That's why its faster. Having said that Cassandra has some shortcomings when it comes to querying data, hence data modeling is the most important part of using Cassandra well. To enable the fast read/write, Cassandra allows you to query only by its primary keys. The partition key enables segregating data into partitions. So Cassandra can determine which partition to look for your data by the partition key. The clustering key keeps the data stored in the tables in sorted order. I am not aware if you can do custom sorting on any field in Cassandra. Of course, you can create secondary indexes on fields other than Primary keys, to query by them, but the moment you do that you degrade performance drastically. All this makes data modeling quite a challenge in Cassandra. Often if you modeled according to a certain requirement, and later when a new requirement comes along that means you need to change the data model again. Cassandra also has a steeper learning curve compared to MongoDB.
The best tool
Apache Spark
Often used as a framework for building analytic tools on top of, Spark is an open-source processing engine that is built for speed, ease of use and sophisticated analytics.
A huge amount of backing is being given to Spark, with over 750 contributors from over 200 organizations aiming to develop on it and advance it.
A number of companies such as Hortonworks and IBM have all been busy integrating Spark capabilities into their big data platforms, and it could be set to become the default analytics power for Hadoop.
I hope this will help to you