Thứ Ba, 10 tháng 2, 2015

NoSQL – HBase vs Cassandra vs MongoDB




nosql-logos
What is NoSQL?
NoSQL provides the new data management technologies designed to meet the increasing volume, velocity, and variety of data. It can store and retrieve data that is modeled in means other than the tabular relations used in relational databases. NoSQL systems are also called “Not only SQL” to emphasize that they may also support SQL-like query languages.
Why do I need NoSQL?
The Relational Databases have the following challenges:
  • Not good for large volume (Petabytes) of data with variety of data types (eg. images, videos, text)
  • Cannot scale for large data volume
  • Cannot scale-up, limited by memory and CPU capabilities
  • Cannot scale-out, limited by cache dependent Read and Write operations
  • Sharding (break database into pieces and store in different nodes) causes operational problems (e.g. managing a shared failure)
  • Complex RDBMS model
  • Consistency limits the scalability in RDBMS
Compared to relational databases, NoSQL databases are more scalable and provide superior performance. NoSQL databases address the challenges that the relational model does not by providing the following solution:
  • A scale-out, shared-nothing architecture, capable of running on a large number of nodes
  • A non-locking concurrency control mechanism so that real-time reads will not conflict writes
  • Scalable replication and distribution – thousands of machines with distributed data
  • An architecture providing higher performance per node than RDBMS
  • Schema-less data model
CAP Theorem and NoSQL databases
CAP provides the basic requirements for a distributed system to follow the following requirements:
  • Consistency (all nodes see the same data at the same time)
  • Availability (a guarantee that every request receives a response about whether it was successful or failed)
  • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Theoretically it is impossible to fulfill all three requirements. Therefore the current NoSQL databases follow the different combinations of the C,A,P from the CAP theorem.
CA – Single site cluster, therefore all nodes are always in contact. When a partition occurs, the systems blocks.
CP – Some data may be not accessible, but the rest is still consistent/accurate.
AP – System is still available under partitioning, but some of the data returned may be inaccurate.
The following graph shows where RDBMS and different NoSQL databases fit into the CAP theorem.
CAP

NoSQL is A BASE not ACID system
NoSQL is a BASE system that gives up on consistency. A BASE system has the following characteristics:
  • Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.
  • Soft State indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
  • Eventual Consistency indicates that the system will become consistent over time, given that the system does not receive input during that time.
NoSQL Classification
NoSQL TypeDocument Data StoreKey ValueColumnGraph
Data ModelCollection of key value connectionsCollection of key value pairsColumn families“Property Graph” – Nodes
StrengthIncomplete Data TolerantFast Look-upsFast Look-upsGraph Algorithms – Shortest path, etc
WeaknessQuery Performance, No Standard Query SyntaxStored Data has no schemaVery low level APINot easy to cluster, need to traverse whole graph to get answer
ExampleMongoDB, CouchDBAmazon Simple DB, RedisHBase, CassandraInfoGrid, Infinite Graph
Read/Write speed: column > document > key-value >graph
Query/Navigation speed: graph > key-value > column > document
HBase vs Cassandra vs MongoDB
NoSQL DatabaseHBaseCassandraMongoDB
Key characteristics
  • Distributed and scalable big data store
  • Strong consistency
  • Built on top of Hadoop HDFS
  • CP on CAP
  • High availability
  • Incremental scalability
  • Eventually consistent
  • Trade-offs between consistency and latency
  • Minimal administration
  • No SPF (Single point of failure) – all nodes are the same in Cassandra
  • AP on CAP
  • Schemas to change as applications evolve (Schema-free)
  • Full index support for high performance
  • Replication and failover for high availability
  • Auto Sharding for easy Scalability
  • Rich document based queries for easy readability
  • Master-slave model
  • CP on CAP
Good for
  • Optimized for read
  • Well suited for range based scan
  • Strict consistency
  • Fast read and write with scalability

  • Simple setup, maintenance code
  • Fast random read/write
  • Flexible parsing/wide column requirement
  • No multiple secondary index needed
  • RDBMS replacement for web applications
  • Semi-structured content management
  • Real-time analytics and high-speed logging, caching and high scalability
  • Web 2.0, Media, SAAS, Gaming
Not good for
  • Classic transactional applications or even relational analytics
  • Applications need full table scan
  • Data to be aggregated, rolled up, analyzed cross rows
  • Secondary index
  • Relational data
  • Transactional operations (Rollback, Commit)
  • Primary & Financial record
  • Stringent and authorization needed on data
  • Dynamic queries/searching  on column data
  • Low latency
  • Highly transactional system
  • Applications with traditional database requirements such as foreign key constraints
Use CaseFacebook messageTwitter, Travel portalCraigslist, Foursquare
Generally, Cassandra performs better than the other two when the data volume is very big.
References:
Choose the right NoSQL Databases https://www.youtube.com/watch?v=gJFG04Sy6NY
NoSQL Databases Explained http://www.mongodb.com/nosql-explained
Why NoSQL? http://www.couchbase.com/why-nosql/nosql-database

Unknown

Nothing is impossible !.

0 nhận xét:

Đăng nhận xét

 

Copyright @ 2013 thevwa.com.

Designed by THEVWA.COM & Thế Nguyễn