Data Storage on Social Networking Sites

Social Networking

The phenomenon of social networking has taken over the web in recent days. People are joining social networking sites every day. A large population of the world is found on such social networking sites (SNS’s). So massive data is generated each day on such social networking sites. Now as everyone knows what are social networking sites? I won’t go into details of it. I will simply focus on the point of discussion i.e. storage issues.

Challenges and Issues related to data storage on SNS’s:

With the 1 billion user base, social networking sites are facing number of challenges to provide better services to their users. In Facebook itself there are more than 500 million users wherein 700000 users are online at any instant of time. Just imagine how efficient and smart should be the way of storage of data for such a huge data base. The storage will become impossible if we will restrict to only one database. Relational databases do not support scaling to an extent of millions of entries. It’s very difficult to use joins on a very big databases or at times different databases. RDBMS requires pre-declared schema and it gets costly as well as difficult to add new constraints in case of huge databases. Data access operations per second decreases with the increasing size of data. These are different challenges. What is the solution for this problem?

NoSql as a solution to SQL:

NoSql has been introduced to tackle such data related problems. NoSQL is a concept used in distributed data stores which intentionally avoids use of SQL. It is easy to use in conventional load-balanced clusters. Data is persistent (not just caches). It gives scalability to available memory. It has no fixed schemas and allows schema migration without downtime. It has individual query systems rather than using a standard query language. NoSQL databases are ACID within a node of the cluster and eventually consistent across the cluster.

Databases used by SNS’s:

SNS use a combination of MySQL and NoSQL
MySQL: It is the basic database for all SNS suitable for frequent used data.
Cassandra: Its’ a NoSQL having very fast data access speed, used by Facebook.
Hadoop/Hbase: It is very huge file system used to store replica of MySQL data, used by Yahoo, Facebook.
Voldemort: It’s a distributed key-value storage system used by LinkedIn.

Increasing size of data on SNS has created a need of smart storage method. Same data can be used by numerous users simultaneously to increase compactness and avoid fragmentation. Efficiency of data retrieval can be increased by using different databases for storing data on the basis of its usage frequency.