« Flashless | Main | SAFE Act considered harmful »

Scaling Reads: Replication vs Partitioning

The common method of read scaling is to copy read heavy data to more nodes. This increases the available read throughput of a datum by increasing the total amount of resources dedicated to serving it. This will always be necessary when the read load for a single datum surpasses the resources of a single node (IO, CPU, Network, etc). This method scales well but has the down side that it is often not efficient with respect to storage space. This is not so often a issue for data on disk but is a often issue for data cached in memory. The following example demonstrates the difference:

Let us posit that there two nodes { node1, node2 } and a data set having four datums, { A, B, C, D } with the load distribution <A:40%, B:40%, C:10%, D:10%>. To scale reads one could fully replicate the datums across both nodes, node1:{ A, B, C, D }, node2:{ A, B, C, D }. We would then distribute the query load evenly between the two nodes so they were each handling one half of the total load for the data. A second option would be to share the load between the two nodes by partitioning the data across the nodes. One such repartitioning is node1:{ A, C }, node2:{ B, D } resulting in the load distribution of  node1:<A:40%, C:10%>, node2:<B:40%, D:10%>.  With this partitioning the read load is still evenly distributed across the nodes but only half as much data is stored at each node.

Even though this example appears to assumes that reliable load data is available for each datum it is possible to achieve a even distribution of load by methods such as repartitioning with only node level knowledge.  One such method would periodically randomly select a datum from node a with above average load and move it to a node with a below average load. Such a scheme should lead to a balanced cluster over time. ( This scheme will likely be discussed in more detail in a later post. )

The cost of using partitioning as a read scaling method is the added complexity of creating the partitions on the data and maintaining the routing information required to deliver queries to the appropriate node.

Tags:

TrackBack

TrackBack URL for this entry:
http://0x0000.org/cgi-bin/mt/mt-tb.cgi/19

Comments (1)

Most of the existing sharding schemes (Livejournal, etc) have used the method of partitioning ALL data for a specific user to a specific shard/server.

This is primitive, easy to implement, but far from elegant.

To further complicate matters if each 'chunk' of data is small and you don't have very man hotspots this method of data distribution actually works.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on November 28, 2007 6:45 PM.

The previous post in this blog was Flashless.

The next post in this blog is SAFE Act considered harmful.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.33