Network Coding for Large Scalecontent Distribution
Essay Preview: Network Coding for Large Scalecontent Distribution
Report this essay
Network Coding for Large ScaleContent Distribution
AbstractЖ We propose a new scheme for content distributionof large files that is based on network coding. With networkcoding, each node of the distribution network is able to generateand transmit encoded blocks of information. The randomizationintroduced by the coding process eases the scheduling of blockpropagation, and, thus, makes the distribution more efficient.This is particularly important in large unstructured overlaynetworks, where the nodes need to make block forwardingdecisions based on local information only. We compare networkcoding to other schemes that transmit unencoded information (i.e.blocks of the original file) and, also, to schemes in which onlythe source is allowed to generate and transmit encoded packets.We study the performance of network coding in heterogeneousnetworks with dynamic node arrival and departure patterns, clus-tered topologies, and when incentive mechanisms to discouragefree-riding are in place. We demonstrate through simulations ofscenarios of practical interest that the expected file download timeimproves by more than 20-30% with network coding comparedto coding at the server only and, by more than 2-3 timescompared to sending unencoded information. Moreover, we showthat network coding improves the robustness of the system andis able to smoothly handle extreme situations where the serverand nodes leave the system.I. INTRODUCTIONTypical content distribution solutions are based on placingdedicated equipment inside or at the edge of the Internet.The best example of such solutions is Akamai [1], whichruns several tens of thousands of servers all over the world.In recent years, a new paradigm for Content Distributionhas emerged based on a fully distributed architecture wherecommodity PCs are used to form a cooperative network andshare their resources (storage, CPU, bandwidth).Cooperative content distribution solutions are inherently selfscalable, in that the bandwidth capacity of the system increasesas more nodes arrive: each new node requests service from,and, at the same time, provides service to other nodes. Becauseeach new node contributes resources, the capacity of thesystem grows as the demand increases, resulting in limitlesssystem scalability. With cooperation, the source of the file, i.e.the server, does not need to increase its resources to accommo-date the larger user population; this, also, provides resilienceThis work was done while the first author was with Microsoft research.to “flash crowds”Ж a huge and sudden surge of traffic thatusually leads to the collapse of the affected server. Therefore,end-system cooperative solutions can be used to efficiently andquickly deliver software updates, critical patches, videos, andother large files to a very large number of users while keepingthe cost at the original server low.The best example of an end-system cooperative architectureis the BitTorrent system, which became extremely popularas a way of delivering the Linux distributions and otherpopular content. BitTorrent splits the file into small blocks,and immediately after a node downloads a block from theorigin server or from another peer, the node behaves as a serverfor that particular block, and, thus, contributes resources forserving the block. Further design choices, such as intelligentselection of the block to download, and parallel downloadingof blocks, improve the performance of the system. For adetailed description of the BitTorrent system see [2].Despite their enormous potential and popularity, existingend-system cooperative schemes such as BitTorrent, maysuffer from a number of inefficiencies which decrease theiroverall performance. Such inefficiencies are more pronouncedin large and heterogeneous populations, during flash crowds, inenvironments with high churn, or when cooperative incentivemechanisms are in place. In this paper we propose a new end-system cooperative solution that uses network coding, i.e. dataencoding at the interior nodes of the network, to overcomemost of these problems.A. Network CodingNetwork coding is a novel mechanism proposed to improvethe throughput utilization of a given network topology [3].The principle behind network coding is to allow intermediatenodes to encode packets. Compared to other traditional ap-proaches (e.g. building multicast trees), network coding makesoptimal use of the available network resources and, moreover,computing a scheduling scheme that achieves such rate iscomputationally easy. An overview of network coding and adiscussion of possible Internet applications is given in [4].Using ideas borrowed from network coding, we proposean end-system content distribution solution which optimallyuses the resources of the network. Every time a client needs1
Page 2
Node A Node B Packet 1 Packet 2 Node CPacket 1 Packet 1, or 2, or 1⊕2?SourceFig. 1. Network Coding benefits when nodes only have local information.to send a packet to another client, the source client generatesand sends a linear combination of all the information availableto it (similarly to XORing multiple packets). After a clientreceives enough linearly independent combinations of packets,the client can reconstruct the original information.In a large distributed cooperative system finding an op-timal packet propagation scheme that minimizes the clientdownload time is very difficult. This is especially the case inpractical systems that cannot rely on a central scheduler and,instead, allow nodes to make local decisions. The schedulingproblem becomes increasingly difficult as the number ofnodes increases, when nodes are at different stages in theirdownloads, or when incentive mechanisms are introduced toprevent leeching clients. As we will see in this paper, networkcoding makes efficient propagation of information in a largescale distributed system with no central scheduler easier, evenin the scenarios described above.To illustrate how network coding improves the propagationof information without a global coordinated scheduler weconsider the following (simple) example. In Figure 1 assumethat Node A has received from the source packets 1 and 2.If network coding is not used, then, Node B can downloadeither packet 1 or packet 2 from A with the same probability.At the same time that Node B downloads a packet from A,Node C independently downloads packet 1. If Node B decidesto retrieve packet 1 from A, then both Nodes B and C willhave the same packet 1 and, the link between them can notbe used.If network coding is used, Node B will download a linearcombination of packets 1 and 2 from A, which in turn can beused with Node C. Obviously, Node B could have downloadedpacket 2 from A and then use efficiently the link with C.However, without