Archive for the 'Ramblings' Category
Future for Mimir: C
Background:
Mimir is a distributed file system that we created for our Senior Design. It used FreePastry which is written in Java and overall was fairly successful. Files could be saved, retrieved, deleted. There was user authentication and overall it worked OK in a stable environment. One of the issues that it had however was speed issues doing the encoding and interacting with the network stack. The encoding issue had a lot to do with java performing extra casting for the XOR operations that were key to our erasure code. The network issues were mostly due to not being fully asynchronous and how we were using the network. The network proved to be a large problem particularly with saving a file though also with retrieving a file. After first trying to use the overlay network directly for saving a file we resorted to polling the network for the nodes and then sending the data through a direct tcp socket to the individual connections. This was an issue because we had to 1 open a lot of connections and 2 loop over them to save the data.
Update:
Since then i have come up with a way to distribute the file evenly over all of the nodes without incurring a bottleneck at the top of a tree. To do this when a node wants to send a file out it picks a random ID in the ID space and also selects it’s compliment( ie if the id space is a circle the node 180 degrees from it.) The node then sends half of the blocks of data to node A and half to node B. node A and be split up their half of the circle into halves and each half then repeats this until they pick themselves for a given node(ie they are the only node in their remaining id range.
This has some benefits first it means that the originating node has to do fewer lookups to save the file and does not need to know anything about the sate of the network. Secondly as long as the node id’s are evenly distributed in the space the file will be evenly distributed.
n the downside this means that the speed of saving the file is determened by the speed of the originating node to nodes A and B. However our design case is for LANs so this should not be an issue and we should beable to better saturate the network.
Note: There is no reason to limit the tree to be binary it could be split into any number of pie pieces.
To help achieve this as well as even grater speed we are planing to use Chimera which is the successor to Tapestry and is written in C. This will allow us to perform the XOR operations using various processor optimizations as well as assembly if needed without the pain of the Java C bridge. In addition this should make our code much smaller!
No commentsStatus Update
It has been a long time since we have updated and we figure it’s time for a status update.The first question i’d like to answer is what have we been doing with our time. The answer is of course being code monkeys. In our last post we said that it worked but there was still a lot of work to do. Of that work we now have message authentication and moved all of the configurations into one file. Also lots of cleaning has been done with more to do. The final improvement is in usability. Users and Storage Nodes will be able to automatically register. This means that installation will be extremely easy just drop some files and change the configuration a little and wuala. Speaking of Wua.la I’d like to take some time to talk about our “Competitors”. Of them there really are only two one is RobuSTore and the other is Wuala. First lets talk about RobuSTore they are the closest thing to our project and are from UC SD. They use LT Codes like our project does and have some of the same goals. What is different is that they are concerned mostly with speed and want some redundancy on top of that, and are also concerned with large files on a pre configured network. They do not use multicast or peer to peer systems (or so it appears) and they know where all the bits of data are. This is different from our goals. We hope to have a fast system yes but we want a dynamic system on unreliable dynamically configured networks with unreliable disks and computers in a word Byzantine. The reason for this is that we want to replace SANS and NAS with peoples laptops and PC’s not with cheaper servers. We also want the network to be flexible and work in a peer to peer way so that it can recover from individual network failures.Wuala on the other hand is very different they use ReedSolomon encoding which while an optimal erasure code is much more computationally complex and would slow down storage. Also ReedSolomon does not scale with the number of nodes in a dynamic way it needs pre defined variables to encode and decode thus is not Rate-less like LT Codes are. It also works in a peer to peer system though. However it is designed to be a “web service” instead of something a company could have for their personal use. It is also designed to be very distributed and has no central meta data storage which would not work for companies who need to know who saved what and get access to it. Also by the end of the quarter we will have a paper we will try to get published.
No commentsVarious Meeting Notes
Attached are the notes from various meetings, they are probably not useful to anyone who was not there unfortunately.
- Meeting Notes 1
- Meeting Notes 2
- Meeting Notes 3
- Meeting Notes 4
- Meeting Notes 5
- Meeting Notes 6
- Meeting Notes 7
- Meeting Notes 8
Some Ramblings
Implementing our senior project on Pastry/Scribe.
if using pre-encoded data
Use Scribe to get a list of nodes with disk space free then send the encoded data to the nodes based on their id’s through pastry. If the node has failed a nearby node will have received the data instead. This node may be-able to save the data OR find a new location for it by any-casting it. If it can save the data then we have no extra bandwidth cost of finding a new location for it otherwise we increase the number of packets.
if encoding data on the network, get the list off all available nodes on the multicast network, create a table of who should compute what parodies and the data dependancies for those parody blocks. There will be global topics for all 0 blocks 1 blocks etc… to our max # of blocks in a file/subfile. The nodes should be listening on this already, they will receive the necessary blocks on this communication and compute requested parity data. When done they should send a NOK if there was some error, depending on the error and how severe there may be some consequences.
For sending data to the host we just use Pastry to route the data to the id of the host which requested the data on the data request channel.
I wonder how security will work I imymagine if when we get the save request we can save the public key for all the users.. though that would not support sharing files.
Solution: Make the Metadata controller sign requests for data then the metadata controller can handle access control and data requests are only valid with the metadata controller’s signature.
How would multiple metadata controllers work with this system with ought having all nodes know what controllers are valid. One way is to have a master MDC that is the CA for all nodes and metadata controllers, think kerberos, if that node goes down we are ok just no new nodes or metadata controllers can join the network.
One problem would be keeping all the certs in sync. Or make it so any mdc issued cert is valid as long as the parent cert is valid. this would allow mdc’s to issue certs and work independently of one another. MDC’s may still require some shared storage or database to keep all this data in. This will probably work though may make verification slightly harder.
Also it is important to note that computing the parody on the box saves bandwidth but may have trouble getting the file off the machine as fast so may have issues when computers leave the network when a file has not finished saving, though this is something that all file systems must deal with. One possible fix is to store a copy like journaling so that if there is an error we can recover from it next time we connect to the network.
No comments

