MaidSafe Makes Data Safe

If you can understand the organization of ants, you can understand MaidSafe.[1]

Alone, by itself, an ant is both vulnerable and easily marginalized. Yet when working with the rest of the colony, ants with roughly the same petite attributes can take down larger prey, clear paths and protect the mound from disasters (both natural and man-made).

I spoke with David Irvine, the CEO of MaidSafe and he used this type of analogy to describe how the decentralized attributes of MaidSafe works. MaidSafe bills itself as a wholly decentralized internet wherein it acts as a decentralized storage platform available to anyone globally. Over the past 8 years, the team has concocted a cauldron of algorithms which harmonize, orchestrate and control individual nodes that continuously repeat simple basic tasks – just like worker ants.

And while one or two nodes may be knocked off the network, because user data is replicated to four servers in geographically disparate locations, the redundancy and security of the network is purportedly second to none.

Consensus

MaidSafe developers accomplished this by starting with a vastly simplified data network comprised of self-validating immutable data. If Bob takes data, such as a name, and hashes it, the resulting outputs have to equal so that they are the same thing. What that means is it is very difficult to find an attack vector or create a virus to alter the state or disrupt the network.

Another reason the data is effectively hard to delete is because the team implemented a chained consensus approach which is secured via a cryptographic technique. While the network can say, “I don’t know who you are, but I can identify you,” even though the computers are very stupid, MaidSafe asks them to do something simple and thus all the computers work together in unison to accomplish it. Each individual node is asked to do simple things and if it does not do it, it is kicked out.

Rather than taking a grand AI approach, Irvine and the team did not create a huge system with millions of lines of code such as a supercomputer comprising Watson.[2] Instead they took a different approach, taking each person’s computer and having it conduct very simple, repeatable, measurable small tasks.[3] In a fulfillment of Heinlein’s words – specialization is for insects – Irvine observes that after 150 million years of natural selection, each ant has a different function: a cleaner ant, a forager ant, a soldier ant – all of whom change their behavior and patterns based on the goals of the colony.

In Irvine’s words, “every node is like a group of ants; while a drop of water could kill one, when all the millions of nodes come together, that is when you see scalability. When one node tries to delete information, other neighbors can report that node to one another and removed the compromised node from the network.”

As a node joins the network, it has to be part of cryptographic keypair and thus joins to the closest nodes to their address; the rest of the nodes then listen to them. The MaidSafe network uses a consensus based mechanism noted above, chaining the consensus together (similar to a blockchain). As a consequence, each group of nodes watches one another, observing their deterministic behavior. In contrast to Tor or FreeNet, each node in MaidSafe is individually different, creating an autonomous network in which each node has an address, fulfilling certain tasks, such as storing, looking, validating and screening data down this immutable chain (called a Transaction Manager).[4]

As Irvine noted, “it is probably not very smart for an ant to go up and do circles along the wall, yet if you drop dirt near its mound they immediately clean it up. [5] Similarly, prey can be taken down by working together. MaidSafe is based on the same philosophy, that individually we are not smart enough. In our brain, every cell by itself is silly and useless by itself, but if you combine all of the neurons and synapses, it creates something: amazing, human beings. In our brain, there are trillions of cells performing a single action and if you damage it the system routes around it. In fact, our team is not necessarily made of some strange cryptographic geniuses; we are relatively ordinary. What we did do is make the logic as simple as it needs to be. We cannot describe the behavior of all the ants or brain cells or a group of humans but we can describe one cell or one ant.” Thus, when you join all the nodes together, the whole system is greater than the sum of its parts via teamwork.

Big Hairy Audacious Goals

Restrained by computer science, according to Irvine there were three particular problems that the team needed to solve: creating a system with strong encryption that is autonomous and nullifies human interaction.[6]

1) How do you encrypt data in a way that no matter how it is broken the data still cannot be utilized? The system must be designed in which you can even give the data to your worst enemy, yet they cannot use the data even if they broke all encryption.

2) In Irvine’s words, “humans cannot resist change, evolution forces us to adopt it. An autonomous network must resist human interference because human action is not repeatable and can be corrupted by trying to game the system. Instead, we put our faith into math; if humans try to interfere, the rest of the network abandons that node.”

3) Another way MaidSafe is different is through a self-authentication mechanism that lets people log into it and obtain private data without having to know an address. One of the problems and vulnerabilities of previously designed networks is that you could not log into a distributed network because there must be an address you had to connect to. This leads to a single point-of-failure. With MaidSafe, there are no passwords on the system nor passwords on your computer.

To tackle the autonomous hurdle, MaidSafe takes information, chops it into smaller pieces and using a data map then takes a hash of each piece and puts it into a container. The system then compresses the chunk with AES256 encryption, using the hash of another chunk (similar to the way Merkle hash trees are used in Bitcoin).[7] Chunk1 hashes with the hash of Chunk2, but this is not used for encryption purposes, instead it is trying to take the data and creating randomization and in this case, Chunk1 becomes random. The process then repeats itself with each ‘child.’ The resultant is a pre-encryption hash, which is then appended together with Chunk1. In Irvine’s words, “It is not a one-time pad, which is the biggest snake oil scheme, rather it is a pre-encryption hash, based on how uniform random information is distributed. We take these two hashes and XOR it with chunks of information.[8] If a hacker intercepts it, all they can do is print out everything in hex, from 000 to FFF, and just guess. You can’t repeat it and there is not enough paper in the world to visualize it.”

And how do you encrypt to the point where any encryption algorithm can be broken and the data will not reveal itself? According to Irvine, it is impossible to really create true noise and because of that, ever since Caesar, every encryption system is eventually broken and therefore users cannot rely on just encrypting via AES, even though many believe “it won’t be broken” (famous last words).[9] Thus Irvine insists it has to be held under a mechanism like this, using pseudo random data determined via hash of random input and then XORing that with the resultant data tending towards the elusive One Time Pad.

In addition to holding patents describing this (which will be held by a non-profit organization and used solely for defensive purposes), there are several academic papers discussing self-encrypting data that is highly obfuscated (according to Irvine, academics dislike calling it encryption, they call it obfuscation).[10] That data is encrypted by itself through pre-encryption hashes, which means if Bob had the current number one single and had to store it as output chunks, in order to decrypt it Bob would have to have original information. Thus there is real-time triplication and more to the point, the network knows it has a copy, but does not know where it is from. As a consequence, based on industry figures, MaidSafe has the potential to recover 95% of the world’s disk spaces. For instance, on average, a backup tape typically has 20 duplicates of a file thus if you remove all the unnecessarily redundant duplicates an enterprise can recover 95% of the disk space. And when supply chains are pooled together a new economy of scales takes place.[11] Or in Irvine’s words, “Google has to use twice the amount of space because they don’t want to mix data together. This encryption method could be used any of these companies, creating a client side with a very high level of encryption, at 1/20^th of the space.”

Another part of the puzzle was self-authentication: how can Alice not connect to a node, if Bob can connect to it? MaidSafe solves this by the following process. A user puts a computer with 10 GB of free space on the network where it is cryptographically identified and a certificate is created for the computer. Thus the network knows and recognizes an ID that can store a certain amount of files – this is a valuable resource because it is a limited, scarce quantity. In turn what happens is that the network will allow you to store on other systems because you have generated a proof-of-resource (verified by the Transaction Manager) showing you have provided 10 GB of space. Thus you can take and use the proof of having provided the storage and use the network for the same amount of space (i.e., quid pro quo) much like a token used on the Bitcoin network allows a user to access and transfer a token.

The next step, according to Irvine: “although humans receive that certificate, the system does not immediately store 10 GB of information, instead it takes pre-encryption hashes, creates a root directory of all information (all of the files you want to access in your life) whereupon it gets shrunk with a recursive compression algorithm and then that tiny information is encrypted by a password generated by you and simultaneously another random number is made up whereupon this is stored at two random locations. Thus, there are two passwords the first of which identifies the key of where it is stored and the second key encrypts the value, the root of the information value stored.” If Bob goes to any other computer connected to the internet, this password functions like a brain wallet, and is decrypted locally in memory.[12] This creates a virtual fail-safe system in which anyone, including a hacker, can go to your blog residing on the MaidSafe network but no one can take it down or DDOS it because it does not exist, it is merely bits of information located at a mathematically unknown location.

In Irvine’s view, such functionality allows people to finally “start implementing the real web. If you go to any computer and type in your password, the computer can be remapped as your own drive. 8.3 exabytes of information can exist on this disk. Because it is a virtual disk, it resides in memory and thus there is no sign that you have used it, whatsoever. That is how you log into your data: you don’t know where it is but that data becomes your computer.”

So in short, you use your password to decrypt data to a client and the network is not aware that the data exists and consequently there is nothing for security exploits like Heartbleed to compromise.[13]

This then leads Irvine to opine that this is “probably the first time we can use biometrics to secure data.” Yet he cautions, “because many institutions already have copies of them Bob can still steal Alice’s known scans which means that some users may not be able to utilize that functionality yet.”

Despite this limited drawback which would affect any similar system, “there are several fairly big things when you think about it. The network is most difficult to compromise because the edge cases of a network looks after itself. We are talking about cryptographically securing identities. Previously we have used some web-of-trust which does not work because of human trust or currently with Verisign which handles public keys yet its servers are centralized and vulnerable. What we have done is to take the functionality of Verisign and turn them into a mathematical program and instead of using timestamps, MaidSafe uses entropy. It binds the use of time (time is never transmitted) because it is unnatural. Instead, entropy is always changing, the network is always increasing entropy (via Brownian motion). Thus we had to remove time to get the algorithms to work properly and in fact, complexity would be increased if you introduced time.”

The team

Irvine notes that to solve these problems required thinking that was not normal and as a mechanical engineer he was never formally trained in computing which he believes provides him an advantage. Because computer engineers are early on trained to log into and get a certificate from Verisign, they will largely not deviate from the presented solution.[14] Yet if you give developers new tools to go through life “they will create magical things,” suggests Irvine.

In his words, “one of the guys in the office is previously a fireman yet is a math genius and one of the top C++ programmers globally. We may not have the best team in the world, but it is a pretty good team, happy to make mistakes, get up in front of others on white boards and know that they may be told ‘that approach has vulnerabilities.’”

The project began in February 2006 and most of the investors are close friends and family members which according to him “makes us pretty honest, to do the right thing for the right reasons.”

When people compare some of the underlying tech to Bitcoin, Irvine notes that, “back in 2006, I did a write-up of what we are doing, we used to call it cybercash or a perpetual coin which was a cryptographic currency that is distributed and cannot be audited at all yet could reach billions of transactions per second which meant providing the velocity of cash itself. And while Bitcoin solves the double-spending problem, it does so on a centralized blockchain which is like sharing the same spreadsheet, which is not brilliant in terms of scalability. If MaidSafe had existed back then, Bitcoin could have probably been a distributed network and solved the Byzantine General’s problem in a decentralized manner.[15] However, what Bitcoin has shown is that if you take an automated program [and] replace the rubbish, it works better than if humans [are] involved. And for our team, when simplifying codes we look at what code can I delete. Whereas in other organization people put a patch on top of code, this ultimately creates too many if statements. And for us, we know what we will likely delete, not by adding more if statements.” Thus one of the reasons he credits the success of his team over other competitors that have tried is that, MaidSafe developers tried not to simply add extra code or rules to fix a problem, but rather removed the underlying faulty code and replace it with something more basic.

MaidSafe is giving away all its assets including its software and consequently over the past 6 weeks its developer mailing list has grown to over 200 people asking how they can participate in the project (dubbed Project SAFE).[16] To Irvine, “We gave it to the world, teaching folks how to program.” And as part of the upcoming Safecoin crowdfunding, the company will setup 6 international, independent development pods, each designed to compete with the MaidSafe team.[17] Or in other words, they have taken an 8 year investment, raised money and set up their own competition. But to Irvine, there is a bigger picture, “I don’t care who came up with the theory, we removed the ego from it on day 1. We don’t keep track of who invented parts of MaidSafe. And once it is in the community, that’s the way it should be.”

[1] Maidsafe

[2] Watson is an AI project developed by IBM and most notably, defeated two humans in a Jeopardy! tournament in 2011.

[3] Marvin Minsky described one of these ideas, of emergent order arising from simplistic agents in The Society of Mind

[4] Tor is incidentally releasing its own project called Toroken. See Behind the movement to build a faster anonymous network from The Daily Dot

[5] In the preface of his upcoming book, Andreas Antonopoulos uses Leafcutter Ants as an example of constructive order emerging from a set of simple rules. See Mastering Bitcoin: Unlocking digital crypto-currencies from O’Reilly Media

[6] Big Hairy Audacious Goals is a term coined by James Collins and Jerry Porras in Built to Last.

[7] Advanced Encryption Standard (AES) is an encryption standard first published in 1998 and widely adopted by many different institutions.

[8] XOR stands for exclusive or, a logical operation.

[9] For an enjoyable historical fiction account on cryptography, readers are recommended to peruse Cryptonomicon by Neal Stephenson.

[10] See Homomorphic encryption and Cryptographic Code Obfuscation: Decentralized Autonomous Organizations Are About to Take a Huge Leap Forward by Vitalik Buterin

[11] David Irvine used an example of a modern light bulb, whose component parts were sourced from an interconnected global supply chain (e.g., filament from a company in China, glass from a company in the US, etc.).

[12] Brain Wallets: The What and the How by Vitalik Buterin

[13] Heartbleed is a security vulnerability in OpenSSL

[14] See Neuro-linguistic programming

[15] Paul Bohm has a good explanation for how Bitcoin solved the Byzantine Generals’ Problem (also known as the Two Generals’ Problem), see Bitcoin’s Value is Decentralization

[16] The whitepaper describing this is MaidSafe.net announces project SAFE to the community

[17] See Safecoin (a type of “AppCoin”), Safecoin, why it’s safe and what it means for us all and MaidSafe Prepares for Safecoin Crowd-Sale to Facilitate “Decentralized Internet” from MarketWired

Leave a Reply Cancel reply