Towards A More Distributed, Trusted and Resilient Data World

This work is part of a project funding by the EU within the EU GLASS (SinGLe Sign-on eGovernAnce paradigm based on a distributed file…

Towards A More Distributed, Trusted and Resilient Data World

This work is part of a project funding by the EU within the GLASS (SinGLe Sign-on eGovernAnce paradigm based on a distributed file exchange network for Security, transparency, cost effectiveness and truSt) project [here][7].

Introduction

The Internet was created as a distributed network, and the Web came along and centralised it. While the Internet supported multiple routes to any destination, the Web created centralised sources of information. But it’s not just these centralise sources of information that may cause problems for our future, the cash-rich tech companies have been acquiring innovative companies in order to give them a “full-hand”. For example, Facebook now owns Instagram and WhatsApp, Google has YouTube, and Microsoft has LinkedIn and GitHub. This week it was announced that Microsoft may acquire Discord.

In the Cloud, too, Amazon and Microsoft dominate and have so much power over the industry (and the world). This centralising of information could thus be a great worry for the future, especially as just a few companies will have so much power over the flow of information, and switch off those they deem as not fit for their platform. Along with this, they may work with governments and suppress the flow of information. In the worst case, we end up with a Big Brother world. The recent removal of Donald Trump from Twitter showed the true power of these companies.

Centralisation of information, too, is not good for the resilience of the Internet, as a single failure could cause major problems in gaining access to data.

So what’s the solution? How can we create a more distributed, trusted, and resilient data infrastructure? Well, one solution is the IPFS (InterPlanetary File System), and it’s part of our current research work.

The IPFS (InterPlanetary File System)

The IPFS (InterPlanetary File System) implements a distributed infrastructure using P2P methods, and where there is no centralised server. As with Torrent networks, it is defined as being censorship-resistant [1] outlines that IPFS can be likened to the Web where we use content-address hyperlinks, but where a single BitTorrent swarm exchange objects within one Git repository.

IPFS breaks files up into blocks or chunks and traces and uses a Merkle DAG (Direct Acyclic Graph) to define the version control of files and a distributed hashtable. Within a traditional blockchain infrastructure, we sequentially store transactions. This can take some time to create a consensus through the building of blocks. With a DAG, each of the transactions becomes a block, and it thus speeds up the consensus mechanisms. Sergio Demian Lerner [2] outlined that in a DAG there were no fixed blocks and that each transaction brings with it, its own proof of work. Within this, he defined the usage of a fast cache for the most recent transactions, and where older transactions cannot be used as a reference.

Distribute Architecture

Chen et al [3], as shown in Figure 1, define four core layers for storage (Layer 4), routing (Layer 3), virtual chain (Layer 2), and blockchain (Layer 1). Within the blockchain layer, it is possible to build a new blockchain or use Bitcoin’s blockchain. A significant and prominently distributed database technology that elaborates the blockchain technology is Blockstack [4]. Blockstack operates by default using the Gaia distributed database that is able to store its data decentralized in the users’ web browsers instead of a centralized web server, thus enhancing privacy. The Blockstack framework is currently built on the Bitcoin blockchain but can be moved to another platform.

Figure 1: IPFS Architecture [3]

For the virtual chain layer, the transactions are processed and verified, and then sent to the blockchain layer to be stored. Each transaction must have been signed by the private key of a sender, and these are verified by their public key. Typically transactions are for a node to bind its IP address and its associated account (such as defined by its public key), and also one to declare the files that it is associated with. Files can either be long-term immutable or occasionally mutable, and are broken into blocks to be barter in the BitSwap protocol. Each of the blocks is then identified with a content identifier (CID). With the Bitswap protocol nodes distribute want-lists to their peers and contains the list of CIDs for blocks that they want to receive. Each node remembers which blocks its peers want. Whenever a node receives a block it checks its list to see if one of its connected peers wanted the received block. The BitSwap protocol involves a node having two lists: the blocks they want and the blocks want they have. Nodes thus barter between themselves. Within a BitTorrent exchange, the data is exchanged in a single torrent.

The routing layer extracts the information from Layer 2 and maps the routing address of an account to their associated files or blocks under their account. Within the storage layer, the data is actually stored (mutable storage and immutable storage). In [3], the authors make improvements to IPFS by adding a zig-zag file storage structure in order to provide a triple replication scheme (for frequently used data) and for an erasure codes storage scheme (for infrequency used data). The authors define that the method can differentiate between hot data and cold data. Within hot data storage, we store data near the computation and where there is fast access to the data, whereas cold data can be store within cloud storage.

Data sharing

Naz et al [5] implement a data-sharing model (Figure 2) with a number of entities:

  • Owner. This relates to the entity that is sharing the data, such as a government entity.
  • Customer. This relates to an entity that can download files from an IPFS server using reconstructed hashes.
  • Workers. These help customers to de-crypt content, authenticate new customers through signatures and query smart contracts customer data requests.
  • Arbitrator. This entity resolves disputes between buyers and sellers for the requested content.

With Naz’s model [1], an owner creates metadata for a file they wish to share, such as for filename, file type, file size, and a description. This information, and a complete copy of the file data, is then added to the IPFS. An example is [1]:

//upload the plain file meta
ipfs.files.add(buf, function (err, meta_result) {
if(err) {
console.log(err);
return res.sendStatus(500);
}
console.log(meta_result);
res.json({ “meta_hash”: meta_result[0].hash,
“file_hash”: fileMeta.hash,
“address”: recipient_addr,
“email”: recipient_email });
}:
Figure 2: Data sharing on IPFS [5]

Once loaded onto the IPFS, the owner receives the hashes of the data back, who then contacts trusted worker nodes. These worker nodes have their key-pairs stored within smart contracts and are responsible for decrypting content. The file hashes are split into $k$ shares using the Shamir Secret Share (SSS) method and encrypted using $n$ random keys. These shares are then stored — along with security information — on a blockchain. It is important to encrypt these hashes as an adversary could rebuild the file based on the hashes. Only valid customers who have paid for access can then rebuild the files. A share for $S$ can then be {$S_1$, … ,$S_n$} shares, and where there are $n$ shares and a threshold of $k$. Overall $k$ shares are required to rebuild the secret. These shares are stored and encrypted in a smart contract and can only be decrypted and rebuild by verified workers (who are found by their public key by the owner).

Ali et al [6] used a side-chain method to keep network privacy and where a validator node runs the side chain (Figure 3). Within the network, each IoT device has public and private keys and which they use to encrypt data for the validator. The validator then adds data onto a side chain. A smart contact then stores that the only communication is between the device and the validator. It also stores the public key and hash of the IPFS storing data on a device, and the public key and access rights of requesters from the consortium.

Figure 3: Side chains [5]

GLASS Project

At Edinburgh Napier University (ENU), we are working on the integration of data sharing into the Interplanery File System (IPFS), and the team at ENU includes myself, Pavlos Papadopoulos, Dr Christos Chrysoulas, Dr Owen Lo and Dr Zakwan Jaroucheh. It integrates with the identity, blockchain and cryptography work of the Blockpass ID Lab in Edinburgh.

GLASS brings together recognized European SMEs and large enterprises, with universities and public authorities from all parts of Europe, including: Luxemburg (LU), Germany (DE), United Kingdom (UK), Greece (GR), Cyprus (CY), Portugal (PT), Belgium (BE) and Turkey (TR), to cooperate for the development of a novel eGovernance paradigm. The multi-disciplinary expertise of each organization, including big data analytics, distributed systems, blockchain, deep learning, innovation & project management, business development, risk management, legal compliance and eGovernment service delivery, as well as the active engagement with stakeholders and end-users throughout every phase of the project, creates a consortium capable of successfully delivering the expected outcomes [7].

References

[1] Benet, J. (2014). Ipfs-content addressed, versioned, p2p file system. arXiv preprint arXiv:1407.3561.

[2] Lerner, S. D. (2015). DagCoin: a cryptocurrency without blocks. White paper.

[3] Chen, Y., Li, H., Li, K., & Zhang, J. (2017, December). An improved P2P file system scheme based on IPFS and Blockchain. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 2652–2657). IEEE.

[4] Ali, M., Shea, R., Nelson, J., & Freedman, M. J. (2017). Blockstack: A new decentralized internet. Whitepaper, May.

[5] Naz, M., Al-zahrani, F. A., Khalid, R., Javaid, N., Qamar, A. M., Afzal, M. K., & Shafiq, M. (2019). A secure data sharing platform using blockchain and interplanetary file system. Sustainability, 11(24), 7054.

[6] Ali, M. S., Dolui, K., & Antonelli, F. (2017, October). IoT data privacy via blockchains and IPFS. In Proceedings of the seventh international conference on the internet of things (pp. 1–7).

[7] https://www.glass-h2020.eu/