The emerging decentralized storage systems (DSSs), such as InterPlanetary File System
(IPFS), Storj, and Sia, provide people with a new storage model. Instead of being centrally managed,
the data are sliced up and distributed across the nodes of the network. Furthermore, each data
object is uniquely identified by a cryptographic hash (ObjectId) and can only be retrieved by
ObjectId. Compared with the search functions provided by the existing centralized storage systems,
the application scenarios of the DSSs are subject to certain restrictions. In this paper, we first apply
decentralized B+Tree and HashMap to the DSSs to provide keyword search. Both indexes are kept in
blocks. Since these blocks may be scattered on multiple nodes, we ensure that all operations involve
as few blocks as possible to reduce network cost and response time. In addition, the version control
and version merging algorithms are designed to effectively organize the indexes and facilitate data
integration. The experimental results prove that our indexes have excellent availability and scalability.
Keywords: decentralized storage systems; keyword search; decentralized B+Tree; decentralized
HashMap
1. Introduction
With the rapid development of internet technology, centralized storage has become an important
business model in our daily life. Centralized storage not only provides a variety of storage services for
both individuals and businesses but also supports different kinds of queries, thus meeting the needs
of users. However, centralized storage systems depend on a trusted third party, which inevitably
inherits the single point of failure drawback. Even if centralized storage systems are backed up for
data availability, they still suffer from certain factors of force Majeure (such as political censorship),
which can cause users to be unable to access their own data.
From the above point of view, data storage requires a more secure and free environment.
The emerging DSSs, such as InterPlanetary File System (IPFS) [ 1 ], Storj [2 ], and Sia [ 3], can provide
people with a new storage model. They are built on a peer-to-peer (p2p) network, so there is no
need to rely on third-party platforms. In these systems, data are not managed by a central node but
divided into blocks and distributed through the network. A data object can be accessed as long as it
exists on any node. Each node in the network can share its free disk space, thus reducing the cost of
decentralized storage. Users do not have to worry that they will not be able to access their own data
because DSSs can be combined with blockchain to ensure data availability [4,5].
One of the key reasons why the traditional centralized storage systems can be applied to various
fields is that they provide rich query services, which is exactly the defect of DSSs. In the DSSs,
each node or data object is assigned a unique identifier (NodeId or ObjectId) by a cryptographic hash