R3D3: A doubly opportunistic data structure for compressing and indexing massive data

Nagy, Máté and Tapolcai, János and Rétvári, Gábor (2019) R3D3: A doubly opportunistic data structure for compressing and indexing massive data. INFOCOMMUNICATIONS JOURNAL, 11 (2). pp. 58-66. ISSN 2061-2079


Download (894kB) | Preview


Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in informationtheoretically minimum space. Yet, efficient data processing requires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.

Item Type: Article
Subjects: Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Z Bibliography. Library Science. Information Resources / könyvtártudomány > Z665 Library Science. Information Science / könyvtártudomány, információtudomány
Depositing User: MTMT SWORD
Date Deposited: 09 Jun 2020 11:52
Last Modified: 21 Dec 2020 12:37

Actions (login required)

Edit Item Edit Item