JCP 2016 Vol.11(6): 504-512 ISSN: 1796-203X
doi: 10.17706/jcp.11.6.504-512
doi: 10.17706/jcp.11.6.504-512
Research on Cassandra Data Compaction Strategies for Time-Series Data
Bai Lu, Yang Xiaohui
College of Computer and Information Technology, Beijing Jiaotong University, Haidian District, Beijing, China.
Abstract—Storage and analysis of time-series data is a subject of intense interest in the current international database research field. Time series data, a sequence of collected data information points by fixing time interval, is an important basis to proceed business analysis and prediction in the future. As an excellent NoSQL database, Cassandra is often used to storage time-series data because of its characteristics of data model. In the scene of real application, time-series data used to proceed the management of data life cycle by setting up TTL; the real delete operation would not be executed immediately, while unnecessary data will be deleted during the compaction course. This paper focuses on the issue of the effect of different strategies for time-series data storage and the research on three Cassandra storage strategies: Size-Tiered Compaction Strategy, Leveled Compaction Strategy and Date-Tiered Compaction Strategy; and comparative test based on stable data storage, recording speed sorted string tables file numbers and so on. Finally, the compaction strategies suitable for time-series data application scenarios are obtained by carrying on experiments.
Index Terms—Cassandra, time-series data, distributed storage system, compaction strategy.
Abstract—Storage and analysis of time-series data is a subject of intense interest in the current international database research field. Time series data, a sequence of collected data information points by fixing time interval, is an important basis to proceed business analysis and prediction in the future. As an excellent NoSQL database, Cassandra is often used to storage time-series data because of its characteristics of data model. In the scene of real application, time-series data used to proceed the management of data life cycle by setting up TTL; the real delete operation would not be executed immediately, while unnecessary data will be deleted during the compaction course. This paper focuses on the issue of the effect of different strategies for time-series data storage and the research on three Cassandra storage strategies: Size-Tiered Compaction Strategy, Leveled Compaction Strategy and Date-Tiered Compaction Strategy; and comparative test based on stable data storage, recording speed sorted string tables file numbers and so on. Finally, the compaction strategies suitable for time-series data application scenarios are obtained by carrying on experiments.
Index Terms—Cassandra, time-series data, distributed storage system, compaction strategy.
Cite: Bai Lu, Yang Xiaohui, "Research on Cassandra Data Compaction Strategies for Time-Series Data," Journal of Computers vol. 11, no. 6, pp. 504-512, 2016.
General Information
ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
-
Nov 14, 2019 News!
Vol 14, No 11 has been published with online version [Click]
-
Mar 20, 2020 News!
Vol 15, No 2 has been published with online version [Click]
-
Dec 16, 2019 News!
Vol 14, No 12 has been published with online version [Click]
-
Sep 16, 2019 News!
Vol 14, No 9 has been published with online version [Click]
-
Aug 16, 2019 News!
Vol 14, No 8 has been published with online version [Click]
- Read more>>