Volume 12 Number 4 (Jul. 2017)
Home > Archive > 2017 > Volume 12 Number 4 (Jul. 2017) >
JCP 2017 Vol.12(4): 362-370 ISSN: 1796-203X
doi: 10.17706/jcp.12.4.362-370

Efficient Cross User Client Side Data Deduplication in Hadoop

Priteshkumar Prajapati, Parth Shah, Amit Ganatra, Sandipkumar Patel
1Department of Information Technology, C.S.P.I.T., CHARUSAT, Anand, India.
2Department of Computer Engineering, C.S.P.I.T., CHARUSAT, Anand, India.


Abstract—Hadoop is widely used for applications like Aadhaar card, Healthcare, Media, Ad Platform, Fraud Detection & Crime, and Education etc. However, it does not provide efficient and optimized data storage solution. One interesting thing we found that when user uploads the same file twice with same file name it doesn’t allow saving the same file. But when user uploads the same file content with different file name Hadoop allows uploading that file. In general same files are uploaded by many users (cross user) with different name with same contents so this leads to wastage of storage space. So we provided the solution of above problem and provide Data Deduplication in Hadoop. Before uploading data to HDFS we calculate Hash Value of File and stored that Hash Value in Database for later use. Now same or other user wants to upload the same content file but with same content, our DeDup module will calculate Hash value and verify it to HBase. Now if Hash Value is matched so it will give message that “File is already exits”. Experimental analysis demonstrates (i.e. Text, Audio, Video, Zip files etc.) that proposed solution gives more optimized storage acquiring very small computation overhead and having optimized storage space.

Index Terms—Cloud storage, deduplication, Hadoop, Hadoop distributed file system, Hadoop database.

[PDF]

Cite: Priteshkumar Prajapati, Parth Shah, Amit Ganatra, Sandipkumar Patel, "Efficient Cross User Client Side Data Deduplication in Hadoop," Journal of Computers vol. 12, no. 4, pp. 362-370, 2017.

General Information

ISSN: 1796-203X
Frequency: Monthly (2006-2014); Bimonthly (Since 2015)
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO, DOAJ, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, CNKI,etc
E-mail: jcp@iap.org
  • Sep 26, 2017 News!

    Papers published in JCP Volume 12 have all been indexed by DBLP   [Click]

  • Sep 02, 2016 News!

    Vol 11, No 3 has been indexed by EI (Inspec)   [Click]

  • Sep 22, 2017 News!

    Vol 13, No 6 has been published with online version 11 papers are published in this issue after peer review   [Click]

  • Aug 14, 2017 News!

    Vol 13, No 5 has been published with online version   [Click]

  • Jun 21, 2017 News!

    Vol 13, No 4 has been published with online version   [Click]

  • Read more>>