Volume 12 Number 4 (Jul. 2017)
Home > Archive > 2017 > Volume 12 Number 4 (Jul. 2017) >
JCP 2017 Vol.12(4): 362-370 ISSN: 1796-203X
doi: 10.17706/jcp.12.4.362-370

Efficient Cross User Client Side Data Deduplication in Hadoop

Priteshkumar Prajapati, Parth Shah, Amit Ganatra, Sandipkumar Patel
1Department of Information Technology, C.S.P.I.T., CHARUSAT, Anand, India.
2Department of Computer Engineering, C.S.P.I.T., CHARUSAT, Anand, India.


Abstract—Hadoop is widely used for applications like Aadhaar card, Healthcare, Media, Ad Platform, Fraud Detection & Crime, and Education etc. However, it does not provide efficient and optimized data storage solution. One interesting thing we found that when user uploads the same file twice with same file name it doesn’t allow saving the same file. But when user uploads the same file content with different file name Hadoop allows uploading that file. In general same files are uploaded by many users (cross user) with different name with same contents so this leads to wastage of storage space. So we provided the solution of above problem and provide Data Deduplication in Hadoop. Before uploading data to HDFS we calculate Hash Value of File and stored that Hash Value in Database for later use. Now same or other user wants to upload the same content file but with same content, our DeDup module will calculate Hash value and verify it to HBase. Now if Hash Value is matched so it will give message that “File is already exits”. Experimental analysis demonstrates (i.e. Text, Audio, Video, Zip files etc.) that proposed solution gives more optimized storage acquiring very small computation overhead and having optimized storage space.

Index Terms—Cloud storage, deduplication, Hadoop, Hadoop distributed file system, Hadoop database.

[PDF]

Cite: Priteshkumar Prajapati, Parth Shah, Amit Ganatra, Sandipkumar Patel, "Efficient Cross User Client Side Data Deduplication in Hadoop," Journal of Computers vol. 12, no. 4, pp. 362-370, 2017.

General Information

ISSN: 1796-203X
Frequency: Monthly (2006-2014); Bimonthly (Since 2015)
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Cherry L. Chen
Abstracting/ Indexing: DBLP, EBSCO, DOAJ, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, CNKI,etc
E-mail: jcp@iap.org
  • Jan 20, 2017 News!

    Vol.12, No.6 has been published with online version.   [Click]

  • Jan 16, 2017 News!

    Vol.12, No.5 has been published with online version.   [Click]

  • Oct 09, 2016 News!

    Vol.12, No.4 has been published with online version.   [Click]

  • Sep 02, 2016 News!

    Vol.11, No.3 has been indexed by EI (Inspec).   [Click]

  • Aug 18, 2016 News!

    Vol.11, No.2 has been indexed by EI (Inspec).   [Click]

  • Read more>>