Volume 5 Number 4 (Apr. 2010)
Home > Archive > 2010 > Volume 5 Number 4 (Apr. 2010) >
JCP 2010 Vol.5(4): 500-507 ISSN: 1796-203X
doi: 10.4304/jcp.5.4.500-507

Efficient Selection and Integration of Hidden Web Database

Xuefeng Xian1, 2, Pengpeng Zhao1, 2, Yuanfeng Yang1, 2, Jie Xin2, and Zhiming Cui1, 2
1 JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise, Suzhou, China
2 The Institute of Intelligent Information Processing and Application, Soochow University, Suzhou, China


Abstract—An ever increasing amount of valuable information is stored in web databases, "hidden" behind search interfaces. A new application area emerge for information retrieval and integration. There may be hundreds or thousands of web databases providing data of relevance to a particular domain on the web. So a primary challenge to internet-scale hidden web database integration is to determine in which web databases to include in the integration system with the aim of making the system contain as much high-quality data as possible and the least degree of overlap. In this paper, we present an approach to iteratively select and integrate candidate web database. The core of this approach is a benefit function that evaluates how much benefit the web database brings to a given status of an integration system by integrating it. We devise a benefit function based on the volume and quality of those new data that added to integration system by integrating the web database. We show in practice how to efficiently apply our approach to select and integrate web database. Our experiments on real hidden web databases indicate that the selected and integrated result of web databases produced by our approach yields an integration system with a significant higher utilities than a wide range of other strategies.

Index Terms—hidden web, data integration, web database selection.

[PDF]

Cite: Xuefeng Xian, Pengpeng Zhao, Yuanfeng Yang, Jie Xin, and Zhiming Cui, " Efficient Selection and Integration of Hidden Web Database," Journal of Computers vol. 5, no. 4, pp. 500-507, 2010.

General Information

ISSN: 1796-203X
Abbreviated Title: J.Comput.
Frequency: Bimonthly
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO,  ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat,etc
E-mail: jcp@iap.org
  • Nov 14, 2019 News!

    Vol 14, No 11 has been published with online version   [Click]

  • Mar 20, 2020 News!

    Vol 15, No 2 has been published with online version   [Click]

  • Dec 16, 2019 News!

    Vol 14, No 12 has been published with online version   [Click]

  • Sep 16, 2019 News!

    Vol 14, No 9 has been published with online version   [Click]

  • Aug 16, 2019 News!

    Vol 14, No 8 has been published with online version   [Click]

  • Read more>>