IJSTR

International Journal of Scientific & Technology Research

Home Contact Us
ARCHIVES
ISSN 2277-8616











 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

IJSTR >> Volume 3- Issue 2, February 2014 Edition



International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616



EXTRACTION OF WEB BLOCKS FROM WEB PAGES AND ANALYSIS OF EXTRACTION ALGORITHMS

[Full Text]

 

AUTHOR(S)

S.K.SHIRGAVE, V.B.BINAGE

 

KEYWORDS

Index Terms: Fragment, ContentExtractor, DeSeA.

 

ABSTRACT

Abstract: Web page can be divided in various blocks called as fragments. A fragment is a portion of a web page which has a distinct theme or functionality and is distinguishable from the other parts of the page.Dividing web pages into fragments has provided significant benefits. Good methods are needed for dividing web pages into fragments. Manual fragmentation of web pages is expensive, error prone, and un-scalable. Due to these problems, extraction of web fragments using Content extractor algorithm and DeSeA algorithm have been widely used. The proposed work has following features: 1) Detect fragment using content extractor algorithm. 2) Extraction of fragment detected in step (1). 3) Detect fragment using DeSeA algorithm. 4) Extraction of fragment detected in step (3). 5) Analyze results of extracted fragment using above algorithms.

 

REFERENCES

[1]. Sung-Won Jung, and Hyuk-Chul Kwon, “A Scalable Hybrid Approach for Extracting Head Components from Web Tables”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 2, FEBRUARY 2006.

[2]. Jeong-Woo Son, Jae-An-Lee, Seong-Bae Park, Hyun-Je Song, Song-Jo Lee, Se-Young Park, “Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Ker-nel”, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]. Chen Hong-ye, “Method of Web Information Extraction Based on Decision Tree”, 2009 International Forum on Information Technology and Applications.

[4]. H.H. Chen, S.C. Tsai, and J.H. Tsai, “Mining Tables from Large Scale HTML Texts”, Proc. 18th Int’l Conf. Computational Linguistics, July 2000.

[5]. M. Hurst, “Layout and Language: Beyond Simple Text for Information Interaction—Modeling the Table”, Proc. Second Int’l Conf. Multimodal Interfaces, 1999.

[6]. G. Ning, W. Guowen, W. Xiaoyuan, and S. Baile, “Extracting Web Table Information in Cooperative Learning Activities Based on Abstract Semantic Model”, Proc. Sixth Int’l Conf. Computer Supported Cooperative Work in Design, pp. 492-497, 2001.

[7]. Y. Wang and J. Hu, “A Machine Learning Based Approach for Table Detection on the Web”, Proc. 11th Int’l World Wide Web Conf. WWW 2002, pp. 7-11, 2002.

[8]. S. Soderland, “Learning to Extract Text-Based Information from the World Wide Web” ,Proc. Third Int’l Conf. Know-ledge Discovery and Data Mining (KDD), Aug. 1997.

[9]. M. Hurst. Layout and language: Challenges for table under-standing on the Web. In Proc. 1st WDA at 6th ICDAR, pp. 27{30, Sept. 2001}.

[10]. A. Tengli, Y. Yang, and N. L: Machine Learning table extraction from examples. In Proc. 20th COLING, pp. 987-993. COL-ING, Aug. 2004.

[11]. Margaret Dunham, Data Mining Introductory and Advanced Topics, ISBN: 0130888923, Prentice Hall, 2003