IJSTR

International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
0.2
2019CiteScore
 
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020

CALL FOR PAPERS
AUTHORS
DOWNLOADS
CONTACT

IJSTR >> Volume 9 - Issue 1, January 2020 Edition



International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616



AN EFFECTIVE IMPLEMENTATION OF WEB CRAWLING TECHNOLOGY TO RETRIEVE DATA FROM THE WORLD WIDE WEB (WWW)

[Full Text]

 

AUTHOR(S)

F. M. Javed Mehedi Shamrat, Zarrin Tasnim, A.K.M Sazzadur Rahman, Naimul Islam Nobel, Syed Akhter Hossain

 

KEYWORDS

Web Crawling, Web Technology, Data, Python, Data Extraction, Algorithm.

 

ABSTRACT

Internet (or just the web) is enormous, well off, best, easily accessible and proper wellspring of data and its clients are expanding quickly now daily. To rescue data from the web, web indexes are utilized which access pages according to the prerequisite of the clients. The size of the web is exceptionally wide and contains organized semi-organized and unstructured information. The greater part of the information present on the web is unmanaged so it is absurd to expect to get to the entire web without a moment's delay in a solitary endeavor, so web crawlers use web crawlers. A web crawler is a fundamental piece of the web search tool. Data Retrieval manages to look and recovering data inside the reports and it likewise looks through the online databases and the web. In this paper, discussed, developed and programmed a web crawler to fetch the information from the internet and filter data for useable and graphical purpose for users.

 

REFERENCES

[1] Internet Access All Over The World: http://www.internetworldstats.com accessed on May 7, 2012, Last Access: 20.11.2019.
[2] World Wide Web Timeline: https://www.pewresearch.org/internet/2014/03/11/world-wide-web-timeline/, Last Access:20.11.2019
[3] Manish Kumar, Ankit Bindal, Robin Gautam and Rajiesh Bhatia, “Key word query based focused Web crawler”, 6th Internation conference of smart computing and communications, ICSCC 2017
[4] C Slamet, R Andrian, D S Maylawati, Suhendar, W Darmalaksana and M A Ramdhani “Web Scraping and Naïve Bayes Classification for Job Search Engine”, The 2nd Annual Applied Science and Engineering Conference (AASEC 2017)
[5] Jeny Thankachan and Mr. S. Nagaraj,“Intelligent Web Crawler: A Three-Stage Crawler for Effective Deep Web Mining”, International Journal of Recent Trends in Engineering & Research (IJRTER) Volume 02, Issue 04; April - 2016 [ISSN: 2455-1457]
[6] Ahmed, Tanvir & Chung, Mokdong “Design and application of intelligent dynamic crawler for web data mining ”, Korea Multimedia Society, Spring Conference 2019.
[7] S. Saranya, B.S.E. Zoraida, and P.V. Paul, "A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval," Proceeding of Arti ficial Intelli gence and Evolutionary Algor ithms in Eng ineering Systems, Springer, New Delhi , pp. 9-16, 2015.
[8] K.S. Kim, K.Y. Kim, K.H. Lee, T.K. Kim, and W.S. Cho, "Design and implementation of web crawler based on dynamic web collection cycle," Proceeding of The International Confer ence on Information Network, IEEE , pp. 562566, 2012.
[9] Y. Kim, H. Hong, and M. Chung, "Application of Cohesion Devices for Improvement of Distributional Representation," Proceeding of The 14th International Conference on Multimedia Information Technology and Applications (MITA), pp. 84-87, 2018.
[10] M.Y. Ivory and M.A. Hearst, "Improving web site design," Proceeding of IEEE Internet Computing 2, Vol. 6, No. 2, pp. 56-63, 2002.
[11] D. Debraj and P. Das, "Study of deep web and a new form based crawling technique," International Journal of Computer Engineer ing and Technology (IJCET), Vol. 7, No. 1, pp. 36-44, 2016.
[12] Z. Guojun, J. Wenchao, S. Jihui, S. Fan, Z. Hao, L. Jiang, et al., "Design and application of intelligent dynamic crawler for web data mining," Proceeding of 2017 32nd Youth Aca demic Annual Conference of Chinese Associ ation of Automation (YAC) IEEE , pp. 1098-1105, 2017.
[13] K.A. Pakojwar, R.S. Mangrulkar, and V.G. Bhujade, "Web data extraction and alignment using tag and value similarity," Proceeding of 2015 International Conference on Innova tions in Information, Embedded and Commu nication Systems (ICIIECS), pp. 1-4, 2015.
[14] S. Kolhatkar, M.M. Pati, M.S. Kolhatkar, and M.S. Paranjape, "Emergence of Unstructured Data and Scope of Big Data in Indian Education," International Journal of Advanced Computer Science and Applica tions (IJACSA) , Vol. 8, No. 1, pp. 150-157, 2017.
[15] M. Afsharizadeh, H. Ebrahimpour-Komleh, and A. Bagheri, "Query-oriented text summarization using sentence extraction technique," Proceeding of 4th International Conference on Web Research (ICWR) , pp. 128-132, 2018.
[16] S. Ringe, N. Francis, and A.H.S.A. Palanawala, "Ontology Based Web Crawler," International Journal of Computer Applications in Engin eering Sciences, Vol. 2, No. 3, pp. 194-197, 2012.
[17] L. Jiang, Z. Wu, Q. Feng, J. Liu, and Q. Zheng, “Efficient deep web crawling using reinforcement learning,” Proceeding of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Berlin, Heidelberg, pp. 428-439, 2010.
[18] Y. Kim, B. Kim, and M. Chung, “Unstructured data analysis and multi-pattern storage technique for traffic information inference,” The Journal of Multimedia Information System, Vol. 21, No. 2, pp. 211-223, 2018.
[19] R. Jason and A. McCallum, "Using reinforcement learning to spider the web efficiently," Proceeding of International Conference on Machine Learning (ICML) , Vol. 99, 1999.