International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 10 - Issue 5, May 2021 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Big Data: Challenges, Popular Tools Of Big Data - Benefits And Applications

[Full Text]



Sadia Zafar, Haroon ur Rashid Kayani, Hafiz Burhan ul Haq, Imran Khalid, Ayesha Nasir



Big Data, Data Science, Big Data Tools, Open Source Tools, Artificial Intelligence, Machine learning, Data Analysis



Big Data is generated everywhere in the world in various digital formats. In 2020 the Big Data revolution estimated billion-billion devices connected to the fast internet, and massive data will be predicted at high speed and drawn researchers' attention in academia, governments, and industries. Big Data is valuable to enhance productivity in businesses and evolutionary breakthroughs in the many fields of sciences. However, there is no doubt that Big Data's handling produces many challenges, such as data analysis, data visualization, data storage, and new technology to deal with Big Data problems. This paper aims to demonstrate the challenges, the new tools of Big Data exploration, their benefits, and applications that can draw researcher's and users' attention to decide better tools for their businesses and need.



[1] World Population.C19-World News. Retrieved January 19, 2021. https://live-c19-worldnews.com/worldPopulation.php
[2] D. Che, M. Safran, Z. Peng, "From big data to big data mining: challenges, issues, and opportunities." International conference on database systems for advanced applications. Springer, Berlin, Heidelberg, 2013.
[3] F. L. Almeida, "Benefits, challenges, and tools of big data management." Journal of Systems Integration 8.4 (2017): 12-20.
[4] C.L.P. Chen, C.Y. Zhang, "Data-intensive applications, challenges, techniques, and technologies: A survey on Big Data." Information sciences 275 (2014): 314-347.
[5] S. Landset, T.M. Khoshgoftaar, A.N. Richter, T. Hasanin, "A survey of open source tools for machine learning with big data in the Hadoop ecosystem." Journal of Big Data 2.1 (2015): 24
[6] B. Curtis, (2020, October 21). What are the 7 V’s of Big Data? YourTechDiet.
a. https://www.yourtechdiet.com/blogs/7vs-big-data/
[7] M.F. Uddin, N. Gupta, "Seven V's of Big Data understanding Big Data to extract value." Proceedings of the 2014 zone 1 conference of the American Society for Engineering Education. IEEE, 2014.
[8] S. Kaisler, F. Armour, J.A. Espinosa, W. Money, "Big data: Issues and challenges moving forward." 2013 46th Hawaii International Conference on System Sciences. IEEE, 2013.
[9] A. Jacobs, "The pathologies of big data." Communications of the ACM 52.8 (2009): 36-44.
[10] M. Zaharia, "Introduction to MapReduce and Hadoop." UC Berkeley RAD Lab.
[11] K. Adnan, R. Akbar, "An analytical study of information extraction from unstructured and multidimensional big data." Journal of Big Data 6.1 (2019): 91.
[12] K. Dahal, Y. Ouzrout, P. Barlas, I. Lanning and C. Heavey "A survey of open source data science tools." International Journal of Intelligent Computing and Cybernetics (2015).
[13] J.E. Camargo, C.A. Torres, I.H. Martínez, AND F.A. Gómez, "A big data analytics system to analyze citizens' perception of security." 2016 IEEE International Smart Cities Conference (ISC2). IEEE, 2016.
[14] N. Elgendy, A. Elragal, "Big data analytics in support of the decision-making process." Procedia Computer Science 100 (2016): 1071-1084.
[15] M.H. Iqbal, T.R. Soomro, "Big data analysis: Apache storm perspective." International journal of computer trends and technology 19.1 (2015): 9-14.
[16] S. K. Sahu, Jacintha M. M., & A. P. Singh, (2017, May). "Comparative study of tools for big data analytics: An analytical study." In 2017 International Conference on Computing, Communication, and Automation (ICCCA) (pp. 37-41). IEEE.

[17] AMIS Conclusion. (2019, April 9). What is Apache Drill, and how to set up our Proof-of-Concept? AMIS, Data-Driven Blog - Oracle & Microsoft Azure. https://technology.amis.nl/big-data-database/what-is-apache-drill-and-how-to-setup-your-proof-of-concept/
[18] G. (2020a, April 8). The world of Big Data: Apache Drill and why I need it. Galaktikasoft.
a. https://galaktika-soft.com/blog/apache-drill.html
[19] Rungta, K. (2021, February 7). Top 15 Big Data Tools | Open Source Software for Data Analytics. BigData Tool. https://www.guru99.com/big-data-tools.html
[20] What is HBase? IBM. https://www.ibm.com/analytics/hadoop/hbase
[21] What is Apache Hive? IBM. https://www.ibm.com/analytics/hadoop/hive
[22] What is Apache Flink? Apache Flink. https://docs.cloudera.com/csa/1.2.0/flink-overview/topics/csa-flink-overview.html
[23] Apache Mahout. Apache Mahout. https://mahout.apache.org/
[24] Techopedia. (2014, August 14). Apache Mahout. Techopedia.Com. https://www.techopedia.com/definition/30301/apache-mahout
[25] Pointer, I. (2020, March 16). What is Apache Spark? The big data platform that crushed Hadoop. InfoWorld. https://www.infoworld.com/article/3236869/what-is-apache-spark-the-big-data-platform-that-crushed-hadoop.html
[26] N. Kourtellis, G. D. F. Morales, & A. Bifet, (2019). "Large-scale learning from data streams with apache Samoa". In Learning from Data Streams in Evolving Environments (pp. 177-207). Springer, Cham.
[27] Apache Storm. Apache Storm. https://storm.apache.org/
[28] Doddamani, S. (2020, October 8). What is Apache Storm? Intellipaat Blog. https://intellipaat.com/blog/what-is-apache-storm/
[29] BigMLer-The command-line tool for Machine Learning | BigML.com. BigML.Com - Machine Learning Made Easy. https://bigml.com/tools/bigmler
[30] DataStax. (2020b, March 6). Hadoop Vs. Apache CassandraTM | Comparison. https://www.datastax.com/products/compare/hadoop-vs-cassandra
[31] Eliazat, A. (2018, May 16). 18 Big Data tools you need to know - Towards Data Science. Medium. https://towardsdatascience.com/18-big-data-tools-you-need-to-know-ebdb82f2c608
[32] Data Platform (CDP) Big Data Platform. (2021, February 9). Cloudera. https://www.cloudera.com/products/cloudera-data-platform.html
[33] Verma, A. (2020b, January 3). Top 10 Open Source Big Data Tools in 2020 [Updated]. Whizlabs Blog. https://www.whizlabs.com/blog/big-data-tools/
[34] Choi, N. (2018, November 16). Top 30 big data tools for data analysis. Big Data Made Simple. https://bigdata-madesimple.com/top-30-big-data-tools-data-analysis/
[35] D.P Acharjya, K Ahmed, "A survey on big data analytics: challenges, open research issues, and tools." International Journal of Advanced Computer Science and Applications 7.2 (2016): 511-518.
[36] E.LeDell, & S. Poirier, (2020, July). H2o auto ml: Scalable automatic machine learning. In 7th ICML workshop on automated machine learning.
[37] Home Page | HPCC Systems. HPCC. https://hpccsystems.com/
[38] Solutions, E. Jaspersoft Big Data services. Jaspersoft. https://www.e-zest.com/jaspersoft-big-data-services
[39] Team, T. (2020, September 20). Top 10 Big Data Tools for Analysis. TechVidvan. https://techvidvan.com/tutorials/big-data-analytics-tools/
[40] Rungta, K. (2021b, February 3). What is MapReduce in Hadoop? Architecture | Example. MapReduce. https://www.guru99.com/introduction-to-mapreduce.html
[41] G. (2020b, December 23). What Is MongoDB? G Teknoloji. https://www.gtech.com.tr/en/what-is-mongodb/
[42] MongoDB. What is NoSQL? NoSQL Databases Explained. https://www.mongodb.com/nosql-explained
[43] Sharma, R. (2021, January 11). Top 5 Big Data Tools [Most Used in 2021]. UpGrad Blog. https://www.upgrad.com/blog/big-data-tools/
[44] Big Data Discovery | OpenText Magellan. OpenText. https://www.opentext.com/products-and-solutions/products/ai-and-analytics/opentext-magellan-data-discovery
[45] Vargas, V., Syed, A., Mohammad, A., & Halgamuge, M. N. (2016). Pentaho and Jaspersoft: a comparative study of business intelligence open-source tools processing big data to evaluate performances. International Journal of Advanced Computer Science and Applications, 7(10), 20-29.
[46] Top big data tools used to store and analyze data text-magellan-data-discovery. Designing Buildings Wiki. https://www.designingbuildings.co.uk/wiki/Top_big_data_tools_used_to_store_and_analyze_dataentext-magellan-data-discovery
[47] H. B. U. Haq, H.U. R. Kiyani, S. K. Toor, S. Zafar, I. Khalid."The Popular Tools of DataSciences: Benefits, Challenges, and Applications." IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.5, 2020.http://search.ijcsns.org/07_book/html/202005/202005008
[48] Choi, N. (2018b, November 16). Top 30 big data tools for data analysis. Big Data Made Simple. https://bigdata-madesimple.com/top-30-big-data-tools-data-analysis/
[49] Splunk for big data analytics. Splunk. https://www.splunk.com/en_us/big-data/splunk-for-big-data-analytics.html
[50] N. (2020e, October 15). What Is Apache Sqoop? Intellipaat Blog. https://intellipaat.com/blog/what-is-apache-sqoop/
[51] SPSS Modeler - Overview. SPSS. https://www.ibm.com/products/spss-modeler
[52] Vidhya, A. (2020c, July 5). 18 Free Exploratory Data Analysis Tools For People who do not code so well. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2016/09/18-free-exploratory-data-analysis-tools-for-people-who-dont-code-so-well/
[53] Rungta, K. (2021b, February 1). Talend Tutorial for Beginners: What is Talend ETL Tool [Example]. Talend. https://www.guru99.com/talend-tutorial.html
[54] ReviewDesk, P. (2020b, November 21). TANAGRA. PAT RESEARCH: B2B Reviews, Buying Guides & Best Practices. https://www.predictiveanalyticstoday.com/tanagra/
[55] Xhafa, F., Naranjo, V., & Caballé, S. (2015, March). Processing and analytics of significant data streams with yahoo! s4. In 2015 IEEE 29th International Conference on Advanced Information Networking and Applications (pp. 263-270). IEEE.
[56] Narasimman, L. (2020b, April 17). Apache Drill vs Apache Hive - A comparative analysis. Indium Software. https://www.indiumsoftware.com/blog/apache-drill-vs-apache-hive/
[57] TechCrunch is now a part of Verizon Media. (2012b, August 17). Big-Data-TooL. https://techcrunch.com/2012/08/17/googles-real-time-big-data-tool-cloned-by-apache-drill/
[58] You are being redirected... HADOOP. https://www.mindsmapped.com/hadoop-advantages-and-disadvantages/
[59] S-Logix. BusBeat: Early Event Detection with Real-Time Bus GPS Trajectories –. https://slogix.in/busbeat-early-event-detection-with-real-time-bus-gps-trajectories
[60] Pedamkar, P. (2021, March 3). Uses of Hadoop. EDUCBA. https://www.educba.com/uses-of-hadoop/
[61] Team, D. (2018b, September 14). HBase Pros and Cons | Problems with HBase. DataFlair. https://data-flair.training/blogs/hbase-pros-and-cons/
[62] Big Data. https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781783985944/1/ch01lvl1sec15/applications-of-hbase
[63] Hadoop Hive Projects | Hive Real Time Projects. Hadoop Hive. https://www.dezyre.com/projects/big-data-projects/apache-hive-projects
[64] John, T. Data Lake for Enterprises. O’Reilly Online Learning. https://www.oreilly.com/library/view/data-lake-for/9781787281349/3ced0f87-601d-4016-9285-359a45bcdf8b.xhtml
[65] Shah, P. (2018, September 14). Why the Apache Mahout Framework is So Popular. Open Source For You. https://www.opensourceforu.com/2018/09/why-the-apache-mahout-framework-is-so-popular/
[66] Apache Mahout and Spark Comparison – matthew’s blog. (2015c, October 22). Apache Mahout. http://matthewbarga.com/blog/index.php/2015/10/22/apache-mahout-and-spark-comparison/
[67] Quora. Mahout. https://www.quora.com/What-are-the-real-world-use-cases-of-the-mahout-Which-all-companies-are-actively-using-mahout-for-machine-learning-purposess
[68] K. Apache Spark Pros and Cons. Apache Spark. https://www.knowledgehut.com/blog/big-data/apache-spark-advantages-disadvantages
[69] Singh, U. (2020, October 7). Top 3 Apache Spark Applications / Use Cases & Why It Matters. UpGrad Blog. https://www.upgrad.com/blog/apache-spark-applications-use-cases/
[70] See some Best-Known Big Data tools, their Advantages and Disadvantages to Analyze your Data. (2019c, May 22). Big Data Tools. https://www.houseofbots.com/news-detail/12023-1-see-some-best-known-big-data-tools,-there-advantages-and-disadvantages-to-analyze-your-data
[71] B. (2018a, December 25). Apache SAMOA – Scalable Advanced Massive Online Analysis. Big Data and Security. https://www.bigdata-security.net/samoa-scalable-advanced-massive-online-analysis/
[72] Companies Using Apache Storm. Apache Storm. https://storm.apache.org/Powered-By.html
[73] Wisdom Jobs. (2019, December 4). Apache Storm Applications - Apache Storm. https://www.wisdomjobs.com/e-university/apache-storm-tutorial-1298/apache-storm-applications-19117.html
[74] BigML Review: Pricing, Pros, Cons & Features. (2019, July 15). CompareCamp.Com. https://www.quora.com/What-are-the-limitations-of-BigML
[75] Team, D. (2018a, September 13). Cassandra Applications | Why Cassandra Is So Popular? DataFlair. https://data-flair.training/blogs/cassandra-applications/
[76] Dryad Data -- Social learning and the demise of costly cooperation in humans. Dryad. https://datadryad.org/stash/dataset/doi:10.5061/dryad.10g95
[77] DRYAD: Financing Sustainable community forest enterprises in Cameroon. World Agroforestry | Transforming Lives and Landscapes with Trees. https://www.worldagroforestry.org/project/dryad-financing-sustainable-community-forest-enterprises-cameroon
[78] H2o. https://www.quora.com/What-are-the-risks-of-using-H2O-ai-framework-When-would-my-company-need-to-pay-anything-to-H2O-ai-Is-the-framework-buggy-somehow-or-is-it-hard-to-install-configure-extend-Do-I-need-to-pay-for-consultancy-eventually.
[79] TrustRadius CAPTCHA. (Jaspersoft. https://www.trustradius.com/products/jaspersoft/reviews?qs=pros-and-cons
[80] Jaspersoft® Studio. Jaspersoft Community. https://community.jaspersoft.com/project/jaspersoft-studio
[81] https://www.researchgate.net/figure/The-advantages-and-disadvantage-of-MapReduce-applications_tbl1_303286828
[82] The Weather Channel Launches New Features in Hours, Not Weeks. MongoDB. https://www.mongodb.com/customers/weather-channel
[83] Chaudhri, A. (2015b, September 24). Advantages and Disadvantages of NoSQL databases – what you should know. Hadoop360. https://www.hadoop360.datasciencecentral.com/blog/advantages-and-disadvantages-of-nosql-databases-what-you-should-k
[84] UK Essays. Applications of Using NoSQL Databases. UKEssays.Com. https://www.ukessays.com/essays/information-technology/applications-of-using-nosql-databases.php
[85] Business Benefits. OpenText. https://www.opentext.com/products-and-solutions/partners-and-alliances/strategic-partners/accenture-and-opentext/business-benefits
[86] OpenText EnCase eDiscovery Pros and Cons | IT Central Station. Open Text. https://www.itcentralstation.com/products/guidance-software-encase-pros-and-cons
[87] Enterprise Content Management Software – ECM Software. OpenText. https://www.opentext.com/about/press-releases?id=7CC4DAE7BE9849D180922A5B3865F9E7
[88] M. (2019e, December 14). What is PolyBase? - SQL Server. Microsoft Docs. https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15
[89] M. (2020e, November 13). PolyBase features and limitations - SQL Server. Microsoft Docs. https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-versioned-feature-summary?view=sql-server-ver15
[90] Basel, K. (2020b, November 2). Python Pros and Cons. Netguru. https://www.netguru.com/blog/python-pros-and-cons
[91] M. (2019c, May 2). 30 Amazing Python Projects for the Past Year (v.2018). Medium. https://medium.mybridge.co/30-amazing-python-projects-for-the-past-year-v-2018-9c310b04cdb3
[92] Splunk Advantages. https://www.learnsplunk.com/splunk-advantages.html
[93] Apache. https://www.quora.com/What-are-the-disadvantages-of-splunk
[94] Excellent Review of IBM SPSS Modeler by a Real User.). SPSS. https://www.itcentralstation.com/product_reviews/ibm-spss-modeler-review-48144-by-altanatabarut
[95] Daniela, J. TANAGRA-A USEFUL TOOL FOR STATISTICS IN MEDICAL APPLICATIONS. In The International Conference Education and Creativity for a knowledge based Society–Computer Science, 2012 (p. 17). Brindusa Covaci.
[96] Chauhan, J., Chowdhury, S. A., & Makaroff, D. (2012, November). Performance evaluation of Yahoo! S4: A first look. In 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (pp. 58-65). IEEE.
[97] G. (2018c, October 4). Top big data tools used to store and analyse data. Big Data Made Simple. https://bigdata-madesimple.com/top-big-data-tools-used-to-store-and-analyse-data/
[98] The Benefits of Using R. (2016, March 26). Dummies. https://www.dummies.com/programming/r/the-benefits-of-using-r/
[99] D. Team, (2019, December 31). Pros and Cons of R Programming Language – Unveil the Essential Aspects! DataFlair. https://data-flair.training/blogs/pros-and-cons-of-r-programming-language/