Автор: Govind P. Gupta
Издательство: IGI Global
Год: 2023
Страниц: 233
Язык: английский
Формат: epub (true)
Размер: 16.0 MB
Advanced Computational Intelligence (CI) techniques have been designed and developed in recent years to cope with various Big Data challenges and provide fast and efficient analytics that assist in making critical decisions. With the rapid evolution and development of internet-based services and applications, this technology is receiving attention from researchers, industries, and academic communities and requires additional study. Convergence of Big Data Technologies and Computational Intelligent Techniques considers recent advancements in Big Data and Computational Intelligence across fields and disciplines and discusses the various opportunities and challenges of adoption. Covering topics such as Deep Learning, data mining, smart environments, and high-performance computing, this reference work is crucial for computer scientists, engineers, industry professionals, researchers, scholars, practitioners, academicians, instructors, and students.
With the rapid evolution and development of internet-based services and applications, Big Data problem is perceived recently in almost all domains like social network analysis, Healthcare informatics, Cyber security Analytics, Sentiment Analysis etc. Big Data refers to a collection of huge datasets that are very complex in structure and size and cannot be handle with help of traditional data management tools. Nowadays, application of Computational Intelligence (CI) techniques in Big Data analytics are very emerging research area among the software industry, data engineering researchers and academics community. Recent advancement in Computational Intelligent (CI) techniques by converging the mathematical modelling techniques with data engineering and optimization techniques, motivates the big data engineers and researchers to apply in various engineering domain related to Big Data. Advanced computational intelligence techniques are being designed and developed in recent years to cope with the various big data challenges to provide fast and efficient analytics which helps in making of the critical decision. Big data processing and Analytics have found various applications in different domain such as healthcare, intelligent transportation system, smart cities, smart grid, smart environment.
• Hadoop and HDFS: Hadoop is a distributed computing framework, developed for cluster computing and big data processing for a big data problem. It is inspired by the Google file system and designed an open-source distributed file system known as Hadoop Distributed File System (HDFS) for management of the big data that are distributed over clustered computing system. It uses map reduce programming approach for designing compute for big data processing. There are various tools are developed for different applications over Hadoop ecosystems such as Cassandra and HBase, Hive for data aggregation and summarization, Apache Avro for data serialization, Pig, and Mahout for machine learning reference library.
• Apache Spark: It is a Big Data processing framework developed by AMP Lab, University of California, Berkeley. It is designed with distributed and advanced in-memory processing capability to overcome the limitation of Hadoop processing framework. Apache spark is developed to handle both batch and stream processing big data. It is having its own machine learning library. Architecture of the Apache spark is designed to overcome the limitation of map reduce engine of Hadoop ecosystem. Spark can perform processing 10 to 100 times faster than the map reduce engine of Hadoop. It supports four different API like Java, Python, Scala, and R.
• Map Reduce Programming: It is programming model that is designed for development of highly scalable program for Hadoop cluster. It is having mainly two components such as map and reduce. In the mapping components, a set of data is converted into key-value pair. Reduce function takes output of map function as input and combines those key-value pairs whose keys are similar. Any sequence of map reduce function can be applied over the big data problem in order to perform data analysis.
• Splunk: It is a time series indexer tool designed for big data problem. In this tool, Splunk processing language is used for searching, manipulation and analysis of big data.
• MongoDB: It is designed for high performance and scalable management of document database. It can be able to handle both structured and unstructured data. This tool is an open-source tool.
• Apache CouchDB: It is an open-source NoSQL database, designed and developed for handling multiple formats of data. It uses JSON data format for storing the data.
• Apache Kudu: This tool is designed to carry out fast data insertion and updation operations and for efficient scanning of table. This tool is designed with objective to remove the limitations of both HDFC and HBase. This tool can handle a very high velocity data and can perform real-time analytics over them.
• Apache Impala: It is a massively parallel processing engine for executing the SQL task over distributed big dataset. It is developed over Hadoop platform. It is a high-performance query engine designed particularly to integrate different big data storage engine like HDFS, Kudu, s3 and HBase and process the query over them.
Скачать Convergence of Big Data Technologies and Computational Intelligent Techniques