Big data in Azure

1. BigData in Azure Venkatesh 2. Introduction to Azure ã Azure Cloud Service ã PaaS ã IaaS 3. What is BigData ã Analyzing extremely large datasets computationally…
of 16
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  • 1. BigData in Azure Venkatesh
  • 2. Introduction to Azure • Azure Cloud Service • PaaS • IaaS
  • 3. What is BigData • Analyzing extremely large datasets computationally to reveal patterns, trends and associations. • Characterized by 3Vs (Volume, Velocity and Variety). • Enhanced insight and decision making.
  • 4. BigData vs Database
  • 5. Microsoft BigData solutions • Microsoft supports Hadoop based BigData solutions. • Built on top of Hortonworks Data Platform (HDP) • Three distinct solutions based on HDP • HDInsight • HDP for Windows • Microsoft Analytics Platform
  • 6. Microsoft Data Platform
  • 7. Hadoop • Hadoop - Framework for solving bigdata problem by using scale-out “divide and conquer” approach • HDFS – Hadoop Distributed File System. Allows data to be split across multiple nodes. • MapReduce – Enables distributed processing.
  • 8. Hadoop Components • Cluster – Collection of server nodes, stores data using HDFS and process it. • Datastore – Data store in each server is a distributed storage service (HDFS /Equivalent) • Query – Big data processing queries using Map Reduce
  • 9. HDInsight • Implementation of Hadoop that runs on Azure Platform • Pay only for what you use • Dynamic allocation of Nodes in the cluster • Integrated with Azure storage
  • 10. HDInsight - Data Storage • Following types of storage supported by HDInsight • HDFS (Standard Hadoop) • Azure Storage Blob • HBase
  • 11. HDInsight – Data Processing • Run jobs directly on the cluster using Map Reduce • Use external programs to connect to the cluster. • Pig – Execute queries by writing scripts in high level language • Hive – SQL like query on the data • Mahout – ML library that allows to perform data mining queries • Storm – Real time computation for processing fast, large streams of data
  • 12. Data Loading Options
  • 13. Designing for HDInsight • Determine the analytical goals and source data • Plan and configure the infrastructure • Obtain data and submit it to HDInsight • Process the data • Evaluate the results • Tune the solution
  • 14. Azure DataLake • Single place to store all structured and semi-structured data in native format • Unlimited data size • Compatible with HDFS
  • 15. Creating HDInsight Cluster
  • 16. Summary • Hadoop – Defacto solution to the Big Data problem • Windows Azure HDInsight Service • Native Hadoop implementation • Managed Hadoop Service for Windows Azure
  • We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!