VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

VMworld 2013 Chris Greer, FedEx Richard McDougall, VMware Learn more about VMworld and register at
of 40
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  • 1. Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps Chris Greer, FedEx Richard McDougall, VMware VAPP5402 #VAPP5402
  • 2. © 2013 VMware Inc. All rights reserved Beyond Mission Critical: Virtualizing Big-Data, Hadoop and Cloud Apps Richard McDougall CTO, Storage and Application Services Chris Greer, Enterprise Architect, FedEx
  • 3. 3 Virtualize Everything: Next Generation Apps Virtual Storage Arrays vSphere SAN/NAS Object / BLOB Traditional Applications • Traditional enterprise storage • HW-based resiliency, QoS Next Gen Cloud Apps • Scale out, flash, DAS • Application specific storage All SSD Array Server-side Flash
  • 4. 4 The complexity enterprise IT and developers face today An Idea for a cool app Spec a server config Justify server costs Procurement process Wait for HW to arrive Wait for IT ops to Image the server Install a Database LOB Architecture approval Central IT Architectural approval Justify more server for scale testing Wait for more HW Configure ACLs and LBs New infrastructures New Languages and Frameworks New Devices and Domains New Data types and requirements
  • 5. 5 Micro Clouds Cloud Foundry – Announced Today on vSphere Data Services Other Services Msg Services .js Public Clouds Private Clouds
  • 6. 6 Big Data - Not Just for the Web Giants – Now the Intelligent Enterprise
  • 7. 7 Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis  Personalized Customer Targeting`
  • 8. 8 The Emerging Pattern of Big Data Systems: Retail Example Real-Time Streams Exa-scale Data Store Parallel Data Processing Real-Time Processing Machine Learning Data Science Cloud Infrastructure
  • 9. 9 Storage: Plan for Peta-scale Data Storage and Processing 0.01 0.1 1 10 100 1000 2000 2003 2006 2009 2012 2015 Online Apps Analytics PB of Data Analytics Rapidly Outgrows Traditional Data Size by 100x
  • 10. 10 Unprecedented Scale “Data transparency, amplified by Social Networks generates data at a scale never seen before” - The Human Face of Big Data We are creating an Exabyte of data every minute in 2013 Yottabyte by 2030
  • 11. 11 A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem  Proactively Maintained Connected Product
  • 12. 12 Cloud Infrastructure Supports Mixed Big Data Workloads Machine Learning HadoopReal-Time Analytics Cloud Infrastructure Machine Learning Hadoop Real-Time Analytics Management Network/Security Storage/Availability Compute
  • 13. 13 Cloud Infrastructure Supports Multiple Tenants Cloud Infrastructure Management Network/Security Storage/Availability Compute Web User Analytics Financial Analysis Historical Customer Behavior
  • 14. 14 Software-defined Datacenter: Compute Agility / Rapid deployment Lower Capex Isolation for resource control and security 1 2 3 Operational efficiency4 Management The Core Values of Virtualization Apply to Big Data Network/Security Storage/Availability Compute
  • 15. 15 Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Nosy Workload 3 Cloud Infrastructure
  • 16. 16 Virtualizing Hadoop  Shrink and expand cluster on demand  Independent scaling of Compute and data  Strong multi-tenancy Elasticity & Multi-tenancy  High availability for entire Hadoop stack  One click to setup  Battle-tested High Availability  Rapid deployment  One stop command center  Easy to configure/reconfigure Operational Simplicity
  • 17. 17 Serengeti Virtual Hadoop Manager (VHM) Hadoop Virtualization Extensions (HVE) Big Data Extensions: Core Components  Core is Open Source  Tool to simplify virtualized Hadoop deployment & operations Serengeti  Virtualization changes for core Hadoop  Contributed back to Apache Hadoop  Advanced resource management on vSphere
  • 18. 18 Hadoop batch analysis Big Data Family of Frameworks File System/Data Store Host Host Host Host Host Host HBase real-time queries NoSQL Cassandra, Mongo, etc Big SQL Impala, Pivotal HawQ Compute layer Virtualization Host Other Spark, Shark, Solr, Platfora, Etc,…
  • 19. 19 Traditional Hadoop vs. Elastic Hadoop Scale-out Network Storage Traditional Hadoop: Converged Compute/Storage Elastic Compute Scale-out Network Storage
  • 20. 20 Management Software-defined Datacenter: Storage Requirements of Next Generation Storage Network/Security Storage/Availability Compute 10x lower cost of storage Handle explosive data growth Support a variety of application types 1 2 3 Solve the privacy and security issues 4
  • 21. 21 HDFS Model ESX ESX ESX J T HDFS or MAPR VM HDFS or MAPR VM HDFS or MAPR VM Local Disks SAN/NAS Non-Hadoop VMs Hadoop Compute VMs JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager N N T T T T T T VirtualCenter Management Server DRS DRS DRSDRS DRS VHM Hadoop HDFS VMs T T T T T T J T
  • 22. 22 Big-Data using Local Disks Host Host Host Host Host Host Host Top of Rack Switch Servers with Local Disks 16-24 core server 12-24 SATA 2-4TB Disks 10 GbE adapter iSCSI/NFS for Shared Storage for vMotion etc,… High Performance 10GBE Switch per Rack
  • 23. 23 Scale-out Storage for Big Data $- $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 $4.50 $5.00 $5.50 0.5 1 2 4 8 16 32 64 128 Cost per GB Petabytes Deployed Traditional SAN/NAS Distributed Object Storage HDFS MAPR CEPH Scale-out NAS Isilon, NTAP
  • 24. 24 Big Data Storage Scale-out Network Storage Elastic Compute Scale-out Network Storage • Hadoop Protocol • Snapshots • Posix Apps • Full NFS Access • Replication • Erasure Coding
  • 25. 25 Big Data with Scale-out-NAS Big-Data using Scale-out NAS Host Host Host Host Host Host Top of Rack Switch Scale-out NAS Host Host Host Host Host Host Top of Rack Switch Scale-out NAS Temp Data Shared Data Isilon Scale-out NAS Local Disk or SSD In each Host For Transient Data
  • 26. 26 Chris Greer, FedEx Services
  • 27. 27 Breakthrough Use Cases  Web Log Analysis  Initial exploration was around detection of mobile devices accessing the website.  Analysis of 570 billion web server log entries took approximately 9 minutes to complete on a small cluster.  ZIP code Analysis  Analysis of data to determine which ZIP codes are the highest source or destination for shipments.  Shipment Analysis  Analysis of shipment information to determine patterns that may delay a package.
  • 28. 28 Agile Big Data at FedEx • Trusted Isolation • Well known auditable platform Security • Deploy in minutes • Optimize for shift in workload characteristics Agility • Create true multi- tenancy • Mixed workloads Elasticity
  • 29. 29 Hadoop Service at FedEx: vSphere + Isilon Storage Scale-out Isilon Cluster - Shared Data - NAS + Hadoop Elastic vSphere Cluster - Mixed Workloads - vSphere - Existing Rack Mount Servers
  • 30. 30 Agility: Automation of Hadoop Cluster Management Deploy Resize Elastic scaling Customize Incorporate best practices Manage Tune configuration Run Execute jobs Access HDFS
  • 31. 31 Monitoring Agility: Ease of Management Due to Consolidation Cluster setup and provisioning Monitoring HW procurement and sizing Cluster setup and provisioning HW procurement and sizing
  • 32. 32 Elasticity: Mixed Workloads on a Shared Platform Production Test Experimentation Dept A: Marketing Dept B: Operations Production Test Experimentation Log files Social dataTransaction data Historical data  Common Infrastructure  Common Infrastructure can be shared by multiple logical Hadoop clusters and prioritized with VMWare resource pools.  Data Segregation  Data that should not be shared can be kept separate and leverage VMWare security controls for isolation.
  • 33. 33 Security  Known Security Model • VMs provide the required levels of Isolation for different workloads  Trusted Auditable Platform • Leverage virtualization as the platform • Known to auditors • Accepted as a valid deployment model
  • 34. 34 Summary
  • 35. 35 Customers Winning from Consolidated Big Data Platforms “Dedicated hardware makes no sense” “Software-defined Datacenter enables rapid deployment multiple tenants and labs” “Our mixed workloads include Hadoop, Database, ETL and App-servers” “Any performance penalties are minor”Management Network/Security Storage/Availability Compute
  • 36. 36 Q&A
  • 37. 37 Other VMware Activities Related to This Session  HOL-SDC-1309 - vSphere Big Data Extensions  VAPP5484 – Big Data Extensions Advanced Features  VAPP5626 – Big Data Panel
  • 38. THANK YOU
  • 39. Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps Chris Greer, FedEx Richard McDougall, VMware VAPP5402 #VAPP5402
  • fluid mechanic

    Aug 22, 2017


    Aug 22, 2017
    Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!