Data & Analytics

Big data, Hadoop - lunchtime talk 2015.02.26

Description
1. Big Data Consulting Hadoop, big data Robert Gibbon - www.bigindustries.be 2. The information age ■ The “economic third wave” has badly hit many blue chip…
Published
of 31
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  • 1. Big Data Consulting Hadoop, big data Robert Gibbon - www.bigindustries.be
  • 2. The information age ■ The “economic third wave” has badly hit many blue chip organisations ■ Manufacturing and retail is in rapid decline in Europe and the US ■ Tech, connectivity and information is restructuring our societies ■ Levels of political and social engagement have surged ■ Trading platforms are empowering small businesses
  • 3. Innovation ■ Mass-production hates innovation ■ Innovation means change – a huge cost with little benefit for production-line economies ■ Continuous improvement mentality ■ Knowledge services need to innovate to differentiate ■ Change in a virtual world can be cheap and yield huge rewards ■ Continuous reinvention mentality
  • 4. The rover bicycle, 1885
  • 5. Big data viz. innovation ■ In a free market like the web, innovation can open up new opportunities ■ Consumer access to grid computing tech is a recent innovation ■ Grid computing opens up new opportunities that would otherwise not be viable ■ Ideal for ventures architected around the long-tail economic model
  • 6. The future - thingternet ■ The internet of things is with us ■ Billions of connected devices, even digital tattoos
  • 7. Big data viz. internet of things ■ Billions of connected devices create a huge amount of data ■ Until big data tech, Internet of Things was nearly impossible to monetize
  • 8. The internet of things is a wild west ■ Many new, unsolved challenges ■ Privacy ■ Governance ■ Civil liberties ■ New challenges = new opportunities
  • 9. let's get back to hadoop
  • 10. ■ FOSS software solution for processing terabytes to petabytes of data ■ Using arrays of regular servers ■ Hadoop core: ■ HDFS - a scale-out file system ■ YARN - a scale-out application resource manager ■ Runtimes: ■ Spark, Impala, Flink, MapReduce, Kafka, SolrCloud etc. ■ Components for data protection, access control and operational management ■ NOSQL databases ■ Hbase, Accumulo, Cassandra etc. Hadoop refresher
  • 11. what can you do with hadoop?
  • 12. Storage ■ Pure online data storage, with no other processing ■ Low cost per-GB for petascale online storage ■ Option to directly query and analyse the data is available if required.
  • 13. ■ Example: huge, constantly changing catalogue of products – like Ebay and Amazon ■ SolrCloud – an advanced search engine serving terabytes of content from Hadoop Search
  • 14. Messaging ■ A distributed message queue backed by a Hadoop cluster - Apache Kafka ■ Elastically scalable ■ Messages are persisted and replicated for durability ■ TBs of messages per broker with predictable performance
  • 15. Targeting ■ Personalised content for users ■ Generates and consumes a huge amount of log data ■ for reporting ■ for predictive analysis ■ Predictive analysis is compute intensive ■ Can be TBs of data per day
  • 16. Self-service Business Intelligence ■ Enterprise Data Hub paradigm ■ A very popular emerging use case ■ Business users directly access raw datasets using specialised discovery tools built on top of Hadoop - DataMeer, Platfora and others
  • 17. Data warehousing ■ Migration of Enterprise Data Warehouse to Hadoop ■ Big cost savings versus trad vendors like Oracle and Teradata
  • 18. Machine learning ■ Predictive analytics with Spark MLLib or Revolution R Enterprise ■ Automatically predict component failures for proactive intervention
  • 19. Big Database ■ Low latency, high throughput, high concurrency, high volume ■ Algotrading ■ Realtime ad auctions ■ Volumes at 200BN transactions per day in realtime reliably served
  • 20. ■ Analysis and response to threats detected by SPI module on remote switch ■ Automated systems management – shut down heating when nobody home to reduce heating bill and emissions ■ Monitor driver propensity to break the speed limit - offer lower insurance premiums to good drivers Device management
  • 21. hadoop - mature?
  • 22. Choice of vendors
  • 23. Solid operational management
  • 24. Impala v Teradata
  • 25. Free grid computing
  • 26. Free scale-out database
  • 27. Growing commercial ecosystem
  • 28. Secure and available ■ RPC authentication and encryption with PKI ■ Data encryption at rest and in transit ■ Kerberos resource access control - HDFS, YARN ■ Table cell level permissions - Accumulo ■ Online snapshot backups ■ No SPoF
  • 29. thanks for listening be.linkedin.com/in/robertgibbon
  • We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks
    SAVE OUR EARTH

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!

    x