Pentaho Big Data Fundamentals
With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.
Pentaho provides the right set of tools to each user, all within a tightly coupled data integration and analytics platform that supports the entire big data lifecycle. For IT and developers, Pentaho provides a complete, visual design environment to simplify and accelerate data preparation and modeling. For business users, Pentaho provides visualization and exploration of data. And for data analysts and scientists, Pentaho provides full data discovery, exploration and predictive analytics.
Using a combination of instructor-led presentations and hands-on exercises, this course provides an overview of big data technologies and an overview of the Pentaho tools for both working with big data and for visualizing it. This course helps prepare you for the Pentaho Data Integration Certification Exam.Back to Courses
This course is the third course in the Database Developer path. Students who need a comprehensive overview of big data tools and technologies should take this course instead of PDI4000: Pentaho Data Integration and Big Data. PDI4000 is intended for students experienced in both PDI and big data.
|Online||English||Pentaho||January 28, 2014 - 10:00 AM EST||Register Now|
|Online||English||Pentaho||February 25, 2014 - 10:00 AM EST||Register Now|
|Online||English||Pentaho||March 18, 2014 - 10:00 AM EDT||Register Now|
At the completion of this course, you should be able to:
- Identify the purpose and value of various big data technologies: Hadoop, HDFS, Hive, MapReduce, NoSQL databases, and so on
- Read and write data using HDFS
- Orchestrate big data jobs in Pentaho Data Integration
- Use Pentaho Data Integration (and Pentaho MapReduce) to manipulate big data
- Read and write data using a NoSQL data source
- Visualize big data using Pentaho InstaView
Before taking this class, students should complete course PDI2000: Pentaho Data Integration I or have equivalent field experience with Pentaho Data Integration. Big data knowledge is helpful but not required. Some basic knowledge of the Linux operating system (CentOS) is required.
Online courses require a broadband Internet connection, the use of a modern Web browser (such as Microsoft Internet Explorer or Mozilla Firefox), and the ability to connect to the WebEx Training Center. For more information on WebEx Training Center requirements, see www.webex.com. Online courses use Pentaho’s cloud-based exercise environment. Students are provided access to a virtual machine used to complete the exercises.
For online courses, students are provided with a secured, electronic course manual. Printed manuals are not provided for online courses. When an electronic manual is provided, students are encouraged to print the exercise book before class begins, though this is not required.
Students attending this course on-site should contact their Customer Success Manager for hardware and software requirements. You can also email us at email@example.com for more information regarding on-site training requirements.
Day 1 Agenda
Pentaho and Big Data
Big Data Overview and Architecture
Hadoop, HDFS and Flume
Writing Data to HDFS using Flume
Working with Structured Data
Working with MapReduce
Working with Pentaho MapReduce
Day 2 Agenda
Working with Hive
Working with Pentaho InstaView
Reporting on Big Data
Working with NoSQL Databases
Oozie, Pig and Sqoop
Transforming Data using Pig