My Roadmap to Learning Data Science (So Far) Part 2: Understanding the Basics of Hadoop

I did not see any courses on Coursera that dealt with Hadoop, but Udacity had an introductory course on Hadoop and MapReduce so I made a note to look at that after I got started with my other data science courses.

When I did get around to the Hadoop class, I thought it was a nice introduction to big data, but I was having trouble with the coding assignments so I decided to look for another course or tutorial. I tried searching for MapReduce instead of Hadoop in Coursera and found a course by the University of Washington that was called "Introduction to Data Science." (This course is not part of the data science specialization.)

It seemed like something definitely worth checking out and might offer a perspective on some skills I need outside of what is offered in the data science specialization. The course was not currently open, but I could access the course materials and watch the videos. In the first week's videos, the course instructor talked about how the course would focus more on theory and less on application. While I think theory is very important, I was looking for something that would focus more on application.

I wanted to learn how to install Hadoop on my machine and run a metaphorical "Hello world" program in Hadoop before moving forward with any theory. It just suits my personal learning style better.

I decided there was no harm in continuing to watch the videos for a bit and see what I get out of it. What I did learn is that MapReduce is a programming model (so a theoretical "thing" that Hadoop implements) and not something specific to Hadoop.

Although this course was not particularly what I was looking for, it did help expand my "broader picture" of data science so it was useful in that regard.

I still was looking for something to help me get started with Hadoop. When I was researching tutorials and courses before viewing the videos for the University of Washington course, I came across a list of free tutorials on Hadoop on a site called SkilledUp (which is an awesome resource!)

I decided to try those tutorials next. They were by Cloudera which is the same company that created the Hadoop and MapReduce Udacity course I mentioned earlier. I watched a couple of tutorials, but I was not getting the step-by-step guide to installation I was looking for.

I looked through Cloudera's courses and found this tutorial called "Get Started with Hadoop in Less Than Thirty Minutes" and I was excited because I thought this just might be what I had been searching for. Unfortunately, it was not.

Another search and I found a tutorial on Udemy on Hadoop and although Udemy requires you to pay for many of their courses, one of their courses had a free video on how to set up Hadoop on your machine! I made a note of this and decided to put Hadoop on hold and get back to R.

During and after these Hadoop adventures, I reached out to some people in the industry to get their thoughts on Hadoop tutorials and people had various recommendations on books and other online platforms, but I even heard that R and Hadoop are incompatible (which, without having a better understanding of Hadoop, I do not fully understand, but perhaps I will in the future).

I did try a Udacity course on R and a Data Camp tutorial on R in between working on the Coursera specialization to get different perspectives on the material. At the moment, I'm back to the Coursera data science specialization.

Thanks for reading! I'll keep you updated as I progress further.

Comments