The Roadmap to a successful Big Data Career
The Hype in Big Data has subsided and the reality has kicked in. As more and more companies are realizing the true potential of Big Data Technologies, they are joining in droves to experiment, adopt and grow their Big Data portfolio. As a result there has been a noticeable spurt in Big Data requirements popping up in the market place. As with any technology when they are new and hot, the supply generally lags the demand. In fact the situation with Big Data technology is so bad, that any average IT guy with Hadoop Certification on his or her Resume can comfortably land in an 80k salary job. Add a good chunk of DW/BI and/or Java skillset or experience, the salary ask can easily jump another 20%.
1. Need for a Clear Roadmap for the New Entrant –
But as with any new technology, we need a clear roadmap to succeed in this space. In the last few years many new training companies have started offering Big Data training courses both in online and in classroom format. There is wide spread attempt to sell this service quick and dirty over the Internet. While many of them are genuinely trying to offer good course curriculum, the offering is falling short of the complete package that is needed to succeed in the market place. For example hdfs and mapreduce (couple of key hadoop components) learning is not complete with solving a sample wordcount problem that most trainers exhibit in their course curriculum. It is nothing wrong to exhibit the wordcount problem for foundational learning experience of Mapreduce, but in order for you to succeed in Job Interviews you need projects that showcase real life business scenarios and solve complex problems.
With this discussion background, it is necessary to draw a clear and exhaustive roadmap to succeed in the Job market.
2. Meeting the Pre-requisites –
It is easy to get caught in the hype and start hadoop learning on the wrong foot. In order for anyone to get the best of any Hadoop training, it is always better to get started with the right skillsets. Let’s try to understand what Hadoop eco system looks like and why you need little bit of preparation before you jump into this new technology.
The following diagram depicts in brief the ecosystem of Hadoop and the platform it is built upon.
Core Java – Since Hadoop eco system products are primarily written in Java, most product integration happens with Java. It is good to have the core java skillset to get the best out of Hadoop. That being said, there are many components in Hadoop that do not require you to be a java expert (like Pig and Hive). But nonetheless, this is a nice to have skillset for hadoop learning.
Linux – Linux is the preferred OS platform for Hadoop installation and system management. Basic Linux skillset will make you feel home in hadoop eco system.
SQL – Hadoop is primarily a target data storage platform and for most modern day architecture, structured storage (tabular) is a basic necessity if not mandatory. Hadoop has its own share of structured data storage (called Hive) and uses HiveQL(an equivalent SQL like language) for data access from hdfs. Basic understanding of SQL is a must to get involved with Hadoop.
3. Learning the Hadoop Eco system –
This is where you learn the Concepts, Architecture and Usage of different hadoop components in the Eco system.
They are depicted on this diagram –
4. Expanding the learning to real-life use cases –
Putting all your learnings into use is the most natural step that one should do after completion of any course. Hadoop is no different and in fact since data storage and data management go together to solve any modern day data architectures, it is important to understand the end-to-end flow to get the right perspective of real life hadoop solutions.
Let me explain it with a small example depicted on this diagram. If company xyz is looking to understand the behavior of its customers and its products for its web-based retail stores, the typical sequence of data flow could comprise of following steps
1. Collect the weblogs from all its web servers in batch or in real time
2. Ingest the data into a distributed storage platform like hadoop
3. Process them in a highly parallelized computing environment like Mapreduce
4. Store the cleansed, Business view of data in a structured format like Hive tables
5. Access the data with high level query languages like HiveQL or reporting tools like Tableau and Jaspersoft
Getting an end to end picture of data pipeline in a hadoop based data management platform is extremely critical for better articulation of projects and create a good impact on the interviewer
5. Adding certification Credentials –
One of the relatively easier way to establish one’s credibility in a new technology space is to get certified from an industry accepted Vendor. Variety of Hadoop certifications are offered from popular vendors like Cloudera and MapR. The Certifications are relatively easy to pass, but it adds a lot of credibility to the new entrants.
It is highly recommended to get certified right after the completion of the hadoop developer course.
6. Getting Interview battle-ready –
One of the last but not the least area of weakness is getting Interview battle-ready. Even though supply of hadoop skillsets seriously lag the market demand, clearing a technical interview cannot be considered a cake walk for any new entrant. Apart from Certification and learning of Key components of Hadoop eco-system, it is important to inculcate a better industry perspective to face the industry experts on the other side of your phone line. Interview preparation deals with effective Story-telling skill and connecting the dots in the Technology and Business landscape and to presents 360 view of your hadoop learning.
Several Mock-Interview sessions from an Industry expert will immensely benefit this area of preparation
7. Putting all together –
Putting these milestones in order and making them time bound will help someone achieve the ultimate Big Data Job one deserves or aspires for. Without a clear roadmap, it is certainly a day dreaming exercise and we are talking about lot of unplanned efforts getting wasted over a long stretch of time
In the next diagram we have depicted time bound boxes inside an Illustrative Roadmap –
Time line to reach the goal –
Big Data Technologies are vast and diverse. Starting from core Hadoop technologies like hdfs, mapreduce to advanced frameworks like Apache Storm, Apache Kafka, Apache Spark, YARN and new products like Graph dbs, Redis, Cassandra and Mongo dB are few names one could count in this space. But if someone has to start at somewhere, it is advisable to start from the core Hadoop platform and slowly move into the other areas of expertise. Although it is difficult to put a number on the time line, the typical timeline to reach a comfort level of competency in Hadoop is around 3 months, given there is consistent effort all along and one follows a predefined set of time bound tasks or so called Roadmap.
8. Getting the Big Picture –
Big Data is a big word to be confined within a basket of technologies or Frameworks. As the data deluge goes beyond the physical boundaries of corporate data centers, and business use case test the limits of traditional boundaries, big data related technology provides many options and opportunities to scale, compute and deliver information in a more distributed yet coherent fashion. This is where the confines of Big data gets larger and diverse.
In a broader sense, we could depict the different layers of Big data technology as in the next picture.
Depiction of Larger Big Data Technology space
9. Next Step –
As with next step in your Big Data journey, picking the right training platform and right advisors are critical factors for successful training and success with career. Apart from the self-motivation and fire that one needs to tread the tortuous path of this multi-discipline learning, a bit of hand-holding will go a long way in your path to success. As there is no dearth of training institutes around, finding passionate educators and a multi-discipline curriculum is a bit rare these days.
To everyone’s success and wishing the very best
Big Data Evangelist and Educator
@KnowledgePact.com, a Premier Big Data training and consulting firm