BUILDING A DATA SCIENCE WORKFLOW
03/28/2022 2022-03-28 22:37BUILDING A DATA SCIENCE WORKFLOW
BUILDING A DATA SCIENCE WORKFLOW
An Effective Communication Strategy and Mental Guided Framework For Data Science Teams.
Getting started in data science and working on projects can appear confusing, scary, and irrational to junior data professionals and new entrants in the space. Some complaints often are, “ the process is not straightforward, ‘’I don’t have a mental picture or flow of what to do next”. Most times, this fearful disposition is simply due to the lack of adequate explanation and communication of the pathway, from guides, mentors, or the leadership. Writing Machine learning algorithms are useful, but without a clear understanding of the workflow, it’s very hard to comprehend the thought process of projects. Like a journey of a thousand miles, every data science project has a beginning and an ending. This is the concept of the “Data Science Workflow”.
.
One way to think about the benefit of having a well-defined data science workflow is that it is like a set of guardrails or trackers to help you plan, organize, and implement your data science project. Workflows drive efficiency in the process and help to keep track of each step in the phase of the project. More like project tracking, it makes the process seamless, efficient, and understandable.
Mastering the data science workflow is essentially crucial and lays the basis of data science projects in the real world. Without having a mental model, it is usually very difficult to grasp processes and know exactly what steps need to be taken. Oftentimes, as projects become complex, it could seem to be of a zig-zag learning process.
With a simple definition,
“A data science workflow defines the phases (or steps) in a data science project. Using a well-defined data science workflow is useful in that it provides a simple way to remind all data science team members of the work to be done to do a data science project.”
JEFF SALTZ
By experience, I relate to this as a missing link in my early struggles as a newbie in the data science ecosystem. I struggled to comprehend the thought process and thereby became very confused. With the benefit of hindsight, I can highlight two reasons why I was struggling,
First, “I didn’t have a trusted guide who could show me how to define and solve data science problems“
Secondly,
“I didn’t have a complete mental model of how the pieces fit together. (When do you do “feature engineering”, when do you use “fit” vs “transform”?” etc.
As data leaders, introducing workflows, will better drive home the message to junior professionals or newbies in the team. As important as machine learning algorithms are, so super-important are workflows.
“Why is the Workflow important?”
First, communication is key. “Building an effective workflow will have a great impact on your thought process and data science journey than your ability to pick between algorithms. Also, once you’ve carefully mastered the workflow, you can iterate through different algorithms quickly as the case may be, even if you don’t understand them deeply.”
Secondly,
“Mastering and Understanding algorithms is useful. However, it’s quite hard to decipher in advance which algorithm will work best for a particular problem. That’s why it’s critical to build a flexible workflow that enables you to easily
experiment with different algorithms.”
With that being said, let’s walk through what a simple data science workflow looks like.
Kindly note, that this is sample workflow, and working on projects may require you keep fituning and coming back to the steps until you get your desirable outcome.
From the diagram above, we can easily grasp a mental flow of the necessary activities involved in each step of the way. This keeps our thought process and our work well organized. More so, it helps beginners and early data professionals understand what they are doing and are supposed to do at every step.
From the image above, we can tell and differentiate how beginners approach projects as to how professionals would do. Building this approach in organizations would benefit teams and
For most beginners, the process begin at Step 2, “Prepare and Explore the data, without the understanding of the business problems. But as we develop and become better professionals, we’ll require to understad the business problem in context, probing and asking questions, and then finally in conclusion, visualize, present, and communicate the impact of the findings to your audience for the necessary action to be taken.
Soft skills such as communication becomes very paramount at this last stage
than just writing codes and building models. This is were most data professionals needs further training.
Finally, as mentioned, Understanding algorithms is useful, but building a flexible workflow enables you to easily experiment with different algorithms. Also, building an effective workflow will have a great impact on your thought process and data science journey than your ability to pick between algorithms. No data science project is ever 100% complete. This is why iterating through the data science workflow is key to continue building and fitting your results.