Written by Oleg Kazantsev
Agile delivery methods have, over the past 20 years, become firmly ingrained in the collective know-how of the technology world. It’s common knowledge that Agile allows small teams to be productive at scale, delivering value to a business when it’s most needed and absorbing risks and uncertainty like no other approach.
But proposing Agile methods to professionals in the Data & Analytics field is more likely to yield a resounding “Yeah…no.” Everyone agrees that agility is great, but many, many data people feel that Agile methodology just isn’t made for big data.
One of the big reasons for this methodology dysmorphia (as one might call it) is confusion around MVP. The Minimum Viable Product (MVP), central to the philosophy of Agile delivery, is a releasable and value-adding feature of a system that can be packaged and rolled out to the end-user at the completion of an iterative delivery cycle.
Agile originated in the world of application-centric startups, so an MVP is usually illustrated by Agile coaches as some widget, enhancement, or new capability. It’s an appetizer that a server brings to the restaurant table so the customers can bite into something while they wait for the main course. Cutting up a large project into mini finished ‘courses’ like this keeps everyone from getting impatient and hangry.
When a similar gastronomic metaphor was used in a recent Agile class for a Data & Analytics division of a major financial institution, everyone nodded. Then, one person raised a hand.
“But what we deal with here is DATA,” they said. “It’s more like juice. Or Wine. Or water. How do you cut water?”
Bullseye. That’s the MVP problem for data in a nutshell. It’s hard to define segments in data—there’s no clear development or test stage. It all overlaps. Contrasted with software features, data and information are much less compartmentalized into usable objects with a clear human-centric function.
Does creating and standardizing a new staging table on the backend constitute an MVP? Is a large data load an MVP? Or should a consulting team just divide a linear, waterfall-like project into multiple two-week sprints, skipping the concept of MVP altogether, and call it “as Agile as it gets?”
Let’s explore the fine art of water-cutting (that is, how to define MVP in Data & Analytics) using three real world examples.
Here’s the use case: A large financial institution is looking to give its multiple divisions equal access to the troves of accumulated data from in a central location—ideally, a cloud. How should they divide such a large cloud transformation into meaningful MVPs?
Answer: When the scale and volume of data transformation are the main challenge, consider dividing the data journey itself, using the concept of Medallion architecture.
Medallion, coined by DataBricks, divides the data journey into three stages:
In standard Agile, a minimum viable product is something that can be rolled out to the end user (usually, the application consumer). So, is Gold the only possible MVP? Not necessarily. To divide the data journey into MVPs requires a strong understanding of the value of data organization-wide. Consider this:
• Large companies often act as raw data merchants for smaller, solution-centric startups. Raw but centralized data (such as event-streaming logs from Kafka or user interactions from social media sites) is a valuable resource for these transactions. This could constitute a definition of MVP for Bronze data.
• Aggregated, standardized data may be used by different divisions of the same company to enable their own software development lifecycle activities. For example, schematized (even if not business-ready) data could be used for machine learning of AI algorithms. It may also be consumed by other in-house applications and used as test data in lower environments. All of these let us define Silver data as MVP.
• No such complexities exist for the Gold data as MVP. It’s business-ready and thus constitutes the default definition of MVP (and, in fact, that of the final deliverable).
Here’s the use case: A risk management unit produces and analyzes multiple reports from various upstream sources and rolls them out to regional and global directors via a business intelligence application. How can they start delivering value early?
Answer: When an existing business process undergoes digital transformation, automation, and enhancement, consider delivering value end-to-end, one expertise area at a time. In that way, each end-to-end (E2E) delivery of a unique report or widget forms an MVP.
Critical to this approach is a clear understanding of business processes. Once it’s known how the reports and widgets can be thematically grouped, they can be divided into data swimlanes. Sometimes, a complex report with elaborate filtering will form a swimlane of its own; sometimes, multiple minor widgets drawing data from the same upstream source could be united into a single swimlane.
Why the term “swimlane?” Because a competitive swimmer always finishes last if they jump lanes or failed to finish their distance, no matter how well they swam. People need to keep to their lane.
What to consider when you define MVP by data swimlanes?
Here’s the use case: A stale data segment (meaning one that hasn’t been used or evaluated in a while) was identified and needs remediation. The data governance team hopes to audit the data with business owners and then roll it out to production in batches. How does the client ensure that all involved systems can consume the remediated data without errors? How does the client eliminate errors/issues in user-facing applications after the remediation?
Answer: When structural changes are not requested, eliminating or reducing risks in the data rollout adds value. Risk reduction builds confidence in company data and data-driven decisions, which is the basis of data governance.
Using risk reduction as an MVP first requires a clear map of business processes and data consumption within the organization. The team should also have a realistic understanding of the volume of impacted data compared to the systems’ ETL bandwidth.
(ETL stands for Extract, Transform, Load. It refers to the process of extracting data from various sources, transforming or reshaping it into a suitable format, and then loading it into a target database or data warehouse for analysis and reporting purposes in data analytics and data engineering.)
Based on this information, the team can build an Ishikawa diagram of risks and downstream effects the data change could cause. Also known the “fishbone diagram,” this drawing will define each MVP as a diagonal “fishbone” of a known possible point of failure. Thus, the sprints or phases of the data remediation project will focus on eliminating risks at each possible cause-and-effect juncture.
In the real-life scenario, the data governance team aimed to remediate around 100,000 records.
In any “MVP as risk reduction” project, the process continues on like this until the project reaches the fish’s head, which is the point when the whole data volume rolls out to production with full confidence.
Data science is an ever-evolving field, and the perceived fluidity of its main subject—information—shouldn’t fool people into thinking it can’t be practically divided into value-adding MVPs. When performing Agile transformation and delivery of Data & Analytics projects and programs, we must resist counterproductive urges:
Clients want to see value before a project goes to production; that’s an understandable desire and a key value of Agile methodology. Like software projects, data projects still have iterations. Like water, data projects have phases. It just takes a bit of adjustment and, yes, agility to define them. Maybe you can’t cut water—but you certainly can cut ice.
Oleg Kazantsev is a Senior Management Consultant and Agile Coach for Launch's Digital Business Transformation Studio. With a background in Data & Analytics, he has a particular passion for and expertise in storytelling through data.
Written by Oleg Kazantsev
Agile delivery methods have, over the past 20 years, become firmly ingrained in the collective know-how of the technology world. It’s common knowledge that Agile allows small teams to be productive at scale, delivering value to a business when it’s most needed and absorbing risks and uncertainty like no other approach.
But proposing Agile methods to professionals in the Data & Analytics field is more likely to yield a resounding “Yeah…no.” Everyone agrees that agility is great, but many, many data people feel that Agile methodology just isn’t made for big data.
One of the big reasons for this methodology dysmorphia (as one might call it) is confusion around MVP. The Minimum Viable Product (MVP), central to the philosophy of Agile delivery, is a releasable and value-adding feature of a system that can be packaged and rolled out to the end-user at the completion of an iterative delivery cycle.
Agile originated in the world of application-centric startups, so an MVP is usually illustrated by Agile coaches as some widget, enhancement, or new capability. It’s an appetizer that a server brings to the restaurant table so the customers can bite into something while they wait for the main course. Cutting up a large project into mini finished ‘courses’ like this keeps everyone from getting impatient and hangry.
When a similar gastronomic metaphor was used in a recent Agile class for a Data & Analytics division of a major financial institution, everyone nodded. Then, one person raised a hand.
“But what we deal with here is DATA,” they said. “It’s more like juice. Or Wine. Or water. How do you cut water?”
Bullseye. That’s the MVP problem for data in a nutshell. It’s hard to define segments in data—there’s no clear development or test stage. It all overlaps. Contrasted with software features, data and information are much less compartmentalized into usable objects with a clear human-centric function.
Does creating and standardizing a new staging table on the backend constitute an MVP? Is a large data load an MVP? Or should a consulting team just divide a linear, waterfall-like project into multiple two-week sprints, skipping the concept of MVP altogether, and call it “as Agile as it gets?”
Let’s explore the fine art of water-cutting (that is, how to define MVP in Data & Analytics) using three real world examples.
Here’s the use case: A large financial institution is looking to give its multiple divisions equal access to the troves of accumulated data from in a central location—ideally, a cloud. How should they divide such a large cloud transformation into meaningful MVPs?
Answer: When the scale and volume of data transformation are the main challenge, consider dividing the data journey itself, using the concept of Medallion architecture.
Medallion, coined by DataBricks, divides the data journey into three stages:
In standard Agile, a minimum viable product is something that can be rolled out to the end user (usually, the application consumer). So, is Gold the only possible MVP? Not necessarily. To divide the data journey into MVPs requires a strong understanding of the value of data organization-wide. Consider this:
• Large companies often act as raw data merchants for smaller, solution-centric startups. Raw but centralized data (such as event-streaming logs from Kafka or user interactions from social media sites) is a valuable resource for these transactions. This could constitute a definition of MVP for Bronze data.
• Aggregated, standardized data may be used by different divisions of the same company to enable their own software development lifecycle activities. For example, schematized (even if not business-ready) data could be used for machine learning of AI algorithms. It may also be consumed by other in-house applications and used as test data in lower environments. All of these let us define Silver data as MVP.
• No such complexities exist for the Gold data as MVP. It’s business-ready and thus constitutes the default definition of MVP (and, in fact, that of the final deliverable).
Here’s the use case: A risk management unit produces and analyzes multiple reports from various upstream sources and rolls them out to regional and global directors via a business intelligence application. How can they start delivering value early?
Answer: When an existing business process undergoes digital transformation, automation, and enhancement, consider delivering value end-to-end, one expertise area at a time. In that way, each end-to-end (E2E) delivery of a unique report or widget forms an MVP.
Critical to this approach is a clear understanding of business processes. Once it’s known how the reports and widgets can be thematically grouped, they can be divided into data swimlanes. Sometimes, a complex report with elaborate filtering will form a swimlane of its own; sometimes, multiple minor widgets drawing data from the same upstream source could be united into a single swimlane.
Why the term “swimlane?” Because a competitive swimmer always finishes last if they jump lanes or failed to finish their distance, no matter how well they swam. People need to keep to their lane.
What to consider when you define MVP by data swimlanes?
Here’s the use case: A stale data segment (meaning one that hasn’t been used or evaluated in a while) was identified and needs remediation. The data governance team hopes to audit the data with business owners and then roll it out to production in batches. How does the client ensure that all involved systems can consume the remediated data without errors? How does the client eliminate errors/issues in user-facing applications after the remediation?
Answer: When structural changes are not requested, eliminating or reducing risks in the data rollout adds value. Risk reduction builds confidence in company data and data-driven decisions, which is the basis of data governance.
Using risk reduction as an MVP first requires a clear map of business processes and data consumption within the organization. The team should also have a realistic understanding of the volume of impacted data compared to the systems’ ETL bandwidth.
(ETL stands for Extract, Transform, Load. It refers to the process of extracting data from various sources, transforming or reshaping it into a suitable format, and then loading it into a target database or data warehouse for analysis and reporting purposes in data analytics and data engineering.)
Based on this information, the team can build an Ishikawa diagram of risks and downstream effects the data change could cause. Also known the “fishbone diagram,” this drawing will define each MVP as a diagonal “fishbone” of a known possible point of failure. Thus, the sprints or phases of the data remediation project will focus on eliminating risks at each possible cause-and-effect juncture.
In the real-life scenario, the data governance team aimed to remediate around 100,000 records.
In any “MVP as risk reduction” project, the process continues on like this until the project reaches the fish’s head, which is the point when the whole data volume rolls out to production with full confidence.
Data science is an ever-evolving field, and the perceived fluidity of its main subject—information—shouldn’t fool people into thinking it can’t be practically divided into value-adding MVPs. When performing Agile transformation and delivery of Data & Analytics projects and programs, we must resist counterproductive urges:
Clients want to see value before a project goes to production; that’s an understandable desire and a key value of Agile methodology. Like software projects, data projects still have iterations. Like water, data projects have phases. It just takes a bit of adjustment and, yes, agility to define them. Maybe you can’t cut water—but you certainly can cut ice.
Oleg Kazantsev is a Senior Management Consultant and Agile Coach for Launch's Digital Business Transformation Studio. With a background in Data & Analytics, he has a particular passion for and expertise in storytelling through data.