Minimum Actionable Dataset
By this point, all thinking humans have come to accept that there are ethical problems with the current ungoverned way businesses, governments and organizations capture, store, use and monetize data.
I'm not going to debate the ethics themselves, or suggest any systemic or social methods for addressing the ethical failings that have created this problem (other than to suggest that ethics really needs to be a required course in both CS programs and MBA ones). Instead I'm simply going to suggest a new philosophical guideline.
Introducing the Minimum Actionable Dataset (aka MADs)
When defining the dataset your application, site or service is going to capture, store and use - start with the minimum set of data needed to make critical product and business decisions. Only add additional data when it is required to make these decisions.
While it is no longer acceptable to capture any and all data, and then later on figure out which stuff is valuable and which you want to analyze and use -- it's also unrealistic (and stupid) to not use data. We need the data we need to make good decisions on what customers and users want, value and need out of our products and services. We need the data to be able to run our businesses effectively, responsibly, sustainably and professionally.
A little analogy....
Lazy developers, when working with a relational database, will often do a "SELECT * FROM" query in order to pull all data into the app, and then manipulate, sort, find the needed data from within that full set within the app logic itself.
This is a Bad Thing.
It is equally bad (and lazy) to do this exact thing with your data capture.
Defining Your MADs
One of the reasons why I identify laziness as a root cause is my acceptance that for many developers, it's easier to simply capture all the data than define the Minimum Actionable Dataset.
Defining this dataset is a collaborative activity - and is one that involves multiple points of view and individuals within the organization. And that is messy, interpersonal and difficult. You need to define the data that is required by Product to make appropriate decisions around customer and user needs, desires and wishes. This obviously should be defined by Product. You need to define the data that is required by the Business to make appropriate decisions around business strategy, tactics, pricing, capitalization, partnerships, etc etc etc.
So you just capture the requirements from both those stakeholders, merge them and voila, right?
No. Wrong.
This is how we got to the "capture it all and let god sort it later" mentality. Both Business and Product have a vested interest in raising the threshold of what is considered "minimum" in this case. So you'll need a counterbalance. You'll need someone who is the "keeper of the threshold."
Most companies do not have a Head of Privacy or a Head of Data Policy. If you do... this is your counterbalance and this person should be part of the decision-making and should have veto power (though, obviously, they can if needed be over-ruled by the CEO). In this context, their role isn't to "protect the user" or "keep us in legal compliance" as much as it is to "defend the minimum threshold."
In other words... their job in this process is to be the one who challenges every new data point captured and requires a justification for why the business and the product couldn't be run effectively without that data.
Why MADs?
The tech industry has lost the trust of our customers and our users.
And we have not - and are not - stepping up to take the lead on fixing this.
We have to. And quickly.
If we don't.... Winter is coming (and by Winter... I mean Regulation).
It's time.