What prevents companies from filtering the massive amounts of data they consistently collect?

As far back as 20 years was the first time where I was responsible for attempting to solve this problem. Organizations don’t understand the full life cycle requirements of all of their data and how to maintain ongoing standards for simple things like, is this a company confidential document or not? What if somebody's role changes? If it's company confidential, is it company confidential with a one-year save or a seven-year save? When does it stop being confidential? And that's just one aspect of information collection in an organization. The other part of the problem is that when I attempted to put tools in place to save spindle space, I end up spending three times more saving this space than I saved in actual money on disc space. And that problem still persists for many people today. Why would you ever delete an email when it costs you more in time to delete one email than you'll ever make up on a spindle in space savings. That problem permeates the industry. It just means everybody keeps everything.

Anonymous Author
As far back as 20 years was the first time where I was responsible for attempting to solve this problem. Organizations don’t understand the full life cycle requirements of all of their data and how to maintain ongoing standards for simple things like, is this a company confidential document or not? What if somebody's role changes? If it's company confidential, is it company confidential with a one-year save or a seven-year save? When does it stop being confidential? And that's just one aspect of information collection in an organization. The other part of the problem is that when I attempted to put tools in place to save spindle space, I end up spending three times more saving this space than I saved in actual money on disc space. And that problem still persists for many people today. Why would you ever delete an email when it costs you more in time to delete one email than you'll ever make up on a spindle in space savings. That problem permeates the industry. It just means everybody keeps everything.
0 upvotes
Anonymous Author
If you think about product design, this is where it really comes up. You have a lot of manufacturers now that want to go direct to consumers. In order to do that, they have to bring the consumer's opinion (their input, their stream of consciousness) into the earliest point of ideation, engineering, design. And they want to keep that loop going all the way through the manufacturing and production process. In order to do that, you have to have access to your shop floor systems, your product life cycle system, your ERP, all your inventory, your supply chain management system, all the big back-office stuff. And what keeps coming up in my world over and over again is, we ultimately want to run AI, but we can't get there until we start doing streaming. And we can't do streaming because we have to parse out the parts of the data that are operational to a machine talking to a machine, and are operational too a supplier dumping a recipe on a machine. Or alternatively, when a customer is monitoring a line we want to control what they see, because if they start seeing the quality numbers, even though we're catching them, we don't want that to impact our contract and lose what could be a huge order.  Now, they look at the stuff that's on MQ and the fire-and-forget paradigm and say, "Well, we can get rid of this, but wait, can we get rid of it?" Because what if the customer wants to pick that up off the line? Or what if the supplier needs to know that that piece of data is intrinsic to the next shipment coming into the factory of inventory on jelly bean parts. So we don't know what we don't know. How do you filter that? And how do you then start getting rid of what you really don't need? Where's the sunset between, I need it now, I need it for comparison against historic, but my historic is two minutes ago, so can I dump my historic, replace it with my current, and do I get into this iterative cycle of constantly cleansing and refreshing data? How does that impact my insight? It also translates into where does that Edge device have to be, to be able to be refreshing that quantity of data.
1 upvotes