I come across a lot of people who proudly proclaim that they are not “data people.” They avoid spreadsheets, they hate columns of numbers, and they claim to get confused easily amidst it all. I’m here to help them all understand – data is your friend and everyone is a data person.
Let’s start with a simple clarification about what “data” is. Data is simply information. It doesn’t have to be a million line spreadsheet, it can be the text of an email. Data is any recorded and referencable piece of information. That’s it. If you go through your email for the number of times you were asked a question, you are doing data analysis.
The common misunderstanding with data is that you need to know everything about Excel to be able to be a data person. Here there is a misunderstanding of the difference between raw data and formatted data. Anyone can work with formatted data but raw data is a different animal.
Raw data is that information which comes in that hasn’t been cleaned, checked, validated, or organized. This process of turning raw data into formatted data is not something that anyone should do. You have to understand the original intent of the data, understand relational data standards, and generally be comfortable inside of data tools. This is a specialized activity.
After the data is formatted, it’s now anyone’s to work with. At this point, working with data largely comes down to asking questions and using the data to answer those questions.
The basic skill set of many jobs can be boiled down to “knowing what questions to ask and getting the right answers.” Those answers may come from experience, reading tea leaves, interviewing other experts, or (most commonly) analyzing the data. If you know what questions to ask, you are 75% of the way to being good at working with data.
Many people believe that an answer with 99% confidence is better than one with only 70% confidence. On the surface, with no additional information, maybe this could make sense. But the world of trade-offs almost always makes the 70% solution better.
Trade-offs occur in every decision. Moving any decision from 70% confidence to 99% requires time and complexity. Time is a non-renewable resource that we can never get back. Delaying a decision to increase confidence can often cost a lot of time and only yield false levels of new confidence. Complexity is similar, the more complexity involved in a decision, the more likely an error exists somewhere in the assumptions.
One of the great lessons I’ve learned in my career is that 70% confidence is often enough to move forward with. Get the next 10/20/30% confidence from real life experience and feedback. If you spend time modeling and trying to get everything perfect for release, key opportunities will pass you by.
The best engineers understand this rule. Poor engineers will strive for 100% and take the time to try and get there. This wastes the time it takes and the additional confidence is often false because of the new complexity. But they now have a thick document to fall back on and defend full of assumptions that they can use to justify any change outside of their expectations.
“Set it and forget it” is a popular saying on many late night infomercials. Take some new cooker, throw your food into it, push a few buttons and then a few hours later you have amazing gourmet meals with no effort. At least that’s the theory.
In the business world, many people have begun treating their algorithms the same way. They create these elaborate rules for metrics, benchmarking and scoring that will assess a thousand variables to come up with the perfect rank. The best will even apply the probability curves around the score that is generated. Today they may even give results that make sense.
Time is fickle. As time passes, conditions change. The rules that governed a process no longer apply because people begin moving back to cities or technologies change the way that work is done or home officing continues to pick up or local policies change the way that financials are calculated. Something always changes.
But this change is often not handled well in algorithms. Often, the team that builds them puts a pin in them and then moves on to the new shiny toy letting the old one run with no supervision. What this really means is that there is no one around to catch it when it stops returning valid answers. To a layperson it may seem like good numbers – everything worked, the data is all there, the results are consistent with what was previously calculated – yet now the answers are no longer statistically valid for some reason.
Shelf-life is a mandatory concept within the perishable food space. It should also be a concept within the data science space. Data can go stale over time much like algorithms can no longer be applicable.
It may seem unnecessary to point out, but we are living in the Information Revolution. It’s spurred on by the increase in digital communications but the simple fact is that our world is all about harnessing information for the maximum return. If Return on Information could be accurately calculated it would likely be the new #1 metric that every business measures itself on.
- Fast food menus are driven by the purchasing behavior of customers. To maximize profits it is necessary to use information to both optimize menu options and prices – ideally by location. A $1 fry may be successful in Georgia but less so in California.
- Brick and mortar retail is size dependent, sometimes big stores are ideal and sometimes smaller stores are. If you are not able to harness your customer intelligence to know which is right for you then you have a 50/50 shot (or less) of guessing right.
- Locating a corporate HQ is dependent upon the labor in the market and the trends for competition for your target skills as well the growth/decline of that skill in the area. Your decision 20 years ago to locate somewhere may no longer be an optimal solution even if it still seems so on the surface.
- Your retail partners, distributors, customer locations, product mix and inventory levels define the optimal supply chain. The mix of all the above likely changes (beyond some threshold) every 3 months or more. How are you using information around each to model the conditions that necessitate a change in location strategy – even if it is simply changing where inventory is pre-positioned. Sometimes having empty space in a warehouse today is the best long-term cost avoidance option.
Information drives everything about real estate. Knowing what is in your lease contract, a given landlord’s financial drivers, the macro and micro characteristics of the market, the labor pool you are trying to tap, future business plans that could impact the decision 3 to 10 years down the road….all of these need to be brought together to optimize any given real estate decision. There’s a lot that can go into a given decision but that doesn’t mean all of it needs to go in. Overkill is a real problem in analysis.
All this to say: invest in knowing how to harness and use information in your real estate decisions. It doesn’t have to all be some fancy, expensive technology (although that may be a component) but it does need to have a rational and consistent approach that meets your needs.
I am a big believer in the process of active self-reflection. To really improve yourself you must look closely at yourself and see the good, bad, worth and ugliness. We are all made up of grey areas. There are parts to us that are sometimes good, sometimes bad. If we don’t understand when it moves from good to bad we won’t know our limits. If we don’t know our limits we are likely to put ourselves (or others) into situations that are not going to turn out how we would want.
Self-reflection is difficult because it forces us to try to understand our own motivations and actions. For myself, it often causes quite a bit of dissonance between the way others talk about me versus the way I see myself. I know myself as flawed and in need of improvement but they may see someone trying hard to do the right things even if something doesn’t work out as expected.
The active piece is just as important in the process. All of us do self-reflection when we screw up or something goes wrong. We try to make sure we don’t repeat the bad thing that just happened. But we should also be reflecting after a random phone call to mom to make sure we are being the right son for the moment and considering the impact of our words in a mundane situation.
This is no different than being a great salesperson but just doing it all the time. The best salespeople I have ever seen know the impact of every word that they use. They know exactly how the person they are meeting with will likely respond to a certain phrase or providing pricing too soon or the way they dress for the meeting or the way they hold their arms. That salesperson has reflected on the impact of every single one of their actions to make themselves as effective as possible.
This is the same concept but taken more broadly. What you’ll find is that as soon as you really begin to understand yourself and your own motivations, you will also begin to better understand others and their motivations and reactions. It’s a pretty cool circle to experience.
My couple of years of professional work was spent doing AutoCAD for all kinds of industrial sites and office layouts. I got very good at quickly going through lots of iterations to give project managers a lot of choices for what they could do. I did not do a very good job at really understanding what my drawings represented in the real world.
As time went on I actually got to live and experience the sites that were built from my drawings. I got to understand the importance of thinking about the number of power outlets on empty walls and the difference between a 40 inch hallway and a 48 inch hallway. In time I even came to understand the each seat in my plan was going to be the full time home of another person for quite a long time.
It is amazing to me how the up-front process can go so quickly that no one has time to really think through all the above on small to medium projects – sometimes even really large projects. Hindsight being what it famously is, it is important for us to take time after the fact to see what we can do to come back in and further improve things. Go live day does not mean that nothing else can be done later. More is always possible.
There are many instances where numbers refuse to cooperate and add up to the correct penny. It can be a very frustrating experience when you are trying to create a file that can be validated and audited. Validating numbers that do not add up right is always a test.
The first thing you must do is identify why the numbers are not adding up correctly. Is it a data issue? Associated with a particular location? Rounding issues that impact the equations? Incomplete information? Incompatible data? There are any number of reasons that it won’t happen. If you cannot identify the reasons why the data is not adding you will never be able to convince someone else of the issue. Showing the causes of the error is step one.
After you show how the error was introduced you need to show the error inherent in the rest of the data you are showing. Should someone take the numbers at face value or are they at +/- some percent from what is shown? Once you convince people of the overall issues you then need to then convince them that the data you are giving them is trustworthy. Proving the current data is step 2.
Step 3 is the hardest – communicating what is going on. This is where most people go wrong. I have seen many a good analyst fall apart in the face of communicating bad data. They try to distance themselves from it instead of owning the work they spent getting it as close to correct as possible. When data is not perfect people want to know that they can trust the person who put it together. If that person does not own it there is often no way to recover even if everything else is done right.
Confidence goes a long way whether in sales, data analytics or solution development. You need to convince people that you have confidence in what you put together as well as convince them that the basis for any recommendations or progress is reliable given a set of assumptions.