Everyone can be a data person – that includes you!

I come across a lot of people who proudly proclaim that they are not “data people.” They avoid spreadsheets, they hate columns of numbers, and they claim to get confused easily amidst it all. I’m here to help them all understand – data is your friend and everyone is a data person.

Let’s start with a simple clarification about what “data” is. Data is simply information. It doesn’t have to be a million line spreadsheet, it can be the text of an email. Data is any recorded and referencable piece of information. That’s it. If you go through your email for the number of times you were asked a question, you are doing data analysis.

The common misunderstanding with data is that you need to know everything about Excel to be able to be a data person. Here there is a misunderstanding of the difference between raw data and formatted data. Anyone can work with formatted data but raw data is a different animal.

Raw data is that information which comes in that hasn’t been cleaned, checked, validated, or organized. This process of turning raw data into formatted data is not something that anyone should do. You have to understand the original intent of the data, understand relational data standards, and generally be comfortable inside of data tools. This is a specialized activity.

After the data is formatted, it’s now anyone’s to work with. At this point, working with data largely comes down to asking questions and using the data to answer those questions.

The basic skill set of many jobs can be boiled down to “knowing what questions to ask and getting the right answers.” Those answers may come from experience, reading tea leaves, interviewing other experts, or (most commonly) analyzing the data. If you know what questions to ask, you are 75% of the way to being good at working with data.

Advertisements

Most algorithms cannot be set and then forgotten.

“Set it and forget it” is a popular saying on many late night infomercials. Take some new cooker, throw your food into it, push a few buttons and then a few hours later you have amazing gourmet meals with no effort. At least that’s the theory.

In the business world, many people have begun treating their algorithms the same way. They create these elaborate rules for metrics, benchmarking and scoring that will assess a thousand variables to come up with the perfect rank. The best will even apply the probability curves around the score that is generated. Today they may even give results that make sense.

Time is fickle. As time passes, conditions change. The rules that governed a process no longer apply because people begin moving back to cities or technologies change the way that work is done or home officing continues to pick up or local policies change the way that financials are calculated. Something always changes.

But this change is often not handled well in algorithms. Often, the team that builds them puts a pin in them and then moves on to the new shiny toy letting the old one run with no supervision. What this really means is that there is no one around to catch it when it stops returning valid answers. To a layperson it may seem like good numbers – everything worked, the data is all there, the results are consistent with what was previously calculated – yet now the answers are no longer statistically valid for some reason.

Shelf-life is a mandatory concept within the perishable food space. It should also be a concept within the data science space. Data can go stale over time much like algorithms can no longer be applicable.

Information and Data is the basis of everything real estate related, be sure you invest resources in it.

It may seem unnecessary to point out, but we are living in the Information Revolution. It’s spurred on by the increase in digital communications but the simple fact is that our world is all about harnessing information for the maximum return. If Return on Information could be accurately calculated it would likely be the new #1 metric that every business measures itself on.

  • Fast food menus are driven by the purchasing behavior of customers. To maximize profits it is necessary to use information to both optimize menu options and prices – ideally by location. A $1 fry may be successful in Georgia but less so in California.
  • Brick and mortar retail is size dependent, sometimes big stores are ideal and sometimes smaller stores are. If you are not able to harness your customer intelligence to know which is right for you then you have a 50/50 shot (or less) of guessing right.
  • Locating a corporate HQ is dependent upon the labor in the market and the trends for competition for your target skills as well the growth/decline of that skill in the area. Your decision 20 years ago to locate somewhere may no longer be an optimal solution even if it still seems so on the surface.
  • Your retail partners, distributors, customer locations, product mix and inventory levels define the optimal supply chain. The mix of all the above likely changes (beyond some threshold) every 3 months or more. How are you using information around each to model the conditions that necessitate a change in location strategy – even if it is simply changing where inventory is pre-positioned. Sometimes having empty space in a warehouse today is the best long-term cost avoidance option.

Information drives everything about real estate. Knowing what is in your lease contract, a given landlord’s financial drivers, the macro and micro characteristics of the market, the labor pool you are trying to tap, future business plans that could impact the decision 3 to 10 years down the road….all of these need to be brought together to optimize any given real estate decision. There’s a lot that can go into a given decision but that doesn’t mean all of it needs to go in. Overkill is a real problem in analysis.

All this to say: invest in knowing how to harness and use information in your real estate decisions. It doesn’t have to all be some fancy, expensive technology (although that may be a component) but it does need to have a rational and consistent approach that meets your needs.

Nate Silver was right, saying otherwise is just misleading.

This one is a little late given that the election was a month ago but I still think it is worthwhile given the opinions I still hear about his performance.

I’m walking out of this election with an increased respect for Nate Silver and the work he is doing at fivethirtyeight.com. Statistics and prediction modeling is hard. Even that is an understatement because anytime you try to predict the future – even tomorrow – it’s more likely that you end up slight wrong that completely right.

A big portion of my job involves trying to understand the impact of decisions today on the business tomorrow. If we build out an office for 40 people today, what is the likelihood we have to close or expand it in 3 years? What is the likelihood that we can support 50 people in the same place without redoing the furniture? Is this city still going to be the right location for this function based on both business and geographic trends?

Nate Silver took a beating the 2 weeks leading up to the election. He was consistently and regularly called out for being too optimistic about Donald Trump’s chances of being elected president.

Why do I respect Nate Silver more today than before? Because he understands the single biggest rule of data analysis: Garbage in, Garbage out. If ou have any questions at all about the quality of the data you are being given to use it is your responsibility to account for that fact and note its potential impacts. Saying that you were wrong because the data was wrong is exactly the wrong answer because the follow-up question is “did you have any reason to suspect the data was wrong?” and any answer other than “yes” makes you incompetent in this case.

Data is fickle and many people think that data itself provides an answer. But if that was really true then IBM’s Watson would have taken over the planet already. Real intelligence is in being able to understand what data means. Where is it best applied, where should it not be applied at all, where is it misleading, where is it incomplete, where is it biased, what conditions could lead to a change in trends…. There is an art to being able to actually deal with data.

It does not surprise me in the slightest that most polling aggregators this year showed Clinton at a 98% chance of winning the election. The data seemed to reflect that. Relying on the cold hard numbers would point almost any model that direction. But this simply proves that some people are better at this than others and you should never trust any model until you sense check its approach, its strengths and its weaknesses. It’s like restaurants. The aggregators who gave 90%+ chance to Clinton are fast food, they throw everything in and give you a generic burger. Those that talked a lot about uncertainty are actually chefs, they know what to do with the raw food they are handed and throw out the worst and make the best sing before the plate goes out.

Small sample sizes and their influence on decisions.

If you follow sports you are likely familiar with the concept of small sample sizes.  Every hitter in baseball will occasionally have an 0 for 5 day with three strikeouts.  If looking at the one game you might come away with the wrong impression of the hitter.  Sometimes a great hitter will have a month where these games consistently happen.  If you focus on that month you will miss the fact the he usually has a month well above average that makes up for it.

In real estate small sample sizes are everywhere.  Do you really think that comps are a completely accurate representation of a market?  Depending on the size of the market and how they were selected, these comps may actually be selected and reflect a trend counter to other market forces.

What defines whether a dataset is “small”?  This is a tough question because even a dataset of millions of records could be small depending on what it is trying to reflect (Amazon sales trends would be an example).  Similarly a dataset of 10 points could be a large dataset for a fairly rare occurrence.

Unless you are dealing directly with statistics the issue of sample size is rarely addressed.  I’ve experienced people throwing “facts” out based on a self-selected dataset more often than I care to remember.  People most often buy into small sample size sets because it is 1) difficult to get more or different data and 2) reflects the reality they expect to see.

A great example of this is in the sales world.  Sales people are in front of a certain type of buyer with certain approaches to a sales environment.  They may hear the same thing multiple times with multiple clients over a month and suddenly think the market has shifted on them.  The more likely reality is that the way their pitch is structured causes a similar response.  The salesperson hears a disconnect and assumes it is based on the market when in fact it is based on a reaction to their words.

Decisions are hard.  It requires a lot of self-reflection to truly understand the process you use in making decisions.  Are you really making decisions using good data or are you simply using data that reflects the result you were hoping for at the start?

The problem with predictions based on data.

Prediction is the art of taking information about the past and applying it to future circumstances to understand what is likely to happen.  It is premised on the fact that the future will follow the same or similar rules as the past.  Behavior is expected to remain largely consistent over time.

For many applications this works really well.  Purchasing and retail wouldn’t work well at all without prediction forecasting.  Population changes by geography are largely predictable.

But often knowledge of the past changes the future.  Knowing that they are losing the youth population to larger cities a smaller community may undertake initiatives to retain or attract population by offering incentives for businesses to locate there.  Or knowing competitor trends a company revises their business strategy to attract new customers.  The past changes the future in unpredictable ways.

The future also can vary from predictions because new information becomes available.  Who could have predicted the impact of the iPhone in 2005 before it was released?  It completely changed the course of several industries within 3 years let alone a full decade.

Predictions using data (especially the buzzwordy “Big Data”) sometimes feel like they are more certain than qualitative predictions.  This is often untrue.  No data set can completely encapsulate a scenario or situation regardless of how large or unique the data set is.

Risk and the new.  Those may not be the centerpiece of prediction but they sure better be included if the prediction is going to have any worth at all.

Measuring for the #workplace of the future and the issue of “anonymous” data.

Workplace standards, design and tracking are hot topics.  Everyone wants to make the most with the least amount of space possible.  But the understanding of workplace is quickly beginning to overlap with employee concerns about their own privacy.

This brings us to the Quartz article where London’s Daily Telegraph installed sensors on desk chairs to track utilization.  This action is one that probably 50+% of all the companies I talk to are considering.  How else can you figure out whether a given desk is actually in use on a regular basis?

The issue with sensors is whether data can truly be anonymous.  The reality is that a given seat is either assigned or commonly used by the same person regularly.  Maybe they work from home three days a week (on average) and their boss is on the fence about it.  By measuring this way you are going to get false-positive utilizations by people that are seeking to game the system and keep their seat.  You are also going to create a lot of nerves by people believing that the data will be used against them somehow.

Reality is that there are enough managers in this world that would mis-use this data.  They would want to analyze their particular team’s seat usage patterns to find trends (whether real or not).  But that’s not how these are meant to be used.  I

All in all this creates a difficult overlap.  You can’t reduce space without knowing how space is used but by measuring with sensors you create an artificial environment in which you may not be able to completely trust the numbers.  (If I wanted my desk to occasionally seem used daily I’d be asking a favor from my neighbor to spin my seat every 30 minutes.)

Just something to think on.  It doesn’t make these sensors useless and it doesn’t mean other methods would work better.  Each company needs to analyze their own culture to determine the best method for them.