The average business has the potential to collect a lot of data. Business operations generate data, and the more things that a business does, the more data that they'll generate. Internal processes can generate a lot of data, but external processes collect data as well.
Businesses generate data, but they might not necessarily collect all the data that they generate. Unless a business is performing some sort of data analytics, they don't really have much of a reason to collect a lot of data.
Of course, when a business implements a BI solution, this all changes. At that point, it becomes beneficial to collect all the data that a business possibly can. The more data that a business has access to, the more powerful their data analysis tools will be.
All that data can quickly become a problem, though. Once a business collects its data, it has to figure out a place to put it. Data storage and management is one of the biggest problems that businesses face when they start to use data in a larger way.
Many businesses have regulations or laws that limit how they can access or store some of their data. For the most part, businesses are required to store customer data in special ways. In some industries, like law or health care, there are even more laws that govern what businesses can and can't do with people's data.
Given these limitations, and the scale of the data being stored, there are many different types of data storage solutions. Businesses need to do their research and figure out which strategy is best for them, so that they can properly leverage the data they collect.
In the past, the most common method that businesses used to store their data was physical, on-premise storage. Businesses would download their data to servers on their network, and if someone wanted to access that data, they'd need to access that server.
Many businesses still use on-premise storage, but it's becoming steadily less common as cloud-based business solutions become more widespread. Just like any other approach, it has its strengths and weaknesses that make it a good fit for some, and a bad choice for others.
A lot of businesses that use on-premise storage do so out of inertia. They've always stored their data like this, and they often don't even know there are other ways. They rarely have any data governance procedures in place, and anyone with access to the server can edit or delete data.
This is the worst-case scenario with on-premise storage. In situations like these, it's very hard for a business to really leverage their data. In general, most businesses that are using a legacy on-premise solution would be better served with a different strategy.
On-premise solutions have other limitations. They're usually the hardest type of storage solution to connect to. It's much harder for a cloud-based BI solution to connect to an on-premise database than it is for it to connect with another cloud-based tool.
They're also the least convenient. They're hard to access remotely, usually require some sort of technical staff to maintain, and rely on that technical staff to fix issues. If the server goes down, there's a lot of pressure on the IT team to get it back, and there's a much larger risk of losing the whole database in something like a server failure.
However, there are reasons that a business would specifically want to use an on-premise data storage solution.
First, on-premise solutions are the most secure. Cloud-based strategies aren't insecure, but they do involve a third party accessing and storing your data. With an on-premise solution, a business can be sure that they're the only people with access to their data. Businesses that have special requirements that govern data access might have to use an on-premise solution.
Second, on-premise storage is cheaper than other strategies, if the business already has the storage infrastructure for it. A server's running cost is usually cheaper than the recurring costs of a paid, cloud-based solution. However, if a business has to buy the storage to implement their on-premise solution, it's not as cost-effective.
Cloud-based data warehousing
Within the last few years, most businesses that store large amounts of data have started to shift away from on-premise solutions, and towards cloud-based ones.
There are cloud-based storage vendors that cater to every size of business, but most businesses who need a solution like this will look at the largest, enterprise-scale vendors like Amazon Redshift or Snowflake.
Cloud-based solutions operate completely on the cloud. When a business collects data, that data gets uploaded automatically to their cloud-based storage. If a business wants to access that data, they need to access their cloud storage to find it.
These tools usually act as pure storage tools. They usually don't have any sort of data analysis or business intelligence features. Their only purpose is to act as a data warehouse, and facilitate data storage, access, and transfer.
Cloud storage has a few distinct advantages over other options. Most businesses that need to store massive amounts of data use some sort of cloud storage solution, since it's the strategy that's best suited for big data and enterprise operations.
First, businesses don't need to invest in any physical data architecture of their own. Since all the data is stored on their cloud service, they don't need any servers or hard drives to store their data. This lowers the cost of data storage overall.
Second, uptime is much less of a worry. For the most part, a cloud-based solution will be online 24/7. Businesses don't have to babysit a server to make sure they can access their data at any time. Plus, employees can access their data remotely, as long as they have their credentials.
Third, cloud storage solutions are designed to handle massive data sets. When datasets get big enough, they start to outstrip the ability of less robust tools to handle them. This makes it very hard to actually access the data you want, since it takes forever to query your data using your tool. Enterprise cloud solutions, though, can handle data sets with millions or billions of rows.
In general, large businesses with large data sets prefer enterprise-scale cloud-based solutions over other storage strategies. However, some aspects of cloud-based data warehousing make them less effective for other businesses.
Businesses that have specific data governance regulations often can't share their data with a third party in this way. Some vendors do comply with some regulations, but it'll be on a case-by-case basis. Businesses usually can't store extremely sensitive information like medical records using a cloud-based app.
Smaller businesses usually won't need enterprise-scale storage solutions, and it'll be cost-prohibitive for them to pay enterprise prices for them. These sorts of solutions are usually fairly expensive.
For example, Snowflake charges 23$ per month per terabyte of storage if a business pays up front, and 40$ per month per terabyte for pay-as-you-go storage. There are other solutions that are much cheaper; businesses need to carefully weigh the benefits of paying a premium for their storage.
Cloud-based solutions tend to be hard to implement, technically speaking. Enterprise-scale solutions like Snowflake usually require a technical support team to implement, which makes them somewhat inaccessible to those without technical staff. There are solutions that are simpler to implement, but don't require any on-premise work, either.
BI tool as data warehouse
Many businesses exist in a middle zone where they're large enough to need a unified data storage solution, but too small to effectively leverage standalone data warehousing tools. Data storage options for these businesses need to be robust enough to handle large amounts of business data, but cheap enough to make financial sense.
For these businesses, it often makes a lot of sense to use a BI tool as a data warehouse. Since a BI tool needs to access business data and store it anyway, it's a good move to just load all your business data into your BI tool and use it as a storage solution.
This has the added advantage of making more data accessible for analysis. A BI tool can only perform analysis on data that it can see. If it's acting as an organization-wide data warehouse, then it'll be able to analyze any data set.
It's also much easier to connect data sources to a BI tool than it is to connect to other data warehousing options. Most BI tools offer simple-to-use connectors with common business systems, which allow this data to flow right from one tool to another. Some tools also allow users to pipe on-premise data like Microsoft Office files right into the data warehouse.
The value of this strategy will vary from tool to tool. Some tools let their users store more data for pretty minimal fees, while others make their customers pay a premium for more data storage. At a certain point in any tool, it'll start to get too expensive to store more data in it. That's the point where investing in a standalone data warehouse is a good idea.
BI tools also have the same data governance problems as other third-party applications. There are some workarounds, like federated connectors, but usually, businesses with special rules on how they can store data can't use a BI tool as a data warehouse.
There are many different strategies that businesses can use for storing and accessing their data. These are just three of the most common solutions; businesses might want to use strategies not listed here, or a blend of the strategies we talked about.
Regardless, businesses that have reached a certain level of data complexity need to implement some consistent data storage solution. With an effective data storage solution, businesses can effectively leverage all of their data towards insight.