The data moat - the best way for an AI startup to defend its technology.

Noy Shulman
3 min readFeb 8, 2021

The previous post in this series discussed the problem with trying to create a technological moat in an AI-based startup. Today we will discuss the next question that arises — Can a startup that bases its technology on deep learning create a moat nevertheless?

A short recap of last week’s post:

  1. Deep tech startups try to protect their product by basing it on cutting edge technology that is hard to duplicate. We referred to this as a technological moat.
  2. Novel AI algorithms are being invented frequently and are published for public use as open-source code.
  3. The second point makes the first one very difficult to achieve.

So how can a company build a moat around an AI-based product? The answer is data.

To better understand why data is the answer, it will be helpful to understand what made the AI renaissance we are experiencing possible. The significant improvements we saw over the last few years in AI technology happened due to three major causes:

  1. The advancements in computing power.
  2. Development of better Deep Learning algorithms.
  3. Data. Huge amounts of data being collected, stored, and labeled.

The last point is the one most relevant to us. In contrast to other types of software products, creating an AI model consists of more than coding. The model has to be trained using labeled data. As a simple analogy, coding the model is analogous to creating an artificial brain. Training the model is analogous to teaching that brain how to perform a certain task. The artificial brain (artificial neural network) is close to useless without training.

While the architecture of the artificial neural network is easily replicable, a proprietary dataset can give a company a competitive advantage and serve as a barrier to entry — the data moat.

Generally speaking, the data used for training has a more significant impact on the performance of the algorithm than the architecture of the neural network. Data collection creates a positive feedback loop for a product. This feedback loop is demonstrated in the following diagram:

A company deploys an AI model as a part of its product. If done correctly, the product should also collect data. That data is passed back to the developers that can use it to retrain the model, making it better. A better model will improve the product which will hopefully lead to more users and the cycle continues.

The data moat actually digs itself. A good example of this can be found in the arms race for an autonomous vehicle. Tesla was one of the first companies to develop and deploy semi-autonomous vehicles. The electric vehicle manufacturer deployed these systems into their cars and consequently collected enormous amounts of data in order to improve their system.

Data collection is not a trivial task. Using the same example, the amount of data that is gathered by a vehicle’s sensors is enormous, even by today’s standards. Sending and storing all this data is extremely costly. The challenge is to find the best data, store it, and use it for training.

Improving data collection is an iterative process. This process isn’t only about the amount of data but also its quality. The training phase will define the requirements for the collection phase. Let’s say our autonomous vehicle has trouble recognizing electric scooters when they drive on sidewalks. Training the algorithm using data that has scooters on sidewalks will improve this problem. Hence, the data collection effort has to collect these examples.

The data collection mechanism is refined and improved with each iteration. After a few iterations, this process will produce great data that is tailored to the company’s needs. The company that has the best data will have the best AI algorithm which will lead to more people using it which leads to more data, and thus the data moat is complete.

To sum everything up, an AI-based startup shouldn’t create a moat by inventing a new special neural network architecture but by building a smart data collection mechanism that will be integrated with the product from day one.

Thanks for reading!

— — — — — — — — — — — — — — -

Join my newsletter for updates: The Intersection

Keep in touch on LinkedIn: Noy Shulman

Check us out at The AI School For Leaders



Noy Shulman

A Data Scientist and AI algorithm researcher. My expertise is helping companies build an AI strategy.