What is the Cost of Maintaining the Correctness of a GenAI Service? (Tech Strategy – Podcast 268)

Facebooktwitterlinkedin

This week’s podcast is about estimating the costs of providing GenAI products and services. And this is changing.

You can listen to this podcast here, which has the slides and graphics mentioned. Also available at iTunes and Google Podcasts.

Here is the link to the TechMoat Consulting.

Here is the link to our Tech Tours.

My approach to assessing GenAI operating costs is the below checklist.

  1. What is the cost of compute, including energy and cooling? Initial cost vs. ongoing?
    1. The core compute is going to drive a lot of the costs. Especially if you are using an AI cloud service provider. If you have a downloaded open-source model, then it’s mostly the initial cost.
    2. These compute costs dependent on the requirements of the AI workloads:
      1. The compute requirements
      2. The timing requirements
      3. The memory requirements
  2. What is the cost of creating and maintaining the desired correctness over time?
    1. How accurate and correct you need the foundation model to be is a big deal. And this factor determines how much of the operations are done by software versus humans. It can make the costs of GenAI products look a lot more like services than traditional software.
    2. My main questions are:
      1. How much does correctness matter? What level of correctness does the product need to compete? What is the cost of inaccuracy?
      2. How does correctness change over time? Is it stable and flat lining, advancing, naturally changing or deteriorating?
      3. How much of a long tail is there in the domain? Is the long tail a liability or a benefit?
      4. How much of the process is iterative? How much are humans in the loop?
      5. What is the initial cost of training and getting to the desired level of correctness?
      6. What is the ongoing cost of inference at the desired level of correctness? How much are humans in the loop?
      7. What is the cost and frequency of fine tuning and/or retraining?
  3. What are the cost implications of increasing scale?
    1. What are the scale advantages vs. disadvantages?
    2. Does the Jevon paradox apply?

Cheers, Jeff

—-

Related articles:

From the Concept Library, concepts for this article are:

  • Generative AI and Agents
  • GenAI Costs: Cost of Compute
  • GenAI Costs: Cost of Correctness

From the Company Library, companies for this article are:

  • n/a

—–transcript below

00:05
Welcome, welcome everybody. My name is Jeff Towson and this is the Tech Strategy Podcast from Techmoat Consulting and the topic for today, what is the operating cost of maintaining the correctness of generative AI products and services? Now this is kind of the culmination of four pretty long articles I wrote about AI architecture. Kind of covered a lot, data centers.

00:32
the nature of uh generative AI compute, the costs associated with it, and really at the center of that, there’s, think, one key question, which is this idea that you have to maintain a certain level of accuracy and correctness in whatever domain your intelligent app or service is serving. And that actually changes kind of everything about the costs. And it can be huge, it can be small.

00:57
It can change your business strategy. can change your competitive dynamics. To me, that’s kind of the question at the core of all this, to get to a good strategy doing this stuff. So, I’m going to sort of wrap it all together today, those four articles into really, I think that one key point, and that’ll be the topic for today. Let’s see, housekeeping stuff.

01:22
or disclaimer, nothing in this podcast or my writing or website is investment advice. The numbers and information for me and any guess may be incorrect. The views and opinions expressed may no longer be relevant or accurate. Overall, investing is risky. This is not investment, legal or tax advice. Do your own research. And with that, let’s get into the topic. Okay, as mentioned, there’s been four pretty long articles about this subject. I’m just going to kind of wrap it all together here. But the first article,

01:52
is basically about the build out of AI data centers, which everyone’s talking about. It’s crazy. Everyone’s spending all this money. You got it Memphis with Elon Musk. You got OpenAI doing Stargate. mean, it’s all over the place and the monies huge. And I mean, the key point there is why are they doing this? Well, because these aren’t really data centers. This is really the architecture of intelligence.

02:18
And these data centers, because of the way the chips are set up and they compute and the huge amount of networking that happens within them, they basically operate like a big computer, which is different than what a data center used to be. It used to be you access some storage, you get some compute, things like that. No, no, these are different animals entirely, and there’s kind of a race in one dimension to go bigger than anyone else. China’s not really doing that race. They have about…

02:46
one tenth of the data centers that the US is building about 500 to 50. They’re doing something a little different. They’re being a little cleverer in their networking and other things. So different strategies, but that was article number one. And there was a podcast on that as well, which was podcast 265. Okay, article number two was. All right, at the center of all that, there’s this idea of why is

03:14
Why are AI workloads different than traditional compute workloads? And it’s not just that one is based on GPUs and the others is on CPUs. No, it’s completely different. Everything about it is pretty much different because at the center, the nature of the compute for AI is just different than the nature of the compute of traditional software, much more deterministic.

03:40
know, giving orders, it is executed, it’s run the same way every single time. Okay, it’s just different with generative AI. The math is different, it’s parallel processing, it’s more probabilistic. So, you ask a generative AI program the same question five times, you’re going to get five different answers. Well, that doesn’t happen with saving a spreadsheet in Excel. So, the nature of the computer at the center is different.

04:06
And then everything around that is pretty much different. The database is different. The memory and storage are different. And really the biggest difference is you really have three components in traditional software compute, which is what we’ve been talking about forever. Even if it had machine learning in it. We had sort of the compute, we had the database, and we had the app.

04:30
And we never really, or at least I didn’t, talked about the compute part very much, because it was always kind of the same. We always talked about the app and the database it was tied to, and how that created a service, an internal workflow in a company, a service that you sell to customers. We always talked about the apps and the database. Well, when you go to generative AI, you get a fourth dimension, which is you have the app, you have the database, you have the compute, and you have the foundation model.

05:00
So, we got four pieces now and you know those first two podcasts were sort of recognizing look even the compute is different there. You got to understand that first. Then we can start talking about the app and the database and the foundation models that are being used within there. And really the foundation models kind of the key bit because it changes the cost structure. A lot of the economics that we are used to in software that people really like.

05:31
Yeah, it’s not the same. It’s really not the same and it can be unattractive and sometimes you can go from traditional SaaS products that get you 60 % gross margins down to 20%. You know, depends how much human labor is involved in correcting and keeping these operating models, these foundation models working correctly.

05:53
You know, it can be all over the map. Getting bigger, which is traditionally very good in software, you get bigger as a SaaS company, you get all sorts of advantages. Your kind of, the way traditional software, SaaS, tended to work was you build it once. It’s pretty much similar for everybody, so you can distribute it to everybody. Maybe you get some recurring revenue. And as you get bigger and bigger, your margins, which were already quite good, because you’re reselling basically the same, pretty much standardized product to everybody.

06:24
As you even get bigger and scale grows, you get more advantages. Your margin starts to increase. You can outspend your rivals on R &D. You might get network effects. You get switch. There’s a lot of advantages. Well, a lot of that may not happen with generative AI. The margins might be quite low. And as you get bigger, you might actually get some major disadvantages for scale. As you get bigger and bigger, you could see your cost structure go negative.

06:52
Instead of getting better, it could worsen dramatically. the question that determines that is correctness, which is what I’ll talk about. There’s other things, but that’s kind of the big one. So anyways, it’s a really interesting subject when you take it apart. And all of this feeds into the idea of what should my AI strategy be. What should my agentic strategy be? Well, if you don’t know the cost structure, you can’t really figure that apart. And that’s what I’ll talk about today. So that’s kind of why I’ve been working on this so much. OK.

07:22
Let’s go a little bit into, I have basically a checklist which I will put in the show notes for pretty much how you take apart the cost structure of a product or service that is mostly about generative AI. And I mean, it’s not everything, but it’s what I’ve been using. So, question number one is basically the compute question. What is the cost of compute?

07:46
And that includes energy and cooling because if you’re building these products, the biggest bill you’re probably going to get every month is from your cloud provider because you’re using their service, their AI service or from OpenAI. Or if you host it yourself, okay, what is the cost of doing that? But that’s going to come down to, what does it cost to do compute? And secondary to that is how much energy does it use and how much cooling is required? And you got to think about it in sort of two phases. What did it cost me to build this thing out?

08:17
And then what are the ongoing costs? Now, if you do it sort of with a downloaded open-source model like a DeepSeek, well, your initial cost might be significant, 10, 15, $20,000. But your ongoing costs might be quite small. If you build your entire business by customizing models on Google Cloud and using all their AI tools and feeding your data in there and customizing using it, OK.

08:41
Your upfront cost might be less, but your ongoing cost might be very significant. And as you grow and use more and more, your overall cost structure might increase. There’s the compute side, but then there’s also the human labor side, which can really explode. I’ll talk about that. So, the first bucket is that. What is the cost of compute? That’s what most people are going to see. Question number two. This is the main question for today. What is the cost of…

09:07
creating and maintaining the desired level of correctness. It’s a very different thing to offer a product or service that says, we’re really good at spell check. Use our app for spell check. Okay, keeping that correct over time probably doesn’t cost that much because spelling doesn’t change that much. If you have a generative AI model that’s all about doing accurate medical diagnosis,

09:36
Well, the cost of maintaining a certain level of correctness, of accuracy in your answers could be significant and there could be a real breadth of answers. Instead of just checking the spelling of words, you could be checking what? Neurology, gastro-neurology, internal, I mean, it can explode in sort of scope of questions addressed and the accuracy is a big, big deal. So, in that case, yeah, the cost can be a major deal.

10:05
And it can change over time. The third question, which I’m not going to get into today, I’ll write a short article about this for subscribers in the next couple of days. What are the cost implications of increasing scale? We are used to the idea that there’s a lot of advantages to scale getting bigger in general. That would be size and then also a scale advantage versus rivals. But there’s also disadvantages. We don’t see those as much in software. We’re very used to seeing those in things like factories.

10:35
which is why factories don’t get big beyond a certain point. We’re kind of used to the idea that you can scale software to global scale like Google search and it’s all fine. With generative AI, we may see major disadvantages to increasing scale and these things may not grow beyond a certain size just like factories never do. So, we’ll talk about that. Well, that’ll be an article. Anyway, so that’s kind of the idea. So, I’ll start with this idea of

11:04
Question number one, checklist number one. Yeah, what’s the cost of compute? uh Now traditionally we would talk about, okay, you got to buy the chips, you got to use the chips, you got to see how many flops you’re turning, how much are you using it, how much time does it take? And then there’s things around that like, okay, how much memory is used? uh Fine, but when we move into things like generative AI, well, it gets kind.

11:32
We get to that fourth component of, yeah, it’s not just the compute. We also have the foundation model, which is driving the usage of compute more than anything else. Okay. Well, how does that change the cost of compute, the usage? Well, everyone talks about the size of models. So, this first factor is, look, how big is your foundation model? There’s a reason they’re called LLMs.

12:01
large language models. Well, one, they’re text-based language. Two, it’s got the word large right in the title. But if you’re looking at very small models, well, those are going to use much less compute. They’re much simpler. They’re quicker to train. They’re cheaper to use for inference. So, we can sort of put, yeah, what are the costs of various different model sizes? And it’s hard to sort of nail this down because it’s changing every month, but…

12:29
Yeah, you want to think about the cost. Are you going to use very small models, which are typically, let’s say, under 1 billion parameters? So, the number of weights that you’re setting, the parameters, if it’s under a billion, those are pretty small things. You can run them on smartphones. You can run them on edge devices. They’re not very smart. Their reasoning is poor. They forget the context very quickly. you know.

12:54
You got to keep telling it what you want all that time. Can’t do complex tasks, but a lot of companies are doing this. Google’s got Gemma, Gema, Meta’s got Tiny Llama. Yeah, we see those out there. We can move into sort of a pretty good spot of seven to 10 billion parameter models, which are small, but they’re surprisingly effective. People kind of call that the sweet spot. Surprisingly capable.

13:22
for relatively cheap and fast inference. So, Lama’s got a model at eight billion parameters, Yammer’s got one at seven billion, French Mistral’s got one at seven billion, fine. We can start to move into small to medium, let’s say 13 to 20 billion parameters. The reasoning starts to get good. That’s interesting, and you can still pretty much run those on consumer hardware with a couple tricks. All right.

13:52
I think where most people, most businesses are talking about are probably in the 13 to 20 billion if you’re a small business. If you’re a medium sized business, you’re probably talking about medium models which are 30 to 70 billion parameters. I think that’s where most businesses are ending up. Very capable. Capable of, know, sort of frontier level performance. But to do that, you’re going to need multiple GPUs. And so,

14:22
You know, the big companies, Alibaba’s, Quinn, they have a, I mean, they got a lot of models, but they got a 72 billion parameter model. Llamas got a 70 billion won. I think that’s where most people are going to end up, businesses. Now, next level above that, if building it yourself, running it in your own company. Now, if you’re just accessing stuff on cloud, you can access whatever you want. And then we get to the large models, which get all the press.

14:49
You know, these are the 100 billion, 200 billion, 300 billion parameter models. That number keeps increasing all the time. That’s, you know, doing all this stuff that no one thought was possible and what was possible it seems to change every couple month. uh Pretty expensive to run. You really probably can only access those by APIs. So, I don’t think many companies are building this stuff in-house. That’s Grok, that’s Llama, that’s GBT-4.

15:18
You know that whole thing. So, you want to think about the cost of these things and then you got to think about what level of performance do I really need? The bigger you go, you get better reasoning, you get better knowledge. They don’t make as many mistakes. You can give them more complicated uh instructions. You can give them larger context windows, which is pretty important. And the weird thing is you keep seeing what they call our emergent abilities.

15:46
Things are emerging out of these that are kind of new, like chain of thought reasoning, coding, multilingual performance, multimodal performance. The emergent capabilities, from what I read, I mean, I’m sort of citing what I’ve read, because I don’t do this myself too much. So, take that with a grain of salt. But emergent capabilities, let’s say 70 billion parameters in that range.

16:14
What are some other factors? As you get bigger, the jump in performance decreases. You go from 7 billion to 70 billion parameters in your model, you’re going to see a huge jump in performance across everything. You go from 100 to 300, the jump is going to be there, but it isn’t going to be that big as before, right? So, there’s sort of diminishing returns that people talk about. And of course, the cost goes up. I’ll give you a couple of numbers I’ve looked up.

16:44
It kind of goes up by a factor of 10 every time. If you’ve got a 7 billion parameter model and you’re going to process 1 million tokens, let’s say by API, that’s let’s say 5 to 15 cents for 1 million tokens for the small models. You take that up a factor of 10, let’s say to 100 billion parameter model. They cost about 10 times as much. So instead of

17:12
you five cents per million tokens, it’s 50 cents per million tokens. So, it kind of goes up, you know, as you go from 10 to 100 parameters, the cost pretty much goes up by 10. Anyways, take that with a big grain of salt, because this changes like all the time. And then when you go up to 300, 400 billion parameter models, it’s going to take another 10 jumps. Suddenly it’s $5 per million tokens, things like that. Anyways, but that’s kind of how I think about it. And, you know, I keep an eye on all this stuff.

17:40
But you want to sort of think about inference versus training. Training is very expensive inference, much less so. You want to think about size of model. You also want to think about the type of model that, you know, these models are very, very different. Basic chat text-based models, kind of what OpenAI started with, they’re kind of commodities now. I mean, you can pretty much get them for free.

18:10
or close to free anywhere you go. That was groundbreaking two and a half years ago and now it’s kind of a free commodity model. So basic tech space models, the cost is dropping dramatically. If you move into something like the other extreme would let’s say be video generation or multimodal models. Well, there’s not that many of those companies that really can do that well yet. So, it’s, sort of a premium. Not as many options, not a commodity yet.

18:40
So, they tend to cost more, but that’s decreasing as well. Image generation generally is just not as complicated as other things. So, some of these are a lot more difficult than others. yeah, the models are kind of going to emerge all over the place. They’re going to be very different animals. We sort of think about them the same way, but you know, that’s something to think about. As mentioned, sort of inference time requirements. Yeah, that’s a big deal.

19:10
Like if you use a reasoning model, DeepSeek R1, it’s going to use compute that let’s say 10 to 25 times more inference compute than a basic model. Why? Because thinking is an iterative process. It’ll do it once, it’ll think about it again, it’ll think about it again, it’ll think about it. So, if you’ve got models that are highly iterative, well that’s going to cost more too. They’re going to use more compute. And anyways, and then on top of all this, we keep seeing new models.

19:40
People are talking about the VLA models now, the vision language action models, which are basically for robots mostly. You can put them in robots that walk around because robots, which are really interesting to think about putting intelligence in robots, is they don’t have what chat GBT has on your browser. On your browser, chat GBT has access to the internet and can, and all the data that’s online.

20:08
A robot walking down the street actually has very little data. All it sees is what’s coming in through its sensors. That’s almost nothing compared to what an online AI can do. So, these things start to get trained in virtual simulations and that’s how you can kind of give them a ton of data. You basically create synthetic data. So anyways, there’s these VLA models that are really interesting for embodied intelligence. You’ll hear a lot of people talking about world models.

20:36
Like some of the pioneers of all of this stuff have started to come out and say, look, we don’t think this is going to get us to AGI. We don’t think text-based, language-based AI is going to get to artificial general intelligence. Because that is not really how that happens. Human beings did not become human beings by reading. We became human beings by going out in the world and seeing things and touching things and learning.

21:04
from the world itself. So, there’s sort of this idea of perception-based foundation models or world models where you can either go out in the world and perceive things and learn that way or you can sort of generate simulated worlds and put it there. So, you’ll hear things like that and then of course there’s agentic models which is something I’m focused on. All right, last point on this.

21:29
Inference versus training cost. One of the things that comes up when we talk about correctness of models. If your model is sort of undergoing data drift, let’s say you’re doing something like, you’re diagnosing skin conditions with a generative AI app. It’s looking at someone’s skin and its sort of trying to diagnose it there. Well, you’re going to really focus on the accuracy, the correctness. Is it 95 % right? Is it 90 % right? Is it 80 % right? And you’ll have a…

21:58
sort of level in your mind that, you know, I call it the desired level of correctness is what? Well, we might say 99 % if its cancer related, 90 % if it’s not potentially cancer, a pimple, we don’t need to be 99 % right for a pimple. But you know, if it’s a mole that might be cancerous, then maybe we do. So, you’ll come up with a level of sort of accuracy required for your product or service to be competitive. Okay, so you train to that level, you deploy, you start using inference.

22:28
It can start to degrade in quality over time. Your data drift. What do you do, Dan? Well, you can, you fine tune, but are we going to have to retrain because retraining is a lot more expensive than using inference. know, a hundred, a thousand times more compute to train than just normal ongoing inference. So, if you have a model, this is why I’ve sort of said this very specifically. What is the cost of creating and maintaining

22:58
the desired level of correctness in your AI product and service. If you have to retrain frequently because of what you’re doing tends to degrade in accuracy frequently, your costs are going to explode. I mean, it’s not just that you want a simpler problem, you want something that’s not going to degrade in accuracy because every time you have to fully retrain or significantly retrain, it’s going to be a big cost hit.

23:26
You really want to retrain once, deploy, and then just do a lot of little fine tuning over time, but not do that. So, you got to think about sort of cost versus uh training versus inference versus retraining. And some models, depending on the domain you’ve sort of chosen for yourself, for your product or service, you’re going to deal with these things differently. Okay, so that’s kind of question number one. I gave you three questions for how I think about costs and these things.

23:52
Number one is what is the cost of compute including energy and cooling and then initial cost versus ongoing cost. Fine. Next, let me get into the second one, which is really the point of this podcast. Okay, question number two on my checklist was what is the cost of creating and maintaining the desired level of correctness over time? You know, how accurate and correct do you need your foundation model to be?

24:20
Well, that really depends on what you’re doing. If you’re doing a chat bot for your online direct to consumer e-commerce site, you might have a chat bot, a uh conversational AI that is very good at questions like, where is my package? What time will my package arrive? How do I do a refund? Well,

24:45
getting a certain level of accuracy and correctness in those answers is probably quite doable. That’s very different than something like, I’m going to describe my medical condition, which has to do with abdominal pain that I’ve been dealing with for two years with six other related symptoms and occasional blah, blah, blah, well, that’s a complicated question and you’re going to need kind of a lot of expertise to get that. What level of accuracy do you need for that?

25:17
How are you going to maintain that over time? It’s just a bigger, more complicated question. And the reason this impacts the economics so much is whenever an AI can’t do it on its own, you have to have humans in the loop. You can’t just give certain questions. You can give the, when is my package going to arrive question right to the AI. It will give it right back to the person. You don’t need a human involved. That medical question,

25:46
I mean, you’re not going to give that an answer to a patient like that. No, it’s going to have to then go to a doctor. And the doctor’s going to look at the answer and maybe give some feedback and adjust it and use one feedback to the model so that the model can stay accurate or get more accurate, but also to sort of imply their judgment on top of the answer and give a recommendation to the customer or whatever. So, the cost structure, this is why I like these generative AI. uh

26:16
apps and services, you know, they look a lot more like software plus services, right? They don’t just look like software products. They look like a software product plus a consulting firm stuck on the side. And the economics seem to be that way. This is Martin Casado, who I really like reading. a, he does infrastructure at Andreessen Horwitz. He writes, he talks about this a lot because I,

26:42
I think he actually can see the numbers because they’ve invested in a lot of these companies and he’s sort of seeing how their gross margins are playing out. from what I can see from what he’s written, he basically says, look, this doesn’t look like software and SaaS gross margins, 50, 60%. This looks like software plus a service business, 20%, 30%, which is, know, software services businesses often get 15 to 20 % gross margins, things like that. m

27:11
And it was when you start to take that apart, it’s interesting. So anyways, I’ve broken that question, the second one down into basically seven sub questions, which I’ve listed in the notes. I’ll go through them quickly. But yeah, the first thing is, mean, you got to sort of have a business discussion before you get into these costs and technical, which is like, how accurate does this thing have to be?

27:34
Who are we competing against? How accurate is their model? Does it really matter? If we’re doing generative AI to create images for logos, so if you want to design your own logo for a company or be a design studio and people send you things and you let them use your app and you can design the logo, how accurate does that have to be? Well, it’s creative. So, there isn’t inherently a right or wrong answer. You just generate five logos for the client.

28:04
They pick one and then you iterate, okay, I like number two, let’s put number, but I want you to change it to blue and I want you to make a star on the top. So, you pick number two that was generated and then you put it back in the model with the adjustments or the changes and it shoots out 10 more. Well, in that case, creative endeavors, actually the accuracy is not that hard to deal with as a process. And you can either have a graphic designer working with them, which would be a cost for you.

28:31
Or you can just have an iterative tool that the customer can use themselves. And then in that case, it looks a lot like software economics. So, it turns out creative stuff, the accuracy is often not as big of a problem. Now, if you’re generating, let’s say, images of people in your tool, and all the hands have like six fingers, okay, that would be an accuracy problem. So, there’s a certain level of accuracy there. You could say the same thing with.

28:58
autonomous vehicles. How accurate does the autonomous vehicle have to be driving down the highway? No manual interventions ever. No accidents ever. 99%, 100 % or does it just have to be more accurate than a human? Does your accuracy just have to be higher than the next best alternative or your competitor? You just have to be higher than a substitute, the human driving or a competitor or rival.

29:28
That’s interesting. What about accuracy in math? Can you have an AI tool that when you give it math problems, let’s say it’s a scientific tool, it adds two and two together and says it’s seven? Well, in that case, mean, your accuracy really has to be 100%. So, once you start taking apart the accuracy and correctness level, you can see it’s very different. Math has to be 100%. Creativity is pretty flexible.

29:57
AV, autonomous driving, well it depends. Okay, we don’t want accidents, that would be one level of required accuracy, but does it have to be 100 % or just better than the substitute or alternative? What’s your competitive market demand of you? Things like that. anyways, that’s a good way to think about it, and usually you’re going to put a number around this. Our accuracy has to be 95%.

30:21
What’s our model doing today? Well, it was 85, but we reinvested, we spent more money to get it to 90, and then we had to spend even more money, a lot more to get it to 95. Usually, increasing performance often takes more time and effort. It’s easier to get from zero to 80 than from 80 to 90. So, you have to spend all this time to get this level of performance. Okay, we’ve got 95, that’s our desired level, but we’ve noticed it’s been trending down to 94, 93, 92 in the last month.

30:52
we may need to retrain or may need to do something. So that’s kind of your metric, correctness. ah So how much, that would be sort of question number one under this bucket. How much does correctness matter for your product or service? How much do you need to compete? How much do you need for safety, things like that? What is the cost of inaccuracy? If you misdiagnose a terminal illness, that’s a big deal. If the car,

31:21
slams into something, the cost of inaccuracy is very, high. So, you want to think about that as well. Okay, sub question number two. How does the correctness change over time? Does it flatline? Is it advancing? Is it just changing unpredictably? Is it deteriorating? So, what does that mean? In certain types of likes, if we’re creating a tool that does spell check or does grammar,

31:49
or maybe just translate something from English to French or whatever. You can kind of see that over time the performance level is going to flatline. Pretty soon, pretty much everything can do spell check. It’s not like company A has dramatically better spell check than company B. Nah, it’s kind of a flatlined capability. So, once you get up to a certain level, you’re pretty good. And what it means, grammar doesn’t change.

32:20
and words don’t change. So, you would expect it to sort of be flat-lined and stable. Once you get it to a certain level of correctness, you’re probably fine. I think autonomous vehicles will actually get to that. I think we’ll be at a point in five to 10 years where all the cars just drive perfectly all the time and that’s just the way it is. I think it’s going to be sort of flat-lined and stable. Now there may be fields were

32:49
performance continues to increase and to advance. Where scientific AI tools, maybe laboratory assistance, agentic AI, tools you use in the lab, tools you use for reasoning, tools that can read every biology paper ever written and come up with new questions and run new tests. Well, that’s not going to be stable and flatlined like driving a car doing spell check. It’s going to keep going up year after year after year probably. So, you’re…

33:17
required level of correctness, the bar is going to keep rising year after year after year. So, an advancing field, that’s a different type of question than sort of a stable flatlined field. There might be other fields where, look, the correct answer just keeps changing all the time. If you ask, what is the weather today? Well, the weather changes all the time. OK.

33:44
What would be a good strategy to make money in the stock market, the Hong Kong stock market this month? Well, the generative AI could come up with a strategy and it might actually work. You might have a very high level of correctness, but two months later, maybe it won’t work anymore because the market changed. So, the whole nature of correctness, if you’re applying it to a changing field, it means something different. Digital marketing is like this.

34:13
You know, what works in digital marketing changes all the time. So, you always have to be changing. Well, so that matters. And then the fourth bucket is one we talked about, which you already talked about, which is deterioration. Look, maybe your model’s just not performing like it used to. You know, maybe it’s day to drift, maybe consumer behavior is changing, maybe competitors are changing things. Like, there could be a lot of reasons why your performance, your correctness is decreasing and degrading over time.

34:42
And in that case, we may need to fine tune, we may need to retrain, we may need to look at a lot of basic assumptions we’ve used. So, this question, number two, how does correctness change over time? Is it flatlining and stable? Is it advancing? Is it just naturally changing? Is it deteriorating? You kind of got to know that when you start to assess how are we doing, we’re at 90%. What does that mean? So that’s kind of question number two. Okay, question number three.

35:10
This one, I actually, think it’s fascinating. And this came from Martin Casado as well, which is how much of a long tail is there within your generative AI products domain? And is the long tail, is it a liability? Is it a benefit? It can be both. It can be a lot of things. And that’s very weird to think about because up until recently, the long tail was really cool. What’s the long tail?

35:39
We would talk about the long tail with something like Google search or TikTok, which is like, why is TikTok so powerful? Well, cause there’s a lot of videos out there in the world for common things, right? Like I want to see the game from last night. I want to see a video about what to do in Hong Kong on a travel trip. Well, those are very common topics. So, you know, if you’re in the matching business, which TikTok is, it…

36:08
It matches what a consumer wants to view with content that meets that. Well, in common things that people like, well, a lot of apps can do that. But because TikTok was so much larger, it had far more content. And so, it was able to serve a lot of long tail use cases that would be very difficult for a smaller app to do. If you have a really weird, bizarre niche

36:37
you know thing I want to see videos of guys who hunt iguanas on Florida housing complexes with air guns. That is actually a thing I found it on YouTube one day and it turns out there’s a whole lot of videos on that they can match that niche interest with that niche content creation in the long tail.

37:01
And that was a big part of why TikTok is so powerful. Whatever you’re interested in any given moment, it can find. And the network effect is based on this to some degree. And you could say the same thing with Google search. Why is Google search so much better? And why does it have 90 % global market share? Well, because it can do long tail searches that no one else, know, smaller apps or services just struggle with.

37:27
And then there’s a flywheel and more people search for weird things. So, the long tail was a huge benefit in video libraries, in search results. You could also say it’s kind of that way in marketplaces. A company that has more products or services, or let’s say more products like a Walmart or an Amazon is better. But the long tail there is much, much smaller. Anyways, the long tail was a big deal. uh Okay, when you start moving into generative AI,

37:57
types of apps and services, what are we talking about with the long tail? If I have an e-commerce direct-to-consumer site, and I have a chatbot, a conversational AI I’ve put into the interface instead of using basically keyword search AI, which is very powerful, that’s totally what Alibaba is doing right now. Okay, there could be simple inquiries like, when is my package going to arrive? Fine.

38:26
How do I do a return? Fine. Those are pretty, there’s sort of a narrow common inquiries that are coming in from customers. But what if the customer says something like, what type of apparel do you think you recommend I should wear for the winter?

38:48
Well, okay, that’s still somewhat common, but it’s a lot more open-ended. There’s a lot of different ways that can be replied to. And we could go sort of further down the path from super common to kind of common too deep into the long tail, which is like, what style and size of jacket should I wear on my upcoming trip to France in December?

39:15
That’s pretty specific. It’s pretty unique to me. It would have to know specifics about me. It would have to know about France. It would have to know about the weather. It would have to know about apparel in general. would have to know about men’s fashion. That’s a pretty niche question. Well, if you’re building a generative AI to answer really a large, long tail of these types of inquiries, how do you achieve correctness?

39:44
in all of those long tail type use cases and inquiries. The further you go into the long tail, the harder and harder it is to get sort of a maintained level of correctness. So that sort of shapes what kind of product or service you offer your customers. Are you opening yourself up to a dramatic long tail? There’s a big difference between a skin diagnosis app where you just point the camera and it tells you pimple, mole, hair, this one’s fine.

40:12
to a full medical doctor that can answer questions about anything. Right? So that’s kind of this long tail question is really interesting to think about. And I think that’s one of the biggest differences when you go from sort of traditional software and even machine learning, which is Google search and TikTok into generative AI type situations is the long tail can become a huge liability to you and it can balloon your costs.

40:40
Because when you have to answer all those long tail type inquiries, the way you’re going to get the sufficient level of correctness is with people. You’re going to have to put people in the loop and it’s going to cost you a lot to support that type of service versus what time does my package arrive. So anyways, that one’s sort of fascinating. I’m thinking about that one a lot in terms of how it changes competitive dynamics and whether network effects are going to get blown apart, which I think they might be in some cases.

41:09
All right, well, I’ll finish up the rest quickly. uh Number four, how much of the process is iterative? And are humans involved in the iterations? Now, the design logo scenario I talked about was, look, I want you to design a logo for me. The AI spits out 10 solutions. I say, I like number two but make it green. It spits out another solution. I can iterate that way.

41:35
The compute costs will go up because you’re doing multiple iterations. But I’m actually going to get to a pretty good level of correctness, or at least in this case, satisfaction, by just iterating my way there. And for the app provider, it’s actually pretty automated. From their perspective, it’s all automated because I’m doing the iterations as a customer. It’s a pretty good business. You can get to correctness. And we’re not putting humans in the loop. But what if it’s more of a medical question or something more complicated?

42:05
where to get to the right level of correctness, you have to do lots of iterations and you have to have a human in the loop of those iterations. Well, that can compound your problems pretty dramatically in terms of cost. That’s reasoning models, that’s things that are difficult to get the answer to and maybe the customer has three, five, 10, 15, like coding can be like this. Coders basically do lots and lots of iterations.

42:33
Right? They say, generate this code for me. Then they, okay, now change this, now change this. And these complex processes, creating videos, creating TV shows, creating code, you have tons of iterations using the tool. Well, the big question is, is there a human in the loop on those iterations? And who’s paying for that? Now, if it’s a software coder using an AI tool, the product doesn’t have to pay for that. But in other cases, like medical stuff, yeah.

43:02
you’d have to pay for that, would be very expensive. Okay, that’s most of what I wanted to go through. The remaining three questions for this bucket are taking those sorts of qualitative four questions I just gave you and then boiling them down to numbers, which is three questions, the last three. What is the initial cost of training and getting to the desired level of correctness? What is the ongoing cost of inference at the desired level of correctness?

43:30
how many humans are involved in that. And the third is what is the cost and frequency of fine tuning and or retraining, right? That’s going to get you sort of a ballpark figure for the cost structure. But I think you got to do the first four questions, which are pretty qualitative to sort of know how difficult and expensive of a problem have you taken on in your generative AI type product or service.

43:57
So, choosing the scope of the problem is super important in this case. And I think that’s pretty much it. I’ll put the checklist in the show notes. Take a look. Maybe that’s helpful. Now there’s a big caveat on all of this is when I built out my sort of frameworks and models for digital strategy, I was able to look back at 20 years of companies and see their financials and see how it plays out. So, all the competitive advantage stuff, economies of scale, switching house, we can see it in the numbers as real.

44:28
Most of what I just said is theory. I don’t have a whole lot of income statements and balance sheets to see how this has played out over time. I’m sort of taking reports from people starting to use these tools and see what happens. this is a lot more forward guessing as opposed to sort of historical analysis. So that’s a big weakness.

44:48
But yeah, that’s kind of where I am in all this. yeah, I think there’ll be one last article on the implications of scale for all this. So that’ll be basically five parts in this understanding the AI infrastructure series. And that will be it. Well, at least for now. As for me, it’s been a pretty good week. I’ve been catching up on articles I was behind on. So, I’m pretty much caught up now.

45:13
I’m going to be in China this week, mostly Shenzhen area, which will be great. I really like Shenzhen. Visiting a couple companies, doing some other stuff. I’ll talk about that next week. Yeah, it’s going to be a great week. And then I guess that gets to December, end of the year already. Anyways, I hope this is helpful and I will talk to you next week. Bye bye.

——–

I am a consultant and keynote speaker on how to accelerate growth with improving customer experiences (CX) and digital moats.

I am a partner at TechMoat Consulting, a consulting firm specialized in how to increase growth with improved customer experiences (CX), personalization and other types of customer value. Get in touch here.

I am also author of the Moats and Marathons book series, a framework for building and measuring competitive advantages in digital businesses.

Note: This content (articles, podcasts, website info) is not investment advice. The information and opinions from me and any guests may be incorrect. The numbers and information may be wrong. The views expressed may no longer be relevant or accurate. Investing is risky. Do your own research.

twitterlinkedinyoutube
Facebooktwitterlinkedin

Leave a Reply