Building genAI products

Building generative AI into enterprise software applications has huge implications for the development process, and many organizations have yet to reckon with them. Heck, I’m not entirely sure I’ve grasped their full reach myself. Here’s what I think it means.

Making an LLM call in an application is very different than calling a traditional software library. How different? Three things stand out to me:

  1. For a given input, traditional software returns a pre-determined output. Not so with LLMs (zero temp aside).
  2. Newer library versions are typically backward compatible. LLMs make no such promise and have not been.
  3. Writing useful production-grade software typically requires a sophisticated, specialized toolchain. However, prompts are only a text editor away, thus democratizing the art.

Before playing those implications forward, let’s outline the enteprise software context in which this is happening: big, complex software applications built and operated by dedicated teams of specialists who use industry practices honed through decades of hard-earned lessons for customers who may welcome improvements, but don’t want change, running in hostile and regulated environments to support critical business processes.

Introducing LLMs into that ecosystem brings about magical capabilities, but almost none of the tools and practices that have made successful (most of the time) the stacking of lines of code by the tens of millions.

As I look at my own cross-functional teams building genAI-powered software, I ponder a few questions:

  • Are our prompts treated like code, with source control, reviews, design, and documentation?
  • Are our genAI-powered apps architected, modular, connected to our data pipeline?
  • Can we test them for functionality, scalability, security, adversity, and backward compatibility?
  • Do we understand the runtime characteristics like cost of inference at scale, latency, logging, and traceability?
  • What do “shift left” and CICD mean when applied to genAI products?

I don’t blame anyone for using LLMs before having all the answers. I certainly am not waiting! LLMs are so darn effective. Indeed, at JA we now track dozens of use cases making over 1.4M LLM calls per day while churning a wallet-stretching 25M tokens/hour. In two-plus years of using LLMs in production, we’ve discovered some of the answers to our questions, but there’s a long way to go, plus a shifting of the goalposts with each new emerging genAI capability.

In summary, I feel that we have yet to fully exit the artisanal phase of genAI app development and are still in the process of defining what industrial-scale development means. Still, what a privilege it is to be watching the ecosystem evolve in real time. It is a professional joy reminiscent of watching the internet grow into an entity viable for business purposes.

If you’ve seen around the genAI bend, what pro tips do you have to share?