Rethinking the Role of a Systems Integrator for Artificial Intelligence

By admin On Aug 20, 2024

In a now-famous blog post from 2014, journalist Steve Cichon compared a Radio Shack ad from 1991 to the then-current iPhone. The difference was stark. Of the 15 items in the ad, 13 had effectively disappeared, all of them replaced by the singular smartphone. The technology landscape embodied in 1991 was unrecognizable by 2014 — and vice versa.

1991 was also the year of the Gulf War. Looking at the defense platforms fielded that year compared to 2014 — or 2024, for that matter — shows a very different story. Not only are the technologies entirely recognizable to a 2024 audience, but the majority of those systems are still in active service.

By any measure, the development of novel technologies in the defense sector has slowed, lagging well behind the pace of commercial industry. Attempts to diagnose the problem are myriad. We talk about intellectual property rights. We talk about commercial-off-the-shelf versus government-off-the-shelf software. We talk about vendor lock, open architectures, and application programming interfaces. I’ve heard countless debates about traditional versus nontraditional defense contractors and about the idea of doing business differently — making it smarter and more agile.

As part of the founding team of Striveworks, a company that does business with the Department of Defense, we have a vested interest in government acquisitions, particularly around AI. However, we have decades of experience in the commercial sector and have seen first-hand how efficient, competitive market structures in our public equity and futures markets can create a competitive flywheel that benefits market participants.

The defense sector’s sluggish innovation is not for lack of trying on the government side. The Department of Defense has numerous innovation initiatives, from Small Business Innovation Research grants to the Defense Innovation Unit, AFWERX, the Army Applications Lab, and others. Defense leaders have made concerted efforts to lower the barriers to entry for startups and other nontraditional defense contractors to do business with the Defense Department. The so-called Last Supper, the post–Cold War consolidation of defense primes, has also been appropriately highlighted as a consequentially negative development in our defense industrial base. The rapid evolution of AI places even more pressure on us as a nation to develop new pathways of acquisition that match the requirements of technology.

This is all useful, well intended, and valuable, but they are ultimately treating a symptom and not the disease. If every market exists to match supply and demand, these activities only increase the supply of novel capabilities — and that’s not the fundamental problem.

The problem is one of competition and the incentives it drives. The market for AI capabilities is uniquely suited to be recast as a continuous competition between models for “the right to inference” individual data points. Creating this constant competition will align incentives for vendors and government alike, creating better outcomes and reducing cost.

The Problem with a Single Buyer Market

When it comes to defense, the number of buyers is small. For large systems, there’s really only a single customer: the U.S. Defense Department (allied and partner military sales are highly correlated with the U.S. acquisitions). This puts the defense market very close to monopsony. Buying decisions occur infrequently — often only once every few years. In this environment, buyers rightly fear that vendors lack strong incentives to keep improving and iterating their products after they win a contract.

This asymmetry is why supply-side interventions don’t address the root cause of slow innovation. Encouraging nontraditional vendors and lowering barriers to entry to the defense market increase competition to the left of a major program contract; after it’s awarded, though, the problem of incentives post-award remains. (Certainly, there are incentives for growth, renewal, and expansion, but the discount factor on a recompete in five years or a lateral expansion in three years is significant.)

In an effort to address this problem, contracting officers end up allocating immense effort to defining what is and isn’t government intellectual property, attempting to lay out interfaces and define compatible subsystems, and grappling with concepts like vendor lock. Like the efforts to increase supply, these efforts improve the state of the market at the margins but don’t cut deeply to the core of the problem. They sharpen competitive pressures at the margin, but, ultimately, the challenges with defense technology must get fixed on the demand side — by eliminating the monopsony and returning to a healthy market structure where many buyers participate to incentivize innovation as in most commercial markets.

Why AI Needs a Systems Dis-Integrator

Exquisite, unique systems — stealth bombers, nuclear-powered submarines, aircraft carriers — don’t fit this multiple buyer market template. The buy is too unique and in necessarily small quantities. A different dynamic exists for business enterprise software — word processors, chat, video conferencing, and enterprise resource planning systems; the overlap between how the Defense Department uses these tools and how any large commercial company does is nearly perfect. Congress and the Defense Department have done well to take a “free ride” on the iterative buying decisions of the commercial market.

In this context, AI exists in a unique market. Only a few technologies have fundamentally restructured economic markets before. The printing press slashed the marginal cost of production for information (and the advent of digital media then zeroed it out). Likewise, the internet effectively eliminated the cost of distribution for information. The rise of the AI market will ultimately prove just as disruptive to the basic nature of economic transactions.

How does this relate to the business model of a systems integrator in defense? AI shares some properties with software: Once you have the model, the marginal costs to produce and distribute its output — the inference — are extremely low. But there is a fundamental difference between AI and software. AI is non-monolithic. Software gains leverage through monolithic horizontal scaling. This element of software is driving people to rethink the role of the systems integrator for software. Unlike software, where deterministic outputs to the same input are a core principle, the performance of AI models is highly contextual and time-dependent. Models drift, data changes, and even so-called foundational models are fine-tuned, retrained, and repurposed to new users and new use cases. There is no best model — only the best model for a particular data point at a particular time. The “fit” between a data point and a model is ephemeral and unique. This characteristic of AI has already been internalized by investors, regulators, and, of course, industry.

Yet the impacts that this essential difference can have on systems integrators for defense are still underappreciated. The business model, implemented through technology, that best suits the ephemeral nature of AI is a two-sided “marketplace” between data points and validated models. We call this approach a systems dis-integrator, and it’s a technological function as well as an organizational process. This approach isn’t new: Electronified financial markets, automated bidding in online advertising, and even the matching process for medical residency admissions have already disaggregated buyers, generated competitive pressures, and driven down costs. All these examples operate as two-sided markets of bidders and offerors matched through a digital marketplace.

The same approach should apply to the Defense Department’s AI initiatives. With a two-sided market, data streams — individual data points — would come together with a host of AI models that compete (i.e., bid) to demonstrate their appropriateness for each particular data point. A matching engine would pair these data points and models, much like buyers and sellers are matched in financial markets or real-time bidding networks for digital ads.

Most importantly, in a single stroke, the Defense Department could stop worrying about “picking winners” or initializing competitions once a decade. By shifting the point of competition from the prime contractor or subcontractor down to the datapoint and inference, the government can turn the acquisition of AI into a highly iterative, highly competitive market at the stroke of a pen. From geospatial intelligence to decision support, there are millions of datapoints flowing through AI systems every day in the Defense Department. Creating “markets” for these datapoints to be matched with models fosters a newfound level of competition that benefits the Defense Department and warfighters.

Steps Forward

The Defense Department and intelligence community have started initiatives in this direction. They have piloted programs that have tried to run iterative development — model “bake off” competitions, such as Project Maven, and other initiatives at the National Reconnaissance Office and National Geospatial-Intelligence Agency. Given the technology available at the time, these efforts were well directed, and they generated a lot of lessons learned — both good and bad — for delivering consistent, performant, and operationally relevant inferences to warfighters. These programs set out to acquire “gold standard” models — soliciting model submissions from a restricted pool of vendors and evaluating model efficacy on a static, holdout dataset. The most performant models were then purchased, and the programs repeated this process every three to nine months. This is a great first step — but we must go much further.

There are no such gold standard models. Model performance is constantly in flux and, unfortunately, typically degrading. This degradation can be driven by environmental factors or adversary actions. Importantly, those adversarial actions can be not just high-tech interventions, like adversarial patches, but very low-tech interventions as well. Basic camouflage and military deception have significant impact on AI systems, and these countermeasures can be deployed in days — and for hundreds of dollars. The United States cannot compete over the long term if its ability to cycle new models into operational milieus is measured in months and millions of dollars. The United States and its allies are already confronted with this economic calculus in air defense today: Knocking down a $500 quadcopter with a $3 million missile is a fundamentally losing proposition, regardless of the specifics of a particular engagement.

But it doesn’t have to be this way for AI. The Defense Department needs to think about models not as exquisite systems but as consumables — a “class XI” of supplies. The rapid commoditization of foundational models, paired with an automated technological solution that selects domain-specific models for inference in real time, creates a dynamic AI ecosystem where adaptations can occur millisecond to millisecond and the marginal cost is measured in cents, not millions.

Systems Dis-Integration Opens a Direct Marketplace for AI Inferences

Five years ago, AI model management was an intensely manual process, and the concept of a systems dis-integrator would have been purely conceptual. Recent advances in the automation of AI model management make this systems dis-integrator approach a realizable vision today. Model builders could bring their models into a catalog. Once loaded in the system, an evaluation framework would register model metadata and the model’s source-derived training data. On the other end of the marketplace, customers who need inferences would leverage the increasing proliferation of data ontologies to programmatically deliver data points preloaded with metadata — for example, the Army’s Unified Data Reference Architecture. A matching engine would then route data points to the most appropriate model. Depending on the use cases, factors like weighing the statistical properties of the inference data, the associated metadata, and user feedback on model performance, inference latency, inference resource load, and other considerations can be used to define the matching algorithm. As with the matching algorithms in commercial markets, the matching algorithm would be public and available for iteration over time. While this algorithmic matching requires additional computation on the margin, the compute required is a small fraction of that needed to perform the inference itself — because the match looks like a query into a database, versus a very graphics processing unit–intensive inference computation. Under the presumption that the compute and infrastructure exist for the models themselves to exist, the marginal burden of matching is not significant.

Why consider an approach like this one? Because it has huge, direct benefits for both sides: the model builders and the data-owning consumers. In this scenario, model builders compete purely on the merits of their developed models. It also provides a lower barrier-to-entry pathway for model builders into different segments of these inference markets. Model builders can submit a specialized model, via application programming interface, that seeks to carve out advantage in one small segment of the marketplace. With the right scaffolding, this can be done at a lower upfront cost to the builder and with the ability to iterate, resubmit, expand, trim, and so forth, much more often.

Meanwhile, inference buyers get to optimize the quality of every individual inference — not just the average performance over all inferences. Commercial vendors, government teams, research groups, federally funded research and development centers, and open-source developers all compete on a level and objective playing field. This approach bypasses the emotional appeals of proposals and lets the heart of the matter — model performance — speak for itself. Further, this concept extends and enhances the efforts already under way to implement thoughtful and codified processes for test, evaluation, validation, and verification of AI models prior to deployment. Those test and evaluation processes can remain an important prior step to the acceptance of models into a deployment scaffolding. Once in that deployment scaffolding, this concept of real-time inference matching provides an additional, complementary layer of safety around deployed models — reducing the risk that models with performance below operationally required levels touch production data and drive erroneous decisions. Active learning and other “on-the-fly” approaches to changing model performance can continue to be governed by the appropriate test and evaluation processes.

The nature and type of risks borne by market participants in such a system are also dramatically reallocated. Model vendors exclusively carry the risk for model performance: Non-performant models don’t get used, and their vendors don’t earn payments. The government wears the operational risk — defining market access, incentive structures, and so forth.

This is demonstrably different from today’s system. Right now, the government carries all the risk associated with performance: The overwhelming majority of contracting for AI models are done on a firm fixed price or time and materials construct where the model is an explicit deliverable. Unlike a usage-based approach, the standards of traditional contracting fail to match the highly iterative, evolving nature of constantly improving models — consequently, the cost of retraining or finding a new model if it doesn’t perform. As the National Security Commission on Artificial Intelligence observed in their 2021 final report,

Critically, the Defense Acquisition System must shift away from a one-size-fits-all approach to measuring value from the acquisition process. Adherence to cost, schedule, and performance baselines is rarely a proxy for value delivered, but is particularly unsuited for measuring and incentivizing the iterative approaches inherent in AI and other software-based digital technologies. Unless the requirements, budgeting, and acquisition processes are aligned to permit faster and more targeted execution, the U.S. will fail to stay ahead of potential adversaries.

Because those contracting actions are large and infrequent (months or years) and because funds are de facto even if not de jure obligated up front, the incentives to continue innovating are considerably dulled — many of the people on both sides of the acquisition decision won’t even be in the same job when it’s time to recompete. The acquisitions professionals in the government are also humans who are stretched very thin; between increased workload, increased regulation, and the increased frequency and duration of continuing resolutions, asking contracting officers to “just think outside the box” presumes a luxury of time and space that doesn’t exist. Like a prisoner’s dilemma, the rational result of our system of incentives today is globally suboptimal. Working with the acquisitions professionals and building a distinct, competition-based acquisition pathway for AI is the better path forward.

Meanwhile, the model vendor carries all the operational risk: getting market access, balancing the equities of their intellectual property portfolio with a government customer fearful of vendor lock, etc. The Defense Innovation Unit and other innovation shops are doing admirable work to bring down this operational risk, particularly with expanding market access. But even so, it remains the wrong risk for emerging vendors to carry.

This allocation of risks is subpar for everyone. The government loses quality of product because model builders are incentivized to devote precious energy to developing byzantine distribution channels inside government acquisition systems — rather than ruthlessly focusing on better and better models that directly contribute to customer value.

Under a true marketplace system, payment would shift to a per-inference structure. Models that do not perform cost the government nothing — unlike our current system. Per-inference payment might raise concerns of high costs if it were uncapped, but, if so, there is an easy fix: The government could institute a rebate model. On a regular basis, the government would allocate a fixed budget pro rata to all successful inferences over that time period. Overall cost would remain a fixed “not-to-exceed” number, but individual payments would fall in direct proportion to the market share earned by the model. There are close analogs to usage-based pricing models. In the commercial space, the buying model for “models as a service” is already dominated by per token or per application programming interface call. There has been broad signaling from acquisition shops pushing the cutting edge, like the Tradewinds team in the chief digital and artificial intelligence office in the Defense Department, that consumption models can be executed within existing federal acquisition law — but in a world where defense primes are vociferous that they need all of the cost risk to be worn by the government, what’s been missing is the “killer use case” that forces a shift in thinking – AI inference is that use case.

Conclusion

The monopsonistic nature and long buying cycles of the defense markets create immense problems for a government buyer looking to maintain the same sharp competitive pressures that spur value delivery in commercial industry. The traditional approach to pick a large systems integrator and apply indirect pressure through that prime to subcontractors has consistently been inconsistently successful. Arguably, the persistence of the phrase “picking a winner” in government acquisitions tells you everything you need to know about the failure of competitive pressures to persist after a contract is awarded. The perniciousness of the concept of picking winners is even sharper in the critical field of AI because the concept of a “best” AI model really only exists at the most granular level: The optimal model to infer on a single data point for a single task is not guaranteed to be optimal anywhere else.

A viable approach to achieving optimality throughout the whole data and task space, in the form of a highly automated matching market, exists in commercial industry — in finance, advertising, higher education, and other two-sided marketplaces. In this world, the systems integrator for AI functions best as the maintainer of marketplaces, providing model developers and model consumers open, objective, and competitive access. Competition occurs billions of times a day, not once or twice a decade, and models win on performance, not PowerPoints. In government and in industry, as acquisitions experts and amateurs — as a nation — there’s a clear consensus that a “business as usual” approach to acquisitions is a threat to our national security. In a multi-polar world, technological progress is nearly inevitable, as is the ability of an efficient solution to displace inefficient ones. Our continued competitiveness on an increasingly AI-dominated battlefield demands that we complete the transformation of AI acquisition into an efficient market structure.

Jim Rebesco is co-founder and chief executive officer of Striveworks, a machine learning operations company.

Anthony Manganiello, a retired Army officer, is co-founder and chief administrative officer of Striveworks.

Image: Staff Sgt. Joseph Pagan

Commentary