The World needs more than a data lab - It needs a data economy
By David Arnež | Co-founder at Inflectiv Bobby Samuels (CEO, Protege) got the diagnosis right. The frontier of AI is jagged. Models that write flawless code fall apart navigating a complex medical workflow. The bottleneck isn't architecture. It isn't compute. It's data. The piece published this week arguing for a dedicated AI data lab; DataLab at Protege - is worth reading carefully. Not because the prescription is complete, but because it names the right problem and reveals exactly where the solution has to go further. We build data infrastructure at Inflectiv. We have 7,700 users, 6,000+ datasets, and 4,600 active agents running on our platform. I've spent more time than I'd like staring at the gap between data that exists and data that AI can actually use. The diagnosis is correct. The prescription misses something fundamental. The real gap isn't research capacity. It's an incentive structure. The a16z piece makes a striking point: 419 terabytes of web data have been scraped. The estimated volume of all data in existence is 175 zettabytes.
Source: a16z (accessed on the web, 11th March, 2026) Public data is effectively exhausted. The intelligence AI needs is trapped everywhere else (in private systems, operational workflows, domain expertise, physical sensors in different formats; PDFs, DOCx, XLM, JSON, …). But here's what a research institution can't solve: that data won't come out through scientific rigor alone. The people who hold it, e.g. organizations, domain experts, individual contributors → have no structural reason to release it. A lab can build the methodology to use the data once it exists. It cannot manufacture the economic incentive for anyone to contribute to it. This is a different kind of bottleneck than the one DataLab is designed to solve. It's not a capacity problem or an attention problem or a translation problem. It's a coordination problem. And coordination problems at scale have historically been solved not by building better institutions - but by building better markets. Data hoarding is rational. Until you make contributing more rational. Consider why the world's intelligence is actually trapped. It isn't primarily because nobody has organized it. It's because the people who hold it have no reliable mechanism to capture value when they release it. Few real examples;
A compliance team at a financial institution has spent years building proprietary signal,a robotics researcher has accumulated sensor data from thousands of operational hours, anda security firm has mapped threat intelligence nobody else has seen, etc.
They don't publish it not because they're secretive by nature but because publishing it, under current infrastructure, means giving it away permanently with no compensation, no attribution, and no visibility into how it's used. The a16z piece notes that better data beats better algorithms and cites the history of AI to prove it. AlexNet needed ImageNet and the LLM paradigm needed the internet. What it doesn't address is the economic structure that made those datasets possible. ImageNet was built with grant funding and graduate students. The internet was built by billions of people with no expectation of compensation. Neither model scales to the next layer of intelligence that AI actually needs. The proprietary, fragmented, domain-specific data that determines AI's frontier capabilities won't come out of goodwill or grant cycles. It will come out when contributing it is more economically rational than hoarding it. There's a third supply side nobody is talking about. The data discussion usually runs on two axes: human-generated data and synthetic data. The a16z framing stays largely in that space; real-world human activity data, proprietary organizational knowledge, multimodal inputs from lived experience. Something new is happening that changes the picture. AI agents are now generating intelligence at scale. On Inflectiv, we crossed 4,600 active agents. With our v2.1 Self-Learning API (releasing in 2nd week of March), those agents don't just consume datasets, they write back to them. Few examples;
A market intelligence agent monitoring TradFi or DeFi sentiment builds a proprietary dataset that grows more valuable every day,a compliance bot tracking regulatory changes accumulates a knowledge base that no human team could maintain, anda research agent scanning academic literature produces structured signal that didn't exist before it started running.
This isn't a replacement for human-generated data, but it’s additive. Agents don't observe the world the way humans do. But they can process what they observe into structured, queryable, provenance-tagged intelligence at a speed and scale that humans cannot. The next hundred ImageNets aren't going to be assembled by graduate students. They're going to be generated continuously by agents doing their jobs, if the infrastructure exists to capture and govern what they produce. What a data economy actually requires. A data lab solves the supply-quality problem. It doesn't solve the supply-incentive problem or the supply-scale problem. Closing the data gap requires both. The infrastructure for a functioning data economy needs a few things that don't currently exist in a coherent stack. Therefore data needs; Provenance → you need to know what something is, where it came from, and what agent or human produced it. Economics → contributors need to capture value every time their intelligence is queried, not just when they initially release it. Governance → as agents write to production datasets at scale, you need security, credentialing, and audit trails that don't currently exist. Liquidity → it needs to move from contributors to consumers autonomously, without human intermediaries at every transaction. The a16z piece ends by noting that DataLab is only the beginning of what's needed and that the field requires an entire ecosystem of data labs. That's true and the ecosystem also requires the economic infrastructure underneath the labs. The layer that makes contributing data more rational than hoarding it. The layer that means agent-generated intelligence doesn't evaporate when the session ends. Better data beats better algorithms. Better economics beats better data. The history of ML says better data beats better algorithms and I believe that every AI breakthrough has depended on the right data existing before anyone knew how to use it. But data doesn't appear because researchers need it, but because someone builds the infrastructure that makes releasing it more valuable than keeping it private. The data economy the AI field actually needs isn't going to be assembled by any single institution, no matter how well-funded or rigorous. It's going to be assembled by millions of contributors (human and agent), but only when the economic incentive to contribute finally exceeds the cost of release. The compute layer has Nvidia. The model layer has OpenAI, Anthropic and Google. The data layer needs more than a (one) data lab. It needs a market. That's what we're building at inflectiv.ai
AI doesn’t struggle because models are weak. It struggles because the intelligence those models need is messy, hidden, or inaccessible. This week, we focused on the layer between raw data and agents, the infrastructure that turns scattered knowledge into something machines can actually use. Here’s what we shared.
Accessible AI Infrastructure Getting started with AI infrastructure shouldn’t require a large budget or complex setup. Inflectiv keeps the entry point simple: free credits every month, access to datasets and agents, and the flexibility to upgrade only when you actually need it.
The World Is Leaking Alpha Across industries, valuable signals already exist in operational data, shipping logs, energy infrastructure, agriculture metrics, labor markets, and more. The issue isn’t that intelligence doesn’t exist; it’s that it’s trapped in messy formats that markets and AI agents can’t consume. The real opportunity lies in structuring that intelligence so it becomes usable.
Building with Walrus We’re excited to be working alongside Walrus Protocol on the infrastructure layer that agents depend on. Reliable intelligence systems require storage and data architecture designed for machine access from the ground up.
The Real Data Moat Owning raw data isn’t enough anymore. The companies pulling ahead are the ones turning that data into structured, agent-readable intelligence that improves with every use. The advantage compounds when infrastructure and datasets work together.
Vertical AI’s Seeing Problem Most vertical AI products fail not because the models can’t reason, but because they can’t access clean, structured inputs. The strongest companies in healthcare, legal, and finance are building intelligence layers underneath their AI systems, turning messy domain knowledge into structured assets that improve over time.
Builders in the Room We joined the UK AI Agent Hackathon at Imperial College alongside OpenClaw. Builders, researchers, and founders came together to experiment with what the next generation of AI agents could look like.
AI Needs Context During the Founders Show AMA, David shared a key point: the future of AI won’t be defined by more compute or bigger models. What matters is context, the intelligence systems can access when they make decisions.
Something Is Coming A small teaser dropped this week, hinting that something new is on the way. Not much longer now.
The conversation around AI keeps focusing on models and compute. But the real shift is happening underneath the infrastructure that turns raw data into structured intelligence. That’s the layer we’re building.
Vertical AI are o problemă de vedere, nu o problemă de gândire
Noul ghid al Bessemer este unul dintre cele mai clare cadre scrise pe Vertical AI. Dar îi lipsește un capitol - cel care explică de ce majoritatea produselor AI verticale eșuează înainte de a avea vreodată șansa de a-și dovedi ROI-ul. Bessemer Venture Partners tocmai a publicat un ghid pentru fondatorii Vertical AI în stadiu incipient aici. Este excelent. Cadrele "Bine, Mai bine, Cel mai bine" sunt cu adevărat utile. Modelul de delegare progresivă este modul în care cele mai bune echipe pe care le-am văzut de fapt operează. Insightul că Vertical AI concurează pentru bugetele de muncă - nu pentru bugetele IT - reframează întreaga oportunitate de piață.
Fiecare industrie scurge alfa. Nimeni nu a construit țeava pentru a o captura.
Cele mai bune semnale de tranzacționare din lume nu sunt pe un terminal. Sunt îngropate în cunoștințele de zi cu zi ale oamenilor care nu se consideră furnizori de date. Iată un tipar care se repetă în toate industriile majore de pe pământ. Undeva în interiorul său, în baze de date, rapoartele de domeniu, registrele de achiziții, ritmurile operaționale există informații care prezic ce se va întâmpla în continuare. Nu este ascuns. Nu este un secret. Este doar dezordonat, izolat și complet nestructurat. Nu ajunge niciodată pe piețe într-o formă utilizabilă. Așa că stă acolo. Scurgând valoare în vid.
✔️ 50 de credite gratuite în fiecare lună ✔️ Seturi de date, agenți, piață, toate ale tale ✔️ Actualizează-te când EȘTI pregătit, nu când spunem noi ✔️ Prețuri care în sfârșit au sens
Săptămâna aceasta ne-am concentrat pe ceva ce majoritatea conversațiilor despre AI ignoră.
Nu există o lipsă de date. Există o lipsă de inteligență structurată, accesibilă, pe care agenții o pot folosi efectiv. De la prețuri la halucinații și motivul pentru care fosele UX dispar, iată ce am analizat.
Lipsa Structurii Există sute de miliarde de terabytes de date în lume, dar aproape niciuna dintre ele nu este formatată pentru ca mașinile să poată interoga, reutiliza sau raționa în mod fiabil. Problema nu este volumul. Este structura. Aceasta este lacuna pe care Inflectiv este construit să o rezolve.
Săptămâna aceasta la Inflectiv: De la cheile API la creierele agenților
🔑 Cheile API făcute simple Crearea unei chei API sună tehnic. Nu este. Am arătat cum un singur click transformă setul tău de date într-un lucru pe care agenții îl pot interoga instantaneu. Fără fricțiuni, fără complexitate, doar acces direct la inteligența structurată. Dacă construiești agenți, acesta este punctul de intrare. Învățați cum să creați o cheie API
🎙 Verificarea realității AGI Există mult zgomot în jurul modelelor și inteligenței de nouă generație. CEO-ul nostru, David, s-a alăturat @FoundersShow pentru o conversație sinceră despre dacă agenții pot raționa cu adevărat, cât de departe au ajuns LLM-urile și ce este hype versus substanță.