What Data Teams Actually Look Like in Construction (And What It Really Means)
May 13, 2026

There is a version of construction data strategy that lives in conference presentations. It involves a mature data team, a clean lakehouse, well-governed pipelines, and AI agents running smoothly across every department.
Then there is the version that exists on the ground at general contractors and specialty contractors all over the country.
Having worked with general contractors and specialty contractors across a wide range of sizes, our team at Kroo has a pretty clear picture of what construction data teams actually look like, what infrastructure they are actually running, and where the gaps tend to appear. This article describes what we see.
The five team structures we see most often
No dedicated data person. This is still the most common situation at general contractors and specialty contractors below approximately 300 million in annual revenue, and it is not uncommon at much larger companies either. Data work gets done by whoever has time. The controller runs custom ERP reports. The project engineer exports to Excel and builds custom reports. The IT person manages software licenses and is tasked with building PowerBI dashboards on the side, but they are not a data professional. Nobody owns the data infrastructure and any other data-related workload because nobody was hired to do that.
This does not mean the company has no useful data. It means the useful data is scattered, inconsistent, and not accessible to the people who need it without significant manual work.
A single data person. Sometimes called a data engineer, or a data analyst, or sometimes an IT manager who has grown into the role (sometimes forced into it), or sometimes a PM who pivoted toward analytics. This person does everything possible. They manually build the data pipelines (or export data manually to stitch together), maintain the PowerBI dashboards, write the SQL views (if they know SQL), field data requests from executives, and manage the vendor relationships for every data-related tool in the stack.
This person is a single point of failure. When they are on vacation, nothing gets built. When they are overwhelmed, backlogs grow. When they leave, much of the institutional knowledge about how the data infrastructure works walks out with them. The companies that have found this person are ahead of those that have not. But they are often running at their limit. And leadership often is not aware of that.
A small team of two to three. Usually one technically strong data engineer paired with one or two people who are analytically capable but not true data engineers. The one data engineer builds and maintains the pipelines and the data warehouse. The analysts build dashboards, handle ad-hoc requests, and work with business users and executives to understand what they need. This is the configuration that starts to feel sustainable. Coverage is better. Knowledge is shared. The backlog is still long but it moves.
A fractional or outsourced model. Some companies, particularly in the $200 to $600 million range, use a fractional CIO or an outsourced IT advisory firm for the strategic advisory work, combined with a vendor relationship like Kroo for the actual construction data infrastructure. The fractional advisor does not build anything hands-on, but they will evaluate tools, make vendor recommendations, and help leadership understand what they are looking at. The vendor provides the data connectors, the data warehouse, and the ongoing maintenance. This model works well when the company does not have the volume of data work to justify a full-time senior hire but needs more guidance than a junior hire can provide.
An enterprise data team. Especially at general contractors and specialty contractors above roughly a billion in revenue, you start to see dedicated data engineering teams, sometimes with a VP or director of data or analytics leading them. These teams have their own roadmaps, their own infrastructure budgets, and sometimes their own internal products. The challenge at this level is often not capability but governance. Multiple teams are building independently. Standards drift. The same data gets modeled differently in different parts of the organization. Politics slow down decisions that should be straightforward.
The data tech stack we most commonly see
ERPs. The ERP is the foundation of financial and job cost data at general contractors and specialty contractors. The most common ones we see are Sage 300, Sage Intacct, Viewpoint Vista, Viewpoint Spectrum, and CMiC. Each system has different API quality and different quirks when it comes to reliable data extraction. Sage 300 is particularly common at smaller and mid-sized contractors. Given their on-premise installation nature, this creates additional complexity for reliably pulling data for business intelligence. CMiC Cloud and Viewpoint Vista tend to be more common at larger contractors and have better data extraction capabilities. Sage Intacct is a cloud-only ERP that most mid-sized and larger contractors are migrating toward, although CMiC does well with larger-sized contractors.
Project management software. There are two main players in this space today: Procore and Autodesk Forma. CMiC also serves as another frequently-seen PM software and the benefit is that companies may use CMiC for both ERP and PM purposes. Both the Procore and Autodesk APIs are well-documented and relatively straightforward to work with.
Scheduling software. Oracle Primavera P6 is standard for larger and more complex projects, particularly in heavy civil and industrial work. Oracle has on-prem and cloud-based scheduling solutions. Microsoft Project is another software that is used by the industry. Phoenix CPM is typically seen with contractors doing infrastructure work.
Data warehousing providers. This is where the most variation exists. The most common choices we see are Azure SQL Server (often chosen because the company is already on the Microsoft stack and Azure Active Directory integration is straightforward), Snowflake (more common at technically sophisticated contractors that have built their own data infrastructure), Microsoft Fabric (Microsoft's newer unified analytics platform, being adopted by some teams replacing older Azure setups), Databricks (used at larger, more engineering-forward organizations, particularly those with machine learning ambitions), and Google BigQuery (seen at companies that use Google Workspace heavily or have a history with Google Cloud). Azure SQL Server is the most common because it is the path of least resistance for a Microsoft-native organization. Snowflake is the most common among teams that have made a deliberate architecture choice.
Business intelligence tools. Power BI is used by the large majority of construction companies we work with. The Microsoft licensing model, the familiarity of Excel users with the interface, and the native integration with Azure make it the default choice. Tableau is used at some larger organizations, particularly those that made the investment before Power BI became competitive. Looker is uncommon but present at a few larger GCs.
AI tools. Most companies we talk to have purchased one or more AI platform licenses. ChatGPT Enterprise is the most common. Microsoft Copilot is growing quickly because it comes bundled with Microsoft 365 licenses and the procurement path is frictionless. Google Gemini is being adopted at some companies, particularly those already on Google Workspace. Claude from Anthropic also growing in popularity. The pattern we see consistently is that the license purchase came before the use case was defined. Companies have purchased dozens of licensed seats. Most people are currently using AI tools for writing and summarizing. Interestly enough, almost nobody is using them on top of their construction data. Kroo helps construction teams achieve that with Kris.
Pipeline and orchestration tools. This is where the largest difference exists between companies that have built their own infrastructure and companies that are earlier in the journey. Self-built infrastructure at technically sophisticated contractors uses AWS or Azure managed services for scheduling and running pipelines, Python scripts for extraction and transformation, and either dbt or custom SQL for transformation logic. Companies earlier in their journey rely on third-party connectors like Kroo for pipeline management, which reduces the engineering burden significantly. The tradeoff is control versus maintenance overhead.
Common secondary systems that often get overlooked
Beyond the core stack, there are several systems that show up repeatedly in conversations about data integration but are frequently left out of the data infrastructure.
Telematics and equipment tracking. Tools like Tenna are common for tracking equipment location, utilization, and maintenance. This data is almost never integrated with the financial data warehouse, which means equipment cost analysis is done in a silo.
Workforce management. Software like Bridgit and Rivet are common for tracking and managing workforce requirements. This data is useful for cross-system analysis with labor costs and other financial data from ERP systems.
Material management software. Software like Kojo and Remarcable are common for managing material procurement and related workflows. This data is useful for cross-system analysis with materials costs and other financial data from ERP systems.
Safety platforms. Tools like Hammertech, Highwire, and others track safety incidents, compliance, and subcontractor prequalification. This data is usually managed separately from project and financial data, which makes it difficult to connect safety metrics to project performance.
Estimating software. Most contractors use some combination of spreadsheets and specialized estimating tools like Destini. Very few have their historical estimate data in a form that can be analyzed systematically. This is one of the most significant untapped data assets in the industry. New players like Ediphi are looking to disrupt this part of the construction industry.
Subcontractor prequalification and procurement. BuildingConnected/TradeTapp and Compass by Bespoke Metrics are commonly used but the data from these platforms rarely makes it into the core data infrastructure.
CRM. Most contractors do not have a formal CRM or are using spreadsheets or basic tools for opportunity tracking. Typical CRM software in the construction industry include Unanet Cosential and Salesforce, although new entrants like ProjectMark are shaking up the CRM space given they are a construction-specific CRM. The GCs that do have a CRM are almost never connecting it to the operational data that would make it more powerful. This is a widely missed opportunity for contractors looking to make the most of their data.
The size mismatch problem
Here is the pattern we see most often with mid-sized general contractors and specialty contractors (let's define this as the ~$300 million to $1 billion size range).
The company has grown to the point where the ERP reports are no longer sufficient. Leadership wants dashboards and better business intelligence. Project teams want better visibility to be proactive on managing jobs better. The CFO wants improved cash flow forecasting and visibility into leading indicators. The operations team wants to see early warning signals across the portfolio.
The company does not yet have a dedicated data team. The IT person manages the network, the software licenses, and the helpdesk tickets. The controller is capable with data but is fully occupied with month-end closing and reporting. The PE who is good with Excel is also a full-time project engineer.
Nobody has the time to build AND maintain the underlying data infrastructure, and the company does not yet have a clear enough picture of what it would cost or what it would take to make the case for a hire.
The result is a kind of productive frustration. Everyone knows the data is there. Everyone knows better analytics would help. But the organizational capacity to build it does not yet exist.
What happens when there is only one data person
The single data person situation deserves its own discussion because it is so common and so fragile. Especially in this industry.
The person is typically excellent. They have figured out how to pull data from multiple systems, built SQL-based views that no one else understands fully, and created PowerBI dashboards that executives have come to rely on. They have institutional knowledge about where the data quality problems are, what the edge cases mean, and which numbers to trust and which ones to treat with skepticism.
When this person is overwhelmed, which is almost always, requests pile up. An executive needs a new dashboard. The controller wants a different cut of the AR aging. The operations team wants different variations of a project-level dashboard. Each of these is a reasonable request that takes hours or days to fulfill and crowds out the maintenance and improvement work that keeps everything else running.
When this person leaves, the institutional knowledge goes with them. The dashboards still run, but nobody knows exactly how. The data quality issues that were being quietly managed by this person start to surface. The requests that were being fulfilled informally stop getting done. The company discovers how much it was relying on one person's knowledge.
The companies that navigate this situation best are the ones that have invested in documentation and in tooling that makes the infrastructure more readable and maintainable by someone other than the person who built it. dbt for transformations, data catalogs with annotations, and runbooks for pipeline maintenance all reduce the bus factor of a one-person data team.
The companies that navigate it worst are the ones that treated the data person as a service provider rather than investing in making their work transferable.
What the most sophisticated GC data teams look like
At the upper end of the market, a handful of GCs have built data infrastructure that is genuinely impressive.
They have a medallion architecture in a cloud-based data warehouse, with raw data in the bronze layer, cleaned and standardized data in the silver layer, and business-ready views in the gold layer. They have automated data pipelines running on scheduled infrastructure with alerting when something fails. They have role-level permissions that let each team see their relevant data without accessing data they should not.
They have started to layer AI on top of this infrastructure. Agents that can answer natural language questions about project financials. Scheduled reports that flag anomalies and land in the right inbox automatically. Workflows that connect data from multiple source systems to support decisions that previously required hours of manual research. And even with this setup, these teams still partner with Kroo to streamline data workflows and optimize their use of internal resources.
The pattern is consistent: the AI tools that came after the data infrastructure was solid. The teams that tried to do it in the other order, buying AI tools before fixing the data, have not gotten the results they hoped for.
The other thing that is consistent about the best data teams in construction: they understand when to buy and when to build their data infrastructure. They may pay for pre-built data connectors for the common systems where the development work is not differentiated, they build custom pipelines for the pieces that are specific to their business, and they make deliberate choices about which parts of the stack to own versus which parts to outsource.
Why this all matters
The construction industry is in the middle of a significant shift in how their underlying data is used operationally. General contractors and specialty contractors are sitting on a goldmine of data, and our mission at Kroo is to help the industry tap into that gold mine.
The companies that are building the right data infrastructure now, with the right team structures to maintain and use it, are going to have meaningful advantages over the next decade.
This is not just because having data is interesting. The decisions get better when you have good data. Project selection, risk assessment, cost forecasting, cash management, workforce planning, and subcontractor qualification can all be data-driven decisions that directly affect margin and company success.
The companies that figure this out first are not necessarily the largest ones or the ones with the biggest technology budgets. They are the ones that have been honest about where their data actually is, made the organizational investment to bring it together, and built the habit of using it.
That process starts with understanding where you actually are. Which of the five team structures describes your company right now? What does your current stack actually look like? Where are the gaps between the data you have and the data you can actually use?
If you are a general contractor or specialty contractor, these are questions that are worth answering if you are focused on data & AI initiatives.
Want to chat about your data stack or team structure? Or learn about how Kroo can help?
Request a Demo