5 Comments
User's avatar
Sebastian Pokuciński's avatar

Hi Adam!

Great post, thanks for sharing.

My question to this topic is: and how much money did it cost to run all the agents, environments, auto-configurations, etc.?

AI-driven development replaces the work of a programmer with the work of an agent. Nowadays agents are not free. Are you somehow able to compare (percentage maybe?), how much did you save (in costs)? The time aspect seems obvious, but I am curious if the costs of tokens are bearable.

Adam Polak's avatar

Hey Sebastian!

Actually, we have all of that data :)

Until now, the TCO of AI in that project is less than 200$ (more or less 60-70$ a month). However, there is a catch here...

#1 - We are using GitHub Copilot, and it might sound insignificant, but in reality, it impacts our spending a lot :) We don't pay for "tokens," but rather for some being called "Premium Request" that is not bound directly to tokens, hence we pay less than if using Cursor or Claude Code (at some we have compared the same output would probably cost us 3 or 4 times more in Cursor and 2-3 times more on Claude Code)

#2 - default Copilot settings/agents are not on par with competition (that's what you pay for it being relatively cheap), hence we have our own set of agents/skills etc. - https://github.com/TheSoftwareHouse/copilot-collections

In general, if we were to do the same thing with the usual team, it would be significantly more expensive - probably 2-3 times. However, in order to reach this point, we also had to invest a lot into building the workflow that doesn't hallucinate so much (it will always) and delivers the quality with full confidence.

The harsh reality - the team that stands behind Copilot Collections has between 3-5 members constantly adding features, improving DX etc.

Sebastian Pokuciński's avatar

Nice! I expected waaaay more. But the crucial thing you say is the mindset and preparation of the whole workflow. I have the feeling it will take a lot of time and effort for this 'new era' of development to actually be the default one.

Updated version of the 'trash in - trash out' : 'trash code basis - trash and hallucinated AI output'

About this #1 - I fully agree, the copilot-based pricing is much better than the typical API-key approach. I did observe the same: a separated Claude Code CLI burned the daily and weekly limit in a couple of hours, while the Copilot from VS works fluently without any issues for much longer and uses significantly less 'premium requests', even with the x3 coupler.

#2 - I have to give it a try

Enterprise Tech Delivery's avatar

Really interesting read, love the experimental nature, PoC type approach to the work. I wonder to what extent small batch sizes coupled with smaller stories would serve to reduce the review bottleneck. I wonder if speeding up the feedback loop on smaller pieces could create a predictable cycle (ie. stories small enough to code in 30 mins, code review allotted 30 mins). Thanks for sharing your experience.

Adam Polak's avatar

In theory, a smaller task would make it easier for Code Review, but in our case, the overall QA process was also the bottleneck, for example, testing it on the environment by QA.

With more smaller tasks (each branched separately), you would need to have a good preview environment strategy to handle that efficiently, and so sometimes it is better to merge a few features first.

Now there is one more thing we found with the AI-Native workflow. The price of having a repeatable flow.

No matter what size of a task we do, some elements will always be part of the process - context building, code analysis, E2E implementation, QA, etc. There is an element of AI preparing to implement the task, no matter what size the task is.

Hence, there is a game of balance - if the task is too small, the majority of time is consumed by preparation and the quality review phase. If the task is too big, the code review might take too long.

There has to be some balance you find project by project :)