Recently came across a Stack Overflow thread discussing the ideal QA to Developer ratio for software development teams. According to the commenters, dev teams can range from no manual testing at all to a 1:1 ratio of dev to QA. My first thought was:

"Can we find a way to estimate how many QA engineers are needed based on a set of criteria that can be measured in any software project?"

After scanning the thread and reviewing a few other resources, I noticed patterns in how people justified more/less QA. Here are the variables I extracted:

Number of Developers - Self-explanatory

Number of lines of code - A naïve way to estimate code complexity and application size

Danger of Failure - How nasty would a bug be if introduced? A frustrating day at the office or lives lost?

Team Maturity - How experienced are the team with each other and development? A group of new grads or a seasoned development team working together for many years?

Degree of Change in Codebase - How volatile is the development effort? Is the product in beta and still has many moving pieces, or is it a mature product with predictable development?

Leverage existing libraries - If you're building an application that relies on 3rd party code, you won't need to invest as much time testing because the codebases you are leveraging have hopefully already been tested.

Amount of unit testing - How much unit test coverage is there? None, 100%?

I experimented with these parameters in Google Sheets using RANDBETWEEN( ) to input numbers like Lines of Code or Number of Developers. Then I used my judgment to estimate the number of QA engineers needed for a project.

CSV output of manual classification of QA engineers needed

With ~200 rows manually classified, I brought this up with my friend Aviv, who thought we could fit a model to the dataset to predict the number of QA Engineers needed for any situation. After importing the data into a Jupyter notebook, he trained a model using TensorFlow and exported the results to CSV,

Here's a pair plot of the model's output. You can see the full report included in the notebook linked above.

Tensorflow Model Output

The model correctly discovered that when I manually labeled Number of QA Engineers needed, I mainly leveraged Number of DevelopersDanger of Failure, and Degree of change in Codebase.

Conclusions

This was an initial attempt to explore whether QA Engineer hiring decisions could be data-driven. I wouldn't put too much weight into the final model, but it was a fun learning exercise! If I wanted to take this work further, I would dive deeper into the variables and devise a way to weigh each appropriately. I would also incorporate the whole QA testing strategy, including hiring SDETS (Software Developers in Test), offshore QA resources, and other 3rd party testing teams.

What do you think? Let me know in the comments 😁

Back to Home
 Profile Picture
Profile Picture

About John Solly

I am a Senior Software Engineer with a focus on geospatial applications, based in the Columbus, OH metropolitan area. This blog is where I delve into the intricacies of GIS (Geographic Information Systems), offering deep dives into different components of the geospatial technology stack. For those who share a passion for GIS and its applications, you've found a spot to explore and learn.

Interested in collaborating or learning more about my work? Take a look at my portfolio for a showcase of my projects and expertise.

Comments

  • No comments yet.
Login to Comment