Thoughts and Practices on Building a Data Science Team

Recently as the rise of ChatGPT popularity, it reminded me my short but valuable experience of building a data science team in large cooperation, so I thought a blog on it would be beneficial and on trend 😎.

Few years ago I got a rare opportunity to look after a data science team that just started out to grow in the company, as the data science director at the time left the company unexpectedly and my boss asked me to temporarily carry on with the mission until the replacement was found. I was quite excited about it (thanks boss! 😂) as this is an area that potentially will bring great value to the company and it is a privilege to help to build “the A team”.

Staffing: Train or Hire?

The first question we have asked ourselves is, do we hire from external or do we train our current employees? The answer is obviously, both! As the company does not have a lot of competency in data science at the time, injecting some external experience seems to be an efficient way to get this going. However there are things to be aware of: 1) a company with strong culture is not necessarily good at “integrating” experience professional hires; 2) domain knowledge is curial to make sure the data science skills are actually applied to the “right problem” with the “right focus” and “right methodologies” (although we could find the ones with the right domain background, which is not obvious as the whole industry is in demand of such talents). So we have spent some energy to make sure there is a structure to integrate them into the company with proper checkpoints to follow through (although this is not specific to data science experience professional hires).This works out well as a few of them turned out to get up to speed quickly in the company and later on proven to be very effective in their roles. Wrt training the existing employees, good news is that there were lots of people wanting to learn data science and trying to get into this space, and they are mostly curious (this is an extremely critical quality) and smart engineers. Also their existing domain knowledge played a key role of helping integrating external hires. The only thing about this is to be patient, as training takes time and investment, and you can not rush or save on it. At the end, we have a team with complimentary skills and they work well together.

Organization: Centralize or Embeded?

Now how are we going to place and organize them in the company so their expertise can be best leveraged and grown? As we are just starting out to build the team, we do not have a lot of them. We thought to centralize them first with the following considerations: 1) provide proper leadership and support/spotlight from the upper management; 2) serve as flexible and shared resources across the much larger software organization; 3) form the critical mass and more effective knowledge sharing so the team can grow overtime. Of course this setup comes with some downsides that we have to be aware of and manage properly. Firstly, they can be seen by the rest of the organization as “that special team” and sometimes hard to gain collaboration from the teams. Secondly, they can be detached from the day-to-day of the project so teams may think they do not understand “the problem to solve” and their credibility can be in jeopardy. In reality however, since there are so few of them to start out and there is a strong desire from the organization to grow this competency, in a sense they are special and also their energy and expertise need to focus on a few key critical projects anyway so they can not cover all the projects. The practice to conquer these challenges is to make sure a) the team work very closely with a few selected projects where we see data science can bring lots of value; b) as the team is growing, we build the right processes and tools to gradually increase the impact of them to more and more projects; c) the team spread their knowledge on data science to the rest of the organization. On a hindsight, we did well on a few areas (e.g. defining the processes, focus on a few key projects, etc) and can do more with others (e.g. providing training and expertise to the rest of the organization).

When I look at where we are after more than 5 years, a few visible evidence: 1) the team has grown quite a bit compare to where we started and also suffered quite a bit on attrition (hot market for such talents); 2) there is a flow of how this team can provide value to the rest of the organization and their credibility has been established partially; 3) project teams start to grow their own competency in this space with the help of the central team. At some point, this competency will becomes like any other software competencies we have and will be fully part of the project teams and the need of centralized team would no longer be there. I guess that is where the destination is.

Leave a comment