A year as an engineering director: 2023 in review.

2023 started off with a calm transition from senior engineering manager into engineering director, but once it started rolling, boy did it escalate quickly.

I closed out 2023 by reviewing my notes and schedule throughout the year. I’ve pulled from those a handful of accomplishments and focus areas which illustrate what roles such as mine look like in a macro sense.

As I went through it, I noticed a sort of narrative arc in it. In January 2023, I was made an engineering director, and it’s cool to see my year break into two major chapters. The first deals with moving away from my role as a senior engineering manager while the second attends to the business of inhabiting the role of a director.

Transitioning away from Senior Engineering Manager.

January.

In January, my focus was largely on shoring up my team. With my promotion, I saw a need to move into a new layer of abstraction in the organization, which in turn meant adding structure and culture to the team so I could turn my attention away from it without worry.

I did this in several ways:

I started codifying the team’s culture into documents that articulated how and why I had made the decisions and designs I had. This gave them something to reference and explicitly work into their processes so that as they continued to evolve, they could stay true to what I’d intended (for as long as my past intentions matter against their realities, anyway).
I enacted a plan to improve the quality of the team’s documentation by writing out a prescription of what it needed to look like and then capturing tasks for closing the gaps. Over the upcoming months, we would use the on-call process to review these gaps and strengthen their docs.
One of our most senior team members, Catalina, was the sole expert in one of our most important systems, so we worked on a succession plan to identify and make progress on how to distribute those responsibilities better. (This is going to be very helpful for supporting her actual succession later on in the year.)
There were performance issues on the team at the time, which led us to improving our skills in supporting others while also building a stronger foundation about the concrete requirements of being on the team.

I also sent some holiday gifts to the team. To further reinforce the themes at the time, I sent some Lamy Safari fountain pens to everyone, accompanied with a demand that they dedicate more time to writing.

This seems unimportant in this point of the story, but remember it for later: in February, we held a Data Summit to bring together members and leaders across the org to discuss our data approach and team structures. It founded our new data organization, which we’d been rather quickly building out over the previous few months.

February.

I started February with a pretty intense urge to build something. Realizing some blindspots in my approach, I eventually decided to use Elixir and Phoenix to build an analytics platform off of our data in Gitlab. This automation helped me assist the team with a number of hygienic, performance, and quality issues.

To further improve quality and sustainability on the team, we also launched our on-call process. This started mostly as a path to clean things up, add more monitoring to things, and make sure that other team members had enough of a starting place for understanding the details of the systems they owned.

In light of some issues we were having in achieving our product goals, I started finding oblique ways to interrogate our approach to planning and prioritization. Using Liminal Thinking and Thinking in Systems as guides, we changed parts of our approach fundamentally to take more authority and creativity over how we chose to solve our problems. I used this same approach to help other leaders in our organization to answer some fundamental questions about why teams at large did not seem to be able to achieve what was expected.

This month we also kicked off our new hiring process. I’ve written more about how our hiring panel works here.

March.

In March, I traveled with my merry band of friends to Tahiti and Moorea. It was a beautiful trip. The house we stayed in was right on the beach in Moorea, maybe 40 feet away from calm clear ocean water. In the distance, about half a mile away, we could see and hear the violence of the open ocean crashing against the reef. The whole thing was quite an experience.

Back in the office, we completed our reading of An Elegant Puzzle. This has been the most influential book over my work process to date. Accordingly, this also led to our creation of a fairly comprehensive engineering management training program. We never did launch it, but the outline along is helpful for aligning EMs with the full range of topics which we need them to be experienced in.

Our Salesforce Operations team had been reporting through me for several months at this point. These activities helped me support their EM better by showing us what areas to put more focus onto in order to continue their journey in increasing their effectiveness. We realized we had some specialization gaps on the team and worked with the organization to shift some focus from the QA and product management teams to support this team.

April.

In April, I began reading Jade Rubick’s excellent blog which led me to the concept of EM support teams. To further out interests in providing better support for our EMs, I made an attempt at standing these up at Nav, which has been met with varied results and is worth putting more energy into.

I began my obsession with Yubikeys and personal information security this month. I might have been a bit late on this though, the world seems to be moving away from needing this technology in favor of passkeys. I’m still into it though.

I began preparing my own succession plan this month by training a new EM for the team I had directly reporting to me.

May.

The decision to hand off that team to a new EM was driven by a few factors:

I knew I was under-serving the EM and members of the SF Ops team by having another engineering team absorb so much of my focus
I knew that the organization was changing and I’d have more responsibilities to attend to rather quickly

The latter came true rather quickly. In May, the Data Engineering team started reporting through me after a reorg of our data teams. We needed them to be more attuned to the engineering organization and less siloed within the data teams’ priorities.

June.

Completing an effort I’d started in January, we presented our official version of operating principles for the team which I was handing off. I’ll have to write about this elsewhere, but a strong sense of principles is key to anything that requires a high degree of effectiveness.

After some discussion with my peers in other companies, I realized and resolved a blindspot in how I was supporting the engineers of this team. We didn’t have brag docs! We now have a review at the end of each month for people to take a moment to think through their efforts and capture them. We do this together, which allows them to discuss what went on and to remind each other of the great things they’ve done for one another. Capturing your accomplishments tends to be much more about your approach than the results of the project.

With all that done, and a few months of training under the belt, we finally migrated the team under Nick’s leadership as a brand new engineering manager.

Not a moment too early, either. I’ve had very little exposure to data engineering in my career, so getting to know that team and its pressures demanded a lot of attention. To further compound that fact, we decided as an organization that we needed to replace our existing data stack.

I set to work acquiring contracts for Snowflake, Fivetran, and dbt so we could migrate away from Redshift and home-rolled scripting. This was my first exposure to vendor acquisition.

I’d been tinkering with that analytics platform I’d started building earlier in the year at this point and most of my problems were starting to point to a need to have a platform for managing knowledge about an organization. Associating people with teams, teams with teams, and people with people is actually very complex when you start tracking that stuff over time. Just modeling it is hard. When you need to build and populate the historical data into it, though? Forget about it. I had to start building a whole new suite of tools just to perform and record my archaeological expedition into the history of the org. So that became my new hobby.

Transitioning into Engineering Director.

With three engineering managers reporting to me and no ICs, I found myself in a position to change my attention to Engineering at large. I made a point to set boundaries for how far of a team’s process and work I’d be willing to oversee, and started instead looking at the larger issues between teams, process, and culture within the organization.

July.

As I started working with the data engineering team, we quickly found ourselves in a spot where I needed to start training an interim EM.

We were also in a pinch as an organization, so I helped set up an emergency SWAT team sort of affair as an onsite in our Draper office. We called it “Code Yellow”. It was a pretty fun experience: I’ve rarely seen teams collaborate so dynamically and quickly as I did during that. It’s rather powerful to get a bunch of people with varied skills together to declare a specific goal and complete freedom to self-organize and work toward that goal.

August.

August was mostly business-as-usual… aside from the layoff. I was also thrilled to get to move one of our most experienced ICs, Catalina, back into my direct reports. I’d been feeling a little too helpless about how difficult so many of our cross-team problems were to solve, and realized this reporting pattern would let us affect important change without having to deal with the bureaucracies of organizing multiple teams toward an effort.

September.

In September, we held another Data Summit. New data leader, new data organization. I think we’ve gotten it right this time. I don’t think there’s any particular failure mode at play that has led us to hold three data reorgs in nine months. I think that our organization of ~250 people is at exactly the right size for our data needs to be getting out of hand in comparison to the proportionally small group of data specialists that is reasonable for us to employ. These constraints just combine particularly awkwardly!

Not directly related, another engineering leader left the organization and we had to move several teams under new leadership. In this change, I picked up a team that is responsible for a key part of our data platform, which we call the Business Identities team.

Now with four EMs and a staff engineer reporting to me, I decided it was probably a little past due to begin holding my own staff meetings.

Entirely unrelated to my earlier interests in the year, we also started discussions with a work analytics platform called Jellyfish. They pull data from Gitlab, Jira, Confluence, and Google Calendar to shed light on all sorts of interesting things. I sorta lost interest in building my own platform for this activity since theirs is so good at helping EMs better understand their teams.

Back during Code Yellow, we had a side conversation amongst engineering leaders about the gradual decay of our API abstractions—our systems were leaking way too much of their responsibilities into each other and it was slowing us down. To resolve this, I helped found our “API Working Group”. For us, a working group is a temporary team of volunteers who are tasked with solving a specific problem. We’ve taken our approach for building these from Will Larson’s advice on scaling technical consistency. The API WG’s scope was to identify the responsibilities and boundaries for core systems so we could begin reconciling discordant responsibilities in a central place rather than requiring client systems to make their own choices about integrations and priorities.

October.

In October we started helping my staff engineer fulfill her increased scope of responsibility by cross-training her into the data engineering team.

We were also seeing that our hiring process was stumbling on the abstractions we’d injected into the process with the hiring bar we’d started at the beginning of the year, so I helped build a training discussion for our interviewers so they could experience what our hiring barometers see during the hiring bar discussions. I produced an example set of interviewer feedback submissions using ChatGPT and we used that to conduct a fake panel review on an imaginary candidate. It was a very illuminating discussion!

As I got deeper into the data stack through my efforts with the data engineering and business identities teams, I founded our Data Governance Board to address some of the problems I was uncovering. This is wholly new territory for me but I’m excited to get to support the org in this way. It’s been difficult to make and understand decisions about how to manage data in the past.

November.

As part of the data governance process I’ve been trying to establish, we need a robust approach to describing our data and policies. Prior data leaders had gone through the terrific trouble of getting a Datahub instance running internally, but we hadn’t adopted it widely yet. I wanted to start leaning heavily onto it, but we were considerably constrained in how we could support its underlying technology. It’s quite complex. I started down a path of getting a SaaS version which is operated by Acryl, but the timing ended up being poor and vendor discussions are currently stalled.

We officially stood up, launched, and trained on our Jellyfish instance, though.

My staff engineer and I began working this month on some of the key stuff that we’d intended for when we joined forces. Some parts of the primary platform are just completely broken in ways that no one team can confidently fix and subsequently can’t prioritize. The perfect sort of work for two highly tenured engineers to solve.

December.

Okay, one last data-related reorg and we can be done with that, right? Early in December I consolidated our machine learning engineers into a team that I’m leading directly which I’ve termed Machine Learning Operations. ML Ops is going to focus on making decisions that scale across our product engineers easily to support their integration of machine learning models into production systems. You can learn more about my strategy in this writeup.

In an attempt to solve a number of organizational issues explored above, I introduced two engineering design documents to the leaders across engineering:

APIs must surface domain-level, not implementation-level, concepts and clients must endeavor to couple only to domain concepts
Teams must own complete domain-level problems and concomitant data, not just the solutions that they prefer or have built

Between these two things, we should be able to create salient and robust abstractions across our sociotechnical systems which allow a significant increase in creativity and ownership within teams. It’ll be a tough road to actually execute on any of this stuff, but we must. When we come back to this at the end of 2024, I expect to see that I’ve put a lot more work into establishing these things and moving our attention onto other things more worthy of our time.