Distributed development within the Samba project

Samba Logo

In its more than thirty-year history, the Samba team has repeatedly faced the challenge of integrating commercial - even conflicting - and open source interests in one project. Which methods, however, make this work at all?

by Volker Lendecke

This article was originally published in German in iX 6/2022 "Open source project work", p. 60 (article on heise.de).

The open source project Samba is considered the standard suite of tools for interoperability between Windows and Linux/Unix. This is a huge market of potential clients, both commercially and personally interesting for developers. However, this also means that there are many contributors to Samba who need to come to an understanding among themselves on the direction of the project.

A short anecdote illustrates what ties the project together up to this day: in 1991 Andrew Tridgell had the problem that he wanted to exchange files on his computer under MS-DOS simultaneously with a server under DES Pathworks and a Sun workstation. Both the Pathworks and PC NFS clients brought their own incompatible TCP/IP stacks. This meant that only either Pathworks or PC-NFS was possible. So he wrote a server process that emulated the Ultrix Pathworks server on the Sun machine. Tridgell didn't realize that the Pathworks server was descended from the OS/2 LAN Manager, which would later provide file sharing in Windows. That led to Samba, which wasn't called that at the time, being compatible with a great many clients right from the start.

It is thanks to Andrew Tridgell that he opened up his "Server 1.0" to a community. With the Samba team, he has created a structure that allows other interested parties to participate in the project on an equal level. Andrew has since retired from active development, but the Samba team continues to function. My personal history with Samba began when I was able to replace the NetWare 3 server in my parents' business with something I compiled myself.

How the Samba team works today

Typical work on Samba, besides developing new features, is mostly fixing bugs. This is done through the Samba community channel bugzilla.samba.org, which is open to all, as well as the Samba mailing lists. These channels are completely based on voluntary contributions, which is working more or less successfully. Very obvious and easy to fix bugs are usually solved quickly. But when things get more complicated, community bugs often don't get the attention that users want.

Then money enters the equation: looking at contributors to Samba at GitLab shows, that employees of Catalyst, IBM (Red Hat), SerNet, and SUSE (in alphabetical order) contribute the majority of patches to the project. These companies all have a commercial interest in Samba and some are in competition with each other. Google, employer of prominent Samba contributor Jeremy Allison, is a special case: an IBM's commercial interest in Samba can be justified much more directly through its Red Hat Enterprise Linux product than the benefits Google as a company derives from its involvement.

The four previously mentioned companies, though, offer support for issues with Samba in exchange for money. Other companies can also register at www.samba.org under the menu point "Support Samba". If a customer reports a problem with Samba to one of the companies listed there, a developer will get to work. Since Samba is very complex and multifaceted, it is impossible for a single developer to keep track of everything. This leads to a central question, which this article takes a closer look at: the communication within the team. As a developer, who do you ask when you get stuck with a problem, and through what channels?

30 Years of Growing Complexity

The complexity of Samba stems from the fact that, since its early beginnings as a pure SMB server, it has grown over the past 30 years into a family of components each with its own implementations of almost every protocol important in the Active Directory world: starting with DNS, LDAP and Kerberos, through SMB1/2/3 to MS-RPC with its dozens of sub-protocols, Samba handles clients with its own servers. Not every one of these servers can even begin to rival alternatives in terms of performance or scalability.

For example, a well-tuned OpenLDAP server runs circles around a Samba LDAP server when it comes to memory requirements or requests per second. However, compatibility with existing clients is the most important factor in these "make or buy" decisions. Microsoft AD is very compatible with RFCs for standard protocols such as LDAP and Kerberos, but also takes full advantage of the extensibility of these protocols.

In git, the git blame and git whatchanged commands are used to find out who to ask about problems in individual cases when working with code. For each line in the code, git then shows who last changed it, and for each file the complete version history is available. This allows developers to see exactly who is the expert on a particular area of the code.

From this microscopic view, however, it is often not clear why a piece of code works in a certain way. Thus, is an incomprehensible piece of code as complex as it is for good reasons, or is the complexity more of an accident? The question "Is this art or garbage?" is more relevant in Samba in very many places than outsiders might think.

Who knows?

To discuss such design issues, developers need to talk. For insiders, it's pretty easy to find the just the right person: The Samba team is a fairly old, well-connected club. Out of the top 20 contributors last year according to openhub.net, 16 have been active for more than five years, 11 for more than 10 years, and three even for more than 20 years.

Before the corona pandemic made it impossible, much of the Samba team - whoever had time - met in person twice a year. Once in spring at SambaXP in Göttingen, Germany, and once in fall at SNIA SDC in Santa Clara, California. On these occasions, the few things the Samba team has to decide as an organization in the Software Freedom Conservancy SFC are discussed, and you get to know each other better at Pedro's "Chips and Salsa" within sight of Intel. When you know each other personally, purely electronic communication is much less likely to lead to misunderstandings. In no way can smileys replace a genuine smile or a startled look, but at best they can complement them.

The SambaXP in Göttingen is one of the personal meetings for the Samba team, here a picture from 2012.

Thus, those who have been working on Samba for many years, and who have also followed the work of their teammates, know who is the right person to talk to for which area of expertise. Still, you usually start by talking to colleagues within the company with whom you have regular team meetings anyway.

The choice of means of communication is flexible: whatever is available at the time is used - e-mail, chat, telephone or video conferencing. For chat, the team formerly relied on IRC on freenode or its own IRC server; today, the Matrix project is used. For direct phone calls, thanks to adjacent time zones, it is quite convenient that most Samba developers are located in two hotspots: Australia and New Zealand, and Germany.

Changes to Samba

From a technical perspective, you become a member of the Samba team if you have an account on the server that performs Continuous Integration (CI) and, after a successful CI, commits the patches to the master repository.

Nowadays, you would probably use GitLab for this. But Samba has been practicing CI since before the term existed: At Samba, what is now known as CI was called autobuild and build farm. The autobuild system still exists today, and it forms the basis for CI pipelines on GitLab, which have replaced the build farm.

Every patch in Samba must be reviewed by two members of the Samba team. This means that if someone from the team writes a patch, they must find another member to contribute a "Reviewed-by:". In principle, the same rule applies to contributions from non-members: Here, too, two team members must submit their "Reviewed-by:"

Until a few years ago, a patch to Samba went through a similar path as one for the Linux kernel: it went by mail to the samba-technical list. There it was publicly commented, reviewed, and then sent to the Samba team's autobuild system. Except for very simple patches, however, it is virtually impossible these days to get a patch through the many thousands of tests of the autobuild system on the first try. For the Samba team, this means that pushing a patch from externals is potentially a lot of work because, at a minimum, you have to communicate the errors in the autobuild to whoever is developing the patch.

Requiring externals to perform a run through autobuild isn't so easy either: the Samba team's autobuild server, with 32 cores and 128 GB of RAM, takes up to two hours for a complete cycle. While that's not really a huge demand on servers by today's standards, it's significantly more resources than you typically have in a laptop.

Another disadvantage of the mailing list approach to patch cycling is that patches can be forgotten if no one actively cares. After all, there is no central list of patches waiting for comments. It can be frustrating and off-putting for new contributors to have to keep pushing their own work.

New ways via GitLab

To take work off the Samba team's plate and lower the hurdles for external contributors, the team started building a presence on GitHub to use the CI infrastructure there. Concerns about the freedom of the infrastructure then led to focusing on GitLab.

So, if someone wants to contribute patches to Samba these days, the way to go is through a fork on GitLab. However, since Samba's CI pipeline is far too large for free accounts, the team provides the ability to push contributions to a Samba team repository. The team can grant permission to this repository relatively freely because the master repository is not on GitLab, but on the internal CI server, to which only Samba team members have access.

Once the pipeline has been passed after a push, a merge request is created, which is then commented. As of March 2022, however, there is not yet an automated mechanism for GitLab to push directly to the Samba master repository. A Samba team member must still perform this step manually.

Conflict discussion and resolution

Of course, even Samba has not gone through its three decades of development without conflict. If one were to name the one mechanism for resolving conflicts, it is that working code trumps everything. In fact, Samba has an enormous amount of working code and improving or even replacing it can be an extremely large obstacle. Two conflicts in the evolution of Samba serve as examples: Samba TNG and Samba 3/4.

Samba TNG was an attempt to adapt the Distributed Computing Environment / Remote Procedure Calls (DCE-RPC) based infrastructure in Samba to the internal Windows structure. A Windows domain member talks to its domain controller using the DCE-RPC protocol family. DCE-RPC is a competitor to the ONC-RPC (Open Network Computing) initiated by Sun, on which NFS and NIS are based, and initially has nothing at all to do with SMB. To implement a user database analogous to NetWare Bindery, Microsoft needed a flexible framework to easily transfer complex structures over a network. DCE-RPC was a good choice because it was not bound to TCP/IP as transport protocol, but could flexibly work with protocols like IPX and SMB via NetBEUI. So, in order to become a member of a Windows domain or even implement a Windows domain controller, Samba had to implement DCE-RPC.

One of the driving forces behind the development of DCE-RPC in Samba was Luke Kenneth Casson Leighton, who even wrote a book about it (see "Sources"). His developments have been instrumental in allowing Samba to participate in Windows domains as a member and domain controller.

However, he was unable to convince the rest of the Samba team of some of his ideas because they were too far ahead of their time. This happened at a time when a fork of a project was more than a click on a GitHub page. That's how Samba TNG came to be, which Luke and a few fellow contributors spent a few years maintaining. In the end, Samba TNG couldn't muster enough resources and was discontinued.

Luke had the right ideas, but they were very radical back in their day, and Samba was already functioning as a domain controller. The fact that the architecture of the RPC services was completely "wrong", i.e. different from Windows, was less important in the evaluation than reasonably functioning code. It was not until version 4.16, more than 20 years later, that the Samba team realized the main ideas of Samba TNG: providing MS-RPC servers in separate programs that communicate via sockets.

A second SMB server

The second major development that ended up at a dead end was Andrew Tridgell's attempt to rewrite the SMB server from scratch. After years of development on Samba, he had begun to develop a new SMB server with an improved architecture based on his experience with the SMB protocol. One that works asynchronously and, most importantly, implements the SMB protocol correctly. Since SMB1, still current at the time, is not satisfactorily documented to this day - and probably never will be - he wrote tests at great expense. They checked every possible aspect of the protocol against existing Windows servers, then implemented them exactly the same way in his new server.

Andrew's developments led to a working second implementation of an SMB server, which is still present in Samba source code today, built into every developer build, and used in tests. However, this second implementation has not managed to completely replace the existing code, despite Andrew's attempts to convince the rest of the Samba team that the architecture is undoubtedly much better.

Once again, the reason is that the existing code worked reasonably well and too many users depended on it. So here too: Working code is paramount.

Over the years, Andrew's focus has evolved from initially being a correct SMB server to a full test suite for the SMB protocol. As part of that, he also decoded and implemented the SMB2 protocol introduced with Vista. Some time later, he and others developed their own DNS and LDAP servers and combined them with the Key Distribution Center (KDC) Heimdal to create an Active Directory domain controller. This AD controller worked as such, except that it was architecturally tightly bound to his new SMB server. The SMB server faction of the Samba team felt it was too risky to replace the working smbd. So the team merged the two development branches back together to present a cohesive product.

Conclusion

Samba, while a very complex project, has a very simple focus: compatibility with existing clients and servers. This makes the question of right or wrong code easy to answer: does it work with Windows or macOS and Linux clients? Within this framework, there is enough room for development, which can also sometimes result in a dead end.

Source

  • Luke Kenneth Casson Leighton; DCE/RPC Over SMB: Samba and Windows NT Domain Internals; Macmillan Technical, 2000
Contact us
Contact
Deutsch English Français