I am a Unix/Linux Infrastructure engineer, a university educated sysadmin of the old lore. In my 25+ years of Sysadmining / Devopsing I have never seen the case where the craft was isolated from either business decisions, service considerations, efficiency in delivery or developer enablement. The job included all of the above, quite a bit of coding and then some. We, of the old school sysadmins, would take all the ingredient and GLUE then together to a cohesive, coherent service all the while working with open source tools because the tooling budget was almost always nil. This process of creation is what I call GlueOPs, a holistic view of the modern IT landscape.
Never was a decision made on its technological merit alone, it has always been:”do it for free or as close as you can get” . This is not a limitation, this is the spring board of innovation. The greatest boon to humanity in the last (20th) Century has been open source software. Millions of spectacular minds spent significant amounts of time in collaborating to produce incredible software. The GNU, Apache, linux Foundations, to name a few, and a host of universities published code that created the Internet boom as we now know it. Some were duds (gopher) , some remained in the stone age but still useful (FTP) and some bloomed into thousands of viable systems (Linux).
In GlueOPs the end target is the complete service to be offered to the client, be it an internal to the organization or traditionally an external one. The first step, even before deciding on the infrastructure to be used (cloud, physical, hybrid) is the selection of software components. For example if there is a need for a centralized authentication system will one chose LDAP or something else ? One weighs the merits of each choice and goes shopping so to speak on searching for the best fitting open source package to fit the needs. A target of 90% fit is a very viable one. If one can find a piece of software that covers 90% of the needs one can always add another 5% via custom coding and cover just about every base. 100% is unattainable as the time-effort to target curve is asymptotic at 100%, it will reach it only at the heat death of the universe.
GlueOps gives particular emphasis in resiliency of services so the tooling deployed must always be able to operate in a redundant, highly available and if possible load balanced modus operandi. Having systems laying about waiting to be used in case of an emergency is considered inelegant unless one tries to have a disaster recovery site. So tools like Haproxy and keepalived are essential building blocks of GlueOps.
The next level of GlueOps are the more traditional aspects of IT in general: databases, VPNs, web servers and accelerators and so on and so forth. The methodology of choice and application stays the same.
The religious aspect of GlueOps
It always surprises me that after so much code has been written there are still little nooks and crannies that have not been touched. To fill in these nooks an admin will invariably need to author one or more tools and that leads us to religion: Computer languages are too many and fights are so easy to start among various proponents of each paradigm. These fights have been fought with such ferocity that would have made ISIS fighters green with envy.
GlueOps emphasizes versatility. Admins should be versatile to use the best tool for the job keeping in mind the first tenet: look for existing open code. As code could be written in any language a master GlueOp should be able to at least read and understand most of them. What is an absolute necessity is for one to be competent to scripting in a shell script (bashat) and at least one more scripting language like Perl or Python. The final choice of which scripting language is the availability of libraries for common tasks. A case is perl: the language has been struggling to attract new developers but still had some life in it. Once CPAN (comprehensive Perl Archine Network) went offline, the language quickly died off to be replace by the new kid on the block: Python. So staying relevant and re-educating one self is a corner stone of GlueOps.
At the wizard level of GlueOps one has to be a master C programmer. So many of the scripting libraries are but a shell around C lib functions. Understanding therefore C is understanding how a linux kernel operates and that is wizard-fu!
I will not spend much time on this. Automation is absolutely essential when one deals with a swarm of machines. It not only helps in day to day operations, it helps to make work repeatable and at least semi documented. One can chose saltstack or ansible or both but one must use absolutely use something. If not for everything else just to have a quick look on how the systems were set up and expected to operate.
Observability and measurements
The cornerstone of engineering is measurements. In Fantasy Lore, if one knows the true name of a dragon one can control it. The contrapositive in engineering lore is: If it cannot be measured it cannot be controlled. Therefore every project should have its own observability subproject. Thankfully there are great tools out now for practically free. Prometheus, Influxdb or the ELK stack for time series databases and tools like Grafana for visualization and alerting. Without these packages to measure the services, one is blind, and that is a bad thing in the business world.
One will have the usual graphing and alerting on usage and alerts on timeouts and so on but measuring will also give you further insights into your systems and applications. Suppose for example that you can see a flat system load line fixed at 1, on a two core system. This almost certainly means that a service is wedged and needs to be restarted although the service as a whole is fine. Even an unexplained regular IO spike should be investigated, in the engineering world there is no room for twilight.
Of course without the application proper there is no service. GlueOps has an important role to play there too: guidance. Developers often forget that we live in a distributed world and must always keep in mind the subjects of resiliency, load balancing etc. etc. as stated above. A GlueOp who is worth his salary should be able to sit down with these cats and explain networking and systems in such a manner that they understand the needs of the service as a whole. The conversation should preferably finish without murdering any hotheaded fools. I am not being cute here, some fights have been bloody indeed. A case and point is explaining how a global lock in a network filesystem degrades performance across the cluster but that is another story to be recited at some other time while quaffing bottles of bourbon.
So what do GlueOps do ?
Considering all the above aspects of the work one can easily see that every single one of them is necessary. Take any one of them lightly and the service as a whole becomes brittle and unstable. The admin has to Glue every piece together in such a fashion that the total comes alive. GlueOps do this: they glue together code, infrastructure, management and observability platforms to create business nourishing services. They are go-to people, problem solvers, engineers, coaches. Companies should stop trying to limit their job description to “systems operator” or “create a continuous integration pipeline”. These is far too limiting for someone who is called to keep money making services running 24x7x365.