Wednesday, October 22, 2008

Timeless Software

Sitting in the newly refurbished cabin of a Lufthansa 747, I cannot help but marvel at the continuous evolution of this beautiful plane. First released in the 60s, before I was born, this machine is so fundamentally different now, modern cabin, modern cockpit, new communication systems, navigation systems, engines, and yet it is essentially the same as when it was first born. The same principles of flight, the same reliability, the same optimizations around the essentials of travel requirements, fuel consumption, and maintenance.

As we at SAP have learned over the years, 36 years after delivering our first packaged application, successful large scale enterprise software follows essentially the same lineage. It solves fundamental problems that businesses face every day, over generations of business change and of technological change and, in doing so, it continuously evolves in a constant cycle of renovation. I call this Timeless Software, and want to write here about what some of its fundamental characteristics are, and how it will help define our software for the next several generations of changes to come.

Where We Are

SAP’s software today covers a massive breadth of business activities. Functionality in the Business Suite covers a large spectrum of business processes, from finance and human resources to sales and service, from planning and procurement to manufacturing and logistics, from managing supply chains to managing business strategy, decision-making and compliance, and others. In addition, its functionality spans variations on these processes across 100+ countries and 25+ industries. Despite this massive reach, customers expect a fundamental degree of coherence, stability, reliability and integration across the various elements of such software. The expectation of stability, given the mission critical nature of many of these business processes, coupled with the fundamental ways in which business deploy and use the software to mirror their own business and its uniquenesses, means that our software and our relationships with customers, are very long-lived and often last decades. Over this long lifespan, customers simultaneously expect the software to contribute to their two fundamental metrics:

  1. Costs, by ensuring that the software is integrated and comprehensive, and easy to reliably operate and cheaply administer, and
  2. Growth, by ensuring that the software addresses differentiated areas and is easy to evolve, change, and integrate into others as necessary

So this, then, is the essential duality that our customers expect from their IT landscape: Deliver operational efficiency via coherence and stability, while enabling business growth and managing change necessary to survive and grow. And this becomes our prime requirement: Enable evolution of our software without disruption; provide a large breadth of stable functionality, over generations of change. And it is around this requirement that we seek to design and architect the evolution of our software.

What is the nature of this change dynamic? Business requirements change all the time; markets evolve, circumstances governing customers’ purchase of products change constantly, businesses are bought and sold, regulations change, and just the day-to-day challenges of competing require a constantly shifting and evolving IT landscape. But change occurs at other layers as well. People’s behavior evolves constantly. There are now millions of blackberry carrying business users worldwide, who carry out quite of a bit of their tasks outside their office. This year we estimate that nearly a billion people worldwide will conduct some or the other business activity on a mobile device. The technological layers change as well. Every year we see roughly two new major UI paradigms. Just in the last 3 years, we have seen the iPhone, Google’s work on Google maps and highly interactive web applications enabled by AJAX, Adobe’s work with AIR, Microsoft’s work on Silverlight and others. Even programming languages, and programming models around them, continuously evolve. Roughly every 10 years a major new language emerges, and minor ones every 3 years or so, well within the lifecycle of large scale applications. And programming models and developer communities emerge around these. The language Ruby, for example, is thought to have reached a million programmers faster than any other language ever. The three key infrastructural building blocks: processors, network and memory, evolve continuously and often non-linearly as well. And this evolution sometimes enables or requires, new architectural paradigms. For instance, cheap main-memory and elastic farms of simple servers have enabled fundamentally new ways of analyzing large amounts of data. Similarly, multi-core processors require rethinking application programming to better utilize their parallelism or risk slowing down. So large scale software, over its lifetime, is subjected to change continuously, business change, as well as change across all the technology layers that it inhabits.

As I look to the future, evolving our products for the next generation, this becomes our essential challenge: How do we build applications to serve the needs of every user, and every activity, in every business around the world? And how do we do so effectively, efficiently and with maximum coherence? And how do we evolve these applications, their ongoing change, consumption, delivery and integration, as well as their connection to the present, across generations of change? How do we deliver software that is always reliable, and yet always modern? In other words, how do we build timeless software?

Enterprise applications are built using a collection of programming models and languages that describe their content, are executed using a set of corresponding containers or run-time, and continuously change over their lifetime. My sense is these three constructs form the essence around which we need to understand Timeless Software, its characteristics and how we build it:

- Content, i.e. the application content, the UI content, the integration content, to represent and serve the activities of users
- Containers, i.e. the runtime(s) that this content inhabits, and
- Change, i.e. the ongoing operation and evolution of both the content and the containers over the lifecycle of a solution while maintaining a continuous link with the past

There are other aspects, to be sure, but these are the three basic ones and I want to share my view on their evolution next.

The Evolution of Content Creation

Enterprise systems cannot rely long-term on any one programming language. Alan Kay once observed that there is a major new language every ~10 yrs and several minor ones in the interim. So over its life span, a major enterprise system sees adoption curves of several languages. Just in the last several years we have seen very rapid adoption of .Net languages, Ruby, Python/Perl/Php, Javascript, and others. Perhaps even more interestingly, programming models emerge around these languages, and often the success of a programming model, e.g. JEE or Ruby-on-Rails, brings with it a large community of programmers, drives the adoption of the language, and an explosion of software artifacts around it.

But lots of languages and dialects also exist for other reasons: There are many different domains & problem characteristics within enterprise systems and for each domain, unique combinations of syntax, tooling conveniences and programming models emerge over time. From Jon Bentley’s “little languages” to the modern-day notion of “domain specific languages”, there are many variations in essentially the same exercise: expressing meaning in convenient, specialized ways. There are programming models and domain-specific languages around User Interfaces, for instance. Data has lots of variations too. Modeling and querying business data, languages for reporting and analytics, for search (as Google showed with their map/reduce programming model), for managing XML based or other hierarchical data, and others. Describing flows, events, rules, software lifecycle, and other aspects each bring their own variations, and the same thing happens in specific application areas and in particular industries. Over time, with successful adoption, these abstractions and conveniences increase. Our own ABAP, for instance, saw several programming models integrated within a general purpose language: abstractions and extensions for data access, for reporting, for UI, even object-oriented programming within ABAP, in the form of ABAP objects. Java, similarly, grew over the years in lots of domains and ultimately the JSR institution served to systematize the inclusion of extensions and programming models within the language. And there are similar examples in other domains, in hardware design for instance. Even cliques of teenagers invent their unique DSLs for texting.

Another key source of diversity in programming stems from the nature of the programmers. Programmers bring different degrees of training/understanding in computer science concepts, in business, and in particular domains. So languages and language constructs, as well as specific abstractions emerge for different programmer segments, be it system programmers, business analysts, administrators, or others.

This diversity is great, insofar as it enables abstractions and separation of concerns, so different classes of problems are dealt with uniquely. After all, the world does not speak one language, as any visit to the UN assembly hall would demonstrate. But the challenge is the resulting complexity that these isolations create. The various abstractions/specializations lead to islands of diverse, non-interoperable languages, language run-times and software lifecycles. Like barnacles attaching themselves to a host, these variations often lead to increased landscape complexity and dramatically higher costs of operation.

So my sense is we need an enterprise programming model that is deeply heterogeneous yet integrated. One that enables expression of meaning in a wide variety of simple and convenient ways, including ways yet to be invented, without losing coherence. One that:

1. Enables developers across lots of domains and specializations to use their native abstractions and conveniences

2. Uses a family of integrated domain-specific languages and tooling conveniences to build software artifacts with maximum efficiency and productivity

3. Has a powerful glue that binds these diverse elements together

4. Can be extended by communities and developers of various sorts in lots of different ways, and

5. Can integrate the next great languages, including languages yet to be invented, and can itself be renovated and embedded in other programming models

Some advanced development work we’ve done in our labs indicates that such an integrated design-time environment is indeed possible and can bridge a heretofore uncrossed divide between families of highly specialized DSLs that are yet integrated into a coherent whole. A key piece of this puzzle is a glue that binds the various DSLs together. The glue in this case, is a mechanism that takes a base language, such as Ruby, and uses capabilities such as reflection to extend the base language with the grammar of new DSLs in a seamless way. The timelessness comes from being able to add new DSLs dynamically to the base language, completely incrementally, without knowing about these in advance. We have experimented with several DSLs that plug into a glue and the glue in turn integrates seamlessly into a base language such as Ruby or Javascript. In a promising effort conducted by our SAP Research team, we have demonstrated how standard Ruby code can be run natively inside the ABAP language run-time, thereby achieving the benefits of both flexibility in Ruby programming and the enterprise-grade and robust Abap environment. I see several exciting developments ahead along these lines that will lead us to new paradigms in extremely efficient content creation without losing coherence.

The Evolution of Containers: Next runtimes

Enterprise run-times are faced with a significant challenge of optimizing the execution of the diverse and heterogeneous language landscapes described above. So if the content is to be built with maximum efficiency of expression and flexibility, then the containers need to enable maximum efficiency in execution. Our key challenge then is to bridge this divide between flexibility and optimization. In layered architectures, and with the first several years of service-oriented architectures behind us, we often take it as a maxim that the benefits of flexibility and abstraction come at the expense of optimization. That layers of abstraction, by creating an indirection, usually cost in performance. But I believe this is a false divide. Run-times need to separate meaning from optimization, and diversity in design-times need not lead to heterogeneity in run-times.

More than a decade ago, I examined one aspect of this issue in my own Ph.D. work, in looking at how meaning, specified in highly generic logic-based languages, could be executed optimally using specialized procedures that could cut the layers of abstraction to achieve significant optimization compared to a generic logical reasoning engine. The principle underneath this is the same one -- by separating meaning from optimization, a system can provide both: the efficiency and generality of specification in a wide variety of specialized dialects interoperating over a common glue, and a very efficient implementation of that glue down to the lowest layer possible in the stack, across the layers of abstraction

There are examples of this principle at work in other areas in our industry. The OSI stack implements seven very clean layers of abstraction in the network, and yet a particular switch or a router optimizes across these layers for extreme runtime efficiency. Hardware designers, similarly, use a variety of languages to specify various hardware functions, e.g. electrical behavior, logical behavior or layout, and yet when a chip is assembled out of this, it is an extremely lean, optimized implementation, baked into silicon. Purpose-built systems often can dictate their requirements to the platform layers below, whereas general-purpose systems often do not know in advance how they will be utilized, and can often be suboptimal compared to purpose-built systems, but more widely applicable.

But beyond crossing the layers of abstraction, run-times have an additional burden to overcome. In enterprise systems, we are often faced with tradeoffs in managing state across boundaries of processes and machines. There are three key building blocks in computing: networks, i.e. moving data around, processors, i.e. transforming data, and state, i.e. holding data, in memory or on a disk, etc. And different types of applications lend themselves to differing optimizations along these three dimensions. Several years ago, when dealing with some difficult challenges in advanced planning and optimization, our engineers did some pioneering work in bringing applications close together with main-memory based data management in our LiveCache technology. The result, implemented successfully in our APO product in supply-chain management, demonstrates how locality coupled with a highly purpose-built run-time offers a unique optimization on network, state and processing. More recent work in business intelligence demonstrates that when it comes to analytics, a great way to achieve performance improvements and lowered costs, is to organize data by columns in memory, instead of in disk-based RDBMSes, and perform aggregation and other analytical operations on the fly on these main-memory structures. Working together with engineers from Intel, our Trex and BI teams achieved massive performance and cost improvements in our highly successful BIA product. We are now taking this work a lot further; in looking at ways to bring processing and state close together elastically, and on the fly, and by looking at ways that the application design can be altered so that we can manage transactional state safely, and yet achieve real-time up-to-date analytics without expensive and time-consuming movement of data into data warehouses via ETL operations. SAP’s founder Hasso Plattner inspired me to do an experiment we dubbed Hana, for Hasso’s new architecture (and also a beautiful place in Hawaii), our teams working together with the Hasso-Plattner-Institut and Stanford demonstrated how an entirely new application architecture is possible, one that enables real-time complex analytics and aggregation, up to date with every transaction, in a way never thought possible in financial applications. By embedding language runtimes inside data management engines, we can elastically bring processing to the data, as well as vice-versa, depending on the nature of the application.

Enterprise systems with broad functionality, such as the Business Suite, often need several types of these optimizations. One can think of these as elastic bands across network, state and processing. Large enterprises need transactional resiliency for core processes such as financials, manufacturing and logistics. They need analytical optimizations, ala BIA, for large-scale analytics over data. They also need LiveCache style optimization for complex billing and pricing operations. They need to support long-running transactions to support business-to-business processes that work across time zones, they need collaborative infrastructure for activities such as product design, and others. Each of these patterns consumes the underlying infrastructure, memory, network and processing, in fundamentally different ways. This breadth is one key aspect that the existing SaaS offerings are extremely narrow in scope. Serving broad enterprise functionality off the cloud is a fundamentally different architectural challenge, than taking a niche edge application, such as sales force automation or talent management, and running it off what is essentially a large-scale client-server implementation. My sense is that enterprise ready cloud platforms will enable extremely low costs of running cloud services that have a broad footprint: transactional, analytical, long-running and others, with extreme ease of development and extensibility. We have some early promising results in these areas, but neither the current SaaS offerings, nor any other cloud platform I am aware of, can address this challenge for the foreseeable future.

So to summarize, I believe the next great run-times will implement the glue at lowest levels possible in the stack, cutting across the layers of abstractions that make developers’ lives easy at design-time but are not needed at run-time. These runtimes will flexibly enable various different application-oriented optimizations across network, state and processing and will enable execution in specialized containers or consolidated containers, in elastic, dynamically reconfigurable ways. This deployment elasticity will take virtualization several layers higher in the stack, and will open new ways for customers to combine flexibility and optimization under one unified lifecycle management, the final piece of the puzzle.

The Evolution of Change: Lifecycle Management

Perhaps the most important piece of this trichotomy is the third one: Change, i.e. managing the lifecycle of a system over the continuous change in its contents and containers. Enterprise software lives a very long time, and changes continuously over this time. Developers often do not often think beyond delivery and lifecycle mgmt is often an afterthought, and yet this very lifecycle management is the only constant in a usually very long life of an enterprise system. It is the embodiment of the relationship that the system maintains with the customer, over several generations and it encompasses several aspects: change in functionality, change in deployment, integrating a new system with an existing one, ongoing administration and monitoring.

One of the fundamental pre-requisites of lifecycle management is the ability to precisely describe and document existing or legacy systems. This documentation, whether it describes code, or system deployment, is a critical link across a system’s life. ABAP systems have well-defined constructs for change management, software logistics, versioning, archiving, etc., as well as metadata for describing code artifacts that makes it easier to manage change.

Consuming legacy software often means understanding what is on the “inside”. Well-defined wrappers, or descriptors, of software can help with this. But it is also often necessary to carve well-defined boundaries, or interfaces, in legacy code. Such firelaning, which has long been a practice in operating systems to evolve code non-disruptively, is essential to managing code’s evolution over the long haul. Service oriented architectures are a step in this direction, but having legacy code function side-by-side with “new” code often requires going far beyond what the SOA institution has considered so far. It requires having data, especially master data interoperability, enabling projections, joins and other complex operations on legacy code, having lifecycle, identity, security, and versioning related information about the legacy code, having policies in place to manage run-time behavior, and other aspects. Most of these steps today are manual, and enterprises pay significant integration costs over a system’s lifetime to manage these. Over time I see this getting significantly better. But it starts with provisioning, or enabling, existing code to behave in this manner, carving nature at her joints, as Alan Kay once told me the Greeks would say. I also see incumbents with an existing enterprise footprint, as having a significant advantage in getting here. It is often far easier to carve a lane out of existing code, than it is to replace it.

Great lifecycle management is the essential change management mechanism. My sense is, next generation lifecycle management will enable systems that can easily be tried, consumed, extended, added to, removed from, projected on, integrated with, etc. This will be achieved by enabling every artifact in a system to be measured, managed, and tested. We will see existing and legacy code being instrumented for administration, for documentation as well as for integration. This will require us to provide precise mechanizable specification and documentation of all important aspects of the system as a key ingredient. The specification of a system’s behavior, its usage, service-levels and policies describing its operation, especially for security, access and change, will be fundamental to this. We already see efforts in this direction towards precise, mechanized specifications of system behavior and we will see more of this. SAP has already taken some steps in this direction with our enhanced enterprise support offering, that enables a business to lifecycle manage system landscape across their entire business from one console.

Deep interoperability between design-times, run-times and lifecycle management, will enable us to combine deployment options in ways that were not possible before. For the foreseeable future we see customers employing some parts of their processes as on-demand services, but deploying most of their processes on-premise. Our lifecycle management frames will ensure that customers can make such deployment choices flexibly.

The evolution of our products along Timeless Software

Our portfolio of products, starting with the Business Suite, including Business Objects and NetWeaver and Business ByDesign, will continually evolve along these principles of timeless software.

As the picture above illustrates, we will continue to enhance our massive yet coherent breadth of functionality, to reflect ever increasing business activities across industries, geographies, and roles. This functionality will be built and extended using an evolving programming model, often in languages that have not yet been invented. And will be deployed in new ways, in the cloud, as appliances, on-premise, and all of the above. This functionality will be exposed for wide varieties of consumption, across consumers, business user workplaces, and devices, rendered via a wide variety of specialized client-side technologies, built by SAP as well as others. And yet all of this functionality will be under the same lifecycle frame, the backbone that will support the constant evolution, and constant optimization of our landscape at our customers. Our products will therefore reflect these principles. We will continually carve new lanes, and deliver new functionality, even deep new technologies. The applications will evolve continuously, and piecewise, as nature does: bringing new things, renovating others, adding here and retiring there, and doing so without breaking its essential qualities: reliability, integrity, integration, seamless administration, change and lifecycle management. Just as every few years we humans shed most of our cells, acquire new memories and lessons, decisions and beliefs, evolve and yet stay essentially who we are, I believe it is possible for software to renovate itself completely, and yet continuously.

So as excited as I am looking ahead to innovations on the horizon and beyond, that there is tons of new technologies, new capabilities, and new functionality to be delivered in our software, it is perhaps most reassuring that none of these will break the essential promises at the heart of timelessness, of reliability, integrity, coherence and continuous evolution.

On that reassuring thought, it is time to press the bed button on my seat and try out the fancy new lie-flat bed to end a day that began already 3 timezones away, 20 hours ago. And as I browse thru the 80 movies onboard, and notice the flight monitor displaying the plane’s airspeed of 567 miles/hr, things that passengers 40 years ago couldn’t have imagined, I find myself thankful for being in the comfort of a well engineered timeless system.