Infonomics
Plain language about Digital Leadership and  Governance of Information Technology for Executives and Directors  
Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Subscribe
The Infonomics Letter on Governance of Information Technology contains news and discussion of developments in the marketplace.

In October 2013, The Infonomics Letter evolved into its third form - appearing more frequently, with each Letter addressing only one topic.


The Infonomics Letter is now delivered in its entirety by email.  In time, it will move to a more standard blog format, with the inbuilt capability for discussion.


Discussion is now also encouraged via a new LinkedIn Group - unsurprisingly called The Infonomics Letter.  Each new edition will be posted to the group as it is released.


From September 2008, Infonomics uses Constant Contact to manage its mailing list.


Constant Contact provides us powerful tools for managing our mailing list and for analysing what happens with our mail.


To subscribe, just enter your email address in the box at the top right of this page.

The Infonomics Letter

Current Edition 7 January 2014 Past Editions
  Spanish Translation
  Discuss on LinkedIn

Test or Fail - A Lesson from Myer

PDF Download

Welcome to the first Infonomics Letter for 2014.  I hope that it’s a great year for all of us.

In this edition we look at a major Australian retailer’s online channel failure and the questions that might have helped the CEO and Directors avoid the pain.

The opportunity to save on Waltzing with the Elephant and to pre-purchase Digital Leadership Manifesto continues until the end of the month.

There’s also a reminder of the ACS Education Across the Nation program which starts soon, delivering briefings on Digital Leadership and classes on Digital Leadership and Governance of IT in nine Australian cities during February, March and April.

Test or Fail – A Lesson from Myer

David Rice wrote in Geekonomics (published by Addison Wesley in 2008) about the contemporary IT industry’s habit of releasing to market software that has not been adequately tested.  Chapter 2 is titled “Six Billion Crash Test Dummies: Irrational Innovation and Perverse Incentives”.  In a nutshell, he argues that competitive pressure combined with market demand for the newest, greatest technology forces technology developers to release to market products that in some cases are not even complete, let alone robustly tested.  We see the consequences in myriad technology failures, many with high visibility and potentially serious consequences, such as those involving exposure of sensitive personal information.

But nowadays we are seeing new pressure to reinforce the importance of proper testing.  Empowered consumers vent their frustration with poor performance and the press gives substantial attention to major failures.  Organisations which deliver substandard systems suffer reputation damage and are increasingly at risk of losing customers.

In the week following Christmas 2013, one major Australian retailer, Myer Holdings Ltd, provided an interesting illustration of what happens to organisations that, however unwittingly, use the crash test dummy technique when engaging with their online customers.  In parallel with extensive discussion of the problem in all of general, business and IT specialist press, members of the Australian Institute of Company Directors LinkedIn community began a discussion on “Myer Boxing Day website crash... what does it say about IT Governance maturity in Australia?”.

There are many dimensions to the question and many layers to the problem that Myer experienced.  It’s worth exploring.

Like many Australian retailers, Myer has been a late arrival at the online retail sales party.  CEO Bernie Brooks, who began his tenure in 2006, is reported in the Australian Financial Review (Myer’s Brookes admits online errors, 24 July 2013) as acknowledging that the company should have done more to be ready for the rise of online retail – a phenomenon in which Australians have become significant personal importers buying from offshore websites.  But by the end of 2013, Myer was gaining a toehold in the online space, with about one percent of turnover coming from its online channel.  It expects the online channel to grow to 10% of turnover within 5 years.

In common with many domestic retailers, Myer exploited the freedom of the online world to jump the gun for its traditional Boxing Day Sale, opening the online store to the annual clearance bargains on Christmas Day.  It should have been a bumper experience for the company and the customers.  But something was wrong and the Myer website ground to a halt.  Messages from frustrated Myer customers posted on Twitter, Whirlpool and Facebook indicate that the system was effectively useless by 6:30 pm on Christmas Day.  Unable to resolve the problem quickly, Myer closed its online site, and it stayed down for more than a week – returning in the early hours of January 2.

Reliance on crash test dummies?

It’s not uncommon to hear of websites failing under load.  We often hear of such things when online ticketing agencies release a new high-profile event, and sometimes the time between launch and failure is extremely short.  But failure of IT systems under load is not a new phenomenon.  Computer systems have long been subject to the tension between the capacity of available (and in the past, expensive) processing resources and the workload their owners seek to process.  Since the advent of multi-user computer systems, specialists and researchers have striven to understand the factors that cause systems to go into meltdown – and they have learned a great deal.  Since the early to mid 1990’s, the IT industry has had available a substantial knowledge base regarding the causes of load-related failure and a great deal of knowledge of how to avoid it.

One of the most fundamental lessons from experience with the early high volume, rapid response online systems for banks and airlines in the 1980s and 1990s is the importance of effective load or stress testing.  The idea of such testing is to discover the behaviour of a given computer system when pushed to workload levels well above its design goal, and to verify that, when pushed significantly beyond its limits, the system degrades gracefully, avoiding the creation of new problems.  It’s not enough to test systems at low load levels, and it’s not enough to rely on tests that focus on functionality.  To ensure that a computer system, and therefore an online sales website, can sustain ongoing very high activity levels, there is no substitute for a carefully planned test that drives the system far beyond its design limits for an extended period.  Often, it takes more than a mere blip of peak load to drive faults to the surface.

People experienced in stress testing know that systems which work perfectly well under light load conditions can behave in bizarre and unpredictable ways under heavy load.  They know that poor coding practice and errors which cause no harm at light load can conspire to generate a rapidly escalating resource crisis inside the system, which then blocks normal system operations and brings everything to a halt.

To look at this through a different lens, think about the crash testing that car manufacturers undertake.  To understand what happens to a new car design in a major collision requires that not just one, but several examples of the car are destroyed by deliberately crashing them.  Although computerised simulations are now sophisticated enough to give a very good idea of what will happen in a major crash, the certainty necessary for gaining type approval still requires the conduct and subsequent analysis of destructive testing.  As every automotive engineer and executive knows, it’s insufficient to conduct a low-speed “fender bender” and try to extrapolate from this what happens when the same vehicle ploughs into a large tree at 100 km/hour.  This is the source of the aforementioned analogy by Rice – we have a tendency to avoid rigorous testing of software systems, and instead throw the systems into operation with just cursory testing and a hope that the customers only discover minor problems.

It took a while for clarity to emerge about what went wrong with the Myer system.  For days, all we heard was gobbledegook about “a breakdown in communication between software and the site's services”.  We also heard that “an application (was) ‘not talking’ with the server, therefore causing the web pages to time out”.  It wasn’t until a report in The Australian (How a tech glitch stole Christmas, January 3), that we had a coherent, if sketchy explanation.  In many high volume systems, the system generates a “thread” for each transaction, which contains all of the unique data for that transaction.  For some reason, the Myer system was either creating redundant threads, or it was not deleting threads when they were no longer required.  The system became swamped trying to manage thousands, or perhaps tens of thousands of redundant threads, and could not give enough time to the threads that still had real customers waiting for service.

To pitch a crude metaphor in pure retail terms, it’s like having to use one new cash register for each customer and not subsequently being able to use that cash register again.  Clearly, it’s easy to imagine that selling anything at all would rapidly become impossible!

Discovering such problems is exactly why prudent organisations undertake stress testing of critical systems.  When customer demand becomes the stress test, it’s too late to analyse the cause (which may indeed require more high stress operation to enable recognition of the trigger and root cause).  While only those on the inside know for sure, there’s an old adage that says: “if it walks like a duck, and quacks like a duck, it’s probably a duck”!  Well, Myer’s online system meltdown has all the hallmarks, now that minimal technical information has been provided, of a system that has not been stress tested to well beyond the reasonably expected workload, until the customers became the crash test dummies for the system.

Myer should have been even more diligent in making sure that the system could handle very high sustained workload given its experience at the half-year sale, back in June 2013.  Then the online store crashed only half an hour after the sale began, with the cause again being attributed to heavy traffic.  One wonders if the response to that failure was not to diligently stress test and find the cause, but to try papering over the cracks with increased processing horsepower.

Something else that pioneers learned from those early high volume high performance banking and airline systems was that adding horsepower does not necessarily mean that capacity increases in proportion, or that failure thresholds are pushed to levels that cannot be attained in even heavy workload periods.  Sometimes, as may have been the case this time (Brooks is reported to have said that the problem was not as a result of insufficient capacity), problems will manifest in ways that do not appear to be taxing the resources, but at the same time are substantially impeding the flow of business.

At the end of the day, Myer, like many other organisations, may be learning the hard way that one must robustly test systems and prove that they can do the job reliably, rather than relying on projections derived from low-stress exercises and theoretical calculations.

Moreover, Myer, like many organisations in the past and present, must deal with the reality that workload for online retail is escalating at a very rapid rate.  Reports indicate that, for those retailers that are offering a stable service, the volume for 2013 is double that of 2012.  Myer itself forecasts that online sales will grow from 1% of turnover in 2013 to 10% within 5 years, which indicates a minimum 60% increase in volume each year.  Such growth rates rapidly erode safety margins between forecast peak load and the point at which systems start behaving badly.  It will clearly be necessary for Myer to have access to a regular increase in horsepower for its online system – and every time the horsepower is increased, it will be essential that stress tests are conducted to ensure that the horsepower actually delivers!

Customer reaction: the customer messages are critical.

For Myer, the people who attempted to purchase clearance sale bargains online have been the crash test dummies.  But these dummies are not the silent types who can be recycled to load up the system and repeat the experience.  Rather, they are vocal and digitally empowered real people who are prone to venting their frustrations and saying what they think.

It’s customary these days for people who are not happy with the service they get from an organisation to air their frustration on social media.  Myer copped a considerable broadside over this incident.  But while there is no way that an organisation can stop its customers complaining, there are many things that can be done to minimise the damage.

Brooks did what every customer savvy CEO should do.  From the outset he accepted that Myer had got it wrong, and he apologised.  He also made promises that customers would not be disadvantaged by their inability to access start-of-sale uber-bargains, and he waived delivery fees.  He is reported as saying “I know perfectly well that this is not good for our reputation in trying to compete online”.

From a fairly early stage, the Myer website gave a clear message that online shopping was unavailable due to “technical difficulties”.  The social media team made a promise to get the site up again as quickly as possible, though few might have imagined that resolution would take a solid week to achieve that goal.

By taking a week and saying that they would not reinstate the site until it was working properly, Myer created a new level of expectation with its customers – that the website would actually work, and that the promised compensation arrangements would be in place.  During the period the website was down, Myer also urged customers to go to traditional bricks and mortar stores.

Beyond these measures to reset customer expectation, there has been little apparent discussion of how well Myer has managed its dialogue with unhappy customers.  We do know that, when the site eventually returned to service, the special deals weren’t there – customers were instead told that the “Stocktake launch deals are coming very soon”.  That may have been a significant error.  Customers were given an expectation that was fulfilled late, and there seems to have been less information available than necessary to enable customers to reap the reward for loyalty.

A brief perusal of social media also reveals that, while Myer resumed service on January 2nd, more than a few customers remain unhappy with their Myer Online shopping experience.  There are numerous complaints of poor performance and shopping carts (bags) mysteriously becoming empty.  On January 6th, one customer posted on facebook: “Well done MYER. Not only did half of your customers not get sale items because your website is not a priority to you. But when we called up ''customer service'' (A term I use very lightly) we get spoken to like crap and pretty such shoved off. Way to leave a bad taste in everyone's mouth”. 

In the retail space, there’s also the challenge of the differences between online and in-store transactions.  Another facebook comment reads: “I tried buying some Essteele pans this morning and unfortunately your website did not allow me to checkout. So I went into two different stores today trying to get them to match the online prices (about 10% lower) and was told that your "policy" did not allow them to do so”. 

While the comments posted about continuing problems days after the service was restored should signal some alarm for Myer, the perhaps deeper question is how well its longer-established business elements are geared for responding to online problems.  While the online shopping system was unavailable, a static web page directed customers seeking more information to a service centre which is open only during store opening hours!  What did customers attempting to obtain assistance after-hours think of that?  One also wonders what messages were given to customers who did call.  Does Myer have a well-prepared and tested plan for ensuring that the call centre maximises customer service and retention when things go wrong with the website and other IT systems, or does it follow in the wake of organisations like British Gas, whose call centre operators insisted that the system was right and that customers were wrong as they complained about domestic gas bills exceeding a million pounds.

Having a functioning web site is not the end of the game for an organisation that is serious about being an online retailer.  To be successful, the organisation must design its total business system for the online space, and that means ground-up rethinking of the entire customer experience to ensure that everything works in harmony.  It doesn’t mean that online has to be configured as a separate business – rather it means, as would be implied by the omni-channel strategy that Myer claims to have adopted, that all the components in the customer experience (and the rest of the business systems) work as effectively for the online space as they do for the traditional bricks and mortar, mail order and telephone order channels.

Perspective for Directors on Governance of IT

The ISO 38500 standard for governance of IT lists performance as one of the six principles for good governance of IT.  The performance principle requires that IT is fit for purpose in supporting the organization, providing the services, levels of service and service quality required to meet current and future business requirements.  Clearly, Myer Online did not meet current business requirements and was not fit for purpose.

But how can the average company director head off the embarrassment that Myer has experienced.  We can’t expect directors to be out there testing systems, and we can’t expect them to have the in-depth technical knowledge to directly assess the testing arrangements.

Actually, directors don’t need to have any of that detailed knowledge.  There are many probing questions that directors who have no specific technology skills can ask to discover whether management is on the ball, and to gain comfort in the systems on which the business relies.

What is the proven sustained peak workload at which the system can operate before customers experience an unacceptable reduction in service?

The only way that management can truthfully answer this question is to have conducted stress tests that give a clear answer.  It’s the one question beyond any other that gives confidence, but only if it is answered in a way that confirms a robust method of determination. 

When exploring this question, directors should look for and explore information about how the sustainable peak workload was determined.  By proven, we mean that the workload is determined with a basis of proof that will stand up to independent assessment.  It’s difficult to imagine that proof would not involve robust, repeatable testing that provides consistent data over multiple test runs.  Some organisations operate on a scale that means running stress tests on the same scale is massively expensive and impractical – but that doesn’t preclude development of a robust method for calculating peak sustainable production workload.  Management should be able to explain clearly how test results are scaled up to generate production scale results.  Many organisations that do undertake stress testing do so with small-scale models of the production systems, and directors need to be careful of claims that systems scale up with 100% efficiency.  Often, overheads associated with scale mean that only part of the additional horsepower translates into additional workload capacity.  Management should explain exactly how the testing environment compares to that intended for full scale business operation, and show how testing proves that the scaled-up peak workload and performance are valid.

What is the headroom, in terms of business transactions, between forecast workload and the proven peak sustainable workload of the system, for the duration of the planning period?

Knowing the capacity of your system is one thing – knowing the likely workload is entirely another.  Look here for evidence that management has a suitable technique for calculating the likely workload.  As more and more customers and competitors switch to online shopping, the behaviour of customers may change.  As the online shopping experience becomes more sophisticated, the amount of computing power and other resources required to process the shopping experience may vary. While the mix of online and in-store shopping will certainly be changing in percentage terms, the overall volume of customer activity will also be changing.  Directors should be convinced that management has an effective method of continuously revising workload projections, and reconfirming that there is enough headroom.

Assume that the unthinkable happens, and the online shopping service does fail.  What are the actions that we will take to deal with such an event, and specifically, how will we manage the reaction of customers and the broader market?

There is no doubt about this one.  Online shopping systems and other online customer related systems will fail.  It may not be a problem with any of the technology that your managers or suppliers control, but there are many other potential points of failure for web-enabled systems.  You need to be sure that your managers have a clear, workable, and preferably well-rehearsed plan for dealing with the fallout.  Among the things that should be addressed in the plan are:

·       Promptly and efficiently fielding complaints and other messages from customers, which may arrive by email or any of several social media systems;

·       Keeping customers, the media and others with a need or desire to know updated on the situation and the action that is being taken, with pragmatic estimates for resumption of service when possible.  The plans for informing customers should include a wide array of channels through which a customer can be kept informed, such as an opt-in email, a text message, or messages dispatched via social media;

·       Providing alternative, and as near-as-possible equivalent customer service via alternative channels;

·       Competitors may seize the opportunity to make a predatory attack on your market share while your systems are not working.  How will you protect your market share in the face of such attack?

Who is the business executive responsible for operation of the online store, and how does that executive maintain the store’s performance in operational, financial and customer service terms?

Most department stores have a store manager whose job it is to ensure that the store operates smoothly, and is well aligned to local customer expectations and behaviour.  What about the online store?  Does it also have an experienced retail store manager who ensures that it is meeting expectations and manages its growth?

What is the real customer experience of our online store?

Above all others, this is the question that should be answered by the business executive responsible for the online store.  No matter how much testing is carried out, the ultimate measure of performance in a 21st century online shopping system is what each customer experiences.  Retailers have long known that customers are much more likely to speak out about a poor experience, and the online world is no different.  It’s not enough to simply wait for customer complaints to reach a crescendo: means should be in place to understand exactly what customers are experiencing and detect problems before a crisis develops.  Directors should ask management to explain how the customer experience is monitored and managed, and management should provide ongoing reporting of the customer experience as part of a comprehensive approach to oversight of the developing and evolving online channels.

And from time to time, it would not hurt for directors to take on the role of customer and actually use the online channel themselves, rather than making an in-store visit.

Discounts on Waltzing with the Elephant and the forthcoming Digital Leadership Manifesto

As announced in the 20 December Infonomics Letter, I’m working on a new book, titled Digital Leadership Manifesto.  From its launch date, expected to be at the end of January, the price for the new book in electronic format will be AU$40, plus GST for buyers in Australia.  To give it the best possible launch, I am offering a pre-release purchase option for $30 plus GST.  To take advantage of the offer, click on the link here or go to the Digital Leadership Manifesto page on the Infonomics website. 

To complement the offer, I’ve reduced the price of Waltzing with the Elephant to AU$50 (plus GST if applicable) for the holiday season – until 31 January 2014!

Education program with the ACS in 1Q14

I’m delighted to be returning to the Australian Computer Society’s Education Across the Nation (EdXN) program with an extended program on Digital Leadership and Governance of IT.

The core of the EdXN program is a briefing for ACS members and guests on topical issues in Digital Leadership and Digital Transformation.   The briefing runs for an hour, followed by time for questions and opportunities for networking.

The EdXN briefing is supported by an upgraded two day class on Digital Leadership and Governance of ICT using ISO 38500.  Based on the proven Infonomics ISO 38500 Foundation class, this event adds new perspective focused on how organisations and whole economies undergo digital transformation.

Both the briefing and the upgraded class draw on Digital Leadership Manifesto – my new book scheduled for release at the end of January 2014.

The program is locked in and ACS branches around the nation will, soon after re-opening for the new year, begin promotion and registration for their local events.  The full programme is:

City

Class

Briefing

Perth (WA)

24 – 25 February

25 February

Darwin (NT)

27 – 28 February

26 February

Canberra (ACT)

3 – 4 March

4 March

Adelaide (SA)

5 March

6 – 7 March

Brisbane (Qld)

17-18 March

18 March

Toowoomba (Qld)

Refer to Brisbane

19 March

Hobart (Tas)

27 - 28 March

26 March

Sydney (NSW)

31 March – 1 April

31 March

Melbourne (Vic)

3 – 4 April

2 April

Click through to the Infonomics Events pages for a detailed description of the briefing and class.  These descriptions will also soon be available on the ACS Events pages, along with registration and pricing details.

These events are not just for ICT professionals.  Digital Transformation affects everybody – a fact perhaps best exemplified by the penetration of smart phones into the general population and by the turmoil in several sectors of the economy as some organisations adjust, and others fail to adjust, to the new realities of life in the digital era.  These events are entirely relevant to everybody who works in a technology-enabled or technology-dependent organisation, and I know that the ACS will welcome participation from people in many occupations.

Once again, I wish you all the best for the new year, and look forward to your support in 2014.

 

Mark Toomey

7 January 2014.

Discuss on LinkedIn

To learn more about Digital Leadership and Governance of IT:

 

Past Editions

2013    2012    2011    2010    2009    2008