Content is the future, but what is the future of content?

In this episode, Chris Willis sits down with content futurist Michael Iantosca to share his insights on what’s cooking in the space, delving into how you can leverage the power of content reuse. Michael is the Senior Director of Content Platforms at Avalara and the Founder of DITA, the open-source publishing engine for content authored in the Darwin Information Typing Architecture. 

Michael shares why finding common terminology is key for a strong foundation and gets into the benefits of moving to a more non-book-oriented, component-oriented, extensible model. Catch more valuable insights by tuning in to this episode.

Watch the episode here

Listen to the podcast here

Read full episode transcript

On the show is Mike Iantosca. Mike is the Senior Director of Content Platforms at Avalara. He’s a content futurist. Twenty years at IBM. He’s building the future of technical content creation and delivery right now as we speak. We’re going to talk about it here. Let’s sit back and get some insight from the flock.

Mike Iantosca, welcome to the show. Mike and I know each other. This is going to be interesting. This is what we would classify as a very special episode. Mike Iantosca is a lot of different things, but just to give you some background on who this man is. I’m sure you’re familiar with DITA if you’re tuning in to this show. Mike’s the founder of the DITA content. You’re a pioneer, I think you could say.

You didn’t stop there. You continue even post-retirement and return to creating new paradigms of content creation. Take me back through the process or your role in creating DITA, and then how you leveraged that approach to get to where you are now.

Thank you for having me, Chris. Content has been my life for 40 years. It’s a joy. I can’t believe I made a career and got paid to have so much fun every day. I started at IBM back around 1980 to 1981 timeframe. One of my mentors back then was Charles Goldfarb, who invented the Markup Language with the trio, Goldfarb, Mosher and Lorie, GML. That’s where SGML came from and GML was birthed.

Working with Charles and working with the technology, I was brought to the dark side, understanding and building solutions based on SGML and then XML. When XML evolved, it had been evolving over a number of years. It was actually jokingly called Monastic SGML at first. A lot of people were involved. When XML emerged and I was at the conference in which the spec came out, I immediately realized there was no X in XML.

It really wasn’t extensible. I was excited about it because it had a lot of promise for taking structured content to a much broader part of the industry. When I came back to IBM, I began evangelizing heavily and was able to get the IBM team to develop the first XML public parser. I was able to convince the folks that I worked with, my peers, that we should stop development on an SGML dialect that was designed for the web. It was called web doc at the time.

I formed a team of about ten people and figure out how can we take XML and make it extensible. We brought in some brilliant folks. I formed the team. I was able to sell it to management on it. I was a member of the team, but it wasn’t until we brought in folks like Michael Priestley, Don Day, John Hunt, and a whole bunch of others in that group of ten.

WOBI 17 | Power of Content Reuse

We spent a good two years or a year and a half developing the concept of DITA and reusable content. It was a challenge because of the mindset of moving from SGML to XML, and developing a standard and architecture, not a language, was unheard of, but we persisted. When we had this standard, we then began to shop it to different tool providers and start to recruit the different tool providers.

We believed that this was a standard that was so extensible. It could become an industry on its own. It took probably about 5 or 6 years. We had to work it through the Oasis organization to get it to be a true standard. Others did that on my team. I sunk into the background and built the very first platforms to support it, and began counseling other companies.

I’ve had the luxury of seeing this entire industry from its birth grow with all of the providers that now provide, buy and extend solutions with a wonderful cadre of consultants to help companies implement this amazing technology that provides exceptional reuse and flexibility for publishing any source to any channel, and take content to the next level beyond where we are now.

As you move forward with that, did you realize that this was something that was going to be much bigger than what you had initially started with? DITA is an approach that I think most major organizations now use. It’s a standard. Popping out of this group that you put together, did you envision where this would go?

We had the vision and we had the wish that it could become that. It took a lot of people to go through the industry and build alliances to get enough inertia until it became its own beast if you will, and then it took on a life of its own. The people who were doing traditional structured content using SGML immediately realized the benefit of moving to a more non-book-oriented, more component-oriented, extensible model where we could do content typing much better and easier reuse with lower barriers of entry.

It immediately pulled in people from the experience-structured world. It’s been a slow but steady growth since. I would say 30% of the industry now uses DITA, which is huge in terms of numbers. It’s slow-growing, but it’s steady growth. There have been a couple of companies and users that may have started and gone to something simpler because they wanted simpler authoring.

At the time, we didn’t have the visual DITA authoring tools that now exist. Sadly, the only serious thing that they lost was the algorithmic machine consumable, processing and automation that only something like DITA provides. They threw the baby out with the bath water when they went to things like Markdown or RST, not realizing that the future of artificial intelligence and machine learning-based content automation was demanding the algorithmically processable intelligence that did ahead.

We’ve lost very few people in the process. Most have stuck with DITA and structured content because they understand the business value proposition and the economies of doing that. Also, being able to support many different sources of many different channels without linearly growing their resources and organization would require maintaining duplication.

It would be hard to look at the experience with DITA in the enterprises as anything but a success. I remember some time ago, sending you best wishes when you retired after decades of doing this specifically at IBM. Yet, here you are with a new title and a new company. I’m fascinated by this title. You’re the Senior Director of Content Platforms.

We think a lot in terms of content impact and the purpose of the content. A lot of businesses miss the business value of content. Content is a thing that we create because we have to write words down because people need to read words. Its value of it is lost on a lot of businesses. I feel like if a company has hired somebody of your caliber with the title inclusive of content platforms, it feels like maybe there’s an understanding in your business, but I don’t want to guess. Tell me a little bit about what content platforms means at Avalara.

I own the content platforms, and performance goes along with it. When we talk about performance, we’re talking about customer success at the end of the day. We’re talking about eliminating or reducing friction for the customer. We’re talking about maximizing time to value, speed of adoption, and ease of implementation.

We believed that this was a standard that was so extensible. It could become an industry on its own. Click To Tweet

These are essential if you are to gain new customers and increase the base of their footprints through upsell and cross-sell. If a company cannot quickly implement your solution, they’re going to go elsewhere. If the friction level is high either during or even after implementation, they’re not going to be happy with you. They’re going to be wary about trusting you to expand their footprint.

A company like Avalara is a high-growth company. It was in the 30 and 40 percentile year-over-year revenue growth that quickly achieve $750 million of revenue and is on a trajectory to be a multinational. How do you grow from hundreds and thousands of customers to millions of customers? That’s a heck of a business proposition.

To do that, Avalara had the vision before I joined that they needed the best and the brightest people to achieve that. They put their money where their mouth is. They’ve invested in the experienced people who have been around the block. They bring in people who are also not experienced, but having that wealth of knowledge and teaming to create something from scratch is an opportunity that’s rare.

When I found out about Avalara and I found out that they had already made the decision to go to DITA, I didn’t make that decision. They had some pretty bright people that did the analysis and said, “We want to be DITA,” at least for the product content. They had basically almost nothing in terms of a platform. To me, that was giving a painter a blank canvas and every color in the world they could imagine and saying, “Here, after a lifetime of painting, go paint your masterpiece.” I couldn’t resist, Chris. I had to come off retirement and give it another round or two. It’s just too enjoyable to do.

That’s exactly right.

In just a period of sixteen or so months, we have built a world-class platform that goes live later this year. That includes a world-class migration to DITA, word-class CNS, world-class content governance with Acrolinx that we use, word-class content delivery platform, and world-class taxonomy and ontology management system, including Globalization and Terminology and Vocabulary Management System that didn’t exist when I arrived.

It’s an amazing lift in probably industry record time, but it’s not just because of what I know in my experience. It’s the teamwork and the collective experience and knowledge of the people around me that were able to make it happen because they all shared that same vision that is driven by our company’s North Star of providing simple and unified solutions for our customers.

We get a lot of pushback that thinking through the impact of content is more of a marketing department thing. Marketing is worried about conversions, so good content converts at a higher rate. I keep pushing back that content performance matters everywhere in the customer experience. As a business, we do have so much experience in tech docs and technical content in general.

If you’re telling me that the only place where performance matters is in marketing, then you’re telling me all the content that’s built through the technology organization through product doesn’t matter. You just talked about it. It’s customer success, taking out friction, maximizing time to value, and getting successful implementations.

All of those things lead to renewals, retention, and customer satisfaction, things that are measured at the top of the business. Many businesses are missing that aspect. “Content is to convert things. I’m trying to get more leads.” Cool. Great. Where is everything else coming from? What if you have unhappy customers? What if customers can’t use your product? We all know the same stories.

You’ve been to the same conferences that I have been to and heard the people talk about they’ve created content that has material issues with it that make it impossible to use the product that the content is accompanying. It’s costing them millions of dollars in failed product launches. How do you say that that doesn’t matter?

Here you are at a company that not only gets that but is pouring resources into it. I get why you came out of retirement because this is the culmination. This is completing the model. As a founder involved in DITA, you’re not done. You’re not just sitting there and saying, “I already did this.” You’re well into the next stage of building for the future. You’ve started a guild. You have a blog of your own. You’ve created resources and conversations that are leading content creation and measurement into the future. Talk a little bit about that.

We want to take content from where it is now. We are at an inflection point in the industry. For the last 20 or 25 years, many companies have exercised the value proposition of componentized and intelligent content. We’ve got the benefits and value of proper reuse. We’ve got the benefits of progressive disclosure, minimalism, and all those wonderful things.

We use tremendous amounts of reuse. The ability to write once and publish to many different channels and many different formats without having to rewrite the content multiple times and linearly scale the resources necessary to do that. We’ve got all of those things. Any of the 30% or 40% of the companies that have gone to structured intelligent content know that. It’s more than just structured content.

You touched on the fact that it’s one thing to get the people into the funnel at the beginning of marketing, but we have study after study. Forrester and everybody else will tell you that 50% or more of the time, the people who are evaluating product purchases are going to the technical content. They’re scoping out the APIs and the functionality.

WOBI 17 | Power of Content Reuse

The marketing materials don’t go that deep. They have to go. You can’t treat marketing content, technical content, and enablement content as silos. They have to interlock with each other. The way we interlock that with each other is through semantics. It doesn’t matter actually that some content is in DITA, some content is in AEM, and other content is in Salesforce or on LMS or whatever.

We need to unite all that content. The way we do that is through a common lingua franca, vocabulary. Vocabularies are the foundation of the future. If we want to link all of that content together from all those different sources, combine it and deliver combinations of it in a truly personalized, not personified way, we need to have standardized vocabulary first.

We need to have a common terminology. That is the foundation. If companies haven’t started it, that’s the first place to start, but it doesn’t end there. Once we’ve got the common terminology, we then can build taxonomies. I don’t want to get into the gore of taxonomies, but taxonomies are nothing but classification labels. If you’ve been to eBay or you’ve been to Amazon, you’ve played with taxonomies.

You’ve traversed from photography down to cameras to lenses to whatever. You know that it could be used to narrow down and make content more searchable, but they have a much greater purpose now going into the future. We want to leverage artificial intelligence and machine learning for content. This componentized structured model, as well as unstructured content, can apply the vocabulary such that we can build what’s necessary to feed AI and machine learning systems through the use of knowledge graphs.

The knowledge graph is a topic that you might hear about at a cocktail party. Anybody talking about them sounds like they’re rocket science, or they might be string theory or something complex. It isn’t. It’s very much like those little molecule models you used to make in high school and college. All they are is that each little ball and each little circle is simply a content object, and then it’s connected to another content object because it has a relationship to it.

We can query massive arrays of these molecule models and build new applications. That’s all the knowledge graph really is fundamental. It’s simple actually. It’s “Mike went to Marist College and Jane went to Marist College. Guess what? Both Mike and Jane went there.” We have a relationship between us that we didn’t even know existed before. Through query language and inbound signals from the customer, we can literally move from failure mode content where we’ve been since the cave days to proactive assistive content that’s highly personalized and specific to the task they’re dealing with at the time.

WOBI 17 | Power of Content Reuse

For example, let’s say I want to prove to the business that the use of technical content contributes as much as it does to the upsell of a product or feature that the marketing content does. If I can have the sales information and graph that, and I can graph my content corpus, I can query both and find out how much of the content they used prior to that sale. I can finally prove to the business how much my technical content contributed to the sale versus the contribution of other types of content.

That’s unheard of. You can’t do that with this kind of technology. The foundation for doing that is the common shared consistent vocabularies. The one thing I did, right away, coming into the business at Avalara was, “Where’s the common term base? There’s not one? Let’s start building one. What’s our common taxonomy? There’s not one for content. Let’s start building a common taxonomy for our content so we can label it so people can sift through it and find it much easier.”

Those are only very foundational things. We’re going to take that and build it into knowledge graphs in order to push this because the tool vendors support things like the terminology and the taxonomies today pretty well. They’re getting better but they’re not yet into the advanced semantic technologies like ontologies and knowledge graphs that are necessary to move into that new future.

We form this guild called the Semantic Content Graph Guild. On it, we have senior members from Microsoft, Dell, Oracle, IBM, Juniper, STMicroelectronics, and the list goes on and on of people who are dabbling and doing some discovery work in this new advanced semantics world and it’s open. We take in members who are willing to discuss and throw their ideas forth.

We have meetings every month. We publish those meetings openly in our little discussion form, our little bulletin board on Anybody can go there and join the discussion. What I tell people is this is all wonderful because it’s making what was nebulous. This talks about AI and machine learning. It’s providing the semantic fabric that these technologies have to have in order to work.

People just didn’t know how to approach it. We created what we call the Semantic Content Maturity Model. That tells you in sequence what you need to do first, second, third and fourth to build up this advanced intelligence on top of the intelligence we have in solutions like DITA. That gives you that fuel to drive the next generation of applications and the kinds of applications that are just not even thinkable now because we don’t have enough intelligence alone, and even structured intelligent content to do that. We need to add another layer. The common lingua franca of that is the consistent vocabularies, terminology and taxonomies. Without that, you have no foundation. You can’t build a house without foundation.

No, it would definitely be difficult. Just so that people don’t miss it. That’s, Semantic Content Graph Guild. Accompanying also, which is your blog. That is another place that you can go to find the Semantic Content Maturity Model. There’s a great blog article and some images that people could read if they want to gain more information on that topic. Let’s not be confused about what we’re hearing right now.

This is the future of the next generation of technology that’s going to drive the creation and delivery of content. I feel like the kickoff of this conversation is, “So you want to have a career in content? These are things you need to know about.” You can’t ignore this and go forward into a world where there’s going to be this kind of advanced personalization, where we’re going to get the right information out at the right time to the right people on the right topics. You can’t create relevance in content if you don’t have this kind of baseline to start from.

That’s absolutely accurate. You just have to build up. Those companies that stayed the course that built DITA on top of DITA, they’re going to have the easiest road of all down this road, not exclusively. Those who have chosen Markdown and RST, they’re just not going to have the granularity that we’re going to enjoy with DITA.

For example, if I’m building a chatbot, how many chatbots are just plain old dumb decision trees that are not very effective? Versus, how many chatbots can mine a massive corpus of user assistance content? It’s almost nonexistent actually. With this model, we can do that, which means if I can label my content, I can even label pieces within topics like a set of steps. It can extract those steps because it understands where it begins and when it ends in a structure and delivers precise answers.

That’s what people want from a chatbot. They want precise answers. Now we’re not giving them precise answers in chatbots. We’re not mining that huge corpus of help and technical content. We can’t do it unless we build out these vocabularies and these semantic technologies to be able to do that sort of retrieval, dynamism, and personalization that’s called for that people expect these things to do that they’re still not doing today.

If you’ve ever played with the back end of a chatbot, you’ve started this process. You know the question that you think somebody’s going to ask, program hard, and code in what you want the answer to be. Where this has to head is the system needs to be able to understand the question and deliver the information out of this basis of information that we have on hand in the business.

I feel like we’re not anywhere near that now in most businesses, but that’s the progression. Take the thing that you’ve already done, and then think about that 100 times, 1,000 times, and then beyond. Now, you’ve got a system that can actually add real value on the fly. We can’t add value on the fly in most businesses now. We’re assuming things. We’re guessing. You talked about failure mode delivery. I’m guessing where the problems are going to be. It would be better to be able to react to an actual problem.

It’s amazing because we have so much technology now that can give us the inbound signals from the user to know what they’re doing and what they’re thinking. Even if we don’t even know explicitly who they are, we know a plethora of things about them. Where they’ve been, where they’re going, even as anonymous users. We need to get away from only prescriptive construction and delivery of content.

It doesn’t disappear. You still have prescriptive organizations of collections, but we need to add to that prescriptive model the truly dynamic assembly on the fly. We need to combine that with the inbound signals, and the intelligence of the relationships of all of the information as objects to do that assembly. That’s where we’re taking advanced semantics now.

We’re not alone. There are numerous companies that are literally a year and a half into this. By the time you see this come out, I think it’s going to be another year or two until you start seeing the significant applications. Companies are going to be 3, 4 and 5 years behind. They’re going to be wondering, how did these companies get here? How did they do this?

This has been the story of Web 1.0, Web 2.0, and Web 3.0. It wasn’t a single technology that made Web 1.0 and Web 2.0 happen. It was a critical mass of technologies that somebody then capitalized on because they said, “I finally have a stack here that I could build something incredible like Facebook,” or whatever it is.

The common lingua franca of that is the consistent vocabularies, terminology and taxonomies. Without that, you have no foundation. You can't build a house without foundation. Click To Tweet

They didn’t invent Web 2.0. They capitalized on the stack of technologies that a brilliant set of technologists independently created that culminated together. It’s exactly where we are in the content and knowledge. I tend to feel that we’re moving away from information and content into knowledge. That’s the next generation of where we have to be if we are to compete.

If people want to be part of that next generation, the resources are here. Go check out and There’s a community that’s specializing in this right now that’s putting in the effort, conversation, and creation around this. You need to be a part of this because this is where it’s going. Being on the cutting edge is definitely where you want to be. We have the cutting edge sitting right in front of you right now. This is a resource. If people want to get in touch with you, just to follow up on this conversation, it’s obviously the blog and the guild. Is LinkedIn a good way to find you?

It’s easy to find me on LinkedIn. I have a very large network and I treasure the partnerships in the industry. We’re open. We’re not in competition when it comes to these things that we’re working on with each other. The value-add that we each individually bring to our businesses is unique. I’m a believer in as much open source and open sharing as possible. I’m lucky and blessed with the partnerships that the industry has organically created in this particular space.

Mike, thanks for being on the show. Thanks for being a long-term friend. I would love to have you back as soon as possible. This was fantastic.

Thank you, Chris. I truly appreciate the time and the ability to share. Thank you.

Important Links

About Michael Iantosca

WOBI 17 | Power of Content Reuse

Michael Iantosca is the Senior Director of Content Platforms at Avalara Inc. Michael spent 38 of his 40 years at IBM as a content pioneer – leading the design and development of advanced content management systems and technology that began at the very dawn of the structured content revolution in the early 80s. Dual trained as a content professional and systems engineer, he led the charge building some of the earliest content platforms based on structured content. If He was also responsible for forming the XML team and a member of the workgroup at IBM that developed DITA.

Strong Resources, Effective Terminology How to get the most out of your terminology program

Download now