Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Google Podcasts | Stitcher | TuneIn | RSS

Carlos Evia teaches structured content authoring using DITA and similar tools at Virginia Tech.
Structured authoring offers a number of benefits, most notably easy content re-use. By carefully structuring content as it goes into a repository, it can be used later in a variety of publications and applications.
Structured authoring has its roots in the technical communications field. As other fields discover the benefits of structured content, interest in the practice has grown. This led Carlos and his colleagues on the DITA Technical Committee to develop a less-technical version of the DITA standard – Lightweight DITA – that can be used by marketers and other non-technical content creators.
Carlos and I talked about:
- his duties as a professor of communication at Virginia Tech
- his transition from journalism to technical writing to academia
- his high-level take on structured authoring and structured content
- some of the standards and formats that guide structured content: XML, DITA, etc.
- the migration of DITA from technical content to other types of communication
- the emergence of Lightweight DITA as a simpler alternative to full-blown DITA
- the three formats Lightweight DITA: XDITA (XML-based), HDITA (HTML5-based), and MDITA (Markdown-based)
- three core concepts of structured content: content reuse, single-sourcing, and content repositories
- the differences between authoring workflows and mindset between web-page-oriented CMSs like Drupal and component-oriented DITA-based systems
- how to address the challenges of writing creatively in a structured-authoring environment
- how to determine when structured authoring is the right solution for your situation
- the possible demoralizing effects of structured authoring and how they affect diversity and inclusion in the field
Carlos’s Bio
Carlos Evia is a professor of Communication at Virginia Tech, where he also conducts research for the Center in Human-Computer Interaction and serves as the faculty fellow at El Centro – Hispanic and Latinx Cultural and Community Center. Carlos is a voting member of the DITA Technical Committee and co-chair (with Michael Priestley) of the Lightweight DITA subcommittee. He is lead editor of the Lightweight DITA technical specification and author of the book Creating Intelligent Content with Lightweight DITA.
Connect with Carlos on Social Media
Video
Here’s the video version of our conversation:
Podcast Intro Transcript
Technical communicators have worked for years with structured content. Structuring content offers many benefits, like the ability to easily re-use common content elements. But when you separate content creation and its final presentation, it can be hard for authors to visualize how their work will look when it’s published. Carlos Evia helps his students at Virginia Tech deal with these issues. He’s also working on new authoring formats that promise to deliver the benefits of structured content beyond the field of technical communication.
Interview Transcript
Larry:
Hi, everyone. Welcome to episode number 75 of the Content Strategy Insights Podcast. I’m really happy today to have with us, Carlos Evia. Carlos is a professor at Virginia Tech. A university in kind of the middle of Virginia. And Carlos teaches… Well, tell the folks a little bit about your background, Carlos. The courses that you teach, the research you do, and all the other fun stuff that’s happening at Virginia Tech.
Carlos:
Well, hi, Larry. I am a professor of communication here at Virginia Tech in the Department of Communications, soon to be renamed as the School of Communication, but that’s a conversation for another day. And I have been here for about 17 years, in a few different capacities. I did not start in communication. I started in the Department of English, but always working with technical communication, leading into these new thing. It’s just working with technical content, that is not necessarily always related to technical writing.
Larry:
Mm-hmm (affirmative). And just in that 17 years, we’ve seen a little bit of change in the way technical content, and web or digital content in general, have been handled, and you’re… I think you might be the first academic that I’ve had on the podcast, which is weird, because I’m such a nerd about research and things like that. But tell me a little bit… So you… And I love that your academic inquiry spans like the humanities, the social sciences, the technical world, the communications world, and even almost vocational in the sense, you don’t like teach tools and stuff, that you teach people how to be really effective technical communicators. Tell me a little bit about have you always been working with structured content, like with DITA and technologies like that, or did you use to just not just but teach like a writing course? How did that evolve?
Carlos:
Well, in my career, I started as a journalist a long time ago. And I worked at a newspaper where there was a guy who was the page maker and that guy had to assemble the paper. So you gave this guy your content that was written just in text, in a terminal that had a very rudimentary upgrading system connected to a big mainframe computer. And the workflow was that the guy will get it and format it to make it look pretty. So that’s one of the things that got me interested in the workflow of publishing content. And in academia, I have taught… I remember being at Texas Tech when I was a PhD student, and one of the classes that I took was called writing for the computer industry. So that’s kind of what I wanted to do. I wanted to write for the computer industry.
Carlos:
So I guess, my teaching has always been connected to writing with computers or about computers. And of course, when I was a grad student, I remember teaching the introductory first year composition course that kind of anybody had to teach because that was your source of income as a grad student. But it has really been a while since I have taught a class that is writing without an intense technological component. And when we first talked a couple of weeks ago, I told you that I teach other things in the Department of Communication, especially. I’ve been teaching a course on an introduction to issues of diversity and inclusion in communication. That is not necessarily related to technical content, but sometimes, I throw in an assignment that deals with working with content management system, just because that’s in my nature. So that’s what I do.
Larry:
Yeah. Well, that’s good. And you’re kind of like not the only guy, but you’re the main… Are you the department chair? Do you-
Carlos:
No, I’m not.
Larry:
Okay.
Carlos:
I’m a professor and I used to direct the program in professional and technical writing in English, but I did it for a few years and I decided that I was just going to do my teaching and research thing. And for the time being I’m just a professor. Just.
Larry:
Just. Yeah. Just a professor. A tenured professor. That’s quite an accomplishment. Congrats on that. So over your career, you’ve seen this change. We’ve gone from like back 17 years ago when you were more than that, it sounds like because this was before you were at Virginia Tech, working as a journalist and seeing a story that you created, some technically publishing oriented person take that and turn it in. We’ve come a long way since then. Nowadays, we have very sophisticated systems for handling all kinds of content across the span, from journalism to web publishing to… And I think in technical communication, has, in some ways, been at the forefront of what I think a lot of people would regard as the gold standard for modern digital publishing which would be highly atomized, very componentized, structured content that can be reused in a number of different ways so that you could take that news story you wrote.
Larry:
And in one instance, have it spit out much like the old production for a print publication. But that might also appear as a radio snippet in a broadcast someplace, or as a teaser in someplace else. And that’s one of the things you’re really involved in, is the DITA standards committee, and is in particular, the new Lightweight DITA standard. And I think, before we get to that, I want to talk just a little bit about what DITA is and how it accommodates and helps this structuring of content and making it more available for you. So can you give folks just a little, kind of an introduction to what DITA is and how it interacts with XML and kind of how it helps with modern publishing?
Carlos:
Yeah. Before we get there, let’s talk a little about structured content and the idea of structure authoring, which is a workflow in a… Like I said, when I started working in the newspaper, I was writing my content in my terminal, no format, no nothing. It was just text. And I would send that down blind to a guy who will print it and they will format it according to whatever standards we had. And that was all in a book, the book of styles. What is a main title supposed to look like? What fonts should you use? What color should you use? In a structured authoring environment, as it has been practiced in technical communication for about 20 years now, that is actually enforced by the computer. So it’s not the dude who is in charge of telling you what your content should look like or behave like.
Carlos:
There is a schema. There is a document that behind the scenes is seeing that you follow a very specific format. And if you don’t do it, you’re going to get slapped in the hand. And it’s not going to validate. So many of those approaches in technical communication have used a version of XML, which is a markup language. And one of the main grammars of XML used in technical communication is DITA, the Darwin Information Typing Architecture, which started at IBM as a way . . . So IBM constructed all their documentation and they could be published to different channels. They could be published to print, they could be published to web because web, of course, has been around for more than 20 years. And now IBM donated DITA to their organization for the adoption of structured information standards some 17, 18 years ago. And now it’s an open standard. It’s not a tool. It’s a standard pretty much like HTML and tools can be built around it.
Carlos:
And DITA aware tools that allow people to create and publish content following the structures that DITA already has built in. Now, the standard is alive. We have meetings every Tuesday at 11:00 AM, and we come up with new ideas that allow the standard to evolve and adapt to more contemporary workflows of digital publishing. And I think that DITA is actually moving beyond that neighborhood of technical publications which started and I have seen companies of many different sizes using DITA for all sorts of digital content that, really, have little or nothing to do with technical communication.
Larry:
And that’s the other thing I wanted to talk to you about and how we first connected was over the idea of Lightweight DITA, which is a more… Is it accurate to say that it’s sort of a subset of DITA? Or in terms of functionality I know it probably doesn’t work exactly that way, but basically the idea that did appear, DITA is based in XML, but in some parts of organizations like a marketing person is not going to be comfortable in XML but they use the HTML. Or like the programmer dude is like, “No, I’ll just do it in Markdown. That’s how I work.” And so that was sort of where Lightweight DITA came from, right? Get the benefits of the DITA standard, but not obligate someone to learn XML.
Carlos:
Indeed. Lightweight DITA started as a subset of XML, probably in 2013 or around that time. Michael Priestley, who is my coauthor and he co-chairs the Lightweight DITA subcommittee with me, came up with the idea of creating a simplified version of DITA in XML that did not have all the tags that DITA has. Because if you know XML, if you know HTML, you know that there are tags with angle brackets and you kind of have to know how to use them. And DITA is very powerful, but it has a ton of those tags. And some people were like, “Well, I don’t want to learn them all because all I want to create is a simple topic that has a title and a body in probably some formatting for at least, or bold or italics”, now strong and em, but that’s another story. And that’s how it was originally conceived is a subset of the tags that are in XML.
Carlos:
And then we started working in this subcommittee to really shape that simplified version of DITA. And we realized, like you were saying, that in many organizations content can be easily trapped in a silo based on the standard, or they’ve been the Markup language that you and your group use. So you could be using XML, but your neighbor who is working in the cubicle, where we don’t have cubicles anymore at the time, of course, we’ll talk about that later. Somebody who works next to you could be working in HTML because they’re in marketing and for them XML is something that they have never used and they know that it’s geeky and they don’t want to touch it. So around 2015 Lightweight DITA split and it retained that simplified version of XML, which is now called XDITA and also we created a mapping of those basic tags of XML to HTML, and that’s called HDITA.
Carlos:
And these are like twin formats of Lightweight DITA that allow you as an author to structure your content and benefit from the known DITA capabilities of content reuse a single sourcing without having to learn XML, without having to use all the tags that are available in the main DITA standard. And even further than that, when Markdown very popular with developers, we developed mapping of the tags of XDITA and HDITA to Markdown. And that format is now called MDITA and it gives you, with obvious limitations because Markdown by definition is supposed to be simple, it gives you as an author, the possibility of creating content that will have some of the DITA functions that people have been using a Lightweight content reuse and single sourcing. And the good thing is that these formats can live together. They can all be aggregated in one single repository, and they can even…
Larry:
Sorry, at this point, we had a problem with our internet connection. We got reconnected. And then we continued by circling back to a few concepts that Carlos had just mentioned.
Larry:
I think at this point, there’s a couple of… You just used a few terms that I think are really key conceptually to the understanding a lot of this stuff. And I think you and I are using them because we’re familiar with them, but I want to maybe just back out just a little bit and talk a little bit about three concepts that just came up, the idea of reuse, the idea of a single sourcing or a single source of truth and then you just mentioned repositories. I think those are three of the kind of… Is that accurate to say that those are three of the really core important concepts that drive this whole technical implementation of how we’re treating content now?
Carlos:
I think many contemporary approaches to working with content really use those or should be using those. And first reuse, the idea of reuse is to have content components that you can call from different files or in the case of DITA topics without using copy and paste. So for example, if I’m writing an introduction to the podcast, and I say, the title of the podcast is content strategy awesomeness, and I type it in many different documents because I have a collection of documents that talk about the podcast, but then I realize, “Oh, I made a mistake.” It’s not content strategy awesomeness, it’s called strategy insights. So if I did not have the capabilities of reuse, I will have to go and change that in every document that I have that mentioned the title of the podcast. If I use a standard like DITA or an XML grammar like DITA, and I tag everything correctly, I could have like a content variable that could change whenever I say, okay, many different files you’re going to inherit this content component, that is title of podcast.
Carlos:
And if I change it on one place is going to be automatically changed in all the other places that make a reference to it. So that is the concept of content reuse. If I were working on Microsoft Word and instead of that, I was doing only copy and paste, then I will have a collection of files in which I pasted content strategy awesomeness. And when I have to change the title of the podcast, I’ll have to go into every one of those and change it. And you know that I’m going to forget all the times that I copied and pasted. So that’s in very simple terms what content reuse is, I mean, in the context of the DITA standard.
Carlos:
And single sourcing, or the single source of truth, as people call it now, is when you have a big source of content that then you are going to make a use like DITA, your content is going to be used like DITA, and you’re going to have computer applications that are going to do some magic and compute and behind the scenes to send that content to many different outputs, many different deliverables, you were talking about it at the beginning, you can have a collection of content elements that you can assemble and build into a PDF document that you print and physically delivered to somebody. You can also have that same collection of content components create a website that is going to go on the internet. You can use that same collection of content components to create content that will go to an app that will use in your iPhone.
Carlos:
[whispering, to keep his Alexa from hearing her name] And you can even create something that will go to Alexa.
Carlos:
And all of that is coming from the same collection of topics, the same collection of files, which you can also filter because it doesn’t have to be that what you give me in paper for the PDF is the same, that what you give me for the website. It can be that we put a filters in the website. You’re going to give me the content for advanced users and in the PDF, you’re going to give me the content for newbie users. And I can use all of that, apply those filters and produce many different deliverables that come from the same place automatically without having to change my content. So my single source of truth, my collection of topics is there, and that doesn’t change. What changes is the process thing and of course, that’s enabled by the structure that the standard allows. And that collection of topics in many contemporary approaches to content is probably going to be stored in a repository.
Carlos:
And that is technology that comes from version control mainly from Git and GitHub. Your collection of topics, your collection of content components that can be complete files or snippets of files, most likely is going to live in a repository where you and people working with you and your team will have access to those. And depending on your available tools, you can go in there and edit and publish or modify, apply filters, and create different outputs. Or maybe your only job is to go in there and create content and update the content. And somebody else in another department is in charge of doing the publishing and creating the website or whatever your channels that you’re publishing are.
Larry:
Mm-hmm (affirmative). And that gets into another aspect of modern digital publishing. Is it often that repository in the old days, it’s still for many if not most people, all those kinds of repositories are associated with a content management system, which is typically like an end-to-end place where you author you put the information and you manage it, you spit it out to a website or maybe some other formats. But increasingly those repositories are separated from their presentation and from the… There’s this whole new dis-articulation of these things, which is made possible by the fact that the content is modular and componentized. And even in like programming in terms object, it permits object-oriented programming kinds of techniques on your content. And I think that’s because I think so many people think about digital publishing is like, you download the WordPress, you install it, five minutes later you publish a blog post and it goes out there.
Larry:
And you think of that as like how content is managed. And that’s a simple process and WordPress does have REST APIs and things like that. Now they’re do permitted to be decoupled from that back end repository and the front end presentation. But it sounds like from everything you said that DITA and XML, and just this generic, this approach to content makes it possible to do a lot more than just publish a website or spit out an email newsletter, that like I’m thinking for example of voice technology, that’s something that a lot of companies are either dealing with right now, or will be dealing with it in six months, that this kind of systematic approach to content facilitatea that. Are there good examples of how those benefits manifest in the real world?
Carlos:
In a web content management system, you were talking like a basic implementation of WordPress or Drupal. Your goal is to publish a web page and every file or every topic that you have in your CMS is a page. It has the structure of a page and it’s a structure that probably it was created with a very strict template. So you can have consistency in those. You can have a title, you can have a slug, you can have a little preview of what the page is going to be, and then you can have the content of the page and inside of the page, you can have links and many other things that, behind the scenes, there’s going to be some HTML most likely. But the way that you write it is most likely the way that it’s going to be presented, because in your mind, you’re thinking I’m going to create a web page.
Carlos:
So I sit down and even though I’m not seeing the final presentation, I’m typing in a couple of text boxes with the awareness that when I click publish, it’s going to go to a website. And that’s the only outcome that I’m going to. If you were using a workflow with structured content with a standard like DITA or Lightweight DITA, you don’t have to be thinking page. You can think chunk, you can think this is a component. And that’s, remember we’re talking about component content management systems. And those are content management systems that instead of giving you just the preview of this is your unit and your unit is a web page, your component is going to be something that will be up for grabs for any collection of topics, any collection of files that need that component. So your component can be tiny, can be a paragraph, can be a word, or it can be something that looks more like a page or a chapter.
Carlos:
And that’s one of the most difficult things. When I introduce new students or new writers to the idea of structure authoring, the separation of content from presentation, because people will use that, we just fire up our computers and we go to Microsoft Word or even InDesign and we create something and the way we’ll be writing it, we control the formatting, we put an emoji here and we put a nice little border and off it goes. I print it. That’s still works, if I only have one type of audience. If I’m going to give it to my friend, if I’m going to give it to my mom and nobody else is going to see that content ever again. But if my content can be and should be reused, because I mean, on organization, I work for a company that is of decent size and I’m paying people good money to create that content, why not use it more than once?
Carlos:
So the mindset of my authors is I’m not going to create a page. I’m going to create a component that is going to be awesome by itself. And when it’s called, when it’s referenced, when it becomes part of something larger it might be even more awesome, because it will be more complicated and more interesting. Now, your task as an author is to be sure that your component, tiny as it is, is well-written, has all the proper tags and metadata for context. So when somebody calls it in is not something that is completely weird and out of place. So it’s a different way of thinking about where your content is going to end up.
Carlos:
And Jason Swarts, who is a professor at North Carolina State, has written about how authors think about the many directions in which content can go when you’re using a workflow that is based on DITA, because it’s not just like sitting down and writing something in Microsoft Word, and it’s not like sitting down and writing something that you’re going to do in WordPress knowing that it’s only going to go on one website. It’s content that could be used in many different approaches.
Larry:
Yeah. And everything you just said, points to an issue that I think it comes up for a lot of writers that they there’s some… I don’t know it’s not objection but sort of there’s a little friction around going to a structured authoring environment from this, because writing is this creative thing. But I’m sure there’s still room for creativity in structured authoring. And you coming out of an English background and into this more structured thing, how have you helped, I guess it might work both ways, so how have you helped writers see the benefits of structuring things this way and how have you helped the more technically oriented people see how to keep creativity in it?
Carlos:
Yeah. And Sarah O’Keefe, who is the CEO of Scriptorium here in North Carolina, she wrote about 10 years ago, an article that talk about XML is the death of creativity in technical writing, because people were complaining, “XML is going to kill the creativity because all you have is this template and there’s nothing you can do.” And she said, “Well, that’s your challenge as an author. You are now inside this specific tag, how are you going to write content that is going to be creative and that is going to properly convey a message to your audience?” So the way that I do it with my students, we tag stories and sometimes children’s stories. We tag them with DITA elements. This past semester, for example, within the story of the three little piggies, and my students have to tag it with DITA elements in a way that you can have different versions of the story. In a story that was for kids, the wolf doesn’t eat the piggies and at the end of the story, they’re all happy and they’re friends.
Carlos:
In a version that we did for grown_ups, the wolf eats the piggies and at the end, somebody kills the wolf. I don’t remember what the story was. And that really forces the new authors, in this case my students, to think in ways that go beyond that one directional approach of thinking. I’m going to put my story here on a word processor, and I’m going to print it because I only have one version of it and it’s going to be printed in this piece of paper and I’m going to give it to Larry as my only reader. If we start thinking about creativity as enhanced or challenged by the structure of a standard like DITA, then we can come up with very interesting ways of selling this story and presenting it with many different… It’s like, you remember those create your own story books when we were kids and that’s what my students did.
Carlos:
And they’ve been doing that for many years and they like that activity. They like that exercise. And I think that that’s helpful too, when they start working with more serious content, they can really think about, “Okay, these could be referenced in more than one file. How am I going to structure it so I am careful with pronouns? How am I going to structure it so that I’m careful with phrases that could be translated and could be out of context?” And that’s an interesting challenge. Again, it’s not for everybody. If you work for a company in which you have only one product, and it has only a tiny piece of paper as a manual, and you only deliver it on print, don’t go for a sophisticated workflow that involves structure authoring. For the love of monkeys, just use Microsoft word and be happy.
Carlos:
But if you start thinking, “Oh, we can go online. Oh, we can still have the print manual, or we can develop something that is going to be the voice.” Well, that’s when you have to start thinking, do I really want to write my content three times and do a lot of copy and pasting that then it’s going to be very difficult to update. Do I really want to pay three authors when I could be paying only one and pay somebody to help me develop this infrastructure? So that’s when you have to start thinking about it. Or if your content is going to be translated to many different languages. Yeah.
Larry:
Yeah, that’s a whole other thing. That’s one of the generic benefits of this approach that having single sourcing makes translation and localization so much easier. But what you just said too reminds me, I know a lot of creative people who really relish the challenge of being constrained and constricted, whether it’s a budget or a schedule or a structured authoring format. So that’s good. Carlos, I just noticed we’re coming up on time. But I always like to give my guests a chance. Is there anything last, anything that’s come up today that we didn’t get a chance to elaborate on, or that is just on your mind about XML or DITA or Lightweight DITA?
Carlos:
Well, recently we have been hearing about… In many, many years ago, some people started talking about, “Oh, XML is too cumbersome. We’re not going to use XML anymore. We’re going to use Markdown. Okay. We’re going to use Markdown.” And Lightweight DITA now allows you to create content in Markdown. But now there’s even, “No, that’s too cumbersome. If you want to structure content, start using JSON or start using something more simpler that will need something more sophisticated to produce deliverable that a human being would actually consume.” So I wonder if we are creating a complicated structure that keeps the authors removed from the deliverables that they’re going to produce, and that could be demoralizing.
Carlos:
That can also affect issues of diversity and inclusion in the field because in computer science, for years, there has been talk about, there are some professions and there are some fields that tend to benefit probably more men who have time to play with computers since they’re children and not women or minorities who don’t have computers. And the first time that they see a computer is when they go to high school. So of course, if you have had a computer on your life, and if you’re good at programming and you see a new JavaScript framework, you immediately start tinkering with it. Well, you can take whatever content somebody else, or you have created in JSON or XML or Markdown and you build a fancy website. And there are so many JavaScript frameworks for developing websites.
Carlos:
But if you’re not exposed to computers at that intensity, and you’re starting to write, and you’re like, “Hey, I created this awesome repository of content. Now, what do I do with it?” Or now you have to learn the new thing and you’re like, “Oh.” So I wonder if as we start exploring different approaches to publishing content, if we really start to, we need to keep in mind, the authors need to have that instant gratification of seeing their content published. So if we develop publishing chains that are entirely too complicated and depend on people with PhDs in computer science, we’re probably not advancing the field as a whole. And we’re just making some people make a lot of money while those who create the content are just like, “Oh yeah, just kind of ignore the content, not that important.” So something to think about us.
Larry:
Well, thanks so much, Carlos. One last thing, I want to make sure we get you, what’s the best way for people to keep in touch with you, if they’d like to follow you on social media or connect with you, what’s the best way to keep in touch with Carlos Evia?
Carlos:
I’m on Twitter, @CarlosEvia on Twitter. You might notice that I don’t tweet a lot, but I’m very active and I use it to communicate and I retweet things about Mexican politics and I sometimes get in trouble, but that’s what I do. And I’m also very easy to find here at Virginia Tech, if you go to the Department of Communication at Virginia Tech, you can find me there.
Larry:
Right. Well, thanks so much, Carlos. I really enjoyed the conversation.
Carlos:
Thank you very much, man.
Leave a Reply