Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Google Podcasts | Stitcher | TuneIn | RSS

Andrea Volpini creates products that help search engines and other computers find your web content.
Adding semantic meaning to your content helps artificial intelligence agents and other computers understand how your information relates to other content on the web.
Revealing these relationships helps both computers and the human users who rely on them find and use your content.
Andrea and I talked about:
- WordLift’s origins as an advanced SEO tool
- the importance of knowledge graphs for the semantic web
- how “triples” make up the semantic web
- how the semantic web is a “web of meanings” connected by links
- his origins in the semantic web and how they led him into SEO
- how knowledge graphs might eventually permit alternatives to Google to arise
- the shift from web pages to more atomic entities
- how building knowledge graphs both feeds Google the information it craves but may also be planting the seeds for Google alternatives to arise
- how structured data helps search engines like Google understand relationships that help link it to a user’s search intent
- the fact that currently the main consumers of the data in knowledge graphs are the big tech companies
- the role of Natural Language Processing (NLP) in creating knowledge graphs
- the role of semantic markup in showing the value of your content
- schemas and the schema.org linked-data vocabulary project
- the importance of an underlying content model
- the limitations of schema.org in describing some domains
- a project that he’s working on to infer content structure by examining collections of previously unstructured content
- the importance of building your own knowledge graph before Google builds it for you
Andrea’s Bio
Andrea Volpini is an Internet Entrepreneur and CEO of WordLift and Insideout10 with 20+ years of world-class experience in online strategies, digital media, and SEO. In 2013 Andrea co-founded Redlink, a commercial spin-off focusing on semantic content enrichment, artificial intelligence, and search.
Video
Here’s the video version of our conversation:
Podcast Intro Transcript
The World Wide Web has always been about connections. First it was simple links connecting web pages. Nowadays it’s knowledge graphs connecting repositories of semantically described content entities. Yep. That’s a mouthful. Andrea Voplini can help you understand these concepts. Andrea has been working with semantic web content for years. He structures content to add meaning that shows computers – including search engines like Google – what the information you publish means and how it relates to other content on the web.
Interview Transcript
Larry:
Hi, everyone. Welcome to episode number 79 of the Content Strategy Insights podcast. I’m really happy today to have with us, Andrea Volpini. Andrea is the founder and the CEO of a company called WordLift. We’re going to talk about the kind of activities that WordLift does, but first of all, welcome to the show, Andrea, and tell the folks a little bit about WordLift and what you do there?
Andrea:
Thanks, Larry. I’m really excited to be on the show today. WordLift originally started as a plugin for WordPress, and now it’s evolving outside of WordPress and helps people create knowledge graph. The purpose of this knowledge graph is to improve the content visibility over search engines like Google. So in a way you can think of us as an advanced SEO tool.
Larry:
Got it. I think that’s a common way that people come to structured data and knowledge graphs and things like that is through a concern about being found on the web. But there’s a whole other aspect to it as well. Tim Berners-Lee gave this famous TED Talk in 2009, I think, about his vision for the evolution of the web and how it would be more connected than it is. So that rather than having these siloed databases and content repositories, you could share things more openly. That gets into what your technology really helps with, is getting on the open web and there’s a very closely related idea, this idea of the semantic web. So could you talk a little bit about the open web and the semantic web, how they fit together?
Andrea:
So in Tim Berners-Lee world, I would say, “just raw data now.” The presentation that you mentioned ended up with this mantra about releasing raw data. If you think about it, it’s still so much valuable right now as a statement, especially because now artificial intelligence so deeply relies on semantically rich data. So what is the semantic web? That’s where we have to start from. As I was saying, semantic web is really, in simple terms, the connection between a document and a database. Traditionally, it’s been seen as something very complex, but in reality, semantic web, it’s just the simplest way in which you can encode knowledge for a computer to understand it. How do you encode information? How can you describe the fact that Andrea is talking with Larry on a podcast? How can a machine understand this?
Andrea:
The simplest way is that, we do it with triples So we have the subject of Andrea and then we have the subject of Larry, and then we can connect them because they’re having this interview today. We generate triples as we describe the world that we live in, in a way that the machine can understand it. That’s the semantic web. That’s it. Simple as that.
Larry:
Yeah. And that’s typically diagrammed. It’s like two little nodes, Andrea, a little dot, Larry, a little dot and then a node connecting them. That’s what the web is all about, connections. The thing that really drives the web is links. In fact, when you build these triples, each of those entities has a unique resource identifier that says what it is so that you know that this is an entity, that’s an entity, here’s how they’re connected. Knowledge graphs are just like a huge way of that happening, right? Is that…?
Andrea:
Right. If you think about it, what is bringing the information? What really conveys the information in this diagram that you described, is the relationship, right? So the relationship, meaning the arch that connects the edges, is what is creating the information that is needed to describe the fact that we’re having this podcast. So semantic web is about creating a different web than the web of pages that we are used to. It’s a web of data. It’s a web of meanings. So in this web of meanings, we just build relationships. The reason we do this is that we want machine to understand the content that we care about. That’s the reason why we would build these links.
Larry:
Right. It’s the links just both within your website that helps other people understand how you organize your content, but the real power in this is the links to other websites and to services like Google to return things like search results. For example, one of the things I think that SEOs are really concerned with now is showing up in featured snippets or knowledge graph display, knowledge panels, or those various new things that Google does. Can you talk a little bit about how doing this, how behaving in a semantic way on the web, helps you get found more easily?
Andrea:
Simple as that. I mean, you can just trigger my name, Andrea Volpini, on Google, and you can see that the brand relating to Andrea Volpini it’s composed of different elements and these elements drive most of the experience from the data in the Google knowledge graph. And by publishing data as linked data and feeding, the way in which Google sees me. And so that’s really the way in which we do marketing these days is by creating data and linking data. Now, there is a problem with these approach that, of course, if we only look at creating data as something that we do for Google, then we kind of limit ourselves. And the scope of the entire semantic web kind of sounds limited.
Andrea:
I had to deal with this because I started from the semantic web. And then all of a sudden I realized that I was building an SEO tool. We discovered it by chance. I never thought about SEO as something that would relate to semantic web technology and knowledge graph. But, but all of a sudden it became evident as Google was becoming more persistent in using the semantic technology stack. So now we have these reality where really a tool like WorldLift that creates knowledge graph it’s an SEO tool, which is totally okay for me and works well. But is this the end of it? Probably not.
Larry:
Right. Because there’s one possible consequence of all of this, is that the more people create open, accessible data and that other people might figure out ways to connect that in better ways than Google does. Because right now we’re doing it just to curry favor with Google, but there’s a nice, delicious potential irony in there that, that very mechanism might give others the opportunity to use these graphs to a different intent and purpose.
Andrea:
That’s correct. So the gatekeeper, it is Google in the world of web pages, and it is still Google in the web of data today, but we can see that as Artificial intelligence gets democratized, more solution will come forward and might provide different gate keeping services for the users. So now there is a very interesting debate about the role of Google in schema.org, the role of Google in the development of the semantic web. My personal opinion is that Google has helped us bringing this forward, because there’s no other way around it. But right now doing marketing using data, it means that you are creating a different way in which you can connect to your audience. And these can be through the Google assistant, but it can be also through Alexa search, or it can be called through Bing. There are different interfaces already among the big tech companies and a lot more it’s coming from God knows what companies.
Larry:
Right. Well and you just hit on something that I think is maybe one of the underlying behavioral things on the web that has driven a lot of this interest in structured data and knowledge graphs and things as that, is the rise of, well, first, like 10, 12 years ago, it was mobile and the desire and the need to kind of respond to that. So it was this responsive web and other technologies and practices arose from that. But with the rise of Voice and other kinds of…. and the whole internet of things and all kinds of gadgets and devices asking the internet for information, we have this whole new thing. Anyhow, can you talk a little bit about how this shift away from pages to more like entities and other ways of looking at this information helps with things like voice search and voice retrieval and things like that.
Andrea:
Yeah. Well, once again, I use myself as an example, but not because I think I’m interesting, but quite the opposite because I think I’m a very common person and there are other most well known Andrea Volpini on the web. And by providing more data and creating entities that connects one to another, we provide machines with ways to interact with our content. So you can go out and ask Google Assistant, “Who the CEO of WorldLift? And then you will get the response that, “The CEO of WorldLift is Andrea Volpini.” And will displaying a link to my page, would I get traffic or Google will just get the response to the user? Does it really matter in the business model of tomorrow?
Andrea:
Because in the business model of today, it’s a big issue, right? If Google is kind of capturing these links, it’s a big issue. But is Google really capturing all these links? Because my personal experience is that, yes, on a general scale, there are less organic opportunity. That’s a fact. It’s becoming more complicated. This is because the machine is becoming more intelligent. So if before we would have maybe 20 search engine result page where we could compete with the content. Now maybe we have two. Because the machine is understanding the synonyms and it’s understanding the intent behind different queries. And if the intent is the same, the search might be similar.
Andrea:
So we have less results being shown. And so the organic opportunity are actually less. But if I then look at people that are building data like us, and of course there is an entire industry working in this direction, then you see that these clicks are actually improving years over years. We are talking about three-figures increases in a market where we have a 9% less organic opportunities. So how can this be possible? The more data we provide to the machine, the more opportunities we have of creating an interaction. And then of course, there is a dialogue that needs to happen between the different parties.
Andrea:
Do I want to share this data with Google? Will it be beneficial for me to do it, or is it better if I keep it on my own? Because open data doesn’t mean it’s free data, right? So we can define the boundaries of what we want to give as part of the open data web and what we want to keep as part of premium content that my users might want to see behind a subscription, whatever content model we were running. So I think we are really kind of in an evolving scenario where the business model it’s becoming more intriguing than just clicks and sessions.
Larry:
Right. It’s much more subtle. One of the things that drives that is a term that you just mentioned. And I want you to elaborate on a little bit, because I’m not sure that all of my listeners will know. That notion of user intent and satisfying like a search intent or another user behavior that’s . . . intent that’s on there. Can you talk a little bit about how structured data and attaching semantic meaning to information helps satisfy that intent?
Andrea:
Right. So imagine that you are looking for the query that we discussed before, the CEO of WorldLift, you don’t know the name, you just know the company. And so you say, “CEO of WorldLift.” And it is only possible through structured data that Google is able to disambiguate a query like still WorldLift with the intent of someone looking for the page of Andrea Volpini, the CEO WorldLift. So Google is able to understand the intent behind the query because we are feeding Google with a lot of data that helps disambiguate properly, the entities behind. So a query like a CEO of WorldLift will be augmented by the search engine with what it’s called a Synthetic Query. So kind of a query that the machine creates in order to improve the results by generating the terms to be searched.
Andrea:
So Google will create a synthetic query for Andrea Volpini when the user will search for CEO WorldLift. And it’s only possible because I’m creating the relationship between WorldLift and Andrea Volpini, and the relationship is CEO. So these arc that it’s connecting my company with my entity, it’s what is providing Google with the ability to properly disambiguate a query like CEO of WorldLift in the intent Andrea Volpini. And to feed data from his index that is related to the right Andrea Volpini.
Larry:
Right. And that’s the value of the service that Google offers, but it seems like a lot of other people could offer that. And now, you just mentioned, you just said that one of the things that drives that is, I assume that there’s a lot of machine learning and ensuing artificial intelligence at Google that grows from all of this data that we’re feeding it. But with the open web, other people can access this data as well. Are there any other players out there now like the beginning of maybe some kind of a breach of Google’s dominance in controlling this kind of information?
Andrea:
I think at this point we’re still in a phase where we need to produce data, right? So the data consumers are still very limited to the big tech companies. And that’s a fact. We have to deal with the fact that the consumer of these data are still the large organization. But things are changing very fast. So the name entity recognition that WorldLift uses, it’s based on the data that it’s linked inside DBpedia and inside WikiData. So really any company now can do Natural Language Processing for whatever purpose using these data. And that has a phenomenal impact, if you think about it. Because if I am making my data available in WikiData, to let people know that let’s say, I’m managing a company or I’ve run an SEO software tool, I am providing a lot of people with the ability to understand everything that relates to my knowledge domain, whether it’s small or large.
Larry:
Yep. Can you talk a little bit about, you just used the term Natural Language Processing, which is… If you could talk just a little bit to kind of define that and talk about how Google and others use Natural Language Processing to… Because that’s one of the things that helps disambiguate your name and your title and all that kind of stuff, right?
Andrea:
Right. Well, it also helps create the triples that you want to have for transforming a document into a graph that a machine can understand. So the role of the Natural Language Processing is revolutionary in a way that allows the machine to understand down structure content that we have published online. So it’s a true revolution because I can read all the tweets about you and understand what are the topics that you’re discussing. But I can also read a webpage about my company and understand what the company does. And well, Google is doing this, yes. But can we all do it? We can.
Andrea:
It’s super accessible and it’s becoming accessible as we have more data. So we’re still in a phase where it’s hard to say, “Who’s going to be the next gatekeeper?” So I don’t have to answer for that now, but I can see that there is a momentum now in building and creating data. And I’m not afraid of the fact that, Google is taking most of the advantage of these data, as long as the publishers are in control. Problem I have is that, sometimes publishers don’t have control of their data. So they just publish webpages and then someone arrives and get these unstructured data, builds the graph for them, and then resells the metadata one way or another. And that creates kind of an asymmetric situation. Now, also Google is asymmetric in of the area of this business model. But at least there’s empower this revolution. So, yeah-
Larry:
I think that’s an important point for people who are listening, who are new to this whole idea that like, “If you don’t create a knowledge graph about your content, somebody else will, and they’ll benefit from it.” That kind of gets into this notion. You might’ve heard a lot of people talking about content as an asset and it’s really, you have the content, but it’s equally, if not more important is the metadata about that content. Is that a correct way of looking at it?
Andrea:
Now, if you think about, if Google wouldn’t have played such a significant role in the evolution of the schema.org markup, it would be very difficult today to prove the ROI of what you just said. So right now I can go to the client and say, “Hey, your content is an asset.” And I can measure the impact on the business of the content that they are producing because of Google. Now, I don’t want to limit myself to Google because then of course I can use this graph and I can then start measuring. When we build a knowledge graph, the first thing that we show, so the first level of the return on the investment would be the impact on SEO and search. So we can measure clicks, we can measure improvement on the click through rates. We can measure findability in terms of more queries being captured by the content or even better queries.
Andrea:
So maybe you might get either less traffic with such a data, but that would convert more. So that would be the first kind of layer of the return of the investment. Then I will start to say, “Hey, what can we do with this data? Can we build a recommendation system using our knowledge graph that would allow people to stay longer on our site and to get back less on Google? Can we do that?” Because that’s the next step of what we do. It’s like creating an engagement inside the site, because I can drive traffic to your site with a very broad keyword, right?
Andrea:
I can even maybe buy some traffic from other sources, Google does that. Why shouldn’t a publisher buy traffic on a broad term that doesn’t cost much. But can then the publisher, doe the publisher have the capabilities and the knowledge graph that is required for driving the user experience from that point on and letting the users find what he’s looking for? That’s the point. That’s the point of semantic data.
Larry:
I remember about, I don’t know, 10 or 15 years ago, having this insight that SEO and UX were the same thing. That they were this relentlessly user focused practices designed to satisfy user intent and to give people what they want and to help them accomplish the tasks they want to do or learn what they want to do. And to what you just said. Hey, but a couple of things you just said, you’ve mentioned several times now, schema.org. And schema is both a generic term that is a way to organize stuff, I guess. And schema.org is a particular open source project kind of led by Google or led by consortium with Google as a major contributor. Can you talk a little bit about schemas and schema.org?
Andrea:
Well, schema is a generic term for describing the way in which data is organized, but schema.org, it’s called the linked data vocabulary. So it’s some vocabulary expressed in the format of link data that allows machine to describe the things of the world. So, this is a person and that is a person and whatnot. So it’s a linked data vocabulary. It’s a way in which we can unambiguously label things, okay. So schema.org is an initiative. It’s open source, it runs some Github. It’s a W3C community. But yes, search engine has a major role in Google, among the search engine has invested the most. Because it’s driving the AI first experienced that Google has been investing on in the last year. So after mobile first, as you said, we are into Artificial Intelligence first mode.
Andrea:
And so the investment on data is the result of these kinds of change of the user experience. So the user interface now it’s the AI. So how do you feed the AI with structured data or semantic rich data? And schema.org, it’s the language that we use for conveying this meanings. Now in the world of the semantic web there are hundreds of different languages and vocabularies and ontologies that we can use to describe the things that we care [about]. So schema is only one of these hundreds of vocabulary, but it’s the one that is being more successful because it’s been the one used by commercial search engines, and Google in particular. So is there something wrong with that? I don’t think so.
Larry:
No. it’s interesting, and I’m curious about the relationship between schema.org and then there’s other more like commercial ontologies and taxonomies that are standardized and specified and things like ANSI and ISO and other standards bodies like that. How does schema.org interact with those kinds of organizing schemes?
Andrea:
Well, the idea of structural content, it’s very broad, right? You can use whatever XML format you like in order to organize the content inside your CMS. And there is there are plenty of standards for creating a content model that works for your content. That I think is where everything starts. So you start by organizing content and creating a content model that is effective for the purpose of the work that you’re doing as a publisher. Now, schema, if you want an output format for expressing this content model to agents outside of your site. So it’s a shared vocabulary that allows you to kind of express the content model, which is made of different elements connected together to the outer world. Now, in the language we approach it, then we say, “Hey, why don’t we use it to structure the content? So why don’t we bring it back? Not just see it as an output, but we see it as an input.”
Andrea:
So we start creating the content model using schema. Now this has several limitation because of course, schema is meant to describe things that are commercially relevant for an organization like Google. So I think as of today, it’s very hard to describe the life of your pet, which might be extremely relevant, but has very little commercial importance for search engine. And so it becomes very hard to… You can of course, describe the different breeds of a pet, but the pet itself it’s not well curated in the vocabulary. They are added vocabulary that you can interject into your project in order to better describe the life of your pet. Business-wise, schema is a very simplified way in which we can describe the world.
Andrea:
It is limited by the fact that Google is kind of such a dominant role, but it’s also up to us to come to the board and say, “Hey, I want to have this ontology because this is my business case, and it’s going to be interesting not for me, but for everyone.” So the Web scale of course, it’s important. And we have to create kind of a democratic approach to these and the community has to be open and so on and so forth. So schema is an output, but you can also see it as an input.
Larry:
Got it. Yeah. And in fact, there’s the whole world of like technical communications and content strategy. With a structured authoring where they’re concerned with really highly structured content going in for the purpose of reuse in a CCMS. Anyhow that’s a whole other conversation. But it’s actually interesting to make sure that people get that there’s a number of different ways that the term structured content is used. But Hey, Andrea, I noticed we’re already coming up close-
Andrea:
But we are doing-
Larry:
Oh, go ahead. Yep.
Andrea:
Well, we’re doing an experiment that with Simple [A] and Cruce Saunders right now where we’re trying to infer the content model from an unstructured document, is because the approach of a content strategy, it’s always to start from the top. Whether the approach of someone like where they’re using NLP, is to start from the bottom and try to look at the unstructured content that we have and try to infer the type of structure that you can derive from the content that you have produced over the years. So there is a lot of connecting dots between the content structure and strategies and schema markup, and SEO in general.
Larry:
Interesting. Well, I’d love to… That work you’re doing with Cruce will that be published for general consumption or is that just an internal project you’re working on?
Andrea:
It’s really at the moment an internal project, but let’s see where we get. I think it’s interesting because he has these very a deep top down approach and I’m learning a lot from him in terms of how do we structure content, what does it really mean to structure content? Whether I have these very focus on, what does the machine needs to understand in order to improve the findability? What is required for this asset to be found by different queries that they can convert? So my approach is super simple. I have to think like a search engine, I have to think, “What data the machine needs to provide more valuable traffic to this publisher.”
Larry:
Got it. Hey Andrea, I noticed we’re coming up on time and I always like to give my guests a chance before we wrap up. Is there anything last, is there anything that’s come up today that you want to elaborate on, or just that’s on your mind about the semantic web or the open web or?
Andrea:
I think one concept for all this is, build your knowledge graph. And if the motto was in the beginning, raw data now, and we started with this, I think personal knowledge graph, it’s the motto that everyone should think through.
Larry:
Oh, that’s good. And I just want to… If I remember correctly, I think he actually said, “Demand raw data.” Now he was like pretty vociferous about it. And I’m guessing you would have a similar urgency about people building their own knowledge graphs.
Andrea:
Totally.
Larry:
Right. Great.
Andrea:
Build your graph. Don’t wait for Google to build the graph about you. Build the graph before Google does.
Larry:
Great. Well, thanks so much Andrea. I really appreciate the conversation.
Andrea:
Awesome. Thanks to you. Thanks to you.
Leave a Reply