Auditing content should be a part of your SEO game plan. Here is what you need to know about it.
Dave: Yeah. Do you want some background on this? Okay. So, you know, I’ve told you about Hike and how we use their tools. I go to some of their webinars, and they were talking about doing this kind of a thing and looking at the analytics and looking at your blog posts. And it’s like, if you’re not getting any traffic to something that’s been there for a while, then you’re better off removing it or redirecting it, but then combining it, or expanding on it to make it more useful. Otherwise, it’s kind of just taking space on your site and not really doing much. And I think their perspective is that you can get more use out of it if you either combine it somewhere or expand upon it or do something with it. Okay, well, that makes sense. Right? It does make sense. So, is that something that would make sense for a client, like every six months or every year? Or what do you think?
David: So, first of all, it has to do with how much content is on the client’s site. So, if you’re just getting started and there’s not a lot of content on sites, it’s probably not worth it. Right? It’s probably worth spending the limited amount of time you have producing more new content.
Dave: Yeah. Yeah. They’re coming from a perspective that there’s stuff, and you’re regularly putting it out there and…
David: Right. I think that’s where people get confused. They think, oh, I have a 10-page website. I need a content audit. Well, not yet. Right? But if you have a thousand or 10,000 posts, yeah, you probably need to do a content audit. And this comes from a pre-Helpful Content update. Even before that Helpful Content update happened a couple of weeks ago. Google was saying that the quality of content on your site, even an individual blog post or a few individual blog posts, can reflect on the quality of how Google views your site overall. So, in light of that idea, if you have a few half-written, half-baked, thin-content articles that were clearly not really helpful, then Google might, if you have enough of those, and Google is obviously not going to say what the threshold is, but if you have a few of those that might reflect poorly on your site, we need to address that. Right? So, looking at the search engine traffic to those pages is an indicator, right? If you’re getting zero traffic over a significant amount of time, maybe three to six months, open up the data. I see it covering at least three to six months. And you’ve got no traffic to that post, and it’s been on the site for a while. I say traffic. I mean, Google organic traffic. Clearly, Google does not like that content for some reason. They’re not sending people to it. So, at that point, you need to identify, okay, do we need to do something with this? Right? Another place you might look is in Search Console under, oh, they just changed the name of it, the site crawl. Let me see if I can find the name.
Dave: Oh, yeah.
David: I need to update the Game Plan because I’m talking about the old thing. Under page indexing, you can look at pages that are not indexed. You want to look at two things, “crawled, currently not indexed,” meaning Google has actually looked at the page and decided not to index it. Right? And “discovered, currently not indexed.” So, discovered means Google knows it’s there but hasn’t bothered to crawl it yet. So, of the two of them, “discovered not indexed” suggests Google thinks it’s not worth them even looking at. That might be a problem of a lack of internal links to it. Maybe it’s really, really deep in your website. One of your very first blog posts. And so, it takes Google so long to get to it that it’s not even bothering to. But “crawled, currently not indexed” means Google’s actually taken a look at it and decided this is not worthy of being in the index. And so, what you should see as a correlation to “crawled, not indexed” is it means you’re not going to get any Google traffic. If you’re just looking at Google traffic and search and Google Analytics or something like that, it might be that you’re talking about something that is helpful and unique, but frankly, no one else is interested in it. Right? Google might have it in the index. But no one is searching for it, and Google’s not serving it to anybody. For instance, I have a client that has been around for decades. They’ve got all blog posts about giveaways for cell phones that no longer exist. Right? Sign up, and we’ll give you a Galaxy version one, right? Okay. That’s just not helpful content anymore. So, what we want to do is… Actually, let me just do something here real quick. So, we have a lot of content on Curious Ants because every time we do this, we record it, and then we also put a video on the site. So, let me share my screen. Okay, so here’s… Oh boy, look. Woo, something’s going wrong. We’ll have to check into that. Right? But right now, we’re not going to talk about that. We’re going to look at this. So, here’s…
Dave: Actually, if we have time, I would like to look into that. That would be very interesting.
David: I don’t know if I want that to be public data.
Dave: Okay, well then, we can turn the recording off.
David: Okay. So, here’s the page indexing report. And Google’s now saying here are the pages indexed, not indexed. That’s all we look at, not indexed. We turn that off. It tells us why some pages have redirects. Okay, that makes sense. Not a problem. We don’t want pages with redirects indexed. That’s why we put the redirects in there. So, that means the redirects are working. So, not a problem. We might review that to make sure we’re not accidentally redirecting something. But that’s my block by robots.TXT. Great. Not indexed. That’s exactly what we want. We might look at this list just to make sure there is nothing robotsed out that we don’t want. Robotsed out? But great. Excluded by no index tag. Okay, great. We don’t want those indexed by Google, either. That’s why we put a no-index tag on, right? That’s doing what it’s supposed to do. Alternate page with proper canonical tag. I’m foggy about what that means. So, let’s look at this. Oh, okay. Okay. Okay. Okay. So, these pages are not indexed and served by Google. So, there are some with campaign URLs and… Okay, great. This URL with the UTM codes, I don’t want to be indexed. And the canonical tag is telling Google, don’t index that version. Look at this version. So, I can look at this using this, and I can see why. This is the canonical tag, even though the URL is that. Right? So, that’s doing what I want it to do.
David: So, that’s actually good. Okay, what else? Soft 404. So, this is a page where there is any number of reasons there could be a soft 404. For this page, Google is saying, we don’t think this page really exists, so we’re going to treat it like a 404. So, maybe it’s a page with like lots of really, really thin content, like a navigational page or something like that. So, we can look at that and see what that is. Oh, a feed page. See how it’s got a feed in there? Yeah, I don’t want Google to index the feed pages. That’s a WordPress thing. It’s actually for the comments. So, here are the pages not found. Okay. Well, now I need to check those and make sure whether I need to add redirects or something. Okay. So, here we go. Discovered, currently not indexed. Crawled currently, not indexed. So, Google knows 159 of these pages exist, but it’s not bothered to look at them. Here are 68 pages that it’s crawled and decided it’s not worth putting in the index. So, what I should do is do a content audit to go through these pages and ask myself why.
Tricia: One thing to add. So, I was looking, and there was something that made me look at this on a client’s page. This is the client that I’m just doing a Google Business Profile for, and an agency is working on the rest. And there was something that brought me to this page. I’m not sure if it was crawled or discovered. Maybe it was discovered. I don’t know. Anyway. It helped me find that the site had just recently been hacked so that I could notify the agency to resolve it before they had a problem.
Dave: That’s good.
Tricia: That just happened this week, so I just wanted to share that. This also can tell you that, and it shows these weird things like links that are on your site. So.
David: That’s why, as a part of the process on Curious Ants, I recommend everybody check this on their site once a week. Just look at it, and make sure it’s doing what you expect.
David: That’s a really great case of finding it. You know, you can also potentially find things under security and manual actions down here. But sometimes you might find it here first. Okay. So, here’s crawled but currently not indexed. So, the first two are feed pages. Fine, I don’t want them indexed. You know, I don’t want the feeds indexed. Here’s one. It’s a particular blog post. About setting up Iubenda. Okay. So, there are a couple of links to this one from offsite and one from onsite. It’s on the site map, but Google has decided not to index it. So I could request indexing. We did a little test a while back where we all found a page like this and requested indexing and realized it didn’t help. People say, oh, submit your site to Google. Yeah. Okay, well… Since I haven’t changed this page, there’s no reason to request indexing. But what I can do, is I can pull this up and take a look. Any idea why Google is not indexing this page?
Tricia: Is it because there isn’t much text on this page? Text along with the video?
Dave: There’s basically no content.
Dave: From Google’s perspective.
David: Right. Right. So, this is probably a function of when I had the AutoNations, the automated transcripts, and I removed them in some cases. I probably just didn’t take the time to put this transcript back up.
David: Right. So, what’s happened is that Google has deindexed the page.
Dave: So, it seems like then you go through all these, and you put it to the side, and then you’re going to end up ranking them. Like, which one really makes the most sense for me to add content to and then request a reindex?
David: Right. And within here, we can go back a step, and we can export it.
Dave: Oh, okay.
David: Right? This makes it a little more scalable project.
Dave: Oh yeah.
Tricia: I’m sorry. I was looking at something else. Say that again.
David: I’m on the crawled pages that are not indexed.
David: Export them into your favorite spreadsheet. And then you can go through, and you can say, I don’t want the feed indexed. That’s doing crap. That’s correct. Not surprised this page is not indexed because there’s nothing on it. Feed page, feed page, feed page. For this page, we can actually skip a step, and we just hit that button, and we can get to look at the page.
Dave: Quick question then for you. Is there any way to mark it in Search Console that says, hey, this is okay, and I don’t want this indexed? And so, you don’t have to look at it? It just appears all the time. Right?
David: That’s possible. One of the advantages of checking Search Console once a week is that you’ll kind of start to see the stuff that’s like, okay, that’s what I expect to see.
David: If you check it so infrequently, you might find yourself going down huge rabbit holes. But the goal of checking every week is kind of catching up so that eventually, this weekly check is like, alright, no surprises here.
David: There are a bunch of pages that either we’re planning to improve, or that we’ve decided to remove or whatever. Right? So, like this one is disappointing, this is a whole blog post I took time to write.
David: So, that’s disappointing. Why is this not indexed? Well, I need to look at this. I mean, I probably want to apply some tools like Grammarly to check for plagiarism. I wrote this myself.
David: I know I didn’t plagiarize, but I want to do a deep dive and look and see if maybe it’s a little bit short. Maybe that’s part of the problem. Just apply some of your practices. Maybe it’s not really contributing anything new to this conversation. But yeah, I took the time to write that, and Google didn’t index it.
Dave: So, if you’re in this case, it’s like, well, it’s got valuable information. Maybe it’s not enough. Maybe it’s nothing new to say. But maybe it makes sense to keep it to round out your expertise. So that when people manually come in to look at it and scroll around, they’ll see it.
David: Well, so “What is Iubenda, and how to set it up?” Well, what if one of you comes here and is like, I need help with setting up Iubenda, and you do a search? And it doesn’t come up in the search. It’s useful to people.
Dave: Try just Iubenda.
David: There it is. So, it’s still helpful to users of Curious Ants. I don’t necessarily want people offsite to come to this site to set up Iubenda. Iubenda has a whole site that explains that.
David: Right. So, that’s kind of where I would start. Now, this stuff not being indexed is not getting any Google traffic. Right? So, if I did use Google Analytics to determine what needs to be audited, that’s another source of data because this is clearly not indexed. But this is the stuff that Google is telling me for several reasons. And I can learn more if I click this button about why Google did not index, and you click on learn more.
Tricia: I’m looking at one of my clients. I’m actually looking at my client’s Google Search Console that had just emailed me today. And they said that they had noticed a drop in their traffic and everything. And when I look at this, there were basically two identical pages, except the city was blocked out. And that wasn’t something that I think was… I had the URL for it. That’s how I knew. Otherwise, I don’t know that I would’ve seen it on their site unless I was physically looking for it. But this was a good, easy thing to look at for that.
David: So, here’s another way you could do this. Let’s say you notice in Google Analytics that a page isn’t getting traffic. You can go to Search Console, and you can paste that URL up here. And it will tell you what it thinks about that URL.
David: As long as you have Search Console access for that site, you can paste it into Inspect URL, and it tells you the page is not indexed. Crawled, currently not indexed. It tells you how it knows the page exists and there’s a referring page. There’s a site map. So, this is the last time it’s crawled it. Three days ago. Crawl the smartphone. We know Google crawls most content on a smartphone. No problem fetching the page. Indexing is allowed. And honestly, they didn’t even care enough to do that like it is. And then we can go to see the other pages. But you can go in and put an individual page, and it’ll tell you what it thinks. Let’s just go to the homepage. We can put the homepage in. Alright. That’s crawled. Thank God.
David: Right? Page is indexed. This is new. There’s a video on it, but it’s not finding it.
Tricia: What kind of video? Oh, it’s a Vimeo.
Dave: So, one of the things that I’m looking at, and Bryan Valentino, you can pipe in. So, I’m looking at one of our sites, it’s a shopping site, and there’s a category page that has some stuff on it, but it’s just like nine products or ten products or whatever. There is hardly any content on there, just like a sentence about each product. So that particular one is a crawled, currently not indexed. I don’t know if it’s worthwhile doing anything with this one. But I guess that comes into what we were talking about a couple of weeks ago and looking at the category pages and seeing if it’s worth it.
Dave: So, it seems like, if you’re doing a shop and you have a whole bunch of different categories, there could be a lot of them in here. And that’s okay. Potentially.
David: Yeah. There’s nothing bad about having crawled, not indexed pages. They’re just not helping you.
David: So, the bad could be like, well, there could be potential in those helping you. And so, use it as an opportunity to say which pages I should work harder on to improve. Most of the time, you go to a page, and you’re like, oh, well, that’s clearly why that page isn’t indexed. There’s no content, just a video.
David: Okay. Not surprised. Do I want it to be indexed? Well, then, I need to take some time to put some effort into that.
David: Sometimes, you go to a page, and you’re like, what the heck? I worked hard on this.
David: Right? So, that was the case with this one, right? I wrote this myself, and I think this is a legit post. So, we can use this to help us identify things. So, first of all, there are no links anywhere on the site to this page. It only found it for the site map. The referring page it’s found is only itself.
Tricia: So maybe some internal linking?
David: I might say, maybe I should add some links. Right? I know I wrote it uniquely for my side. I know I didn’t borrow it from anybody else or do any funny business, like substitute SEO for PPC. And if you’re not confident that’s not what your clients are doing, it could just go in and pull a sentence off this and put in a search. Okay. Let’s see. Go to Google, and put it in quotation marks… Okay, well, no one’s used that sentence in the history of the internet. So, it confirms it’s not indexed because it didn’t even find it on the site. So, at least I know the copy probably hasn’t been copied somewhere else. So, this might not be a content problem as much as a very deep within the site and no internal links kind of deal.
Tricia: Yeah. People can’t find it, as well.
David: Yeah. Well, Google doesn’t find it. Google is using internal links to determine how important is this page and. Are there internal links to the page? No. Google says no internal links? Well, it must not be very important, so don’t serve it up. So, that might be a thing I can do with this page. Just go throw in a couple of internal links to it.