How Do You Troubleshoot Indexing Problems?

Indexing your pages is important for visibility. Here’s how to troubleshoot indexing problems when they arise.

Video Transcript:

Onawa: I have one site that I’m having issues with. I didn’t build it, but I manage it now. And Search Console hated it and wouldn’t index it. It finally did.

David: When you say Search Console indexed it, what metric were you looking at?

Onawa: Just pages, just indexing it all together. It finally did the home page. When I ask it to recrawl a secondary page, it says it’s not available. It cannot be indexed to it due to a sitewide issue. But it is doing the homepage.

David: Okay. Oh, this is going to be a ton of fun.

Onawa: It hasn’t been so far.  

David: No. Yeah. I bet it hasn’t. Okay. So, I’m going to go ahead and share my screen. And we’ll pretend this issue is on my site rather than your client’s site. So, here’s the overview report when you log in to Search Console. So, I would not use performance to determine indexing. I would use indexing. Right? I just want to confirm that’s what you’re looking at. Okay. So, you can go to full report, and it’ll tell you why pages aren’t indexed. Do you see anything there?

Onawa: It just says zero not indexed, one index.

David: Okay. And it’s just the home page that is indexed?

Onawa: Yeah.

David: Okay.

Tricia: But it says not indexed is zero?

David: Okay. Here’s how I would work through this problem. The first thing I would do is I would go to view your robots.txt file.

Tricia: That was going to be my first place to go. Oh, I’m excited. See, I’m learning.

David: Can you view the robots.txt file on this site?

Onawa: Yes. I’m 100% certain I can, but where it’s going to be viewable is…

David: So, all you have to do is, after the homepage, type robots.txt. And if it’s not there, then…

Onawa: It’s there.

David: Can you copy and paste the content of that file into the chat, please?

Onawa: Yeah. I think I just changed this yesterday.

David: Let’s just see what it says. Okay, user agent, everything. Disallow WP admin. Okay, so what this is telling us is any user agent can see everything but WP admin. Okay. Now, I might be a little foggy on this, but… I’m going to stop sharing for a second. I don’t think Google follows the disallow. No, wait, no. Okay. Yes, it follows the disallow. It doesn’t follow the allow. So, you’re telling Google or anybody they just can’t look at this page, but they can look at anything else. So, what we’ve determined is that we’re not preventing Google from viewing anything but, in this case, the WordPress admin section of the site. So, that shouldn’t prevent any problems.

Onawa: I think I just changed that yesterday. So, I think it was disallowing more, probably.

David:  Okay. So, let’s check this because it’s WP admin. We know this is a WordPress website, right? So, go into your WordPress. Can you log into the backend of WordPress on this site?

Onawa: Yeah.

David: Okay. Yeah, that’s fine.

Onawa: I had some sites that I didn’t have access to hosting get taken over, and it was a giant hassle.

David: Yeah, I bet.

Onawa: I’m still trying to get access to their hosting.

David: Okay. So, you’re in the backend in the admin, right? So, go over on the left and look at the settings, and then reading.

Onawa: It does not say discourage.

David: Okay. So, it’s unchecked. Okay. There you go. All right. So, what we’re learning is important. Google is not seeing it, but it’s not because we’ve forbidden Google from seeing it. Right? This search engine visibility button is the bane of many WordPress developers.

Tricia: Yeah.

David: And I’ve made so much money from people who have forgotten to uncheck that I probably could have bought a car. In fact, one of my best friends who helped design Curious Ants forgot about unchecking this on a website he launched, not mine because it’s the first thing I check. It was another site. He was so embarrassed that he built a website to check externally that this is not checked, right? You could enter a URL and just check. Okay, so I’m going to share my screen again. So, we’re within the search console, right? In the very top here, with the box or with this URL inspection, you can type any URL in here. So, grab a page from your site that you think should be indexed.

Tricia: That one has your WP admin.

David: That’s okay. You can put any URL of your website here.

Tricia: Okay.

David: Okay.

Onawa: It says the URL is not on Google.

David: Right? Just like mine does, right? And in my case, it’s the admin, so I don’t want that. But it tells you a little more information. This says it’s not indexed by Google. There are no referring pages. It’s not in the sitemap. Does yours say anything like that?

Onawa: Yes, it says the same.

Tricia: Which one are you using? Are you using one with WP admin or a regular one?

Onawa: Just a regular.

David: So, we’re going to test live URL on yours, too.

Onawa: It’s saying not available to Google.

David: Okay, So, here, if you look at mine, it tells you why. This page cannot be indexed, blocked by robots.txt. That’s why Google’s not allowing this. Are you getting any information about why?

Onawa: It’s just not available due to a sitewide issue.

David: Sitewide issue.

Onawa: Yeah, that’s the top one. And then page fetch it says failed robots.txt unreachable.

David: Interesting. So, you’re not getting any conditions met or anything like that?

Onawa: There are certain conditions that are met, but it just sent me to common tasks.

Tricia: Are we recording? Can she share hers?

David: We are recording, but we’re trying to troubleshoot it in a way that we don’t have to give away this information to the public. So, here’s what I want you to do. Type in your homepage. And type in your robots.txt file in the inspect URL.

Onawa: Okay.

David: And then test the live URL.

Onawa: It’s saying the URL is not available.

David: Okay. So, you notice that on my site, I typed in robots.txt, and it is available on Google. So, there is a setting that, for some reason, either on the server or could be in your security settings of this website, is preventing Google from even seeing the robots.txt file.

Tricia: Well, I have a question because I did that same thing with robots.txt, and it says mine’s not on Google.

David: Well, did you do the live test?

Tricia: No.

David: Test it and see if Google’s even able to see it. And in this case, Google is able to see this robots.txt, but Onawa’s one Google is not allowed to see for some reason. So, there may be some sort of setting on the server that is preventing Google from seeing the robots.txt file, which is also preventing Google from seeing anything but the home page. You don’t have access to the server yet?

Onawa: I can access any part of this particular website.

David: Okay. Do you have access to the htaccess file?

Onawa: Yeah.

David: A simple typo in the htaccess could prevent everything but the home page. That’s why I also… Sometimes WordPress websites will use something like WordFence as a security plugin. Do you know if you’re running that?

Onawa: Yeah, this predated me putting WordFence on there. I don’t know. I haven’t tried some of the other sites.

David: Okay. So, if it predated WordFence, then that’s probably not it. But if you can look at your htaccess file, I would check that for potential errors.

Onawa: Okay.

David: Maybe something that only allows you to access the home page. You can view other pages on the site except for the homepage as a user, right?

Onawa: Yeah, yeah.

David: But Google can’t. But also, Google can’t even access your robots.txt file. Now, what’s interesting is that the problem isn’t that Google is not allowed to see the robots.txt file because Google does not require you to have a robot set .txt file. But whatever problem is preventing Google from seeing the robots.txt file might also be preventing Google from getting to the rest of the site. So, I’m not saying open up the robots.txt file. I’m saying whatever is holding that back is probably the problem. And a server setting, a security setting, and htaccess setting are where I would check. And I wish Dave were here because he might have some other ideas. But we’ve ruled out the big ones. And we’ve also ruled out some sort of technical… Well, we haven’t ruled it out, but it’s probably not likely some sort of technical problem where Google can’t read links to the site. Like, I have a client that just built a site in a JavaScript framework. And we realized that when Google crawls the site, all it sees is white space because Google can’t read the JavaScript framework. So, it’s absolutely unindexable, except for the home page. That’s probably not what’s going on here.

Onawa: Yeah, it’s not a new site by any means. I just kind of took over the management, and I’ve been sort of the one to look out for a lot of SEO-type stuff.

David: Yeah. So, here’s another test I would do on an interior page. The Google mobile-friendly test. Okay. We’re going to enter an interior page into this tool. Now, what’s really great about this tool is not only will it tell you if the page is mobile friendly, which isn’t really our issue here, but this gives you a screenshot of what that page might look like. Now, in this case, it’s not available. So, we’re going to actually go to a page that’s available and make it do a real test. So, we’re going to let it do its thing. It’s going to take apparently longer than I thought to do its thing. And we’re still waiting for it to do its thing. So, a couple of years ago, John Mueller from Google recommended that you use this tool to determine whether Google could even read your page, which is outside the use of this. This is for testing mobile friendliness, making sure you have a responsive website, basically. And so, how do you do this? We look at this page, and it’s mobile-friendly. Okay, that’s fine. That’s not really our question. We want to view the tested page. And when we view the tested page, this is the copy that Google has downloaded on this page. So, we see something really important. We see HTML. Right? And theoretically, we can go to this page, and we can go to a snippet, members of the colony. I can find this. It’s in the meta. But it’s also in the paragraph. So, I know that Google is reading it. But I can also look at the screenshot. And this is what I really want to see. If Google sees something different, this is what Google is able to access. So, if you put one of those interior URLs in here, and it says this page is usable on mobile, which it might, not because you remember the first page that I entered in was a backend membership page. You had to be logged in to view that page.

Onawa: Yeah. It told me no for this one. I did get the htaccess file, and it’s got a lot on it.

David: Right. Modern web does not require me to use htaccess as much as I used to. So, I’m not as skilled at htaccess as I once would have been. I would troubleshoot that first. And this goes back to, like, literally, my first week working in SEO at an agency. I had my boss give me an htaccess file and said, “Upload this to the site.” And I said, “Wait, there’s an error.” He goes, “Shut up. You don’t know what you’re doing. Upload this to the site.” I did it, and six months later, that website was kicked out of Google. Not for any violation, but because basically the rule said, don’t index anything. I wish I had taken a harder stance on that because even one week into SEO, I knew more than my boss. Yeah. That says something about the SEO industry. But that’s where I would check first. Keep using these tools. The URL inspection. You can just refresh and test it again. Right? And just keep testing and see what you find.

Onawa: Okay.

David: That’s where I would start. Does that help you?

Onawa: Yeah, yeah.

Have a question about this process? Ask it here:

Get started doing SEO today

SEO seems hard- you have to keep up with all the changes and weed through contradictory advice. This is frustrating and overwhelming. Curious Ants will teach you SEO while bringing your website more traffic and customers- because you’ll learn SEO while doing it.