Home » Blog » SEO Group Coaching » Why Does Robots.txt Return a 403 Error, and How Can You Fix It?
Learn why your robots.txt file might return a 403 error, what it means for SEO tools, and how to fix it.
David: You’re asking a question about a robots.txt file and a header code.
Tricia: Yes, so one of the things I know we previously talked about is that Ahrefs has a free thing where we can go in, and it’ll give us a little bit of a report. So, that’s where I saw it, and basically, it’s saying that the error is that the robots.txt file is not accessible. It’s 403 forbidden.
David: Yes.
Tricia: So first, what does that mean? And second, how do I fix it?
David: Okay, good, good, good, good. So great. Ahrefs is a big paid tool, but it’s also a really good free tool.
Tricia: Yeah, this is the free tool.
David: As with any third-party tool, we have to understand that it’s third-party data. It’s not necessarily what Google’s thinking.
Tricia: Okay.
David: So, what we can do is put that into Search Console and see if Google is interpreting the same 403.
Tricia: Okay.
David: So, remember there are certain header codes that are sent to browsers, or in this case, search bots, whenever a page is requested. What we typically want to see is a 200-header code, meaning yay, the page exists, and you’re allowed here. Sometimes, it’s a 300-something code that means the page ain’t there anymore. It’s over here. Now, 400 means the page ain’t there, or you’re not allowed to see it. 500 means there’s a server error, and the server has a problem not letting you do it. So, 403 means a page may or may not exist, but if it does, you’re not allowed to see it.
Tricia: Okay.
David: So that’s a 403 as opposed to a 404, which I think most of you’ve heard about, which is the page isn’t there. I’m speaking in SEO language. “Page ain’t there” is a technical term. So, my Kansas is showing.
Tricia: So, I went to Google Search Console, and I put it in, and it’s not giving me any codes, but it says the URL is not on Google.
David: The first thing I would do… I actually went and checked this before we got on the call. The first thing I would do is look inside the robots.txt file, which is as simple as requesting that URL in your browser. So, I should say, for instance, Shopify’s default robots.txt file says the Ahrefs bot is not allowed to view the site. And so Ahrefs can’t scan a Shopify site unless you allow Ahrefs to. But that would mean it would give you a different error. It wouldn’t give you a 403. It would just say not able to crawl site, not allowed. But that’s the first thing I checked, just to confirm. Now, as I talk about it, it’s not going to send a 403 error. It would basically say I’m not allowed to look at that site.
Tricia: Right.
David: Okay. So, what this means is probably something in the server, or maybe even DNS or Cloudflare, that is preventing the Ahrefs bot from getting to the robots.txt file. Cloudflare has all kinds of bot protection services, which is one of its values, but it can be really aggressive. I mean, we all know sometimes web developers hate marketing people, and they hate SEO, and they think, “Oh, I’m not going to allow any SEO bots to come to my site.” And well, they hamstring us because we want the data to make the best decisions. So, what’s probably happening is that the tools in something like Cloudflare or potentially the host, or something like that, are not allowing the Ahrefs bot to get to the robots.txt file.
Tricia: Okay.
David: But it’s not necessarily preventing Google from getting to the robots.txt file. Let me pull up something really quick. So, there used to be a tool in Search Console that would allow you to check your robots.txt file in Search Console. They moved it, but Bing Webmaster Tools has it. So, you can use a robots.txt checker in Bing Webmaster Tools, and that will show you whether or not Bing has access to it. If Bing doesn’t have access to it, then it could potentially be an issue. There’s a problem accessing some sort of server thing that’s not allowing text files. I don’t know what it might be.
Tricia: Okay.
David: My intuition is telling me that something like Cloudflare is preventing the Ahrefs bot from accessing the robots.txt file.
Tricia: Okay.
David: This is the problem, you want to control the Ahrefs bot through robots.txt because it’s an honorable bot. It will honor the robots.txt. So, what you’re doing is you’re telling an Ahrefs bot, potentially giving it permission to crawl the whole site except for the robots.txt file, which potentially would be telling it not to crawl.
Tricia: Yeah.
David: Right? So, in other words, this is a problem for third-party tools like Ahrefs bot, not necessarily the site. I can access it. You could go to some of the redirect checkers, like Redirect Detective. You could try it in there and see what happens. However, depending on how aggressively the rules are set, it might throw a 403 error there, too, because it might not allow that bot to access it.
Tricia: Yeah, okay.
David: This sounds to me like somebody has been overly aggressive in the development team or the IT team, or maybe they’ve had problems in the past where web spam or something, and they’ve just been a little too aggressive, and they’ve turned on too many features. So, this leaves you to either say, well, we’re not going to use Ahrefs, or please allow Ahrefs and not allow other tools. Ironically, developers doing this aren’t really helping. They think they are, but it’s not these bots that are the problem. It’s the bots that don’t have ethics that cause overload problems and stuff like that. These bots are really well written. They’re very careful not to overwhelm your system. These are credible bots managed by credible companies. It’s the ones that are the nefarious, different ones. Those are the ones you want to prevent, but unfortunately, those are harder to prevent because there are so many of them out there.
Tricia: Okay.
David: That’s probably what’s going on. I’d confirm this with Bing Webmaster Tools because, obviously, we at least want Bing Webmaster Tools. That’s where I would look. I think that’s probably what’s going on.
Tricia: Okay, perfect. I’ll check this.
SEO seems hard- you have to keep up with all the changes and weed through contradictory advice. This is frustrating and overwhelming. Curious Ants will teach you SEO while bringing your website more traffic and customers- because you’ll learn SEO while doing it.