Browsing the archives for the Permalinks tag.


  • Anthony Stevens

SEO Task #2 Completed: Remove Duplicate URLs

Blogging, SEO, Software

Coming off the heels of my SEO Task #1, Complete Permalinks, knowledgeable commenter Vanessa Fox said:

You want exactly one URL to every post. Pages can have multiple URLs pointing them. You want to avoid [this] by making sure there’s a unique URL for a page and that any other potential URLs for that page 301 redirect.

In my case the following two URLs pointed to the exact same content:

http://thepursuitofalife.com/2008/11/16/two-great-pictures/
and
http://thepursuitofalife.com/two-great-pictures/

I had enabled permalinks in WordPress, with a custom structure of /%postname%/. For some reason though, the version of the URL with year/month/day/postname/ redirected to /postname/

The following custom 404 Perl script fixed the problem: (whitespace removed for HTML purposes)

$qs = $_SERVER['QUERY_STRING'];
$pos = strrpos($qs, '://');
$found = 0;
$pos = strpos($qs, '/', $pos + 4);
$uri = substr($qs, $pos);
$pattern = '/(\d{4})\/(\d{2})\/(\d{2})(?P(\/.*))/';<br /> if (preg_match($pattern, $uri, $groups)) {<br /> $uri = $groups['title'];<br /> $found = 1;<br /> }<br /> if ($found == 1) {<br /> $host = $_SERVER['HTTP_HOST'];<br /> $host = 'http://' . $host;<br /> $uri = $host . $uri;<br /> $loc = 'Location: ' . $uri;<br /> header( "HTTP/1.1 301 Moved Permanently" );<br /> header( "Status: 301 Moved Permanently" );<br /> header( $loc ) ;<br /> exit(0); // This is Optional but suggested, to avoid any accidental output<br /> } else {<br /> $_SERVER['REQUEST_URI'] = $uri;<br /> $_SERVER['PATH_INFO'] = $_SERVER['REQUEST_URI'];<br /> include('index.php');<br /> }<br /> ?></code></p> <p>Originally I hadn’t put in the <strong>if (found == 1)</strong> part; but without it Firefox told me I was in an endless loop and couldn’t complete the request.</p> <p>Now the incoming URL for the year/month/day/postname version has a 301 redirect to the “correct” version.</p> <p>Fairly pleased so far.</p> <p><span class='st_facebook_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='Facebook'></span><span class='st_twitter_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='Twitter'></span><span class='st_linkedin_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='LinkedIn'></span><span class='st_email_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='Email'></span><span class='st_sharethis_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='ShareThis'></span><span class='st_fblike_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='Facebook Like'></span><span class='st_plusone_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='Google +1'></span><span class='st_pinterest_buttons' st_title='SEO Task #2 Completed: Remove Duplicate URLs' st_url='http://thepursuitofalife.com/seo-task-2-completed-remove-duplicate-urls/' displayText='Pinterest'></span></p> </div> <div class="post-foot"> <div class="post-comments"> <img src="http://thepursuitofalife.com/wp-content/themes/disciple/images/18.png" width="16" height="16" align="left" alt="" border="0" style="margin-right:4px;" /><span>Comments Off</span> </div> <span class="post-edit"></span> <span class="post-tags"><img src="http://thepursuitofalife.com/wp-content/themes/disciple/images/36.png" width="16" height="16" align="left" title="Tags" alt="Tags" border="0" style="margin-right:4px;" /> <a href="http://thepursuitofalife.com/tag/canonicalization/" rel="tag">canonicalization</a>, <a href="http://thepursuitofalife.com/tag/permalinks/" rel="tag">Permalinks</a>, <a href="http://thepursuitofalife.com/tag/seo/" rel="tag">SEO</a>, <a href="http://thepursuitofalife.com/tag/urls/" rel="tag">urls</a>, <a href="http://thepursuitofalife.com/tag/wordpress/" rel="tag">WordPress</a></span> </div> </div> <div class="sep"></div> <!--/post --> <div class="post" id="comments"> </div> <!-- post --> <div class="post" id="post-874"> <div class="post-title"> <h1><a href="http://thepursuitofalife.com/seo-task-1-completed-permalinks/" rel="bookmark">SEO Task #1 Completed: Permalinks</a></h1> </div> <div class="post-sub"> <div class="post-date"> <img src="http://thepursuitofalife.com/wp-content/themes/disciple/images/24.png" width="16" height="16" align="left" alt="" title="Date" border="0" style="margin-right:4px;"> Nov 19, 2008 </div> <!-- // post author, remove comments if you want it displayed <div class="post-author"> <img src="http://thepursuitofalife.com/wp-content/themes/disciple/images/39.png" width="16" height="16" align="left" alt="" title="Author" border="0" style="margin-right:3px;"> <a href="http://thepursuitofalife.com/author/anthony-stevens/" title="Posts by Anthony Stevens" rel="author">Anthony Stevens</a> </div> --> <div class="post-cat"> <img src="http://thepursuitofalife.com/wp-content/themes/disciple/images/34.png" width="16" height="16" align="left" alt="" title="Category" border="0" style="margin-right:4px;"> <a href="http://thepursuitofalife.com/category/blogging/" title="View all posts in Blogging" rel="category tag">Blogging</a> </div> </div> <div class="post-text"> <p>I have a mini-goal to apply the best SEO techniques to my blog over the next month. Toward that end, I’ll be reading everything that Vanessa Fox has written on the subject, in addition to searching archives of various SEO-related blogs.</p> <p>The first thing I did was set up permalinks, <a href="http://perishablepress.com/press/2008/02/06/permalink-evolution-customize-and-optimize-your-dated-wordpress-permalinks/">initially using this method by Jeff Starr</a>. It’s written for an Apache installation, so the .htaccess items don’t apply to my IIS installation, but I found <a href="http://www.keyboardface.com/archives/2007/09/07/update-for-wordpress-permalinks-on-iis/">a dead-simple custom 404 solution here</a> – which unfortunately didn’t work. After a little investigation, I put together this Perl script that took Starr’s script as inspiration, but which deals properly with the year/month/day link style from my old wordpress.com blog:</p> <p><code>$qs = $_SERVER['QUERY_STRING'];<br /> $pos = strrpos($qs, '://');<br /> $pos = strpos($qs, '/', $pos + 4);<br /> $uri = substr($qs, $pos);<br /> $pattern = '/(\d{4})\/(\d{2})\/(\d{2})(?P<title>(\/.*))/';<br /> if (preg_match($pattern, $uri, $groups)) {<br /> $uri = $groups['title'];<br /> }<br /> $_SERVER['REQUEST_URI'] = $uri;<br /> $_SERVER['PATH_INFO'] = $_SERVER['REQUEST_URI'];<br /> include('index.php');</p> <p>Now, you can visit URLs like:</p> <p><code>http://thepursuitofalife.com/two-great-pictures/</code></p> <p>instead of</p> <p><code>http://thepursuitofalife.com/2008/11/16/two-great-pictures/</code></p> <p>which should ramp up my search engine rankings.</p> <p>I'm still having one problem with URLs that Google has already indexed that look like this:</p> <p><code>http://thepursuitofalife.com/index.php/2008/11/14/some-post/</p> <p>The index.php file resolves, so I can't use my 404 handler described above. I'll need to dig into the WP default path resolution code and figure out what to do there.</p> <p><span class='st_facebook_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='Facebook'></span><span class='st_twitter_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='Twitter'></span><span class='st_linkedin_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='LinkedIn'></span><span class='st_email_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='Email'></span><span class='st_sharethis_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='ShareThis'></span><span class='st_fblike_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='Facebook Like'></span><span class='st_plusone_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='Google +1'></span><span class='st_pinterest_buttons' st_title='SEO Task #1 Completed: Permalinks' st_url='http://thepursuitofalife.com/seo-task-1-completed-permalinks/' displayText='Pinterest'></span></p> </div> <div class="post-foot"> <div class="post-comments"> <img src="http://thepursuitofalife.com/wp-content/themes/disciple/images/18.png" width="16" height="16" align="left" alt="" border="0" style="margin-right:4px;" /><a href="http://thepursuitofalife.com/seo-task-1-completed-permalinks/#comments" title="Comment on SEO Task #1 Completed: Permalinks">4 Comments</a> </div> <span class="post-edit"></span> <span class="post-tags"><img src="http://thepursuitofalife.com/wp-content/themes/disciple/images/36.png" width="16" height="16" align="left" title="Tags" alt="Tags" border="0" style="margin-right:4px;" /> <a href="http://thepursuitofalife.com/tag/blogging/" rel="tag">Blogging</a>, <a href="http://thepursuitofalife.com/tag/permalinks/" rel="tag">Permalinks</a>, <a href="http://thepursuitofalife.com/tag/wordpress/" rel="tag">WordPress</a></span> </div> </div> <div class="sep"></div> <!--/post --> <div class="post" id="comments"> </div> <div class="post"> <div style="float:left;"></div> <div style="float:right;"></div> </div> <!-- /main column --> </div> <div class="c3"> <!-- right sidebar --> <div id="sidebar2"> <br/><br/> <ul id="widgets2"> <li id="search-2" class="widget widget_search"><h2>Search</h2> <form method="get" id="searchform" action="http://thepursuitofalife.com/"> <input type="text" onfocus="if (this.value == 'Search this blog') {this.value = '';}" onblur="if (this.value == '') {this.value = 'Search this blog';}" value="Search this blog" name="s" id="s" /></form></li> <li id="recent-posts-3" class="widget widget_recent_entries"> <h2 class="widgettitle">Recent Posts</h2> <ul> <li> <a href="http://thepursuitofalife.com/this-years-poem/" title="This Year’s Poem">This Year’s Poem</a> </li> <li> <a href="http://thepursuitofalife.com/poor-martin-skrtel/" title="Poor Martin Skrtel">Poor Martin Skrtel</a> </li> <li> <a href="http://thepursuitofalife.com/f-lux/" title="f.lux">f.lux</a> </li> <li> <a href="http://thepursuitofalife.com/just-an-fyi/" title="Just an FYI">Just an FYI</a> </li> <li> <a href="http://thepursuitofalife.com/teaching-reflections/" title="Teaching Reflections">Teaching Reflections</a> </li> </ul> </li> <li id="meta-2" class="widget widget_meta"><h2 class="widgettitle">Meta</h2> <ul> <li><a href="http://thepursuitofalife.com/wp-login.php">Log in</a></li> <li><a href="http://thepursuitofalife.com/feed/" title="Syndicate this site using RSS 2.0">Entries <abbr title="Really Simple Syndication">RSS</abbr></a></li> <li><a href="http://thepursuitofalife.com/comments/feed/" title="The latest comments to all posts in RSS">Comments <abbr title="Really Simple Syndication">RSS</abbr></a></li> <li><a href="http://wordpress.org/" title="Powered by WordPress, state-of-the-art semantic personal publishing platform.">WordPress.org</a></li> </ul> </li> <li id="recent-comments-2" class="widget widget_recent_comments"><h2 class="widgettitle">Recent Comments</h2> <ul id="recentcomments"><li class="recentcomments">Alyson on <a href="http://thepursuitofalife.com/this-years-poem/comment-page-1/#comment-21872">This Year’s Poem</a></li><li class="recentcomments">Damon on <a href="http://thepursuitofalife.com/just-an-fyi/comment-page-1/#comment-21186">Just an FYI</a></li><li class="recentcomments"><a href='http://www.nerostorm.com' rel='external nofollow' class='url'>Alan Jones</a> on <a href="http://thepursuitofalife.com/nexus-7-usb-debugging-setup/comment-page-1/#comment-20558">Nexus 7: USB Debugging Setup</a></li><li class="recentcomments"><a href='http://justaprogrammer.net' rel='external nofollow' class='url'>Justin Dearing</a> on <a href="http://thepursuitofalife.com/open-plan-offices/comment-page-1/#comment-19355">Open Plan Offices?</a></li><li class="recentcomments"><a href='http://www.justaprogrammer.net' rel='external nofollow' class='url'>Justin Dearing</a> on <a href="http://thepursuitofalife.com/responsibility/comment-page-1/#comment-18804">Responsibility</a></li></ul></li> </ul> </div> <!-- /right sidebar --> </div> </div> </div> <div id="footer"> Powered by <a href="http://www.wordpress.org/" target="_blank">WordPress</a>  ·  <a href="http://wnw.blogwarhammer.net/themes/disciple" target="_blank">Disciple</a> theme </div> </body> </html> <div style="display:none"> </div> <script type='text/javascript' src='http://s0.wp.com/wp-content/js/devicepx-jetpack.js?ver=201325'></script> <script type='text/javascript' src='http://s.gravatar.com/js/gprofiles.js?ver=2013Junaa'></script> <script type='text/javascript'> /* <![CDATA[ */ var WPGroHo = {"my_hash":""}; /* ]]> */ </script> <script type='text/javascript' src='http://thepursuitofalife.com/wp-content/plugins/jetpack/modules/wpgroho.js?ver=3.5.1'></script> <script src="http://stats.wordpress.com/e-201325.js" type="text/javascript"></script> <script type="text/javascript"> st_go({v:'ext',j:'1:2.2.5',blog:'5543875',post:'0',tz:'-7'}); var load_cmc = function(){linktracker_init(5543875,0,2);}; if ( typeof addLoadEvent != 'undefined' ) addLoadEvent(load_cmc); else load_cmc(); </script>