Note: This article was written in late 2014, and whilst a lot of the commands and advise in this article are still relevant, a couple of links might be to an old ‘end of line’ version of software
Plagiarism Guard’s footer has a ‘Blog Feed’ section which lists the five most recent blog posts from its WordPress powered blog. This works by parsing the blog’s RSS feed (powered by WordPress). However I found that simply parsing the feed and finding the latest 5 posts added anything from 500 ms to 1000 ms onto a page load, which was too much. To overcome this I could have used Django’s caching framework (which supports template-level caching). However since I potentially needed to access the blog post data elsewhere in my app, I decided to periodically save the blog post data to the database and read the data from here instead. This approach has two benefits:
- The process which updates the blog post data isn’t dependent on a user loading a page (as mentioned below, it’s handled via a Django management command triggered via a cron job)
- Reading the top N results from the database is a lot quicker (<5 ms)
The below lists the code I used to achieve this feature.
Database Model/Entity
In models.py I added the following:
class RecentBlogPosts(models.Model):
title = models.CharField(max_length=200)
link = models.CharField(max_length=2048)
desc = models.TextField(null=True, blank=True)
date = models.DateTimeField()
This could obviously be extended to save more data e.g. the blog post/description, but I didn’t personally need this extra data.
Django Custom Management Command
So that I could make the parsing of the blog feed (and saving to the DB) flexible, I went down the route of adding a custom management command within my Django app. So I added a ‘management’ folder within my app, and a blank _init_.py file within this folder. Then I added a ‘commands’ subfolder and another _init_.py file within this subfolder. The actual feed parsing uses Feedparser, which can be installed via pip as normal:
pip install feedparser
Finally I added recent_blog_posts.py within ‘commands’:
import feedparser
from time import mktime
from datetime import datetime
from django.core.management.base import BaseCommand
from plag.models import RecentBlogPosts
class Command(BaseCommand):
args = ''
help = 'Gets N recent blog posts. Better than parsing the list every page load.'
def handle(self, arg, **options):
num_blog_posts = int(arg)
feed = feedparser.parse('https://www.plagiarismguard.com/blog/feed/')
for blog in RecentBlogPosts.objects.all():
blog.delete()
loop_max = num_blog_posts if len(feed['entries']) > num_blog_posts else len(feed['entries'])
for i in range(0, loop_max):
if feed['entries'][i]:
blog_post = RecentBlogPosts()
blog_post.title = feed['entries'][i].title
blog_post.link = feed['entries'][i].link
blog_post.desc = feed['entries'][i].description
blog_post.date = datetime.fromtimestamp(mktime(feed['entries'][i].published_parsed))
blog_post.save()
Change the URL as needed but the code should be fairly self explanatory – it uses Feedparser to parse the URL, then it iterates over the top N (as passed into recent_blog_posts) results, saving them to the database model we added earlier.
Showing the ‘Blog Feed’
I added a custom tag into Django so that my template(s) simply had to reference the tag to retrieve the blog data (instead of e.g. adding a global context or template variable). So inside the app I added a ‘templatetags’ folder, with a blank _init_.py file and a custom_tags.py file which contains the following:
from django import template
from plag.models import RecentBlogPosts
register = template.Library()
@register.filter
def blog_comments(number_of_results):
return RecentBlogPosts.objects.order_by('-date')[:number_of_results]
Then within the template where the blog feed should be displayed, you simply need to load the custom tags at the beginning:
{% load custom_tags %}
And then the following code displays the blog feed:
<h2>Blog Feed</h2>
{% with blog_posts=5|blog_comments %}
<ul>
{% for post in blog_posts %}
<li>{{ post.date|date }} - <a href="{{ post.link }}">{{ post.title }}</a></li>
{% endfor %}
</ul>
{% endwith %}
Scheduling the Custom Management Tag
To schedule in the recent_blog_posts Django management command (which refreshes the recent blog data), I added the following cron job:
*/30 * * * * /usr/bin/python3 /home/plagiarismguard/manage.py recent_blog_posts 5
This retrieves the first 5 results – but this can be tweaked as needed. Since the template tag also supports fetching a certain number of results, there’s no specific reason for recent_blog_posts to also have this restriction. I knew I would only need the top N results so it made sense to add this argument, but you can easily change recent_blog_posts.py to subclass NoArgsCommand instead, and then the cron tab argument can be ignored.