Tristan Perry

How To: Add A Django RSS Feed Reader

Plagiarism Guard’s footer has a ‘Blog Feed’ section which lists the five most recent blog posts from its Wordpress powered blog. This works by parsing the blog’s RSS feed (powered by Wordpress). However I found that simply parsing the feed and finding the latest 5 posts added anything from 500 ms to 1000 ms onto a page load, which was too much. To overcome this I could have used Django’s caching framework (which supports template-level caching). However since I potentially needed to access the blog post data elsewhere in my app, I decided to periodically save the blog post data to the database and read the data from here instead. This approach has two benefits:

  • The process which updates the blog post data isn’t dependent on a user loading a page (as mentioned below, it’s handled via a Django management command triggered via a cron job)
  • Reading the top N results from the database is a lot quicker (<5 ms)

The below lists the code I used to achieve this feature.

This article was written in late 2014, and whilst a lot of the commands and advise in this article are still relevant, a couple of links might be to an old ‘end of line’ version of software

Database Model/Entity

In models.py I added the following:

class RecentBlogPosts(models.Model):
    title = models.CharField(max_length=200)
    link = models.CharField(max_length=2048)
    desc = models.TextField(null=True, blank=True)
    date = models.DateTimeField()

This could obviously be extended to save more data e.g. the blog post/description, but I didn’t personally need this extra data.

Django Custom Management Command

So that I could make the parsing of the blog feed (and saving to the DB) flexible, I went down the route of adding a custom management command within my Django app. So I added a ‘management’ folder within my app, and a blank _init_.py file within this folder. Then I added a ‘commands’ subfolder and another _init_.py file within this subfolder. The actual feed parsing uses Feedparser, which can be installed via pip as normal:

pip install feedparser

Finally I added recent_blog_posts.py within ‘commands’:

import feedparser

from time import mktime
from datetime import datetime

from django.core.management.base import BaseCommand

from plag.models import RecentBlogPosts

class Command(BaseCommand):
    args = ''
    help = 'Gets N recent blog posts. Better than parsing the list every page load.'

    def handle(self, arg, **options):
        num_blog_posts = int(arg)

        feed = feedparser.parse('https://www.plagiarismguard.com/blog/feed/')

        for blog in RecentBlogPosts.objects.all():
            blog.delete()

        loop_max = num_blog_posts if len(feed['entries']) > num_blog_posts else len(feed['entries'])

        for i in range(0, loop_max):
            if feed['entries'][i]:
                blog_post = RecentBlogPosts()
                blog_post.title = feed['entries'][i].title
                blog_post.link = feed['entries'][i].link
                blog_post.desc = feed['entries'][i].description
                blog_post.date = datetime.fromtimestamp(mktime(feed['entries'][i].published_parsed))
                blog_post.save()

Change the URL as needed but the code should be fairly self explanatory - it uses Feedparser to parse the URL, then it iterates over the top N (as passed into recent_blog_posts) results, saving them to the database model we added earlier.

Showing the ‘Blog Feed’

I added a custom tag into Django so that my template(s) simply had to reference the tag to retrieve the blog data (instead of e.g. adding a global context or template variable). So inside the app I added a ‘templatetags’ folder, with a blank _init_.py file and a custom_tags.py file which contains the following:

from django import template

from plag.models import RecentBlogPosts

register = template.Library()

@register.filter
def blog_comments(number_of_results):
    return RecentBlogPosts.objects.order_by('-date')[:number_of_results]

Then within the template where the blog feed should be displayed, you simply need to load the custom tags at the beginning:

{% load custom_tags %}

And then the following code displays the blog feed:

<h2>Blog Feed</h2>
{% with blog_posts=5|blog_comments %}
    <ul>
        {% for post in blog_posts %}
            <li>{{ post.date|date }} - <a href="{{ post.link }}">{{ post.title }}</a></li>
        {% endfor %}
    </ul>
{% endwith %}

Scheduling the Custom Management Tag

To schedule in the recent_blog_posts Django management command (which refreshes the recent blog data), I added the following cron job:

*/30 * * * * /usr/bin/python3 /home/plagiarismguard/manage.py recent_blog_posts 5

This retrieves the first 5 results - but this can be tweaked as needed. Since the template tag also supports fetching a certain number of results, there’s no specific reason for recent_blog_posts to also have this restriction. I knew I would only need the top N results so it made sense to add this argument, but you can easily change recent_blog_posts.py to subclass NoArgsCommand instead, and then the cron tab argument can be ignored.