As one of the potential solutions identified by the community to help enable institutions to expand their online course offerings, the Scaling up online learning project has been exploring the viability and usefulness of developing a directory of UK online courses. As a very first exploration of this, Michael Webb, Jisc’s Director of technology and analytics, has put together a prototype directory, and I caught up with him to ask him about the process and the outcomes.
How did you go about putting the prototype together?
The working group of the Scaling Up Online Learning project provided a list of webpages that contained online course listings, and this was used as the base for identifying the content to populate the directory. From this starting point, Michael analysed these and came up with a search strategy which combined custom Google searches, a custom page crawl and ‘scape’ details.
Firstly Michael had to find a way to determine whether or not a page was about an online course and he took two approaches with the aim of achieving this:
- The search starts (at a given institutions url) under a common url (e.g. http://course.uni.ac.uk/online)
- Searches for the words “online” or “distance learning”
- Checks the page title to see if it looks like a course and (if it does) the search then trys to find basic course information
- The search goes to a single page that lists all online courses (at a given institution)
- Then uses a custom crawler to:
- Check the page for links that look like course pages
- Goes to any pages that do fit this search and then tries to find basic course information
So what is a course? For the purpose of the prototype, it looks for: ‘Online’, ‘MSc’, ‘BSc’, ‘MA’, ‘BA’. ‘MRes’, ‘Phd’, ‘MBS’, and ‘LLM’. This does not produce perfect results, but it does a pretty good job of capturing many of the courses contained within the University webpages that the prototype is searching and in total the prototype directory found around 500 courses.
Once the course(s) were found they are put into the database and for each course, an attempt is made to collate basic course information:
This produced more variable results, and was just under 50% successful. It wasn’t impossible to find this information, but it was pretty hard. Success depended on how the university had structured their site/pages. The process has demonstrated the need to be realistic about how much that it is possible to pull out and whether it is worth pursuing.
- No straightforward way of determining that a page is about a course. For example a non-credit bearing course may be listed as “Introduction to British History: 900-1350”
- Course listing pages mix online courses with other types of courses (e.g. part-time) so not clear to a search engine which is which
- Links to online courses go to a general course page with no mention of online delivery. In this example both a search engine and a human would not be able to identify the online course(s).
- Getting course details (other than course title)
- False results (non-course information is returned in the search results)
- Human intervention/moderator
- XCRI-CAP feed (machine readable version of the data) – (XCRI-CAP is the UK standard for describing course marketing information. It shows how to structure the information, defines and names the data components and specifies the types of data permitted within each component.)
- University course descriptions or site design make it clear which courses are ‘online’ or ‘distance learning’ so that they can be captured during searches.
A workshop was held during March with the project Working Group members who had indicated a particular interest in this area of the Scaling up Online Learning project to demonstrate the prototype to them and to share these initial findings. As well as providing valuable feedback throughout the workshop, the working group members agreed that it would be worthwhile developing the directory a little further to help explore it’s viability within the marketplace – their feedback will also feed into a broader service design review before final decisions are made on the future of the directory.
If you would like to take a look at the directory yourself, you can view it here: http://suol.azurewebsites.net/index.php