Have you ever heard of youtube-dl?
It is an application that allows you to download videos (and other media) from YouTube, but also from more than 700 other supported websites. The application is actively maintained by a group of developers, and as its source can be found on a publicly available repository on Github anyone is free to contribute.
Usage
The application only requires you to have Python installed on your system and is quite simple to use. You just open up your terminal and pass the URL to the video you want to download to the application as follows:
youtube-dl <VIDEO_URL>
The application also allows you to specify options for extracting audio only, bypassing geological restrictions, specifying filters for playlists and so on. If you’re not too experienced in using terminals then don’t worry; there are also applications available around GitHub that offer a more friendly user interface, such as youtube-dlG.
Youtube-dl architecture in DESOSA 2016
If you are interested in finding out more about youtube-dl and its software architecture you may want to check out the DESOSA: 2016 (Delft Students on Software Architecture: 2016) book. In a group of four we contributed to this book by writing the chapter on youtube-dl. In this chapter you can find out all about the application, its developers, its features and architecture. You can also check out chapters on other interesting software systems such as Atom, GitLab and Ruby on Rails in this book to learn more about them. Enjoy reading!
Creating your own extractor for youtube-dl
Is one of your sites not supported? Then you might want to try to add support by writing your own extractor for those sites in youtube-dl, which is in most cases actually not that difficult (but it depends on the site, of course).
In this blog I will give you a quick summary of how you can add support for new sites. The first thing you want to do is forking the youtube-dl repository on GitHub. Let’s say the site you want to support is called ‘my_video_site’, then what you want to do is creating a new file called my_video_site.py
in the youtube-dl/extractor
directory on your fork. Then your extractor implementation in the my_video_site.py
file will need to look something like the following:
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MyVideoSiteIE(InfoExtractor):
IE_NAME = 'MyVideoSite' # Here you put the name of your extractor
_VALID_URL = r'<REGEXP>' # Here you define a reg. exp. matching valid URLs
# You define your tests in the _TESTS variable as a list
# of dictionaries representing the tests that each consist of
# an example URL and the expected outputs of the extractor
_TESTS = [{
'url': '<EXAMPLE_URL>', # An example URL that could be passed
'info_dict': { # Dictionary expected from _real_extract
'id': '<VIDEO_ID>',
'url': '<FILE_URL',
'ext': '<FILE_EXTENSION>',
'title': '<VIDEO_TITLE>'
}]
def _real_extract(self, url):
# In this function you do the actual extraction of data needed to download
# the file, which the function will return in the form of a dictionary
# <Your extraction code here>
# Return the dictionary with the information about the video to download
return {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': video_formats,
'thumbnails': video_thumbnails
}
When your extractor is ready you also need to update the extractors.py
file in the same directory by adding an import of your extractor by adding the following line:
from .my_video_site import MyVideoSiteIE
And that’s it! Now you should be able to download videos from the site for which you just added an extractor. If you think your extractor may be useful to others you can file a pull request to the youtube-dl repository as well.
Cheers!