Skip to content Skip to sidebar Skip to footer

Downloading Files From Filetype Fields?

I am looking for a way to download files from different pages and get them stored under a particular folder in a local machine. I am using Python 2.7 See the field below: EDIT her

Solution 1:

As @JohnZwinck suggested you can use urllib.urlretrieve and use the re module to create a list of links on a given page and download each file. Below is an example.

#!/usr/bin/python

"""
This script would scrape and download files using the anchor links.
"""


#Imports

import os, re, sys
import urllib, urllib2

#Config
base_url = "http://www.google.com/"
destination_directory = "downloads"


def _usage():
    """
    This method simply prints out the Usage information.
    """

    print "USAGE: %s <url>" %sys.argv[0]


def _create_url_list(url):
    """
    This method would create a list of downloads, using the anchor links
    found on the URL passed.
    """

    raw_data = urllib2.urlopen(url).read()
    raw_list = re.findall('<a style="display:inline; position:relative;" href="(.+?)"', raw_data)
    url_list = [base_url + x for x in raw_list]
    return url_list


def _get_file_name(url):
    """
    This method will return the filename extracted from a passed URL
    """

    parts = url.split('/')
    return parts[len(parts) - 1]


def _download_file(url, filename):
    """
    Given a URL and a filename, this method will save a file locally to the»
    destination_directory path.
    """
    if not os.path.exists(destination_directory):
        print 'Directory [%s] does not exist, Creating directory...' % destination_directory
        os.makedirs(destination_directory)
    try:
        urllib.urlretrieve(url, os.path.join(destination_directory, filename))
        print 'Downloading File [%s]' % (filename)
    except:
        print 'Error Downloading File [%s]' % (filename)


def _download_all(main_url):
    """
    Given a URL list, this method will download each file in the destination
    directory.
    """

    url_list = _create_url_list(main_url)
    for url in url_list:
        _download_file(url, _get_file_name(url))


def main(argv):
    """
    This is the script's launcher method.
    """

    if len(argv) != 1:
        _usage()
        sys.exit(1)
    _download_all(sys.argv[1])
    print 'Finished Downloading.'


if __name__ == '__main__':
    main(sys.argv[1:])

You can Change the base_url and the destination_directory according to your needs and save the script as download.py. Then from the terminal use it like

python download.py http://www.example.com/?page=1

Solution 2:

We can't know what service you got that first image from, but we'll assume it's on a website of some kind--probably one internal to your company.

The easiest things you can try are to use urllib.urlretrieve to "get" the file based on its URL. You may be able to do this if you can right-click the link on that page, copy the URL, and paste it into your code.

However, that may not work, for example if there is complex authentication required before accessing that page. You might need to write Python code that actually does the login (as if the user were controlling it, typing a password). If you get that far, you should post that as a separate question.


Post a Comment for "Downloading Files From Filetype Fields?"