write short note on Working with web using shell script

Mumbai University > Information Technology > Sem 5 > Open Source Technology

Marks: 5M

Year: Dec 2015

1 Answer

Working with web using shell script allows us to:

  1. Working with CURL
  2. Download web page as a formatted text file
  3. Parsing data

Working with CURL Command:

Transferring data from one place to another is one of the main-task done using computers connected to a network. There are so many GUI tools out there to send and receive data, but when you are working on a console, only equipped with command line functionality, using CURL is inevitable.

  • CURL is an easy to use command line tool to send and receive files, and it supports almost all major protocols(DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, TELNET and TFTP) in use.
  • Can be used inside your shell scripts with ease
  • Supports features like pause and resume of downloads
  • It has around 120 command line options for various tasks
  • It runs on all major operating systems(More than 40+ Operating systems)
  • Supports cookies, forms and SSL
  • Both curl command line tool and libcurl library are open source, so they can be used in any of your programs
  • It supports configuration files
  • Multiple upload with a single command
  • Progress bar, rate limiting, and download time details
  • ipv6 support

By default curl will show you the entire output on your console. A nice feature of curl is to guess the protocol based on the URL host name you use. For example if you give a URL named ftp.example.com (CURL will use FTP protocol to fetch data). But in case curl cannot guess the Protocol, then it will default to HTTP.

root@ubuntu1: ~# curl example.com

The above command will show the entire HTTP content on that example.com URL. Here also curl tried to guess the protocol. But as it didn't find any, it defaulted to HTTP.

The previously shown example command will not save the html output(It will show you the output in the console itself.). If you want to save the output to a file, you can either use redirection method in linux, or use -o option in curl.

root@ubuntu1:~# curl example.com > example.html

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 1270 100 1270 0 0 2645 0 --:--:-- --:--:-- --:--:-- 5852

The above command will save the output to example.html file.

Downloading Multiple Files using single CURL command:

CURL command can be used to download multiple files at the same time, using -O option. An important fact to note here is that, curl will use the same TCP connection to download multiple files. This is done to improve the download speed. Establishing TCP connection to a target server requires multiple processes. Its called as three way handshake. To reduce the time involved in doing a three way handshake, curl will try to use the same connection for all downloads from the same server issued by the single command.

Please log in to add an answer.