发现一个O'Reilly's CD bookshelf，并提供下载脚本

li-jiahuan

UID: 22298
帖子: 1
积分: 2
在线时间: 10 分钟

1^# li-jiahuan 发表于 2006-02-01 04:22

发现一个O'Reilly's CD bookshelf，并提供下载脚本

http://www.chinalinuxpub.com/doc/oreillybookself/

前一阵在网上搜perl的文档时
发现了一个O'Reilly's CD bookshelf
里面有好些相当经典的书籍
但该网站不知为何
也不知如何设置了不让非交互式下载程序下载
（不是很准确的说法，但大家尝试一下下载即可明白我的意思

）

复制内容到剪贴板

代码:

[No.505 04:03:08 ~]$ wget -r http://www.chinalinuxpub.com/doc/oreillybookself/perl/perlnut/index.htm
--04:03:29-- http://www.chinalinuxpub.com/doc/oreillybookself/perl/perlnut/index.htm
=> `www.chinalinuxpub.com/doc/oreillybookself/perl/perlnut/index.htm'
Resolving www.chinalinuxpub.com... 210.82.89.226
Connecting to www.chinalinuxpub.com|210.82.89.226|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
04:03:30 ERROR 403: Forbidden.

FINISHED --04:03:30--
Downloaded: 0 bytes in 0 files
[No.506 04:03:30 ~]$

经过多次尝试
意外地发现curl程序居然可以用来下载这些书籍
花了半个晚上写了个粗糙的脚本用以下载

注：
以下脚本主要是针对perl书籍
http://www.chinalinuxpub.com/doc/oreillybookself/perl/index.html
（谁让我主要是想下perl的资料呢

）
而且没写Example下载部分
哪位好心人就写写吧。。。。。。。。
对于非perl类书籍
也有可以用本脚本下载的
但可能需要做相应修改
大家自己看着办吧

再注：
如果下载该网站上的电子书而发生侵权之类的头痛问题
本人概不负责解决
请权利方通知本人或linuxeden.com
我们马上删除本帖

以下脚本只是花了少量时间编写
加上水平有限
估计不入人眼
谁要是看得不舒服
就帮忙改一下吧
谢过。。。。。。。　　　　　　

li-jiahuan

UID: 22298
帖子: 1
积分: 2
在线时间: 10 分钟

2^# li-jiahuan 发表于 2006-02-01 04:31

脚本用法
./down.txt dir_to_save book_link
book_link中请不要包含index.htm
比如要下载 http://www.chinalinuxpub.com/doc ... erl/learn/index.htm
到 prog_perl中
./down.sh prog_perl http://www.chinalinuxpub.com/doc/oreillybookself/perl/learn/
如有什么问题
Just Read The ****ing Codes

复制内容到剪贴板

代码:

[No.616 04:24:22 Downloads]$ cat down.sh
#! /bin/bash
#
# author: li-jiahuan@sohu.com
# date  : 06/02/01
# usage : ./down.txt dir_to_save book_link

# set -n

link="http://210.82.89.226/doc/oreillybookself/perl/learn"
index=${link}/index.htm

# ${para-default}, ${para:-default}
dir=${1:-$PWD}
link=${2:-$link}
mkdir -p $dir
cd $dir
#if [[ $PWD != $dir ]];then
#    echo "Can not cd to $dir"
#    exit
#fi

# download the index page
[[ -f "index.htm" ]] || curl $index > index.htm

# web="http://210.82.89.226/doc/oreillybookself/perl/learn/prf1_01.htm"
# download the Preface
for i in {1..5};do
      for j in `seq -w 1 15`;do
            web=${link}/prf${i}_${j}.htm
            preface=${web##*/}
            [[ -e $preface ]] && continue
            echo -e "\nDownloading $web ..."
            curl $web > $preface
               if grep "was not found on this server" $preface ;then
         rm $preface
         break
      fi
done
done

# web="http://210.82.89.226/doc/oreillybookself/perl/learn/ch0_01.htm"

# download chapters
for i in `seq -w 1 19`;do
      for j in `seq -w 1 50`;do
            web="${link}/ch${i}_${j}.htm"
            chapter=${web##*/}
            [[ -e $chapter ]] && continue
      echo -e "\nDownloading $web ..."
            curl $web > $chapter
            if grep "was not found on this server" $chapter ;then
                     rm $chapter
                     break
            fi
      done
done

# web=http://210.82.89.226/doc/oreillybookself/perl/learn/appa_01.htm"
# download Appendix

for i in {a..z};do
for j in `seq -w 1 50`;do
      web="${link}/app${i}_${j}.htm"
      appendix=${web##*/}
      [[ -e $appendix ]] && continue
      echo -e "\nDownloading $web ..."
      curl $web > $appendix
      if grep "was not found on this server" $appendix ;then
         rm $appendix
         break
      fi
done
done

# web=http://210.82.89.226/doc/oreillybookself/perl/learn/index/idx_a.htm
# download index
link=${link}/index
mkdir -p index
cd index
[[ -f idx_0.htm ]] || curl $link/idx_0.htm > idx_0.htm
for j in {a..z};do
      web="${link}/idx_${j}.htm"
      idx=${web##*/}
      [[ -e $idx ]] && continue
      echo -e "\nDownloading $web ..."
      curl $web > $idx
      if grep "was not found on this server" $idx ;then
         rm $idx
#          break
      fi
done

[No.617 04:24:25 Downloads]$