采集CSDN论坛数据 Apache响应问题
tbxiong
|
1#
tbxiong 发表于 2008-11-19 00:37
采集CSDN论坛数据 Apache响应问题
首先请这里的人原谅我这个小蜘蛛
因为宿舍没装宽带,遇到难题,完全没有什么可查的,所以只能想到去抓数据现在已经抓很多了,但还不满意. 问题是这样的 我的采集程序是 PHP + MYSQL + APACHE 平时采一些技术网站,问题还不明显. 但在采象CSDN 论坛这个的大论坛的时候 当采集时间到4小时左右所有的线程全部断了,所有的http请求全部断了,象暂停,但apache不会死,可以继续请求响应.. 从5点半到明天来看的时候才采8千多篇帖子。查数据库插入时间显示最新为9点左右,说明程序跑到那时候就停了。 连试两次都时间都差不多,只能跑4小时的时间? 就那程序 可以采chinaz.com几乎所有的技术文章(没图) 1.我的一个朋友说是window 的底层限制 2.另一个说是服务器拒绝接受我的请求所以断了, 我在采集一些页数超过两百页的网站也回出现这样的问题,所以否定第二个. 我自己认为应该是在apache 这问题上 我的 httpd.conf 基本配置是这样 C# code
# This is the main Apache server configuration file. It contains the # configuration directives that give the server its instructions. # See <URL:http://httpd.apache.org/docs/2.0/> for detailed information about # the directives. # # Do NOT simply read the instructions in here without understanding # what they do. They're here only as hints or reminders. If you are unsure # consult the online docs. You have been warned. # # The configuration directives are grouped into three basic sections: # 1. Directives that control the operation of the Apache server process as a # whole (the 'global environment'). # 2. Directives that define the parameters of the 'main' or 'default' server, # which responds to requests that aren't handled by a virtual host. # These directives also provide default values for the settings # of all virtual hosts. # 3. Settings for virtual hosts, which allow Web requests to be sent to # different IP addresses or hostnames and have them handled by the # same Apache server process. # # Configuration and logfile names: If the filenames you specify for many # of the server's control files begin with "/" (or "drive:/" for Win32), the # server will use that explicit path. If the filenames do *not* begin # with "/", the value of ServerRoot is prepended -- so "logs/foo.log" # with ServerRoot set to "@@ServerRoot@@" will be interpreted by the # server as "./logs/foo.log". # # NOTE: Where filenames are specified, you must use forward slashes # instead of backslashes (e.g., "c:/apache" instead of "c:\apache"). # If a drive letter is omitted, the drive on which Apache.exe is located # will be used by default. It is recommended that you always supply # an explicit drive letter in absolute paths, however, to avoid # confusion. # ### Section 1: Global Environment # # The directives in this section affect the overall operation of Apache, # such as the number of concurrent requests it can handle or where it # can find its configuration files. # # # ServerRoot: The top of the directory tree under which the server's # configuration, error, and log files are kept. # # NOTE! If you intend to place this on an NFS (or otherwise network) # mounted filesystem then please read the LockFile documentation (available # at <URL:http://httpd.apache.org/docs/2.0/mod/mpm_common.html#lockfile>); # you will save yourself a lot of trouble. # # Do NOT add a slash at the end of the directory path. # ServerRoot "." # # ScoreBoardFile: File used to store internal server process information. # If unspecified (the default), the scoreboard will be stored in an # anonymous shared memory segment, and will be unavailable to third-party # applications. # If specified, ensure that no two invocations of Apache share the same # scoreboard file. The scoreboard file MUST BE STORED ON A LOCAL DISK. # #ScoreBoardFile logs/apache_runtime_status # # PidFile: The file in which the server should record its process # identification number when it starts. # PidFile logs/httpd.pid # # Timeout: The number of seconds before receives and sends time out. # Timeout 300 # # KeepAlive: Whether or not to allow persistent connections (more than # one request per connection). Set to "Off" to deactivate. # KeepAlive On # # MaxKeepAliveRequests: The maximum number of requests to allow # during a persistent connection. Set to 0 to allow an unlimited amount. # We recommend you leave this number high, for maximum performance. # MaxKeepAliveRequests 100 # # KeepAliveTimeout: Number of seconds to wait for the next request from the # same client on the same connection. # KeepAliveTimeout 15 ## ## Server-Pool Size Regulation (MPM specific) ## # WinNT MPM # ThreadsPerChild: constant number of worker threads in the server process # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_winnt.c> ThreadsPerChild 250 MaxRequestsPerChild 5000 </IfModule>
|