A search engine is an information retrieval system designed to help find information stored on a computer system. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.
While researchers and developers take a broader view of IR systems, consumers think of them more in terms of what they want the systems to do — namely search the Web, or an intranet, or a database.
Search engines match queries against an index that they create. The index consists of the words in each document, plus pointers to their locations within the documents. This is called an inverted file. A search engine or IR system comprises four essential modules:
* A document processor
* A query processor
* A search and matching function
* A ranking capability
While users focus on “search,” the search and matching function is only one of the four modules. Each of these four modules may cause the expected or unexpected results that consumers get when they use a search engine.
Read detailed abt this on – www.infotoday.com/searcher/may01/liddy.htm
The term “search engine” is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.
Crawler-based search engines, such as Google, create their listings automatically. They “crawl” or “spider” the web, then people search through what they have found.
If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.
A human-powered directory, such as the Open Directory, depends on humans for its listings.
You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.
Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory.
The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.
“Hybrid Search Engines” Or Mixed Results
In the web’s early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another.
For example, MSN Search is more likely to present human-powered listings from LookSmart.
However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.
The Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being “spidered” or “crawled.” The spider returns to the site on a regular basis, such as every month or two, to look for changes.
Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.
Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been “spidered” but not yet “indexed.” Until it is indexed — added to the index — it is not available to those searching with the search engine.
Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search engine software ranks web pages on the aptly-named How Search Engines Rank Web Pages page.
Major Search Engines: The Same, But Different
All crawler-based search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results. Some of the significant differences between the major crawler-based search engines are summarized on the Search Engine Features Page.
How Search Engines Rank Web Pages
its too big to post so check the link..
searchenginewatch.com/showPage.html?page=2167961
ok now i think its enough on intro part.. now posting links to refer..
Nice animated explaining How SEARCH ENGINE works..
http://www.learnthenet.com/ENGLISH/animate/search.html
Read what wikipidea says about SEARCH ENGINE and WEB SEARCH ENGINE..
http://en.wikipedia.org/wiki/Search_engine_%28computing%29
http://en.wikipedia.org/wiki/Web_search_engine
A very well posted difference between Google, Yahoo and Ask.. also explaining major technical related stuff associated with SEARCH ENGINE..
also look for Recommended things in the site while making or using a SEARCH ENGINE..
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
A very detailed tutorial.. and specifically abt SPIDERS
http://www.webreference.com/content/search/how.html
http://www.monash.com/spidap4.html
A step by step understanding
1. Introduction to How Internet Search Engines Work
2. Looking at the Web
3. Building the Index
4. Building a Search
5. Future Search
6. Lots More Information
7. See all Internet Basics articles
http://computer.howstuffworks.com/search-engine.htm
That’s all.. I hope all of u can understand SEARCH option in better way now.. use it
The Search Engines ability to find corrections is with the help of a function called “levenshtein”.
The function can take two strings as parameter and calculates just the number of insert, replace and delete operations needed to transform one string to another.
soundex and metaphone are two other similar functions.
Google Hacking-Use Google as a warez search engine a.k.a Get Free Stuff!
1.Go to www.google.com
2.In the Search Bar type in:“intitle:index of” and then type in the keyword for whatever you are looking for.
So for example if I want to find some linkin park songs I would type in this:
“intitle:index of” LINKINK PARK(OR SONG NAME) MP3(YES WE CAN ADD EXTENSION ALSO)
View Security Cameras Worldwide. – This will let you hack into random live scurity cameras all over the world and operate them.
1.Go to www.google.com
2.In the search bar type in:inurl:”viewerframe?mode refresh”
3.Then go to any of the search results and boom here is your camera.
Searching crack
for searching crack u can use following command
crack:name of the software whose crack u want to download
eg-.suppose i want to download crack of nfs most wanted then i type
crack:need for speed most wanted
Searching serials
just type software/game name and after that type 94fbr
eg.-windvd 8 94fbr
how it is working
actually 94fbr a key for windows 2003 which Microsoft distribute free in very big amount
so when we hit 94fbr Google relates it to serial no. and return serial no. of specify software.
POWERED BY HIT JAMMER 1.0!
Allows you to gain access to the admin panel of site running
hitjamer 1.0 script.
1.Go to google.com
2.In the search bar type in:POWERED BY HIT JAMMER 1.0!
3.Look for webpages that have POWERED BY HIT JAMMER 1.0! slogan on the bottom of the main page
4.Once you found the page that has POWERED BY HIT JAMMER 1.0! slogan on the bottom replace the url with this
one:www.nameofsite.com/admin/admin.php <— That should take you to the admin panel.
Example: If the site’s name is www.uber1337xxoxox.com/index.php/ifuckgoats/lol
then just replace it with this www.uber1337xxoxox.com/admin/admin.php
FTP HACK
This hack shows all info (including username and passwords) for websites running on ws_ftp sofware.
1.Go to google.com
2.In searchbar type in: intitle:index of ws_ftp.ini
3.Go to any result page and find a file called ws_ftp.ini (press ctrl+f for autosearch)
After you found ws_ftp, click on it and it will give you a whole bunch of private stuff.Look for something along the lines
of PWD=blahblahblah.That’s your password.It’s encrypted.So use an MD5 hash cracker or johntheripper to crack the
password.
how to search google for RAPIDSHARE links
If you wanna find some apps, files etc on rapidshare.de via google, do the following.
Paste this into the google search window (not the adress bar):
site:rapidshare.de -filetype:zip OR rar daterange:2453402-2453412
this searches the site rapidshare.de for any file that is rar or zip, and
has been indexed between 1-11 February.
dvd site:rapidshare.de -filetype:zip OR rar daterange:2453402-2453412
This is the same search but it specifically searches for “dvd” with the same
search criteria, so any app posted with the word dvd in it will be found.
SOME IMPORTANT SEARCH STRING
Try these search string in different way
intitle:”Index of” passwords modified
allinurl:auth_user_file.txt
“access denied for user” “using password”
“A syntax error has occurred” filetype:ihtml
allinurl: admin mdb
“ORA-00921: unexpected end of SQL command”
inurl:passlist.txt
“Index of /backup”
“Chatologica MetaSearch” “stack tracking:”
“parent directory ” /appz/ -xxx -html -htm -php -shtml -opendivx -md5 -md5sums
“parent directory ” DVDRip -xxx -html -htm -php -shtml -opendivx -md5 -md5sums
“parent directory “Xvid -xxx -html -htm -php -shtml -opendivx -md5 -md5sums
“parent directory ” Gamez -xxx -html -htm -php -shtml -opendivx -md5 -md5sums
“parent directory ” MP3 -xxx -html -htm -php -shtml -opendivx -md5 -md5sums
“parent directory ” Name of Singer or album -xxx -html -htm -php -shtml -opendivx -md5 -md5sums
Notice that I am only changing the word after the parent directory, change it to what you want and you will get a lot of stuff.
nurl:microsoft filetype:iso
You can change the string to watever you want, ex. microsoft to adobe, iso to zip etc…
“# -FrontPage-” inurl:service.pwd
Frontpage passwords.. very nice clean search results listing !!
“AutoCreate=TRUE password=*”
This searches the password for “Website Access Analyzer”, a Japanese software that creates webstatistics. For those who can read Japanese, check out the author’s site at: coara.or.jp/~passy/ [coara.or.jp/~passy/]
“http://*:*@www” domainname
This is a query to get inline passwords from search engines (not just Google), you must type in the query followed with the the domain name without the .com or .net
“sets mode: +k”
This search reveals channel keys (passwords) on IRC as revealed from IRC chat logs.
allinurl: admin mdb
Not all of these pages are administrator’s access databases containing usernames, passwords and other sensitive information, but many are!
allinurl:auth_user_file.txt
DCForum’s password file. This file gives a list of (crackable) passwords, usernames and email addresses for DCForum and for DCShop (a shopping cart program(!!!).
intitle:”Index of” config.php
This search brings up sites with “config.php” files. To skip the technical discussion, this configuration file contains both a username and a password for an SQL database. Most sites with forums run a PHP message base. This file gives you the keys to that forum, including FULL ADMIN access to the database.
eggdrop filetype:user user
These are eggdrop config files. Avoiding a full-blown descussion about eggdrops and IRC bots, suffice it to say that this file contains usernames and passwords for IRC users
To locate proxy servers
try these queries:
inurl:”nph-proxy.cgi” “Start browsing”
or
“this proxy is working fine!” “enter *” “URL***” * visit
These queries locate online public proxy servers that can be used for
testing purposes.
Calculator
This trick is extra cool: You can use the blank Google search box as a calculator. Just enter an equation, like 2+2, and then press Enter to have Google tell you 2+2=4. For multiplication, use the asterisk (*), like this: 2*3. For division, use the slash (/), like this: 10/3. You can also use the search box to perform unit conversions, like this: 5 kilometers in miles or how many teaspoons in a cup? For a chart listing of units of measure Google can convert, The calculator works for simple equations and for some seriously complex operations, too, like logarithms and trigonometric functions. You can find a rundown of all its capabilities at. And if you know what a physical constant is or the phrase “base of the natural system of logarithms”
Changing the Number of Results
In the middle of a Google results URL, you can sometimes find num=, which tells you the number of search results Google gives you per page of results. You can temporarily change the number of results to anything from 1 to 100 simply by altering the number in the URL and then pressing Enter. Most of the time, search results are easiest to read when you’ve got 10, 20, or 30 per page . But this trick is a quick way to amp up the number of results on a page for the rare times when you want to review a lot of them at once or compare results 1 and 100 on one page.
Unsafe searching.
The SafeSearch filter tells Google to remove potentially offensive links from your results. The problem is, sometimes the filter gets carried away and removes things you need . To make sure the filter is off, add this to the end of your URL: &safe=off. To make sure it’s on, add &safe=on.
Google queries for locating various Web servers
“Apache/1.3.28 Server at” intitle:index.of
Apache 1.3.28
“Apache/2.0 Server at” intitle:index.of
Apache 2.0
“Apache/* Server at” intitle:index.of
any version of Apache
“Microsoft-IIS/4.0 Server at” intitle:index.of
Microsoft Internet Information Services 4.0
“Microsoft-IIS/5.0 Server at” intitle:index.of
Microsoft Internet Information Services 5.0
“Microsoft-IIS/6.0 Server at” intitle:index.of
Microsoft Internet Information Services 6.0
“Microsoft-IIS/* Server at” intitle:index.of
any version of Microsoft Internet Information Services
“Oracle HTTP Server/* Server at” intitle:index.of
any version of Oracle HTTP Server
“IBM _ HTTP _ Server/* * Server at” intitle:index.of
any version of IBM HTTP Server
“Netscape/* Server at” intitle:index.of
any version of Netscape Server
“Red Hat Secure/*” intitle:index.of
any version of the Red Hat Secure server
“HP Apache-based Web Server/*” intitle:index.of
any version of the HP server
Queries for discovering standard post-installation
intitle:”Test Page for Apache Installation” “You are free”
Apache 1.2.6
intitle:”Test Page for Apache Installation” “It worked!” “this Web site!”
Apache 1.3.0 – 1.3.9
intitle:”Test Page for Apache Installation” “Seeing this instead”
Apache 1.3.11 – 1.3.33, 2.0
intitle:”Test Page for the SSL/TLS-aware Apache Installation” “Hey, it worked!”
Apache SSL/TLS
intitle:”Test Page for the Apache Web Server on Red Hat Linux”
Apache on Red Hat
intitle:”Test Page for the Apache Http Server on Fedora Core”
Apache on Fedora
intitle:”Welcome to Your New Home Page!”
Debian Apache on Debian
intitle:”Welcome to IIS 4.0!”
IIS 4.0
intitle:”Welcome to Windows 2000 Internet Services”
IIS 5.0
intitle:”Welcome to Windows XP Server Internet Services”
IIS 6.0
Querying for application-generated system reports
“Generated by phpSystem”
operating system type and version, hardware configuration, logged users, open connections, free memory and disk space, mount points
“This summary was generated by wwwstat”
web server statistics, system file structure
“These statistics were produced by getstats”
web server statistics, system file structure
“This report was generated by WebLog”
web server statistics, system file structure
intext:”Tobias Oetiker” “traffic analysis”
system performance statistics as MRTG charts, network configuration
intitle:”Apache::Status” (inurl:server-status | inurl:status.html | inurl:apache.html)
server version, operating system type, child process list, current connections
intitle:”ASP Stats Generator *.*” “ASP Stats Generator” “2003-2004 weppos”
web server activity, lots of visitor information
intitle:”Multimon UPS status page”
UPS device performance statistics
intitle:”statistics of” “advanced web statistics”
web server statistics, visitor information
intitle:”System Statistics” +”System and Network Information Center”
system performance statistics as MRTG charts, hardware configuration, running services
intitle:”Usage Statistics for” “Generated by Webalizer”
web server statistics, visitor information, system file structure
intitle:”Web Server Statistics for ****”
web server statistics, visitor information
nurl:”/axs/ax-admin.pl” -script
web server statistics, visitor information
inurl:”/cricket/grapher.cgi”
MRTG charts of network interface performance
inurl:server-info “Apache Server Information”
web server version and configuration, operating system type, system file structure
“Output produced by SysWatch *”
operating system type and version, logged users, free memory and disk space, mount points, running processes, system logs
sql injection dorks
allinurl: \”index php go buy\”
allinurl: \”index.php?go=sell\”
allinurl: \”index php go linkdir\”
allinurl: \”index.php?go=resource_center\”
allinurl: \”resource_center.html\”
allinurl: \”index.php?go=properties\”
allinurl: \”index.php?go=register\”
dork for finding admin page
admin1.php
admin1.html
admin2.php
admin2.html
yonetim.php
yonetim.html
yonetici.php
yonetici.html
adm/
admin/
admin/account.php
admin/account.html
admin/index.php
admin/index.html
admin/login.php
admin/login.html
admin/home.php
admin/controlpanel.html
admin/controlpanel.php
admin.php
admin.html
admin/cp.php
admin/cp.html
cp.php
cp.html
administrator/
administrator/index.html
administrator/index.php
administrator/login.html
administrator/login.php
administrator/account.html
administrator/account.php
administrator.php
administrator.html
login.html
modelsearch/login.php
moderator.php
moderator.html
moderator/login.php
moderator/login.html
moderator/admin.php
moderator/admin.html
moderator/
account.php
account.html
controlpanel/
controlpanel.php
controlpanel.html
admincontrol.php
admincontrol.html
adminpanel.php
adminpanel.html
admin1.asp
admin2.asp
yonetim.asp
yonetici.asp
admin/account.asp
admin/index.asp
admin/login.asp
admin/home.asp
admin/controlpanel.asp
admin.asp
admin/cp.asp
cp.asp
administrator/index.asp
administrator/login.asp
administrator/account.asp
administrator.asp
login.asp
modelsearch/login.asp
moderator.asp
moderator/login.asp
moderator/admin.asp
account.asp
controlpanel.asp
admincontrol.asp
adminpanel.asp
fileadmin/
fileadmin.php
fileadmin.asp
fileadmin.html
administration/
administration.php
administration.html
sysadmin.php
sysadmin.html
phpmyadmin/
myadmin/
sysadmin.asp
sysadmin/
ur-admin.asp
ur-admin.php
ur-admin.html
ur-admin/
Server.php
Server.html
Server.asp
Server/
wp-admin/
administr8.php
administr8.html
administr8/
administr8.asp
webadmin/
webadmin.php
webadmin.asp
webadmin.html
administratie/
admins/
admins.php
admins.asp
admins.html
administrivia/
Database_Administration/
WebAdmin/
useradmin/
sysadmins/
admin1/
system-administration/
administrators/
pgadmin/
directadmin/
staradmin/
ServerAdministrator/
SysAdmin/
administer/
LiveUser_Admin/
sys-admin/
typo3/
panel/
cpanel/
cPanel/
cpanel_file/
platz_login/
rcLogin/
blogindex/
formslogin/
autologin/
support_login/
meta_login/
manuallogin/
simpleLogin/
loginflat/
utility_login/
showlogin/
memlogin/
members/
login-redirect/
sub-login/
wp-login/
login1/
dir-login/
login_db/
xlogin/
smblogin/
customer_login/
UserLogin/
login-us/
acct_login/
admin_area/
bigadmin/
project-admins/
phppgadmin/
pureadmin/
sql-admin/
radmind/
openvpnadmin/
wizmysqladmin/
vadmind/
ezsqliteadmin/
hpwebjetadmin/
newsadmin/
adminpro/
Lotus_Domino_Admin/
bbadmin/
vmailadmin/
Indy_admin/
ccp14admin/
irc-macadmin/
banneradmin/
sshadmin/
phpldapadmin/
macadmin/
administratoraccounts/
admin4_account/
admin4_colon/
radmind-1/
Super-Admin/
AdminTools/
cmsadmin/
SysAdmin2/
globes_admin/
cadmins/
phpSQLiteAdmin/
navSiteAdmin/
server_admin_small/
logo_sysadmin/
server/
database_administration/
power_user/
system_administration/
ss_vms_admin_sm/
Error message queries
“A syntax error has occurred”filetype:ihtml
Informix database errors, potentially containing function names, filenames, file structure information, pieces of SQL code and passwords
“Access denied for user” “Using password”
authorisation errors, potentially containing user names, function names, file structure information and pieces of SQL code
“The script whose uid is ” “is not allowed to access”
access-related PHP errors, potentially containing filenames, function names and file structure information
“ORA-00921: unexpected end of SQL command”
Oracle database errors, potentially containing filenames, function names and file structure information
“error found handling the request” cocoon filetype:xml
Cocoon errors, potentially containing Cocoon version information, filenames, function names and file structure information
“Invision Power Board Database Error”
Invision Power Board bulletin board errors, potentially containing function names, filenames, file structure information and piece of SQL code
“Warning: mysql _ query()” “invalid query”
MySQL database errors, potentially containing user names, function names, filenames and file structure information
“Error Message : Error loading required libraries.”
CGI script errors, potentially containing information about operating system and program versions, user names, filenames and file structure information
“#mysql dump” filetype:sql
MySQL database errors, potentially containing information about database structure and contents
Dork for locating passwords
http://*:*@www” site
passwords for site, stored as the string “http://username:password@www…”
filetype:bak inurl:”htaccess|passwd|shadow|ht users”
file backups, potentially containing user names and passwords
filetype:mdb inurl:”account|users|admin|admin istrators|passwd|password”
mdb files, potentially containing password information
intitle:”Index of” pwd.db
pwd.db files, potentially containing user names and encrypted passwords
inurl:admin inurl:backup intitle:index.of
directories whose names contain the words admin and backup
“Index of/” “Parent Directory” “WS _ FTP.ini”
filetype:ini WS _ FTP PWD
WS_FTP configuration files, potentially containing FTP server access passwords
ext:pwd inurl:(service|authors|administrators |users) “# -FrontPage-”
files containing Microsoft FrontPage passwords
filetype:sql (“passwd values ****” | “password values ****” | “pass values ****” )
files containing SQL code and passwords inserted into a database
intitle:index.of trillian.ini
configuration files for the Trillian IM
eggdrop filetype:user
user configuration files for the Eggdrop ircbot
filetype:conf slapd.conf
configuration files for OpenLDAP
inurl:”wvdial.conf” intext:”password”
configuration files for WV Dial
ext:ini eudora.ini
configuration files for the Eudora mail client
filetype:mdb inurl:users.mdb
Microsoft Access files, potentially containing user account information
Searching for personal data and confidential documents
filetype:xls inurl:”email.xls”
email.xls files, potentially containing contact information
“phone * * *” “address *” “e-mail” intitle: “curriculum vitae”
CVs
“not for distribution”
confidential documents containing the confidential clause
buddylist.blt
AIM contacts list
intitle:index.of mystuff.xml
Trillian IM contacts list
filetype:ctt “msn”
MSN contacts list
filetype:QDF
QDF database files for the Quicken financial application
intitle:index.of finances.xls
finances.xls files, potentially containing information on bank accounts, financial summaries and credit card numbers
intitle:”Index Of” -inurl:maillog maillog size
maillog files, potentially containing e-mail
Network Vulnerability Assessment Report”
“Host Vulnerability Summary Report”
filetype:pdf “Assessment Report”
“This file was generated by Nessus”
reports for network security scans, penetration tests etc
dork for locating network devices
“Copyright (c) Tektronix, Inc.” “printer status”
PhaserLink printers
inurl:”printer/main.html” intext:”settings”
Brother HL printers
intitle:”Dell Laser Printer” ews
Dell printers with EWS technology
intext:centreware inurl:status
Xerox Phaser 4500/6250/8200/8400 printers
inurl:hp/device/this.LCDispatcher
HP printers
intitle:liveapplet inurl:LvAppl
Canon Webview webcams
intitle:”EvoCam” inurl:”webcam.html”
Evocam webcams
inurl:”ViewerFrame?Mode=”
Panasonic Network Camera webcams
(intext:”MOBOTIX M1″ | intext:”MOBOTIX M10″) intext:”Open Menu” Shift-Reload
Mobotix webcams
inurl:indexFrame.shtml Axis
Axis webcams
intitle:”my webcamXP server!” inurl:”:8080″
webcams accessible via WebcamXP Server
allintitle:Brains, Corp.
camera webcams accessible via mmEye
intitle:”active webcam page”
USB webcams