- Business use data to derive information that is critical to their day-to-day operations.
- Storage is a repositary that enables users to store & retrieve this digital data.
- It is a collection of raw facts from which conclusions may be drawn.
- Data in the form of 0's & 1's is called digital data & is accessible by the user only after it is processed by a computer.
- With the advancement of computer & communication technologies, the rate of data generation & sharing has increased exponentially.
Types of data:
- Data can be classified as structured or unstructured based on how it is stored & managed.
- Data created by individuals as businesses must be stored so that it is easily accessible for further processing.
- In a computing environment, devices designed for storing data are termed storage devices or simply storage.
- The types of storage used varies based on the type of data & the rate at which it is created & used.
Information Storage Systems:
- They are built by taking into consideration the basic capability of a storage device, such as HDD & adding hardware & software to obtain high performing, reliable & easily managed system.
Information Retrieval Systems:
Information retrieval system is a system with a user interface that provides the facility for the use to create, search & modify the data stored in a storage network.
This is typically a peer-to-peer network which is operated & maintained by private organisations; however, access rights are provided to the public.
The access can be performed via Internet from outside the organisation & via Internet within the organisation.
Information storage & IR are addressed as two sides of the same coin.
If a person is able to search the required information, then that information must have already been stored in some format. The format in which the information is often represented to the people can be either texts, imaged, audios or videos, which makes it different to obtain clear & precise answers to multiple questions which the users may require. Searching a document involves a collection of information which may be either easy or complicated depending on how the collection is organised.
Almost all of the IR Systems fielded today are either Boolean IR systems for major document collections or text pattern search systems for handling small document collections (for ex: personal collections of files).
Next pattern search queries are strings or regular expressions. The grep family of tools, in the UNIX environment is a well-known example of text pattern searches.
In Boolean IR System, documents are represented by sets of keywords, usually stored in an inverted file. An inverted file is a list of keywords & identifiers of the documents in which they occur. Boolean queries are keywords connected with Boolean logical operators (AND, OR, NOT).
Conceptual models focuses on the performance enhancements of IR systems with the information associated with statistical distribution of terms.
The statistical models such as vector space, probabilistic or clustering models do-the statistic distribution of terms where every document is retrieved collection is allocated with probability of relevance.