# Data Structure - know all about it

**Data:** - The term data simply refers to a value or a set of values These values may represent some observation from an experiment, some figures collected during some survey ( such as census, exit polls, etc. ), or marks obtained by a student in an examination, etc.

**Data Item:** - A data item refers to a single unit of values. For example, each subject's roll number name, date of birth; age, address and marks are data items.

## Data items can be divided into 2 types.

**Group Items:** - Data items that can be divided into sub-items are called Group items. For example, an address is a group item as it is divided into sub-items such as house number, street number, locality, city, and pin code. etc. Likewise, a date can be divided into the day, month and year, and a name can be divided into first and last names.

**Elementary Items:** - Data items that can not be divided into sub-items are called elementary items. For example, roll number, marks, city, Pincode, etc. are normally treated as elementary items because they can not be further divided.

**Entity:** Entity is something that has certain attributes or properties, which may be assigned values. For example, the student is an entity.

**Field:** The attributes or properties of the entity are called a field The value assigned to these fields may be either numeric or non-numeric. For example, The possible attributes for a student can be roll number, name, date of birth, gender and class. The possible values for these attributes can be 12345, VIKAS, 09/09/1985, M 9.

**Record:** - A record is a collection of related fields for a given entity set.

**File:** - File is a collection of related records. For example, a file containing records of all students in class, a file containing records of all employees of an organization In fact, a file represents an entity set.

**Key:** - A key is a data item in a record that takes unique values and can be used to distinguish a record from other records it may happen that more than one dàta item has unique values. In that case, there exist multiple keys of a time, we may be using only one data item as a key. called the primary key, depending on the problem in hand The other key ( s ) are known as alternate key ( s ).

In some cases, there is no field that has unique values. Then a combination of some fields can be used to form a key, Such a key is known as a composite key.

in the worst case, if there is no possibility of forming a key from within the record, then an extra data item can be added to the record that can be used as a key.

**Information:** The terms data and information have been used to mean the same thing. But actually, information is more than just data in simple terms, information is processed data Data is just a collection of values ( raw data ), from which no conclusion can be drawn Thus data as such is not useful for decision making When the data is processed by applying certain rules, newly generated data becomes information This newly generated data ( information ) conveys some meaning and hence can be used for decision making.

**Data Type:** - A data type is a collection of values and a set of operations that act on those values. Whether our program is dealing with pre-defined data types or user - defines types, these two aspects must be considered

- Set of values.
- Set of operations

For example, consider an integer data type which consists of valued ( MININT -3 , -2 , -1 , 0 . 1 , 2 , 3 , MAXINT ) , where MININT and MAXINT are the smallest and largest integers that can be represented by an integer type on a particular computer The operations on integers include the arithmetic operation of addition ( + ) , subtraction ( - ) , multiplication ( ) and division ( / ) . There are also operations that test for equality/inequality and the operation that assign an integer value to a variable.

### Data Types can be divided into 3 types

**Primitive Data Type** - A primitive data type is a data type that is predefined The predefined data type is also known as a built-in data type These primitive data types may be different for different programming languages. For example, the C programming language provides built-in support for integers (int, long), reals (float, double and long double) and characters (char).

**User-defined data type** - When a primitive data type does not suit the requirement in a particular case then we can create our own types. These types are called user-defined data types.

**Abstract Data Type-** ADT is a useful tool for specifying the logical properties of a data type A data type is a collection of values and a set of operations on those values. That collection and those operations form a mathematical construct that may be implemented using a particular hardware or software data structure. The term " abstract data type refers to the basic mathematical concept that defines the data type.

When we define an abstract data type as a mathematical concept, we are not concerned with space or time efficiency. Those are implementation issues The definition of an ADT is not concerned with implementation details at all. Implementing a particular ADT on a particular piece of hardware or using a particular software system may not even be possible. ADT is a useful guideline for implementers and a useful tool for programmers who wish to use the data type correctly

### Data structure

Data may be organized in many different ways the logical or mathematical model organization of data is called data structures. The choice of a particular data structure depends on the following considerations.

- It must be able to represent the relationship of the data in the real world.
- It must be simple enough so that it can be processed efficiently as and when necessary.

#### The study of data structures includes

- The logical description of the data structure.
- implementation of the data structure.

Quantitative analysis of the data structure. This analysis includes determining the amount of memory needed to store the data structure also called space complexity ) and the time required to process it. ( also called time complexity ).

#### Performance Analysis

**TIME COMPLEXITY** - This measure is used to estimate the running time of the algorithm in terms of the size of input data. For this purpose, a popular notation called big ' O ' notation is NY used 2. SPACE COMPLEXITY: This measure is used to define extra space consumed by the algorithm except input data. This is also measured in terms of input size and big ' O ' notation is also popular in this case as well.

To calculate Time and Space Complexity we use

**Big Oh notation** - The function f ( n ) = O ( g ( n ) ) ( read as ' f of n is big oh of g of n ' ) if and only if there exist positive constants c and m such that f ( n ) < = c'g ( n ) for all n , n > = m . 3.

Example

The function 3n + 2 = O ( n ) as 3n + 2 < = 4n for all n > = 2

The function 6 * 2 " + n² = O ( 2 ) as 6 * 2 + n² < = 7 * 2 " for all n > = 4 .

We write O ( 1 ) to mean a computing time that is a constant. O ( n ) is a linear, O ( n ) is called quadratic, O ( n ) is called cubigend O ( 2 " ) is called exponential. If an algorithm takes time O ( log N ), it is faster as compared to O ( n ).

**Omega notation** - The function f ( n ) = ( g ( n ) ) ( read as " f of reis omega of g of n " ), if and only if there exist positive constants c and m such that f ( n ) > = c'g ( n ) for all n, nom.

Example

The function 3n + 2 = 2 ( n ) as 3n + 2 > = 3n for all n > = 1 .

The function 6 * 2 " + n² = 12 ( 2 " ) as 6 * 2 " + n² > = 6 * 2 " for all >> # 1

**Theta notation** - The function f ( n ) = ( g ( n ) ) ( read as " f of n is theta of g of n ' ) . if and only if exist positive constants c and m such that that c * g ( n ) < = f ( n ) < = c2 * g ( n ) for all n , nam .

Example

The function 3n + 2 = 0 ( n ) as 3n + 2 > = 3n for all n > = 1 and 3n + 2 < = 4n for all n > = 2

The function 6 * 2 " + n² = ( 2 " ) . 3 18012 1

#### Counting the number of steps

We can determine the number of steps needed by a program to solve a particular problem instance in one of two ways. In the first method, we introduce a new variable, count, into the program This is a global variable with an initial value of 0 Statements to increment the count by the appropriate amount are introduced into the program. This is done so that each time a statement in the original program is executed, the count is incremented by the step count of that statement

Algorithm Sum ( a, n )

{

S = 0

count = count + 1 [ count is global it is initially zero . ]

for 1 : = 1 to n do

{

count = count + 1

S : = s + a [ l ]

count = count + 1 [ for s]

}

count: = count + 1[for last time of for]

count: = count + 1[for the return]

Return s

}

#### Cases To Consider During Analysis

Choosing the input to consider when analyzing an algorithm can have a significant impact on how an algorithm will perform. For example, if the input list is already sorted some sorting algorithms will perform very well but other sorting algorithms may perform very poorly The opposite may be true if the list is randomly arranged instead o sorted Hence multiple input sets must be considered while analyzing an algorithm. These include the following

**Best Case Input**- This represents the input set that allows an algorithm to perform most quickly With this input the algorithm takes the shortest time to execute as it causes the algorithms to do the least amount of work For example on the searching algorithm the best case would be if the value we are searching for is found in the Mth location that the search algorithm checks As a result. this algorithm would need only one comparison irrespective of the complexity of the algorithm No matter how large is the input, searching in a best case will result in a constant time of 1 Since the best case for an algorithm would usually be a very small and frequently constant value, a best case analysis is often not done.**Worst Case Input**- This represents the input set that allows an algorithm to perform most slowly Worst case is an important analysis because it gives us an idea of the most time an algorithm will ever take Worst case analysis requires that we identify the input values that cause an algorithm to do the most work For example, for a searching algorithm, the worst case is one where the value is in the last place we check or is not in the list. This could involve comparing the key to each list value for a total of N comparisons**Average Case Input**- This represents the input set that allows an algorithm to deliver an average performance Doing Average - case analysis is a four-step process. These steps are as under

Determine the number of different groups into which all possible input sets can be divided Determine the probability that the input will come from each of these groups

### List of data structures

LINEAR DATA STRUCTURES: - The elements form a sequence i.e. linear list.

- ARRAYS
- LINK LISTS
- STACKS
- QUEUES
- HASH TABLE

NON-LINEAR DATA STRUCTURES: - The elements do not form a sequence.

- TREES
- GRAPHS

### Operation on Data Structures

**Traversings**- To access each and every element of data structure exactly one's is called traversal of DS.**Insertion**- To add one more data element in the data structure at a particular position is called insertion operation.**Deletion**- To remove an element from a data structure is called deletion Delotion should be performed in such a way fat the remaining elements should not be in an inconsistent state.**Searching**- To find out the location of a particular data element in the data structure is called a searching operation.**Sorting**- To arrange all the data of data structures in a specific order is known as sorting of data.**Merging**- To combine two data structures into a single one is called the merging of data structures.