1. Abstract and System defined data types
- Data type
- Primitive data type
- Abstract data type
- Polymorphic data type
2. Representation, Primitive data structures
- Data structures
- Linear data structures
- Arrays
- Linear
- Two dimensional
- Linked List
- Linear
- Doubly
- Circular
- Stacks
- Queues
- Arrays
- Non-linear data structures
- Trees
- Binary trees
- Binary search trees
- Heaps
- Graphs
- Hash tables
- Trees
- Linear data structures
- Common operations on Data structures
- Traversal
- Searching
- Insertion
- Deletion
- Sorting
- Merging
Data type: A type stands for a set of values. A set of values that correspond to a specific type, say a set of integers belong to the type integer, are collectively referred by using a data type. A data type specifies the type of data required to be stored in a variable.
In ‘C’, variables must be declared before their use and must be declared with a relevant data type, which specifies the range of values that may be stored in the variable of that data type. In ‘C’, data type specifies the amount of memory to be allocated. For example: char allocates 1 byte, int – 2bytes, float – 4bytes, double – 8 bytes and long double – 10bytes on a specific machine. Data types may be Primitive (fundamental), Abstract or Polymorphic data types.
The following are the characteristics of data types:
- A domain is defined – the set of all possible values.
- A set of allowable operations on those values is also defined.
- A value is atomic within the domain.
Primitive Data Type: A primitive data type say char, int, float, double is the data type that is used to refer to a single value such as integer, float, character etc. ‘C’ provides above four basic data types. A variable in ‘C’ may be declared to be of primitive data type, such as int x; declares a variable x of type integer and allocates 2 bytes of memory for storing an integer value.
Abstract Data Type: ADT may be defined as a set of data values and associated operations that are precisely specified independent of any particular implementation. Thus an Abstract Data Type is an organized collection of information and a set of operations used to manage that information. The set of operations defines the interface of the ADT. As long as the ADT fulfills the conditions of the interface, it doesn’t really matter how the ADT is implemented. Since, in ADT, the data values and operations are defined with mathematical precision, rather than as an implementation in a computer language, we may reason about effects of the operations, relations to other abstract data types whether a program implements the data type etc. One of the simplest abstract data type is the stack data type for which functions might be provided to create an empty stack, to push values onto a stack and to pop values from a stack.
The basic difference between abstract data type (ADT) and concrete data type is that the latter allow us to look at the concrete representation, whereas the former hide the representation from us. An ADT may be pure ADT or Updatable ADT. A pure ADT is one where all operations are pure functions. This means that operations have no side effects. In particular, they do not modify or update there input arguments. They just use these arguments to generate output, which are fresh values of ADT (or of other types). Most concrete types are pure. For example, no operation on integers actually modifies an integer. Instead, all the operations like ‘+’ produce fresh outputs.
An updatable ADT is one where some operations actually change values of the ADT. For example, suppose we had an operation called ‘pop’ that took a stack as an argument and modified it. (“in place”, “destructively”) by removing the highest priority item. This operation would be considered impure and the whole ADT would then be impure also. An ADT may be user defined ADT.
We know that an Abstract Data Type is a data type that satisfies the following two conditions:
1. The representation, or definition, of the type and the operations are contained in a single syntactic unit.
2. The representation of objects of the type is hidden from the program units that use the type, so only direct operations possible on those objects are those provided in the type’s definition.
A user defined Abstract Data Type should provide:
1. A type definition that allows program units to declare variables of the type, but hides the representation of these variables.
2. A set of operations for manipulating objects of the type.
An example of a user defined abstract data type is structure. ‘C’ provides four basic types: int, char, float and double. However, ‘C’ also provides the programmer with the ability to define his/her own types. Structure is one such example. A structure is an aggregate of different parts, where each part is of some existing type.
struct abc
{int x;
float y;
};
The above structure definition does not create any variables, rather it creates a new type. Variables of this type may be created in a similar way to variables of a built in type.
struct abc a;
The typedef keyword allows us to create new type names for our new types.
For example:
typedef struct abc AB;
where AB is a new type name that can now be used to create new types.
AB b;
Polymorphic Data Types: A polymorphic data type is a data type, which can be transformed to any distinct data type as required. An item of polymorphic data type is created by the use of a generic pointer. A generic pointer is simply a byte address without an associated type. In other words, a generic pointer does not point to an object of a specific type; it just points to some location in the memory of the computer. In ANSI C, a generic pointer is declared as a void pointer. It is only when the actual operations must be performed on the data that generic pointers are cast to pointers of specific types and de-referenced.
Data Structures: The following are the characteristic features of data structures:
1. It contains component data items, which may be atomic or another data structure (still a domain).
2. A set of operations on one or more of the component items.
3. Defines rules as to how components relates to each other and to the structure as a whole (assertions).
A data structure may be static or dynamic. A static data structure has a fixed size. This meaning is different from the meaning of static modifier. Arrays are static; once we define the number of elements it can hold, the number doesn’t change. A dynamic data structure grows and shrinks at execution time as required by its contents. A dynamic data structure is implemented using links.
Data structures may further be categorized into linear data structures and non-linear data structures. In linear data structures every component has a unique predecessor and successor, except first and last elements, whereas in case of non-linear data structures, no such restriction is there as elements may be arranged in any desired fashion restricted by the way we use to represent such types.
Sequential Access vs. Direct Access: In sequential access, to access the n-th element, you must access the preceding (n-1) elements. In direct (random) access, any element can be accessed without accessing its predecessor or successor.
Linear Data structures include arrays, structures, linked lists, stacks and queues. Non-linear data structures include trees, binary trees, graphs and digraphs.
1. Arrays: An array is said to be
- a collection of a fixed number of component values, each of which are of the same type, atomic and structured.
- a set of index (subscript) values that are non-negative integers (consecutive).
- there is a 1-1 relationship between such index and an array component (element). Index 0 selects the first element, 1 the second and so on.
Difference in Array and Structure:
- An array is a homogenous structure, whereas structure are heterogenous structures.
- A component of an array is accessed by its positions in the array, where a component of a structure is accessed by an identifier (the name).
- Because array components are accessed by position, an array is a structured composite type.
An array may be one dimensional or two dimensional. A one-dimensional array contains elements row-wise, from representation point of view. It contains only one subscript. The subscript value begins with zero. A two dimensional array contains two subscripts, one for row and the other for column. Thus a two dimensional array may be considered as a tabular representation of elements where elements are arranged in rows and columns. Two subscripts are used combinedly one for row position and other for column position, to refer to an element in a 2D array.
2. Linked List: A linked list is a linear list of elements linked/joined together with links, where link is maintained by keeping a reference/address of the next element. The address of first element is sustained for traversal or other operations to be performed on the list. A linked list is a dynamic structure. It may grow or shrink dynamically. A linked list may be singly linked list, doubly linked list, circular linked list or doubly circular linked list. Header lists may also be created that contains a header node.
A singly linked list is unidirectional since elements are added up or removed from left to right. A doubly linked list is an improved data structure form of singly linked list, since it contains both forward and backward links/reference. Thus, traversal may easily be done in either of the direction. A circular linked list is one in which the last node points to the first node, thereby making a loop or circle. Circular lists are useful in situations where processing is required to be done repetitively in a loop. A doubly circular linked list is that which maintains forward as well as backward reference and also contains loop by keeping the address of first node in last and last node in first.
All these list categories have their own merits and demerits depending upon the situation in which they are implemented.
3. Stacks: A stack ADT is also linear like a list or a queue. In a stack, items are added and removed from only one end of the stack, called the top of the stack. It is therefore a LIFO (Last In First Out) data structure. Examples of stacks are – a stack of plates in a cupboard, a stack of files in the drawer of a table. Stacks are often drawn vertically. Some operations that may be performed on stack are:
- Push: adds an item to the top of the stack
- Pop: removes an item from the top of the stack
- Peek(top): retrieves the top item without removing it
- Empty: returns true if the stack is empty, false otherwise
- enqueue: adds an item to the rear of the queue
- dequeue (or serve): removes an item from the front of the queue
- empty: returns true if the queue is empty, false otherwise
Queues often are helpful in simulations or any situation in which items get backed up while awaiting processing. A queue can be represented by an array or using a linked list. A linked list representation is more efficient if the references point from the front toward the rear of the queue.
Non-Linear Data Structures:
1. Trees: A tree is a non-linear data structure that consists of a root node and potentially many levels of additional nodes that form a hierarchy. Nodes that have no children are called leaf nodes.
(i) Binary Trees: A binary tree is defined recursively. Either it is empty (the base case) or it consists of a root and two subtrees, each of which is a binary tree. Binary tree and trees are represented using references as dynamic links, though it is possible to used fixed representation like arrays.
(ii) Binary Search Trees: A Binary Search Tree is a binary tree that satisfies the following conditions:
- All values less than the key value (root) form a part of left subtree.
- All values greater than the key value (root) form a part of right subtree.
- The left and right subtrees are also binary trees.
(iii) Heap: Heap is a complete binary tree with n nodes such that:
- the root node contains the largest element
- both the left and the right sub-trees of the root are heaps once again
2. Graphs: A graph is a non-linear data structure. Unlike a tree or binary tree, a graph does not have a root. Any node in a graph can be connected to any other node by an edge. In a directed graph or digraph, each edge has a specific direction. Edges with direction sometimes are called arcs.
Both graphs and digraphs can be represented using dynamic links or using arrays. As always, the representation should facilitate the intended operations and make them convenient to implement.
3. Hash Tables: A hash table is used to represent the hash values that are generated using a hash function. Generally, the size of the hash table is determined in terms of the data elements it is going to sustain in itself.
Common operations on Data Structures:
- Traversal
- Searching
- Insertion
- Deletion
- Sorting
- Merging