Wednesday, 7 August 2013

Introduction to Database View


Summary: in this tutorial, you will learn about a new database object called database view. We will discuss about the advantages and disadvantages of using database views.
A database view is a virtual table or logical table which is defined as a SQL SELECT query with joins. Because a database view is similar to a database table, which consists of rows and columns, so you can query data against it. Most database management systems, including MySQL, allows you to update data in the underlying tables through the database view with some prerequisites.
A database view is dynamic because it is not related to the physical schema. The database systemstores database views as a SQL SELECT statement with joins. When the data of the tables changes, the view reflects that changes as well.
Advantages of database view
The following are advantages of using database views.
  • A database view allows you to simplify complex queries: a database view is defined by an SQL statement that associates with many underlying tables. You can use database view to hide the complexity of underlying tables to the end-users and external applications. Through a database view, you only have to use simple SQL statements instead of complex ones with many joins.
  • A database view helps limit data access to specific users. You may not want a subset of sensitive data can be queryable by all users. You can use database views to expose only non-sensitive data to a specific group of users.
  • A database view provides extra security layer. Security is a vital part of any relational database management system. Database views provides extra security for a database management system. A database view allows you to create only read-only view to expose read-only data to specific users. Users can only retrieve data in read-only view but cannot update it.
  • A database view enables computed columns. A database table should not have calculated columns however a database view should. Suppose in the orderDetails table you havequantityOrder (the number of ordered products) and priceEach (price per product item) columns. However the orderDetails table does not have computed column to store total sales for each line item of the order. If it has, the database schema would not be a good design. In this case, you can create a computed column named totalwhich is a product of quantityOrderand priceEach to store the computed result. When you query data from the database view, the data of the computed column is calculated on fly.
  • Database view enables backward compatibility. Suppose you have a central database, which many applications are using it. One day you decided to redesign the database to adapt with the new business requirements. You remove some tables and create several new tables, and you don’t want the changes affect other applications. In this scenario, you can create database views with the same schema as the legacy tables that you’ve removed.
Disadvantages of database view
Besides the advantages above, there are several disadvantages of using database views:
  • Performance: querying data from a database view can be slow especially if the view is created based on other views.
  • Tables dependency: you create view based on underlying tables of the a database. Whenever you change the structure of those tables that view associates with, you have to change the view as well.


Tuesday, 6 August 2013

Data Cleansing


Also referred to as data scrubbing, the act of detecting and removing and/or correcting a database’s dirty data(i.e., data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly). The goal of data cleansing is not just to clean up the data in a database but also to bring consistency to different sets of data that have been merged from separate databases. Sophisticated software applications are available to clean a database’s data using algorithms, rules and look-up tables, a task that was once done manually and therefore still subject to human error.

The process of removing inaccurate and historical data from operational systems to use in a data warehouse.

The process to correct data errors in a collection of data in order to bring the level of quality to an acceptable level to meet the information customers' needs.

means a removing errors and inconsistencies from data being imported into a data warehouse.

A processing step during which missing or inaccurate data is replaced with valid values.



Deduping:


'DeDuping' is the process of removing duplicates in Customer and Address records in a Database or Spreadsheet.