Storage requirement of Data is an ever increasing phenomenon. The increasing pace of new emerging websites generating enormous data needs a matching increase in storage capacity as well. The data already available or that is continuously being generated in the universe do not need to be accessed and stored with the same priority. This prioritizing is known as tiered storage approach. Optimized placement of data in different tiers of storage capacities is an ongoing process. This optimization of data in tiered storages is automated to remove the hassle of manual handling. Data centers do this storage optimization to meet the data access needs of the users and ever increasing need of storage capacity.
Automated storage tiering
It is a process of actively moving data to different storage types. The less frequently accessed data is moved to Serial Advanced Technology Attachment (SATA) disk drives. Whereas frequently accessed data is moved to Serial Attached SCSI (SAS) or solid-state storage drives or Redundant Array of Independent Disks (RAID). This all process is done through storage management software. The result is that the user gets optimum data retrieval time for the mostly used data.
Social networking sites like Linkedin, facebook, twitter, instagram etc., usually repeats the same data time and over again. Data Deduplication detects those repeated patterns and eliminates repetitions to reduce data to a single instance. Each new reference to the same set of Data points to that same single physical copy. Thus the storage space wasted in repeatedly duplicating the same data is recovered and less storage capacity suffices to an optimized quantity of data.
In data compression, data structure is compressed to reduce the occupied space. Data compression involves compress and decompress data as required. This process needs high performance processors or it slows down the data access process that is why this is not suitable for online data. The optimized data can be moved to other storages without a need to decompress it. Software like WinZip and WinRar are examples of shifting data while in compressed form.
Thin provisioning (TP)
Data centers often use the methodology of thin provisioning for storage optimization. This methodology is adapted in virtual storages. Here unwritten/unutilized blocks of data storage are removed. Thus the storage is available logically to store more data than the available physical capacity. The data is not reduced but optimized for more capacity. Say in case 500GB of data capacity is allocated to a user. That whole capacity will never be fully utilized and initially even 1-2% of available capacity is utilized. That extra unused capacity is utilized to optimize the storage usage. Disk space available to new gmail accounts is a good example for this.
This step consists of identifying the data which is not in regular usage and moving it from a primary storage to a secondary storage. Archived storages are called long term storages. It could be similar to automated storage tiering with a difference of archived data removed from the main data stream. The directory structure or catalogue of archived data will be different from the primary data, therefore, the data retrieval time from primary data source will be much reduced.
The research and development over storage optimization methodology is continuously evolving. 5 basic steps of the same are discussed to give an introduction to users. Knowledge of data optimization methodology benefits the user in deciding the hosting services with the hosting companies. Since, the performance of websites, handling huge data, depends over the methodology the data center is using for data storage optimization in terms of data retrieval time. Therefore, consultation with the expert before taking a decision about hosting is imperative.
A Few Other Good Reads
Tell us your technology needs
Let us help you get your business technology infrastructure needs in check so you can focus on what really matters, the business.