Filed under: Apache, Featured, Magento Cart, MySql, OS Commerce, PHP, Shopping Carts, Wordpress, Xcart, Zen Cart
This article has all the optimization techniques , Performance tuning tips and guidelines that are required for any larger servers database, big ecommerce stores, Blogs, CMS Systems etc.
When getting ready to deploy and configure the Big shopping carts like Magento , there are some general technical environment components that must be taken into consideration to create an optimized setup. These range from physical hardware selection and network throughput, to the underlying Open Source stack, which are key underpinnings which help drive own configuration components.
Some of the Settings that we applied for some of the carts and blogs and achieved a great performance boost.
This Article is based on Magento Performance White paper :http://www.magentocommerce.com/whitepaper Thanks to Magento Team for publishing this wonderful article
When trying to achieve the optimal setup, please take the following into consideration:
With a large amount of concurrent users the sufficient amount of RAM is highly critical to handle all incoming connections. Faster, modern systems with multi-core CPUs, high front side bus speeds and fast hard drives, preferably at 7200RPM and above, will generally speed up the entire application.
Insufficient network I/O throughput and latencies in the internal network can significantly impact performance of a multi-server setup. Outbound connection latency may hurt the customers browsing the store frontend.
Since Magento is a PHP application that runs on the LAMP stack. Therefore, current, up-to-date and well-configured versions of the Linux Kernel, Apache, MySQL and PHP will provide better performance results. Proper configuration of the web-server, the database server and PHP itself is required in order to achieve optimal performance results. While Upgrading MySQL you need to check the big Bug list for your selected version, so only prefer Stable versions. If that is a Beta version then these may be cases that your server and database will crash.
Magento Configuration Components:
• Proper cache backend
• Handling sessions with fast storage
• Directory structure optimization
• Flat frontend catalog
• Magento Enterprise Edition cron scripts
• Rebuilding indexes
• Admin panel separation
• Proper search type selection
• Frontend layout complexity
• Number of HTTP requests on the page
• Using parallel connections
Proper Database Configuration
Proper MySQL configuration is one of the most important aspects of configuring any application. You will be wondering to know that Optimizing the MySQL configuration can provide up to a 70% performance. An incorrect configuration may result in the web-server spending more time in idle loops waiting to retrieve data from the database.
As an additional note, the default MySQL installation, even in later versions, is configured to use far less resources than average hardware can provide.
Let’s quickly go through the most important directives in the MySQL configuration file, my.cnf, and their recommended values.
The formulas that can be used for all the calculations can be found in the template file bundled within a MySQL distribution package like (my-huge.cnf, my-innodb-heavy-4G.cnf, my-large.cnf, my-medium.cnf and my-small.cnf
MySQL is improving its performance from its newer versions like 5.1 is 15 times Performance compare to 5.0.
So if you are using a MySQL 5.0 or 4.0 then you are wasting your Server resources.
• Available Memory
MySQL have Database Engines like InnoDB, MyISAM, so each has its own requirements, Magento uses InnoDB as its primary table storage engine type. InnoDB, unlike MyISAM, can use the in-memory buffer pool to cache table indexes and data. Less disk I/O is needed to get data from hard drives when the value of the in-memory buffer pool is set higher. A general recommendation is to set this parameter up to 80% of the available RAM for a dedicated database server. In cases where both the web-server and the database are running on the same machine, it is recommended to split the entire memory pool into two parts, each having its own primary assigned portion (e.g. on a single server with 6 GB RAM installed it can be split to have 2-2.5 GB used by MySQL, with the rest left for the web server).
The key parameter in this section is innodb_buffer_pool_size, which should be set to use as much available memory as possible:
Server Type innodb_buffer_pool_size
combined web and DB server, 6 GB RAM 2-3 GB
dedicated database server, 6 GB RAM 5 GB
dedicated database server, 12 GB RAM 10 GB
Today’s servers typically have more than 1 CPU installed, with 2 or more cores each. The InnoDB engine can effectively use multiple threads to serve more concurrent connections.
innodb_thread_concurrency should be set to a value equal or greater than 8, even for a single CPU. The recommended value is calculated with the following equation:
2*[Number of Total CPUs]+2
thread_cache_size allows for the caching of a client’s threads when a client disconnects, and to reuse them when new connections are created. The recommended value is from 8 to 64, and depends on your max_connections number.
thread_concurrency can be simply calculated as [number of CPUs] * multiplier . The multiplier value is between 2 and 4 and should be determined by testing the different values and benchmarking for the best results in your environment.
• Built-in Caching
table_cache( deprecated in MySQL 5.1) or table_open_cache( Available in MySQL 5.1 or greater) is the number of tables that can be simultaneously opened by MySQL. A value of 1024 will be sufficient for most, if not all, Magento Enterprise Edition sites.
Having the query cache enabled may result in significant speed improvements when you have a large amount of identical queries, which is the case for any eCommerce application frontend. The recommended values for a Magento database server are query_cache_size 64M and query_cache_limit 2M
A sort buffer is used for optimization of sorting in ORDER BY and GROUP BY queries. 8M is the recommended value for a Magento Enterprise Edition database.
• Slow queries logging
Logging slow queries might be useful for debugging purposes, but it should be disabled in production use.
• InnoDB storage
The InnoDB engine works with a single data storage file, which usually grows in time. It’s a good idea to have its initial state configured to be at least twice as large as the Magento database size, and innodb_autoextend_increment should be set to a fairly high value in order to avoid frequent data file extending operations.
InnoDB supports transactional operations by using transaction log files. Transaction log files are generally configured in groups of 2. The bigger the size of the transaction log file, the less often it performs I/O operations on primary storage files. However more time will be required to restore a database in the event it would be necessary.
Do not use multiple InnoDB table spaces unless you are sure you know the benefits in your particular hardware setup.
Apache Web Server Configuration
The most commonly used Apache configuration provides PHP support with mod_php. This Apache configuration loads a large number of modules. However, most of these modules are not necessary in order to run Magento. This becomes more relevant in a multiserver setup, where different tasks can be split on different nodes and each node has to be configured to perform its specific task the best.
The minimum required list of Apache modules is:
• mod_expires – generates content expiration and cache control headers
• mod_deflate – compresses content before it is delivered to the client
• mod_mime – associates the requested file with its type and behavior
• mod_dir–serves directory index files
• mod_rewrite–is used to support Search Engine Friendly URL’s
• mod_authz_host–is required to limit access to specific files
• mod_authz_user–might be required in a staging environment to setup password authentication, but on a live site it is not necessary
With all unused Apache modules disabled by commenting out the corresponding ‘LoadModule’ lines in httpd.conf, it is possible to cut memory consumed by Apache, which will allow more concurrent connections to be handled with the same amount of RAM.
Another important component is setting an optimal number of running Apache processes. The best method is to create the required number of Apache processes when the web server is started. This number should be calculated by measuring the memory amount consumed by Apache under the maximum load. This is currently the best threading method as mpm_worker cannot be safely used with PHP, and the process of forking every new Apache child in mod_prefork mode is an expensive operation.
Also note that ServerLimit and MaxClients values should be specified explicitly to prevent running out of physical memory and going into a swap file, causing a sever breakdown of webserver performance. MaxRequestsPerChild can be set to default value (4000).
Under a heavy load keeping persistent connections becomes disadvantageous, thus the KeepAlive directive should always be set to off.
mod_deflate allows to compress the content before sending it to the browser. Magento .htaccess file already includes the necessary settings to enable the compression. Please make sure to uncomment this section in order to decrease the page load time.
Additionally, you can take advantage of eliminating directory structure scans for .htaccess files by moving all .htaccess directives into appropriate <Directory> sections of the main httpd.conf file.
In order to reduce the I/O throughput on Apache web-nodes in a multi-server setup, it is advisable to use a load balancer capable of handling all of the logging activity, instead of having the activity handled by Apache backends.
Apache and MySQL Conclusion: An optimized MySQL and Apache configuration shows 55-70% performance increases on dynamic pages (that is most of the pages in the URL’s list). The homepage results are less affected as the default Magento setup has cache enabled by default
and caches make fewer queries to the database on hitting the homepage. Default Apache and
MySQL configuration is not able to handle higher concurrencies (100 concurrent sessions) resulting in the results for that concurrency varying a lot between tests.
PHP is an interpreted scripting language. The process of running a PHP script includes a few steps – reading a script file from the hard drive, parsing and compiling bytecode, and finally running that bytecode.
Realpath cache configuration
Optimization of file I/O is not only limited to using faster hard drives. It is also highly recommended to increase the default realpath_cache_size and realpath_cache_ttl values in php.ini settings. Based on tests the recommended values are realpath_cache_size=32k and realpath_cache_ttl=7200 on production servers.
The process of reading PHP scripts from disk and compiling them can be eliminated by enabling PHP accelerators. PHP accelerators cache compiled bytecode, resulting in less file and system I/O. Well known PHP accelerators eAccelerator and APC are tested and fully compatible with Magento. Their built-in shared memory can also be used as Magento cache storage.
To reduce the memory usage and speed up PHP performance you can enable in php.ini only the minimum set of PHP extensions required to run your application. The necessary extensions are:
• SOAP (if the API is to be used)
Conclusion: Adding a PHP accelerator provides a performance boost from 42% on simple pages (homepage) to 500-600% when different PHP-files are used (URL’s list). APC accelerator
provides good results, but from our tests eAccelerator is 15-20% more efficient.
Caching On Magento
Magento is able to cache frequently-used data utilizing different cache backends.
When installing Magento the filesystem is set to be used as the cache backend by default. Using a cache backend will always improve the performance of Magento, and while the filesystem cache is the most reliable storage with unlimited size, it will not provide the best performance.
Magento can also work with the following cache backends that provide better performance than the filesystem cache backend:
• APC – a bytecode cache for PHP, andalso provides a shared memory storage for application data
• eAccelerator – a PHP accelerator that can also cache dynamic content
• memcached – a distributed, high-performance caching system
Please make sure that if you are using APC, eAccelerator or memcached, you configure them with enough memory to include all cache data, otherwise they may purge required cache hierarchy structures and break the cache.
Conclusion: It may be required to disable the built-in Magento cache (that is enabled after installation by default) during active development, but please make sure that
caching is enabled on production sites as disabled cache makes the store frontend 5-6 times slower and less responsive under load.
Conclusion: The APC cache backend improves the results, which are 2-3 times better than the default filesystem cache backend. The memcached cache backend shows 10-15% better results than APC. And from the tests the eAccelerator cache backend shows the best results which are 5-10% faster than memcached.
Magento uses PHP sessions to store customer session data.
The default method is to use filesystem storage, which works well if you are using a single webserver. Its performance can be improved by configuring a tmpfs in-memory partition to avoid extra hard drive I/O activity.
In a clustered environment with multiple web-servers, the first option for handling sessions is to use a load-balancer capable of associating client requests to specific web-nodes based on client IP or client cookies. If you are in a clustered environment and not using a load-balancer capable of the above, it is necessary to share the session data between all the web-servers. Magento supports two additional session storage types that can be used in this case.
Storing session data in the database (though it is fully supported) is not recommended as it puts an additional load on the main database, and therefore requires a separate DB server to handle multiple connections efficiently under load in most cases.
memcached session storage is free of these disadvantages. memcached service can be run on one of the cluster servers to provide fast session storage for all web-nodes of the cluster. memcached session storage doesn’t show any performance improvements when used in a single-server configuration though, because of extra overhead processing compared to raw filesystem session files.
Conclusion: The default filesystem storage shows the best results on a single-server setup. The memcached session storage shows slightly different results (1-2% worse) and it can be considered an option in a clustered environment with a simple load-balancer setup. The database session storage should be used only in a clustered environment and only if the memcached storage cannot be used for some reason.
Directory Structure Optimization
Optimizing directory structure can also help fine tune Magento Enterprise Edition performance. It is highly recommended to use the Zend Framework distribution bundled within Magento Enterprise Edition as it is tweaked to significantly reduce the number of system calls required to locate a file in the directory structure. This is accomplished by commenting out all extra require_once directives within the configuration file. The additional require_once calls are not required because Magento Enterprise Edition implements its own autoload function that handles all necessary file requests on demand. Recent Magento Enterprise Edition versions (since version 1.3.x) include the Magento Enterprise Edition Compilation Module (Mage_Compiler) which provides extra optimization by placing all the files in one directory and combines the most used classes in a few single files.
Conclusion: Enabling Magento Enterprise Edition Compilation Module provides a 10-15% additional performance boost.
Flat Frontend Catalog
Starting in Magento 1.3.x the Flat Frontend Catalog module was introduced. The Flat Frontend Catalog module maintains an additional set of database tables to store catalog data in a linear format, with extra indexes that facilitate executing database queries on the frontend catalog pages.
Flat Frontend Catalog structures were implemented for both category and product data. Flat Categories are recommended for any Magento installation for improved performance, whereas the Flat Products is designed and recommended for catalogs that have over 1,000 SKU’s.
To enable one or both, first go to your administrator panel and navigate to System -> Cache Management. Under Catalog click on the Rebuild button next to Rebuild Flat Catalog Category or Rebuild Flat Catalog Product. Note: If only Flat Catalog Categories are used there is no need to rebuild the Flat Catalog Product.
Navigate to System -> Configuration, click on Catalog and select the Frontend tab. Choose Yes next to the appropriate selection of either Use Flat Catalog Category or Use Flat Catalog Product.
Conclusion: Enabling the Flat Catalog module does not add much to the homepage test results as the default homepage does not contain product listing information and the categories menu is efficiently cached by default. However, when it comes to browsing the frontend catalog the performance gain is around 2-3% on smaller catalogs and reaches 8-10% on larger catalogs with complex category structures and large numbers of products.
Admin Panel Separation
Admin panel operations are in general more resource consuming than the frontend activities. Often they require increasing PHP memory limits or having extra extensions compiled into PHP. Therefore, having a dedicated admin server can help make admin panel operation faster while not impacting the frontend configuration and performance. The separation can be done by specifying different base URL’s on a global level, and for frontend websites and stores. Each separate domain name can then be served by a separate server.
Proper Search Type Selection
Magento supports 3 search types that can be selected in configuration – LIKE, FULLTEXT, and their combination. FULLTEXT search is known to be faster and puts less load on the database.
Number of HTTP Requests Per Page
<reference> <action method=”addJs”><script>custom_js/gallery.js</script></action> <action method=”addJs”><script>custom_js/intro.js</script></action> </reference>
The example layout file above will combine those 2 scripts in a single file that will be added to the page with one request to js/index.php?c=auto&f=,custom_js/gallery.js,custom_js/intro.js.
Using Parallel Connections
You can find the detailed list of website design best practices at the Yahoo Developer Network at http://developer.yahoo.com/performance/rules.html
Media and Static Content Delivery
Though Apache is a fast and reliable web-server, there are other web-server options that are known to serve static content and media files more efficiently, consuming less memory and CPU time.
Widely used are nginx, lighttpd and tinyhttpd. These are multiplexing web-servers, which don’t have built-in scripting languages support, but can handle thousands of simultaneous connections per server.
Additional Performance Gains
Static content delivery can be improved using a caching reverse proxy, such as Squid, or an HTTP accelerator like Varnish. A reverse proxy can locally cache content received from Apache to reduce the load on the Apache backends.
Another way to reduce your server load and to get smaller network latencies is using a content delivery networks (CDN). Most CDN’s support pushing media content through a simple API and can be integrated with the Magento backend quite easily.
Magento is designed to utilize benefits of running a multi-server setup in a clustered environment. Web-nodes are not limited to be of exactly the same type. There might be different nodes performing different tasks (frontend servers, static content and media servers and a separate admin panel server).
Scaling DB nodes
Magento works with a database in a manner that easily allows separating database connections for read and write activities. Each particular module can use its own connections if needed, which is fully customizable and can be easily set in app/etc/local.xml.
The following configuration snippet shows how to setup 2 separate connections to master and slave DB servers:
<dbname><![CDATA[Magento Enterprise Edition]]></dbname>
<dbname><![CDATA[Magento Enterprise Edition]]></dbname>
<initStatements>SET NAMES utf8</initStatements>
Magento can be scaled over any number of additional web-servers, which will allow for the handling of more concurrent requests by simply introducing new web-nodes when the number of page views and visitors grow. When the number of visitors grows, doubling the number of web-nodes can provide up to 60-70% performance increase.
Our results show better performance on physical dedicated servers than cloud environment.
Software Versions Used
CentOS release 5.3 (Final) Linux 2.6.24-23-xen SMP x86_64 GNU/Linux
mysqlVer 14.14 Distrib 5.1.36, for redhat-linux-gnu (x86_64) using readline 5.1 PHP 5.2.10 Apache/2.2.3
memcached 1.2.5 SIEGE 2.69 Magento Enterprise Edition 22.214.171.124
If you still have any questions you are always welcome to contact us or comment.
Asia’s largest conference on open source, to be held from 19th-21st September, 2010 at Chennai, India (more: http://osidays.com).
OSI Days 2010 is the 7th and latest conference in the rich legacy established by the Linux Asia series of conference in India. Organised by the Forum for Open Source Initiatives in India (FOSII) and the Linux for You magazine (part of the EFY Group), OSI Days serves as the focal point for the convergence of the Open Source Community and Industry in Asia.
The conference is targeted at the Policy & Decision makers in a technological ecosystem – Government, Academicians, CXOs, SMEs, Developers and hardcore hackers. OSI Days 2010 will bring together over 3000 of the finest people in the open source domain together to discuss and confer on varied and relevant topics including:
- Mobile: App Development, Game Development, Android, iPhone, Symbian & Others
- IT Managers / Business: Legal, Community Management, Best Practices, Marketing Strategies, Open Web / Standardization, Business Models
- Cloud Computing: Tools and Platforms, Cloudnomics, Cloud for Dummies & Others
- Government: Applications, eGovernance , Case Study, Legal
- Hardware: Infrastructure Management, Security, Semi Embedded Devices, Parallelization, Grid, Multi Core, Multi Threading, Virtualization & Others
- PHP: PHP 5 & 6, PHP Security, Frameworks, Architecture / QA & Best Practices
- Ruby on Rails
- Drupal: Best Practices, Module Development, Theme Development, Scaling/ Management/ Performance & Others
- Databases: MySQL, NoSQL, CouchDB, PostgreSQL, Ingres, SQLite & Others
- Java Script
- Developer / Tools & Techniques
(For details: please See the conference schedule at: http://osidays.com/schedule)
The Call for Papers are open for the conference till June 15th (more: http://osidays.com/call-for-papers). We invite you to come join us in promoting open source technologies and projects by participating at the confernce as speakers and contributing to the knowledge and wisdom at OSI Days 2010.
For any clarifications,
OSI Days 2010
Schedule at a Glance
FOSS for Everyone
Mobile Application Development
Zend Certification Training
IT Dev Web
IT Dev Mob
Open Source Databases
Open Source Databases
Some technologies come on the information technology landscape and stay, providing long-lasting benefits, whereas others are more of a short term fad and ultimately end up disappearing because the value they supplied was too niche oriented and/or they were quickly supplanted by another technology that is better. Recently, articles, blogs, analyst reports, and other media outlets have been noting the rise and usage of column-oriented databases in the areas of data warehousing, analytics, and other business intelligence/read-intensive situations. And on the MySQL front, there are a couple of column DB’s that are now available for you to use.
Are column-oriented databases a technology that is destined to stay and provide long-term benefits or will it be relegated to the forgotten pile of other software that came on the scene quickly and then disappeared?
Let’s look at three key questions that are consistently asked of column-oriented databases and see how the technology stacks up:
- How do column-oriented databases work?
- Do column-oriented databases really make a difference?
- What learning curve (application/database development, etc.) is involved with column-oriented databases?
How Do Column-Oriented Databases Work?
All the legacy relational databases currently being offered today were and are primarily designed to handle online transactional processing (OLTP) workloads. A transaction (e.g. an online order for a book through Amazon or another Web-based book dealer) typically maps to one or more rows in a relational database, and all traditional RDBMS designs are based on a per row paradigm. For transactional-based systems, this architecture is well-suited to handle the input of incoming data.
However, for applications that are very read intensive and selective in the information being requested, the OLTP database design isn’t a model that typically holds up well. Whereas transactions are row-based, most database queries are column-based. Inserting and deleting transactional data are well served by a row-based system, but selective queries that are only interested in a few columns of a table are handled much better by a column-oriented architecture. On average, a row-based system does 5-10x the physical I/O that a column-based database does to retrieve the same information. Taking into account that physical I/O is typically the slowest part of a query, and that an analytical query typically touches significantly more rows of data that a typical transactional database operation, the performance gap between row-oriented architectures and column-oriented architecture oftentimes widens as the database grows.
To get around their selective query inefficiencies, row-based RDBMS’s utilize indexing, horizontal partitioning, materialized views, summary tables, and parallel processing, all of which can provide benefits for intensive queries, but each comes with their own set of drawbacks as well. For example, while indexing can certainly help queries complete faster in some cases, they also require more storage, impede insert/update/delete and bulk load operations (because the indexes must be maintained as well as the underlying table), and can actually degrade performance when they become heavily fragmented. Moreover, in business intelligence/analytic environments, the ad-hoc nature of such scenarios makes it nearly impossible to predict which columns will need indexing, so tables end up either being over-indexed (which causes load and maintenance issues) or not properly indexed and so many queries end up running much slower than desired.
Those not familiar with a column-oriented database might wonder exactly what they are and what actual benefits they deliver over a legacy RDBMS. It’s important to note that, on the surface, a column-oriented database appears exactly like a traditional relational database: the logical concepts of tables and rows are the same, SQL commands are used to interact with the system, and most other RDBMS paradigms (e.g. security, backup/recovery, etc.) remain unchanged.
But, a column-oriented database specifically designed for analytics overcomes the query limitations that exist in traditional RDBMS systems by storing, managing, and querying data based on columns rather than rows. Because only the necessary columns in a query are accessed rather than entire rows, I/O activities as well as overall query response times can be reduced. In other words, if you don’t have to read an entire row to get the data you need, why do it?
The end result for column databases is the ability to interrogate and return query results against either moderate amounts of information (tens or hundreds of GB’s) or large amounts of data (1-n terabytes) in much less time that standard RDBMS systems can.
The good news for you who use MySQL is that the storage engine architecture allows column database vendors to plug their technology into MySQL and voila! You now have at your disposal a powerful alternative to other MySQL engines that can really tackle serious data needs.
Do Column-Oriented Databases Really Make a Difference?
So column databases look pretty good from a technical blueprint perspective, but do they really walk the talk in the real world? If they do, then their impact will be substantial because, in the end, the back end database used for BI or read-intensive work is the #1 overall contributor to a well running system.
The Data Warehouse Institute (TDWI) did a recent study and found that (not surprisingly…) the most important component in a business intelligence implementation was the database server itself.
Further, TDWI found that nearly half of those it polled are ready to replace their database used for business intelligence applications with another, more modern alternative. When asked what the technical reasons were for the replacement, the number one answer was the inability for the legacy RDBMS to service queries in the time needed:
Interesting information, but do column databases really have the capability to help the pain these folks talk about?
As an example of how a column-oriented database can outperform a legacy RDBMS, Calpont recently commissioned a well-known data warehouse industry expert – Bert Scalzo – to benchmark a leading row-based database (of which he has many years of experience in tuning for fast performance) against Calpont’s InfiniDB Server (Community Edition), which has as one of its core features, a column-oriented design. A Star Schema styled benchmark was conducted on two different machines to gauge performance on both mid and large-sized servers. The mid-sized server was an 8 CPU, 8GB RAM, 14 SATA 7200 RAID-0 no cache configuration, and the large server was a 16 CPU, 16GB RAM, 14 SAS 15K RPM RAID-0 with 512MB cache machine. Both were running 64-bit CentOS 5.4. The raw database size was 2TB.
As can be seen on the graphs below, various configurations were used for the row-based database, however no matter the configuration, the column-oriented InfiniDB database consistently beat the legacy database in storage footprint, load time, and query speed:
In summary, the column database saves on storage costs, supplies faster access to new/incoming data, and runs query much faster than its row-based competitor.
Notice also, that in addition to producing overall faster query speeds, InfiniDB also supplied much better query predictability in terms of query time. Whereas the row-based database produced wildly varying minimum and maximum query times over the various runs, the column database had a far more tightly group of runs when it came to predictable response times. This translates into much better dependability from a business standpoint in ensuring BI reports and queries meet whatever service-level agreements are imposed from business users.
Lastly, whereas the row-based database had been maxed out performance-wise in the benchmark tests, if a user wished to get even faster performance from InfiniDB, they could move from the Community Edition to the Enterprise Edition, which supports massive parallel processing (MPP) and shave the query times literally in half with the addition of a new node. Further, they could get their query time in half again by adding two more nodes (for a total of four) and continue to work in MPP fashion with more nodes until they reach whatever final query times they desire.
In addition to better performance, the column-orientation aspect of column databases supplies a number of useful benefits to those wishing to deploy fast business intelligence databases.
First, there is no need for indexing as with traditional row-based databases. The elimination of indexing means: (1) less overall storage is consumed in column databases because indexes in legacy RDBMS’s often balloon the storage cost of a database to double or more the initial data size; (2) data load speed is increased because no indexes need to be maintained; (3) ad-hoc DML work speed is increased because no index updates are performed; (4) no indexing design or tuning work is imposed on the database IT staff.
Second, there is far less design work forced on database architects when column databases are used. The need for complicated partitioning schemes, materialized view or summary table designs, and other such work is completely removed because column databases need none of these components to achieve superior query performance.
What Learning Curve is involved with Column-Oriented Databases?
You’ll be pleased to find that the learning curve associated with moving from legacy, row-based RDBMS’s to a column database is very small if not completely non-existent. Unlike other databases that came on the scene in prior years which required either different programming paradigms (e.g. object-oriented databases) or learning new design methodologies and database access languages (e.g. OLAP databases), column databases look and handle just like standard relational databases. They use the same ANSI standard SQL language, security methods, and require no development paradigm changes.
In fact, column databases actually lessen the burden on both the development and administration staff because they do away with the need for indexing exercises, data partitioning schemes, supplementary object designs (e.g. materialized views), and other similar tasks. The ease of use factor, therefore, is greater with column databases than it is with traditional RDBMS’s. Moreover, they do not require such specialized in-house expertise to build highly-performant systems.
In the end, the answer as to why you should consider a column database over a legacy RDBMS comes down to the fact that column databases do indeed make a big impact in how data warehouses, BI databases, and read-intensive systems perform. This makes column databases a good choice for today as well as a technology whose benefits will extend many years down the road.
To download the Community Edition of InfiniDB that uses MySQL as its front end, as well as free documentation, go to http://www.infinidb.org/downloads. To obtain a trial of InfiniDB Enterprise, please visit: http://www.calpont.com/.
What’s New in MySQL 5.5
It’s been a busy year for MySQL. Perhaps you’ve heard. Here are some recent improvements to the speed, scalability, and user-friendliness of the MySQL database and the InnoDB storage engine that we think deserve their own headlines. Now is a great time to beta test the 5.5 release and give feedback to the MySQL engineering team.
Improved Performance and Scalability
- InnoDB Becomes Default Storage Engine
- MySQL sometimes gets knocked about features such as as ACID-compliant transactions, foreign key support, and crash recovery. These features are strongest in the InnoDB storage engine, but MyISAM has always been the default, so new users could get the wrong impression. Starting in MySQL 5.5, InnoDB is the default storage engine, so that everyone can see this reliability and stability out of the box. As a bonus, the level of InnoDB in MySQL 5.5 is InnoDB 1.1, a rearchitected InnoDB with many performance and scalability features over and above the built-in InnoDB in 5.1 and before. (Since we are unifying the InnoDB within MySQL using the best and fastest technology, we are phasing out the
Built-Indistinction; MySQL 5.5 comes with the latest and greatest InnoDB 1.1.) Read more about the latest InnoDB enhancements below.
- Better Metadata Locking within Transactions
- If a table is referenced within a transaction, no other transaction can perform DDL such as DROP TABLE or ALTER TABLE until the first transaction commits. Previously, the lock was released at the end of a statement rather than the whole transaction. Read more about metadata locking within transactions.
- Improved Performance and Scale on Win32 and Win64
- If your company uses Windows by itself or in a mixed environment, you probably want to deploy MySQL databases on Windows. To make that a reality, the MySQL team has incorporated a number of Windows-specific features for speeding up and scaling up.
- Windows API calls for much of the I/O done inside MySQL (a community contribution, hat tip to Jeremiah Gowdy).
- Ability to build engines and other plugins as DLLs on Windows.
- Network support for auto-detecting the MAC address (a community contribution, hat tip to Chris Runyan).
- Much cleanup and simplifying of threading code.
- Semi-Synchronous Replication
- This feature improves the reliability of failover, to avoid failing over to a slave that is missing some committed changes from the master. You can choose to have commits on the master node wait until at least one slave has logged the relevant events for the transaction. The
semi-synchronousaspect is because the master does not wait for all the slaves to acknowledge, and there is a protocol to avoid the master waiting too long if the slaves fall behind. Read more about semisynchronous replication.
- Replication Heartbeat
- In replication, the
heartbeatis a message sent at regular intervals from a master node to the slave nodes. You can configure the heartbeat period. If the message is not received, the master knows that the slave node has failed. You can now avoid the spurious relay log rotation when the master is idle, rely on an more precise failure detection mechanism, and have an accurate estimation for seconds behind master. (This is a different feature than
Linux heartbeat, which is a similar health-checking system for cluster nodes.) To use this feature, you issue commands like:
CHANGE MASTER SET master_heartbeat_period= milliseconds; SHOW STATUS like 'slave_heartbeat period' SHOW STATUS like 'slave_received_heartbeats'
- The SIGNAL and RESIGNAL statements allow you to implement familiar exception-handling logic in your stored procedures, stored functions, triggers, events, and database applications that call those things. SIGNAL passes execution back to an error handler, like THROW or RAISE statements in other languages. You can encode the error number, SQLSTATE value, and a message in a consistent way that can be interpreted by an error handler in the calling program. RESIGNAL lets you propagate the exception after doing some amount of error handling and cleanup yourself. With RESIGNAL, you can pass along the original error information or modify it. Read more about SIGNAL/RESIGNAL.
- More Partitioning Options
- With the new RANGE COLUMNS and LIST COLUMNS clauses of the CREATE TABLE statement, partitioning is now more flexible and also can optimize queries better. Instead of expressions, you specify the names of one or more columns. Both of these clauses let you partition based on DATE, DATETIME, or string values (such as CHAR or VARCHAR). Partition pruning can optimize queries on tables that use RANGE COLUMNS or LIST COLUMMS partitioning, and WHERE conditions that compare different columns and constants, such as
a = 10 AND b > 5
a < "2005-11-25" AND b = 10 AND c = 50
- Performance Schema
- The Performance Schema feature involves an optional schema, named performance_schema, with tables that you can query to see intimate details of low-level MySQL performance. You can get information about performance right at that moment, or various amounts of historical performance data. You can clear the data to reset the figures, filter and format the data using WHERE clauses, and generally interact with it using all sorts of SQL goodness. Performance Schema data now also includes details about the InnoDB storage engine. Read more about Performance Schema.
What’s New in InnoDB
To make a long story short: it’s all about performance and scalability! To those who enjoy trying all permutations of configuration settings, we apologize in advance for making so many of these improvements take no thought or effort at all.
At this year’s MySQL Conference & Expo, you’ll hear about the InnoDB Plugin 1.0.7, the first production-ready (GA) release of the InnoDB Plugin. Most of the enhancements listed here are from InnoDB 1.1, which is part of MySQL 5.5 and thus is still in beta. Download MySQL 5.5 and try them out.
- Improved Recovery Performance
- One of InnoDB’s great strengths is its ability to reliably recover data after any type of crash that affects the database. But this cleanup and checking makes the next restart take longer. Well, cover up your sundial. Put away your hourglass. The enterprising InnoDB team has improved the algorithms involved in recovery by a huge amount — in computer science terms, it’s a better
big-Onumber. Now you will need to keep your finger ready on the stopwatch to see how long recovery takes. This feature is available both in InnoDB 1.1 and the InnoDB Plugin 1.0.7. Read more about faster recovery.
- Multiple Buffer Pool Instances
- With today’s buffer pools frequently in the multi-gigabyte range, pages are constantly being read and updated by different database threads. This enhancement removes the
bottleneckthat makes all the other threads wait when one thread is updating the buffer pool. All the structures normally associated with the buffer pool can now be multiplied, such as the mutex that protects it, the LRU information, and the flush list. You control how many buffer pool instances are used; the default is still 1. This feature works best with combined buffer pool sizes of several gigabytes, where each buffer pool instance can be a gigabyte or more. Read more about multiple buffer pool instances.
- Multiple Rollback Segments
- This feature is both a performance and a scalability improvement. By dividing the single rollback segment into multiple parts, InnoDB allows concurrent transactions to create undo data (from insert, update, and delete operations) without making each other wait. A happy consequence is that the old limit of 1023 simultaneous inserting / updating / deleting transactions is now much higher, for a total of approximately 128K concurrent writer transactions. This feature does not introduce any incompatibility in the InnoDB file format, and does not require using the newer
Barracudafile format. However, the setup within the system tablespace only takes place when the system tablespace is created, so to take advantage of this feature, you must create a new instance (not just a new table or a new database) and import the data into it. Read more about multiple rollback segments.
- Native Asynchronous I/O for Linux
- This feature enables better concurrency of I/O requests on Linux systems. With asynchronous I/O, an I/O request can be sent off and the thread servicing the query does not need to wait for the I/O to complete; that aspect is delegated to the I/O helper threads. InnoDB already supported asynchronous I/O on Windows systems. On platforms other than Windows, InnoDB internally arranged its I/O calls as if they were asynchronous (leading to the term
simulated asynchronous I/O), but behind the scenes the query thread really would block until the request finished. Now true asynchronous I/O support (called
native asynchronous I/Oso it won’t be confused with references to
asynchronousalready in the source) is available on Linux as well as Windows. This feature requires the libaio userspace library to be installed on Linux. It comes with a configuration option innodb_use_native_aio that you can turn off in case of any startup problems related to the I/O subsystem. Read more about asynchronous I/O for Linux.
- Extended Change Buffering: Now with Delete Buffering and Purge buffering
- InnoDB uses indexes to make queries faster. Secondary indexes, those on columns other than the primary key, require work (meaning disk writes) to keep them up to date when those those columns are inserted, deleted, or updated. For example, if you run the command DELETE FROM t WHERE c1 = 'something';, and you have a secondary index on column c2, what’s the rush to update that secondary index? Its contents might not be in the buffer pool, and maybe the index won’t be read for a long time.
InnoDB has had an optimization for a while now to delay disk writes for secondary index maintenance when the changes are due to inserts. This delay waits for the index contents to be read into the buffer pool for some other reason, such as a query, where the changes can be made quickly in memory and then flushed back to disk using the normal schedule for writing dirty blocks. When the changes in the buffer pool affect a group of sequential disk blocks, they can be flushed more efficiently than if the data was written piece by piece. Very clever!
In InnoDB 1.1, this technique is extended to include the different kinds of writes caused by deletes (an initial
delete markingoperation, followed later by a
purgeoperation that garbage-collects all the deleted records). This optimization is under your control through the innodb_change_buffering configuration option, which has a new default of all. (We call the optimization
change bufferingrather than the old name
insert buffering; the actual memory structure is still called the
insert buffer.) Read more about enhanced change buffering.
The scalability improvements in InnoDB 1.1 revolve around better isolation of threads and mutex contention. These are performance-type improvements that really kick in when the database server is heavily loaded. (For those of you who are not yet experts on InnoDB performance,
mutexes are in-memory structures that prevent different threads from interfering with each others’ changes to important memory areas like the buffer pool.)
- Improved Log Sys Mutex
- Previously, a single mutex protected different memory areas related to the undo and logging information. In particular, this mutex blocked access to the buffer pool, while changes were being written there by DDL operations making changes to the data dictionary. Splitting the old log_sys mutex to create a separate log_flush_order mutex means that all of this internal processing can happen with less waiting and less blocking of other operations involving the buffer pool, without any configuration needed on your part. Read more about improved log sys mutex.
- Separate Flush List Mutex
- Along the same lines, operations involving the buffer pool and the flush list previously were protected by a single mutex, which could cause unnecessary delays. (The buffer pool mutex has historically been very
hot, so any other operation that tied up the buffer pool was adding fuel to the fire.) Now the flush list has its own mutex, reducing contention with buffer pool operations and making InnoDB faster without any configuration needed on your part. Read more about separate flush list mutex.
- Improved Purge Scheduling
- The InnoDB purge operation is a type of garbage collection that runs periodically. Previously, the purge was part of the master thread, meaning that it could block some other database operations. Now, this operation can run in its own thread, allowing for more concurrency. You can control whether the purge operation is split into its own thread with the innodb_purge_threads configuration option, which can be set to 0 (the default) or 1 (for a single separate purge thread). This architectural change might not cause a big speedup with this single purge thread, but it lays the groundwork to tune other bottlenecks related to purge operations, so that in the future multiple purge threads could provide a bigger performance gain. The configuration option innodb_purge_batch_size can be set from 1 to 5000, with default of 20, although typical users should not need to change that setting. Read more about improved purge scheduling.
- InnoDB Stats in Performance Schema
- The Performance Schema has been part of MySQL 5.5 for a while now. InnoDB 1.1 is instrumented for the first time for Performance Schema monitoring, with statistics available for InnoDB-specific mutexes, rw-locks, threads, and I/O operations. The data is structured so that you can see everything, or filter to see just the InnoDB items. The information in the performance_schema tables lets you see how these items factor into overall database performance, which ones are the
hottestunder various workloads and system configurations, and trace issues back to the relevant file and line in the source code so you can really see what’s happening behind the scenes. Read more about InnoDB integration with Performance Schema.
After you import the data. execute the following queries to delete all the customers, orders, wishlist info, logs, reports, stored carts.
This script will start you over, remember to backup first!
SET FOREIGN_KEY_CHECKS=0; TRUNCATE `sales_order`; TRUNCATE `sales_order_datetime`; TRUNCATE `sales_order_decimal`; TRUNCATE `sales_order_entity`; TRUNCATE `sales_order_entity_datetime`; TRUNCATE `sales_order_entity_decimal`; TRUNCATE `sales_order_entity_int`; TRUNCATE `sales_order_entity_text`; TRUNCATE `sales_order_entity_varchar`; TRUNCATE `sales_order_int`; TRUNCATE `sales_order_text`; TRUNCATE `sales_order_varchar`; TRUNCATE `sales_flat_quote`; TRUNCATE `sales_flat_quote_address`; TRUNCATE `sales_flat_quote_address_item`; TRUNCATE `sales_flat_quote_item`; TRUNCATE `sales_flat_quote_item_option`; TRUNCATE `sales_flat_order_item`; TRUNCATE `sendfriend_log`; TRUNCATE `tag`; TRUNCATE `tag_relation`; TRUNCATE `tag_summary`; TRUNCATE `wishlist`; TRUNCATE `log_quote`; TRUNCATE `report_event`; ALTER TABLE `sales_order` AUTO_INCREMENT=1; ALTER TABLE `sales_order_datetime` AUTO_INCREMENT=1; ALTER TABLE `sales_order_decimal` AUTO_INCREMENT=1; ALTER TABLE `sales_order_entity` AUTO_INCREMENT=1; ALTER TABLE `sales_order_entity_datetime` AUTO_INCREMENT=1; ALTER TABLE `sales_order_entity_decimal` AUTO_INCREMENT=1; ALTER TABLE `sales_order_entity_int` AUTO_INCREMENT=1; ALTER TABLE `sales_order_entity_text` AUTO_INCREMENT=1; ALTER TABLE `sales_order_entity_varchar` AUTO_INCREMENT=1; ALTER TABLE `sales_order_int` AUTO_INCREMENT=1; ALTER TABLE `sales_order_text` AUTO_INCREMENT=1; ALTER TABLE `sales_order_varchar` AUTO_INCREMENT=1; ALTER TABLE `sales_flat_quote` AUTO_INCREMENT=1; ALTER TABLE `sales_flat_quote_address` AUTO_INCREMENT=1; ALTER TABLE `sales_flat_quote_address_item` AUTO_INCREMENT=1; ALTER TABLE `sales_flat_quote_item` AUTO_INCREMENT=1; ALTER TABLE `sales_flat_quote_item_option` AUTO_INCREMENT=1; ALTER TABLE `sales_flat_order_item` AUTO_INCREMENT=1; ALTER TABLE `sendfriend_log` AUTO_INCREMENT=1; ALTER TABLE `tag` AUTO_INCREMENT=1; ALTER TABLE `tag_relation` AUTO_INCREMENT=1; ALTER TABLE `tag_summary` AUTO_INCREMENT=1; ALTER TABLE `wishlist` AUTO_INCREMENT=1; ALTER TABLE `log_quote` AUTO_INCREMENT=1; ALTER TABLE `report_event` AUTO_INCREMENT=1; -- reset customers TRUNCATE `customer_address_entity`; TRUNCATE `customer_address_entity_datetime`; TRUNCATE `customer_address_entity_decimal`; TRUNCATE `customer_address_entity_int`; TRUNCATE `customer_address_entity_text`; TRUNCATE `customer_address_entity_varchar`; TRUNCATE `customer_entity`; TRUNCATE `customer_entity_datetime`; TRUNCATE `customer_entity_decimal`; TRUNCATE `customer_entity_int`; TRUNCATE `customer_entity_text`; TRUNCATE `customer_entity_varchar`; TRUNCATE `log_customer`; TRUNCATE `log_visitor`; TRUNCATE `log_visitor_info`; ALTER TABLE `customer_address_entity` AUTO_INCREMENT=1; ALTER TABLE `customer_address_entity_datetime` AUTO_INCREMENT=1; ALTER TABLE `customer_address_entity_decimal` AUTO_INCREMENT=1; ALTER TABLE `customer_address_entity_int` AUTO_INCREMENT=1; ALTER TABLE `customer_address_entity_text` AUTO_INCREMENT=1; ALTER TABLE `customer_address_entity_varchar` AUTO_INCREMENT=1; ALTER TABLE `customer_entity` AUTO_INCREMENT=1; ALTER TABLE `customer_entity_datetime` AUTO_INCREMENT=1; ALTER TABLE `customer_entity_decimal` AUTO_INCREMENT=1; ALTER TABLE `customer_entity_int` AUTO_INCREMENT=1; ALTER TABLE `customer_entity_text` AUTO_INCREMENT=1; ALTER TABLE `customer_entity_varchar` AUTO_INCREMENT=1; ALTER TABLE `log_customer` AUTO_INCREMENT=1; ALTER TABLE `log_visitor` AUTO_INCREMENT=1; ALTER TABLE `log_visitor_info` AUTO_INCREMENT=1; TRUNCATE `sales_payment_transaction`; ALTER TABLE `sales_payment_transaction` AUTO_INCREMENT=1; TRUNCATE sales_invoiced_aggregated; TRUNCATE sales_refunded_aggregated; TRUNCATE sales_shipping_aggregated; -- Reset all ID counters TRUNCATE `eav_entity_store`; ALTER TABLE `eav_entity_store` AUTO_INCREMENT=1; SET FOREIGN_KEY_CHECKS=1;
If I miss anything please point out.
This scripts works perfectly when I do test on magento
Configuration Differences in MySQL 5.4 from MySQL 5.1
There are Several variables and options are new or changed in MySQL 5.4 to provide more flexible runtime configuration, and better “out of box” configuration of default values for MySQL operation on up to 16-way x86 servers and 64-way CMT servers with 4GB or more memory.
The changes to
InnoDB configuration values may cause issues if you upgrade to MySQL 5.4 from an older version or MySQL, or if you upgrade from MySQL 5.4.0 through 5.4.2 to 5.4.3 or higher. See Section 126.96.36.199, “Upgrading from MySQL 5.1 to 5.4”.
These system variables are new:
innodb_adaptive_flushing: Controls adaptive flushing of dirty pages. Default:
innodb_change_buffering: Controls insert buffering. Default:
inserts(buffer insert operations).
innodb_file_format: The format for new InnoDB tables. Default:
innodb_file_format_check: Whether to perform file format compatibility checking. Default:
innodb_io_capacity: The limit on the maximum number of I/O operations per second (IOPS) the server can perform. Default: 200.
innodb_read_ahead_threshold: Controls sensitivity of linear read-ahead. Default: 56.
innodb_replication_delay: The replication thread delay (in ms) on the slave server if
innodb_thread_concurrencyis reached. Default: 0.
innodb_write_io_threads: The number of background I/O threads to use for read prefetch requests and for writing dirty pages from the buffer cache to disk. Default: 4.
innodb_spin_wait_delay: Maximum delay between polls for a spin lock. Default: 6.
innodb_stats_sample_pages: How many index pages to sample for statistics calculations. Default: 8.
innodb_strict_mode: Whether InnoDB returns errors rather than warnings for certain exceptional conditions (analogous to strict SQL mode). Default:
innodb_use_sys_malloc: Whether InnoDB uses the OS (system) or its own memory allocator. Default:
innodb_version: The version of InnoDB.
More information about the new system variables can be found in the
InnoDB Plugin Manual at http://www.innodb.com/products/innodb_plugin/plugin-documentation.
The default or minimum value of these existing system variables has changed:
innodb_additional_mem_pool_size: Default increased from 1MB to 8MB.
innodb_buffer_pool_size: Default increased from 8 MB to 128MB. Minimum increased from 1MB to 5MB.
innodb_file_io_threads: Removed (replaced by
innodb_log_buffer_size: Default increased from 1MB to 8MB.
innodb_max_dirty_pages_pct: Default decreased from 90 to 75. Maximum decreased from 100 to 99 to never allow a completely dirty buffer pool.
innodb_sync_spin_loops: Default changed from 20 to 30.
innodb_thread_concurrency: Default changed from 8 to 0. In effect, this changes concurrency from 8 to “infinite”.
table_definition_cache: Default and minimum increased from 256 to 400.
table_open_cache: Default increased from 64 to 400.
These system variables have been made dynamic and can be modified at runtime:
This status variable is new:
Innodb_have_atomic_builtins: Indicates whether the server was built with atomic instructions.
This server option is new:
--super-large-pages: Boolean option. Large page support is enhanced for recent SPARC platforms. Standard use of large pages in MySQL attempts to use the largest size supported, up to 4MB. Under Solaris, a “super large pages” feature enables uses of pages up to 256MB. This feature can be enabled or disabled by using the
Filed under: Databases, eCommerce, Featured, Magento Cart, MySql, Shopping Carts
How to set different port no for mysql database in Magento
Open the Configuration file where you setup database properties, i.e app/etc/local.xml
Just Replace the PORTNO in below xml with your port number.
<default_setup> <connection> <host><![CDATA[HOSTADDRESS]]></host> <username><![CDATA[USERNAME]]></username> <password><![CDATA[PASSWORD]]></password> <dbname><![CDATA[DBNAME]]></dbname> <port><![CDATA[PORTNO]]></port> <active>1</active> </connection> </default_setup>
The acronym SEQUEL was later shortened to SQL because SEQUEL was a trademarked name; this means that the correct pronunciation of SQL is sequel not es-que-el (The 1995 SQL Reunion: People, Projects, and Politics). …
The scripts below and the video capture walkthrough setting up a Microsoft SQL Server Replication Distributor using SQL commands. As discussed in previous posting, the Distributor is a key role in Replication. …
We are currently seeking a Java PL/ SQL Developer for our client in the Consulting domain. We value our professionals, providing comprehensive benefits, exciting challenges, and the opportunity for growth. This is a contract position …