Amazon Archives | Datafloq

The Most Pervasive Mistakes to Dodge Throughout a Cloud Migration

Daniel Jacob — Wed, 06 Oct 2021 10:43:05 +0000

Cloud migration Overview

During a cloud migration, a business transfers part or all of its data center capabilities to the cloud, often to operate on cloud-based infrastructure supplied by a cloud service provider such as Amazon Web Services (AWS), Google Cloud, or Microsoft Azure, among others. Migration to the Cloud The process of migrating workloads to the cloud is a time-consuming one for many organizations. Therefore, plan your migration carefully since there may be some mistakes throughout the process.

What are the most significant advantages of moving to the cloud?

Following are just a few of the advantages that have compelled businesses to move resources to the public cloud:

In terms of scalability, cloud computing is
considerably more flexible than on-premises infrastructure, allowing it to
accommodate bigger workloads and more users with much less effort. When it
came to scaling out business services in conventional IT settings,
businesses had to buy and set up physical servers, software licenses,
storage, and network equipment from scratch.
Cost – Because cloud providers take care of
maintenance and updates, businesses who migrate to the cloud may expect to
spend considerably less money on information technology operations. They
will be able to dedicate more resources to innovation, whether it is the
development of new goods or the improvement of current ones.
Increased performance and a better end-user
experience are two benefits of moving to the cloud. In addition,
cloud-hosted applications and websites may quickly expand to serve more
users or to handle greater throughput, and they can be located in
geographical proximity to end-users, reducing network latency.
Users may access cloud services and data from
anywhere, regardless of whether they are workers or customers, thanks to
the digital experience. This helps to the digital transformation, allows
for a better customer experience, and gives workers contemporary, flexible
tools to do their jobs.

What is the benefit of moving to the cloud?

The cause for an industry's migration differs from one to the next. Some of them opt to transfer their data to the cloud only based on the actions of others.

Without addressing the why question, cloud migration may result in the loss of important data, increased vulnerability to cyber security risks, and increased expenses for no apparent reason. This relates mostly to small and occasionally medium-sized enterprises that are just getting started in their company growth.

Additionally, you have the option of using a SQL database that is cloud-ready for your apps. Consider your options carefully before agreeing to utilize them since the modification of a database without doing a thorough inspection may result in a slew of problems.

How to Overcome Your Cloud Migration Apprehension?

You must understand how a third-party provider will handle your data before entrusting them with your personal information.

The following are examples of questions you may ask yourself:

Will they be able to keep your information safe?
Will they be able to meet your uptime
requirements?
Will you still have access to all of your
information if something happens?

Given the hundreds of questions that are likely running around in your mind, one of the most difficult cloud migration difficulties is conquering your concerns regarding cloud migration. Companies must do certain preliminary preparation and efforts to achieve an effective cloud migration. Before you begin the cloud migration procedure, it is critical that you have a full awareness of what is required and that you make the necessary preparations.

Selecting the best Cloud Service Provider

Each company has its unique set of needs when it comes to cloud computing. A large amount of immediate flexibility may be required by certain companies while more customized information management choices may be advantageous to others.

2. Method to cloud migration

Begin by developing a thorough cloud migration strategy. In this instance, evaluating your current infrastructure as well as your company goals can assist you in determining the most appropriate approach and determining where you should incorporate.

3. Migrate Your Applications

When properly planned, the implementation of your app and the transfer of your data to the cloud should proceed easily. Transferring data from local data centers to the public cloud may be accomplished in a variety of ways.

4. Examine the infrastructure

Consider the existing infrastructure of your company carefully, and choose how and where to integrate without losing sight of any important aspects.

Realize a successful cloud migration with little effect on the company

When contemplating cloud migration, it is critical for businesses to decide whether a cloud service, private network or hybrid cloud strategy is the most appropriate ‘as well as the sequence in which applications and settings should be moved to the cloud ‘before proceeding.

Cloud Migration Services have extensive experience in providing better client experiences as well as data analytics solutions. Using its cloud migration services, Cloud Migration Services guarantees that applications run as quickly as possible, while also increasing availability via the use of third-party tools and pre-defined templates that are tailored to particular workloads. Cloud Migration Services‘ migration factory method streamlines and coordinates complicated migrations, making them less time-consuming and less costly. They enable a cloud application migration that is quicker, more cost-effective and has no negative influence on the company.

Bottom Line

Transferring current business operations to the cloud is not a problem, but rather a chance to make them more creative and flexible. As a first stage, you must examine your systems and operations as well as all of the network elements that are currently accessible to you. Then, as part of your cloud migration adventure, develop a plan that takes into account all of the needs of your company organization.

The post The Most Pervasive Mistakes to Dodge Throughout a Cloud Migration appeared first on Datafloq.

Webinar: Build a customer-centric business with external consumer data

Sam Makad — Sun, 26 Sep 2021 22:00:00 +0000

AWS Data Exchange Webinar: How to use external consumer insights and marketing data to build a customer-centric business

You are invited! Learn how external consumer and marketing insights can result in higher customer satisfaction, better retention, and a stronger overall bottom line.'

September 27th | 11:00 AM PT (2:00 PM ET)'

In this virtual session, AWS Data Exchange will host a discussion with thought leaders from companies such as Acxiom and BlastPoint. They will share how organizations from big box retailers to automotive brands are using consumer insights data to reach new customers, drive real business change, and increase longevity.'

Key takeaways include:

How marketers can leverage consumer level data to reap insights in a dynamically changing marketplace
Integrating external consumer insights data into personalized customer experiences with AI
Visualizing data workflows using AWS Analytics tools to drive business intelligence
How AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud

The post Webinar: Build a customer-centric business with external consumer data appeared first on Datafloq.

A Lesson In Effective Innovation From Amazon

Bill Franks — Thu, 19 Aug 2021 13:33:06 +0000

I recently hosted a private event put on by Amazon Web Services for CDOs and CAOs. I interviewed Jeff Carter, a former Teradata executive and colleague who has been at Amazon for several years now in various senior leadership roles, first at Amazon.com and now at Amazon Web Services. It was a lively discussion and one of the topics, which I'll discuss here, has stuck with me ever since. It is the 2 > 0 and 1 > 2 philosophy that Amazon follows when pursuing innovation. This concept has been mentioned publicly in various forums and I validated that it is safe for me to discuss. Let's dig into it!

Defining 2 > 0

Innovation is hard in any environment. In a large corporation, innovation can often get caught in a tug of war between various stakeholders who want to get credit for the work. As a result, many organizations will specifically force teams to pause work when it appears that fully or partially redundant efforts might be underway by different teams. The logic makes sense on the surface it seems to be a waste of money to do things twice, so let's get everyone together, form a committee, decide a single path forward, assign a single team to pursue that path, and then continue the effort.

The problem with that approach is that, at best, the innovation is greatly delayed. At worst, nothing happens and instead of having two competing solutions that work, the company will have none. At Amazon, they empower teams to solve their problems. When two teams are both pursuing a similar goal, they are each encouraged to continue rather than being blocked. The situation is turned into a competition of sorts to see which team can get there first and have a better solution.

It is that approach that Amazon refers to as 2 > 0 . It means that having two solutions to a problem is better than having none. They let some of their best minds battle it out to see who achieves the better innovation.

Defining 1 > 2

If Amazon took the 2 > 0 concept too far, however, then there could be mayhem and massive redundancy. So, they layer in the other side of the philosophy which goes by 1 > 2 . The concept is simple. Once the competing teams each have a solution, the solutions are evaluated to determine which is the overall best and most scalable. That solution is then targeted for deployment. However, there is an important twist! As part of the deployment, the winning team must incorporate any additional functionality that the other team required above its own. Thus, when it comes time to deploy, there will be a single solution that meets everyone's needs.

That is the core point behind 1 > 2 . It is better to get to two solutions instead of one to consider. At the same time, it is better to deploy and scale one solution than to deploy and scale multiple similar solutions. When crunch time arrives, Amazon forces the teams to reconcile their differences and move forward with a single, unified, agreed upon solution. They take the best of the innovative solutions while ensuring all needs are met.

The Ramifications

The brilliance of Amazon's approach to me is that it encourages innovation, and even makes a competition out of it, during the prototyping and development stages. However, when it comes time to deploy a real solution, the innovations are combined to ensure that there isn't a lot of redundancy present. By taking the best from each independent solution, Amazon ends up with a final product that is better than either of the individual solutions would have been. At the same time, they get that final solution faster than they would have if they made everyone stop and line up on a single path before starting any prototyping and development. Sure, it can cost more allowing some redundancy of effort, but the speed and quality impacts must be making the extra costs pay out or the philosophy would have been abandoned.

One of the challenges I've talked about with large organizations for many years is the balance between letting people freely experiment and explore in a development environment while still protecting the integrity and scalability of a production environment. Finding the right balance isn't easy, to say the least. However, Amazon's 2 > 0 and 1 > 2 philosophy is one of the best approaches I have come across. I would recommend that you consider incorporating it into your own organization‘s culture and processes.

Originally published by the International Institute for Analytics

The post A Lesson In Effective Innovation From Amazon appeared first on Datafloq.

How Big Data Helps Drive Amazon Sales

Datafloq Sponsored — Thu, 11 Jun 2020 19:18:05 +0000

The reliability of businesses on big data is becoming increasingly high every day. Businesses now fully rely on data, from the time it is generated, up to the moment it delivers valuable insight to online users. Hence, collecting, storing, processing, and analyzing data within a short period of time, has become necessary in order for a business to stay ahead of the competition.

Amazon, as the leading eCommerce platform, has achieved all its success by putting in hard work to remain at the top of the charts. It makes use of big data analysis to persuade customers to make more shopping choices that are pleasing. This stimulates more purchases from them, and thus more profits.

How Does Amazon Use Big Data?

The secret to Amazon's success lies within their adopted model, which has left nothing untouched. Its technologies enable them to provide customers with great options that make them feel informed. Customers become knowledgeable about a variety of choices, giving them the ability to make wise and better shopping decisions. The dependence of a product's sales rank on the number of its sales is also interesting. The smaller the rank, the more sales there are. Therefore, this tool will be useful to sellers because it uses data and the latest algorithms to convert the rank into sales.

Amazon analyzes big data collected from different customers who shop on their platform, to accurately build an efficient recommendation engine for its users. Amazon tries to learn as much as possible about you, in order to make accurate predictions of what you might want to buy. They recommend various items which you might be interested in buying thereby saving you the stress of searching through their website to find products on your own.

Amazon's recommendation system is built by picturing who you are and then offering you products that other customers with the same or similar profile as yours, have bought in the past.

Implementing Dynamic Pricing System to Stay Ahead of Competition

Amazon uses big data to assess the willingness of a customer to purchase a product on its platform. This is why the price of items on Amazon's platform changes frequently.

The proactive approach used by Amazon to implement big data algorithms has enabled them to meet the precise needs of customers. Once a customer shows a lot of interest in purchasing a product, the price of the product changes to a slightly higher price. Unlike platforms that make use of big data, other websites that don't, find it difficult to frequently change the price of items in their stores. Therefore, this results in them making less profit as compared to the Amazon platform.

Amazon uses big data to change product prices every 10 minutes, which means that product prices change about 2.5 million times a day. Things that are being analyzed include:

Competitor pricing
Inventory that is available

Analyzing this data enables Amazon to set unbeatable prices for products on its platform.

Amazon Encourage Users to Buy More Products with Every Order

The product recommendation system used by Amazon has been the most popular big data application on the site. It presents online shoppers with items that are similar to products they've purchased in the past, or products related to items in their shopping carts. This amazing recommendation system helps persuade customers to buy more items, even if they didn't originally plan on doing so.

Amazon is said to have generated about a 35% increase in annual sales using its recommendation system. Amazon designed its recommendation system in such a way that it is appealing to customers so that they will buy more and more products willingly.

SinceAmazon has generally become accepted by customers all over the world as a great place for purchasing any item, the company's profits increase day by day.

Amazon personalize, which was released by Amazon, has paved the way for developers to make use of its highly scalable and easy-to-use platform, to recommend products for users on any domain. Amazon's technology has enabled other companies to show merchandise options such as food, clothing, jewellery, etc. to their customers. This innovative idea from Amazon shows that they will continue to hold on to the use of big data in moving their business forward.

Without big data, Amazon wouldn't have come this far with its eCommerce business. Big data has propelled Amazon as an eCommerce platform, to the highest level within the eCommerce industry. Through big data, Amazon has been able to easily connect with manufacturers to track their inventory, so as to ensure quick delivery of orders. This has also reduced the cost of shipping items by 10 40%.

Amazon is one of the leading platforms that has used big data to move its business forward, and it has no intention of slowing down. Implementing technologies has proven to be the most reliable and promising way of staying ahead of the competition for the long-term future.

The post How Big Data Helps Drive Amazon Sales appeared first on Datafloq.

ScyllaDB Trends How Users Deploy The Real-Time Big Data Database

Dharshan Rangegowda — Wed, 27 Nov 2019 11:45:30 +0000

ScyllaDB is an open-source distributed NoSQL data store, reimplemented from the popular Apache Cassandra database. Released just four years ago in 2015, Scylla has averaged over 220% year-over-year growth in popularity according to DB-Engines. We've heard a lot about this rising database from the DBA community and our users, and decided to become a sponsor for this years Scylla Summit to learn more about the deployment trends from its users. In this ScyllaDB Trends Report, we break down ScyllaDB cloud vs. on-premise deployments, most popular cloud providers, SQL and NoSQL databases used with ScyllaDB, most time-consuming management tasks, and why you should use ScyllaDB vs. Cassandra.

ScyllaDB vs. Cassandra Which Is Better?

Wondering which wide-column store to use for your deployments? While Cassandra is still the most popular, ScyllaDB is gaining fast as the 7th most popular wide column store according to DB-Engines. So what are some of the reasons why users would pick ScyllaDB vs. Cassandra?

ScyllaDB offers significantly lower latency which allows you to process a high volume of data with minimal delay. In fact, according to ScyllaDB's performance benchmark report, their 99.9 percentile latency is up to 11X better than Cassandra on AWS EC2 bare metal. So this type of performance has to come at a cost, right? It does, but they claim in this report that it's a 2.5X cost reduction compared to running Cassandra, as they can achieve this performance with only 10% of the nodes.

There are dozens of quality articles on ScyllaDB vs. Cassandra, so we'll stop short here so we can get to the real purpose of this article, breaking down the ScyllaDB user data.

ScyllaDB Cloud vs. ScyllaDB On-Premises

ScyllaDB can be run in both in the public cloud and on-premises. In fact, ScyllaDB is most popularly deployed in both public cloud and on-premise environments within a single organization. The 44% of ScyllaDB deployments leveraging both cloud and on-premise computing could be through either a hybrid cloud environment leveraging both for a specific application, or using these environments separately to manage different applications.

ScyllaDB on-premise deployments and ScyllaDB cloud deployments were dead-even at 28% each. You can run both the free open source ScyllaDB and ScyllaDB Enterprise in the cloud or on-premise, and ScyllaDB Enterprise license starts at $28.8k/year for a total of 48 cores.

Most Popular Cloud Providers for ScyllaDB

With 28% of ScyllaDB cluster exclusively being deployed in the cloud, and 72% using the cloud in some capacity, we were interested to see which cloud providers are most popular for ScyllaDB workloads.

#1. AWS

We found that 39.1% of all ScyllaDB cloud deployments are running on AWS from our survey participants. While we expected AWS to be the #1 cloud provider for ScyllaDB, the percentage was considerably lower than the responses from all cloud database types in this survey that reported 55% were deploying on AWS. This number is more in line with our recent 2019 Open Source Database Trends Report where 56.9% of cloud deployments were reported running on AWS. This may be because AWS does not support ScyllaDB through their Relational Database Services (RDS), so we could hypothesize that as more organizations continue to migrate their data to ScyllaDB, AWS may experience a decline in their customer base.

#2. Google Cloud

Google Cloud Platform (GCP) was the second most popular cloud provider for ScyllaDB, coming in at 30.4% of all cloud deployments. Google Cloud does offer its own wide column store and big data database called Bigtable which is actually ranked #111, one under ScyllaDB at #110 on DB-Engines. ScyllaDB's low cost and high-performance capabilities make it an attractive option to GCP users, especially since it is open-source compared to Bigtable which is only commercially available on GCP.

#3. Azure

Azure followed in third place representing 17.4% of all ScyllaDB deployments in the cloud from our survey respondents. Azure is an attractive cloud provider for organizations leveraging the Microsoft suite of services.

The remaining 13.0% of ScyllaDB cloud deployments were found to be running on DigitalOcean, Alibaba, and Tencent cloud computing services.

Their managed service, Scylla Cloud, is currently only available on AWS, and you must use the ScyllaDB Enterprise version to leverage their DBaaS. Scylla Cloud plans to add support for GCP and Azure in the future, but with only 39% reporting on AWS, we can assume over 60% of ScyllaDB deployments are being self-managed in the cloud.

Databases Most Commonly Used with ScyllaDB

As we also found from the 2019 Open Source Database Report, organizations on average leverage 3.1 different database types. But, in this survey, organizations using ScyllaDB reported only using 2.3 different database types on average, a 26% reduction compared to our results from all open source database users. We also found that 39% of ScyllaDB deployments are only using ScyllaDB, and not leveraging any other database type in their applications.

So which databases are most commonly used in conjunction with ScyllaDB? We found that ScyllaDB users are also using SQL databases MySQL 20% of the time and PostgreSQL 20% of the time as well. The second most commonly used database with ScyllaDB was Cassandra represented in 16% of the deployments, and we could assume this is by organizations testing ScyllaDB as an alternative to Cassandra in their applications as both database types are wide column stores.

MongoDB was the fourth most popularly deployed database with ScyllaDB at 12%. Redis and Elasticsearch were tied in fifth place, both being leverage 8% of the time with ScyllaDB deployments.

We also found 20% of Scylla deployments are leveraging other database types, including Oracle, Aerospike, Kafka (which is now transforming into an event streaming database), DB2 and Tarantool.

Most Time-Consuming ScyllaDB Management Tasks

We know that ScyllaDB is widely powerful, but how easy it is to use? We asked ScyllaDB users what their most time-consuming management task was, and heard from 28% that Scylla Repair was the longest management task. Scylla Repair is a synchronization process that runs in the background to ensure all replicas eventually hold the same data. Users must run the nodetool repair command on a regular basis, as there is no way to automate repairs in the ScyllaDB open-source or ScyllaDB Enterprise versions, but you can setup a repair schedule through Scylla Manager.

ScyllaDB slow query analysis tied ScyllaDB backups and recoveries for second place at 14% each for the most time-consuming management task. It does not look like ScyllaDB currently has a query analyzer available to identify queries that need optimizing, but users can use their Slow Query Logging to see which queries have the longest response time. ScyllaDB backups are also unable to be automated through the open-source and enterprise versions, but they state that recurrent backups will be available in future editions of Scylla Manager. There is also no automated way to restore a ScyllaDB backup, as these must be performed manually in all versions.

10% of ScyllaDB users reported that adding, removing or replacing nodes was the most time-consuming task, coming in at a fourth place. These are manual processes that can take quite a bit of time, especially if you are dealing with large data size. Adding nodes is used to scale out a deployment while removing them scales your deployment down. Nodes must be replaced if they are down, or dead, though a cluster can still be available when more than one node is down.

Tied for fifth place at 7% was upgrades and troubleshooting. ScyllaDB Enterprise and open source both require extensive steps to upgrade a cluster. The recommended methods are through a rolling procedure so there is no downtime, but this is a manual process so the user must take one node down at a time, perform all of the upgrade steps, restart and validate the node before moving on to performing the same steps for the remaining nodes in the cluster. Time-consuming indeed, but fortunately not a daily task! Troubleshooting is, of course, a deep rabbit hole to dive into, but ScyllaDB Enterprise customers receive 24/7 mission-critical support, and open-source users have access to a plethora of resources, including documentation, mailing lists, Scylla University and a slack channel for user discussions.

The remaining 21% of time-consuming tasks reported by ScyllaDB users include monitoring, migrations, provisioning, balancing shards, compaction and patching.

So, how do these results compare to your ScyllaDB deployments? Are you looking for a way to automate these time-consuming management tasks? While we support MySQL, PostgreSQL, Redis and MongoDB' Database today, we're always looking for feedback on which database to add support for next through our DBaaS plans. Let us know in the comments or on Twitter at @scalegridio if you are looking for an easier way to manage your ScyllaDB clusters in the cloud or on-premises!

The post ScyllaDB Trends
How Users Deploy The Real-Time Big Data Database appeared first on Datafloq.

How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost

Dharshan Rangegowda — Mon, 28 Oct 2019 11:00:13 +0000

AWS is the #1 cloud provider for open-source database hosting, and the go-to cloud for MySQL deployments. As organizations continue to migrate to the cloud, it's important to get in front of performance issues, such as high latency, low throughput, and replication lag with higher distances between your users and cloud infrastructure. While many AWS users default to their managed database solution, Amazon RDS, there are alternatives available that can improve your MySQL performance on AWS through advanced customization options and unlimited EC2 instance type support. ScaleGrid offers a compelling alternative to hosting MySQL on AWS that offers better performance, more control, and no cloud vendor lock-in and the same price as Amazon RDS. In this post, we compare the performance of MySQL Amazon RDS vs. MySQL Hosting at ScaleGrid on AWS High-Performance instances.

TLDR

ScaleGrid's MySQL on AWS High-Performance deployment can provide 2x-3x the throughput at half the latency of Amazon RDS for MySQL with their added advantage of having 2 read replicas as compared to 1 in RDS.

MySQL on AWS Performance Test

	ScaleGrid	Amazon RDS
Instance Type	AWS High-Performance XLarge (see system details below)	DB Instance r4.xlarge (Multi-AZ)
Deployment Type	3 Node Master-Slave Set with Semisynchronous Replication	Multi-AZ Deployment with 1 Read Replica
SSD Disk	Local SSD & General Purpose – 2TB	General Purpose – 2TB
Monthly Cost (USD)	$1,798	$1,789

As you can see from the above table, MySQL RDS pricing is within $10 of ScaleGrid's fully managed MySQL hosting solution.

What are ScaleGrid's High-Performance Replica Sets?

The ScaleGrid MySQL on AWS High-Performance replica set uses a hybrid of local SSD and EBS disk to achieve both high performance and high reliability. A typical configuration is deployed using a 3-node replica set:

The Master and Slave-1 use local SSD disks.
Slave-2 uses an EBS disk (can be General Purpose or a Provisioned IOPS disk).

What does this mean? Since the Master and the Slave-1 are running on local SSD, you get the best possible disk performance from your AWS machines. No more network-based EBS, just blazing-fast local SSD. Reads and writes to your Primary, and even reads from Slave-1 will work at SSD speed. Slave-2 uses an EBS data disk, and you can configure the amount of IOPS required for your cluster. This configuration provides complete safety for your data, even in the event you lose the local SSD disks.

ScaleGrid's MySQL AWS High-Performance XLarge replica set uses i3.xlarge (30.5 GB RAM) instances with local SSD for the Master and Slave-1, and an i3.2xlarge (61 GB RAM) instance for Slave-2.

MySQL Configuration

A similar MySQL configuration is used on both ScaleGrid and RDS deployments:

Configuration	Value
version	5.7.25 community edition
innodb_buffer_pool_size	25G
innodb_log_file_size	1G
innodb_flush_log_at_trx_commit	1
sync_binlog	1
innodb_io_capacity	3000
innodb_io_capacity_max	6000
slave_parallel_workers	30
slave_parallel_type	LOGICAL_CLOCK

MySQL Performance Benchmark Configuration

Configuration	Details
Tool	Sysbench version 1.0.17
Host	1 r4.xlarge located in the same AWS datacenter as the Master MySQL
# Tables	100
# Rows per table	5,000,000
Workload generating script	oltp_read_write.lua

MySQL Performance Test Scenarios and Results

To ensure we provide informative results for all MySQL AWS workload types, we have broken down our tests into these three scenarios so you can evaluate based on your read/write workload intensity:

Read-Intensive Workload: 80% Reads and 20% Writes
Balanced Workload: 50% Reads and 50% Writes
Write-Intensive Workload: 20% Reads and 80% Writes

Each scenario is run with varying number of sysbench client threads ranging from 50 to 400, and each test is run for a duration of 10 minutes. We measure throughput in terms of Queries Per Second (QPS) and 95th Percentile latency, and ensure that the max replication lag on the slaves does not cross 30s. For some of the tests on the ScaleGrid deployment, MySQL configuration binlog_group_commit_sync_delay is tuned so that the slave replication lag does not go beyond 30s. This technique is referred to as slowing down the master to speed up the slaves' and is explained in J-F Gagne's blog.

Scenario-1: Read-Intensive Workload with 80% Reads and 20% Writes

As we can see from the read-intensive workload tests, ScaleGrid high performance MySQL instances on AWS are able to consistently handle around 27,800 QPS anywhere from 50 up to 400 threads. This is almost a 200% increase over MySQL RDS performance which averages only 9,411 QPS across the same range of threads.

ScaleGrid also maintains 53% lower latency on average throughout the entire MySQL AWS performance tests. Both Amazon RDS and ScaleGrid latency increase steadily as the number of threads grows, where ScaleGrid maxes out at 383ms for 400 threads while Amazon RDS is at 831ms at the same level.

Scenario-2: Balanced Workload with 50% Reads and 50% Writes

In our balanced workload performance tests, ScaleGrid's MySQL High-Performance deployment on AWS outperforms again with an average of 20,605 QPS on threads ranging from 50 to 400. Amazon RDS only averaged 8,296 for the same thread count, resulting in a 148% improvement with ScaleGrid.

Both ScaleGrid and Amazon RDS latency significantly decreased in the balanced workload tests compared to the read-intensive tests covered above. Amazon RDS averaged 258ms latency in the balanced workload tests, where ScaleGrid only averaged 125ms achieving over a 52% reduction in latency over MySQL on Amazon RDS.

Scenario-3: Write-Intensive Workload with 20% Reads and 80% Writes

In our final write-intensive MySQL AWS workload scenario, ScaleGrid achieved significantly higher throughput performance with an average of 17,007 QPS over the range of 50 to 400 threads. This is a 123% improvement over Amazon RDS who only achieved 7,638 QPS over the same number of threads.

The 95th percentile latency tests also produced significantly lower latency for ScaleGrid at an average of 114ms over 50 to 400 threads. Amazon RDS achieved an average of 247ms in their latency tests, resulting in a 54% average reduction in latency when deploying ScaleGrid's High-Performance MySQL on AWS services over Amazon RDS.

Analysis

As we observed from the test results, read-intensive workloads resulted in both higher throughput and latency over balanced workloads and write-intensive workloads, regardless of how MySQL was deployed on AWS:

MySQL on AWS Throughput Performance Test Averages	ScaleGrid	Amazon RDS	ScaleGrid Improvement
Read-Intensive Throughput	27,795	9,411	195.4%
Balance Workload Throughput	20,605	8,296	148.4%
Write-Intensive Throughput	17,007	7,638	122.7%

MySQL on AWS Latency Performance Test Averages	ScaleGrid	Amazon RDS	ScaleGrid Improvement
Read-Intensive Latency	206ms	439ms	-53.0%
Balanced Workload Latency	125ms	258ms	-51.6%
Write-Intensive Latency	114ms	247ms	-53.8%

Explanation of Results

We see that the ScaleGrid MySQL on AWS deployment provided close to 3x better throughput for the read-intensive workload compared to the RDS deployment.
As the write load increased, though the absolute throughput decreased, ScaleGrid still provided close to 2.5x better throughput performance.
For write-intensive workloads, we found that the replication lag started kicking in for the EBS slave on the ScaleGrid deployment. Since our objective was to keep the slave replication lag within 30s for our runs, we introduced binlog_group_commit_sync_delay to ensure that slave could achieve better parallel execution. This controlled the delay, and resulted in lesser absolute throughput on the ScaleGrid deployment, but we could still see a 2.2x better throughput compared to RDS deployment.
For all of the read-intensive, write-intensive, and balanced workload scenarios, ScaleGrid offered 0.5X lower latency characteristics compared to RDS.

ScaleGrid High Performance' deployment can provide 2x-3x the throughput at half the latency of RDS with an added advantage of having 2 read replicas as compared to 1 in RDS. To learn more about ScaleGrid's MySQL hosting advantages over Amazon RDS for MySQL, check out our Compare MySQL Providers page or start a free 30-day trial to explore the fully managed DBaaS platform.

The post How to Improve MySQL AWS Performance 2X Over Amazon RDS at The Same Cost appeared first on Datafloq.

PostgreSQL Trends: Most Popular Cloud Providers, Languages, VACUUM, Query Management Strategies & Deployment Types

Dharshan Rangegowda — Tue, 15 Oct 2019 08:48:12 +0000

PostgreSQL popularity is skyrocketing in the enterprise space. As this open-source database continues to pull new users from expensive commercial database management systems like Oracle, DB2 and SQL Server, organizations are adopting new approaches and evolving their own to maintain the exceptional performance of their SQL deployments. We recently attended the PostgresConf event in San Jose to hear from the most active PostgreSQL user base on their database management strategies. In this latest PostgreSQL trends report, we analyze the most popular cloud providers for PostgreSQL, VACUUM strategies, query management strategies, and on-premises vs. public cloud use being leveraged by enterprise organizations.

Most Popular Cloud Providers for PostgreSQL Hosting

Let's start with the most popular cloud providers for PostgreSQL hosting. It comes as no surprise that the top three cloud providers in the world made up 100% of the PostgreSQL deployments in the crowd across this enterprise report. AWS, however, has taken a significant leap from our last report, where they now average 77.4% of PostgreSQL cloud use compared to 55.0% in April. AWS does offer a managed hosting service for PostgreSQL called Amazon RDS, but there are many other DBaaS solutions that offer PostgreSQL hosting on AWS, such as ScaleGrid, that can provide multi-cloud support so you're not locked in with a single cloud provider.

AWS was not the only cloud provider to grow – we found that 19.4% of PostgreSQL cloud deployments were hosted through Google Cloud Platform (GCP), growing 11% from April where they only averaged 17.5% of PostgreSQL hosting. This leaves our last cloud provider – Microsoft Azure, who represented 3.2% of PostgreSQL cloud deployments in this survey. This is one of the most shocking discoveries, as Azure was tied for second with GCP back in April, and is commonly a popular choice for enterprise organizations leveraging the Microsoft suite of services.

Most Used Languages with PostgreSQL

This is a new analysis we surveyed to see which languages are most popularly used with PostgreSQL. The supported programming languages for PostgreSQL include .Net, C, C++, Delphi, Java, JavaScript (Node.js), Perl, PHP, Python and Tcl, but PostgreSQL can support many server-side procedural languages through its available extensions.

We found that Java is the most popular programming language for PostgreSQL, being leveraged by 31.1% of enterprise organizations on average. PostgreSQL can be easily connected with Java programs through the popular open-source PostgreSQL Java Database Connectivity (JBDC) Driver, also known as PgJDBC.

Python was the second most popular programming language used with PostgreSQL, coming in close at an average of 28.9% use with PostgreSQL. Back in 2013, PostgreSQL surveyed their users to see which external programming languages were most often used with PostgreSQL, and found that Python only represented 10.5% of the results, showing a massive increase in popularity over the past six years.

The programming language C came in third place, averaging 20.0% use with PostgreSQL, followed by Go in fourth at 13.3%, PL/pgSQL in fifth at 11.1%, Ruby in sixth at 8.9% and both PHP and Perl in seventh at 4.4%. PHP was actually the most popular language used with PostgreSQL in 2013, representing almost half of the responses from their survey at 47.1% use. The last column, Other, was represented by C++, Node.js, Javascript, Spark, Swift, Kotlin, Typescript, C#, Scala, R, .NET, Rust and Haskell.

Most Popular PostgreSQL VACUUM Strategies

PostgreSQL VACUUM is a technique to remove tuples that have been deleted or are now obsolete from their table to reclaim storage occupied by those dead tuples, also known as Bloat. VACUUM is an important process to maintain, especially for frequently-updated tables before it starts affecting your PostgreSQL performance. In our survey, we asked enterprise PostgreSQL users how they are handling VACUUM to see what the most popular approaches are.

The most popular process for PostgreSQL VACUUM is the built-in autovacuum, being leveraged by 37.5% of enterprise organizations on average. The autovacuum daemon is optional, but highly recommended in the PostgreSQL community, at it automates both VACUUM and ANALYZE commands, continuously checking tables for deal tuples. While highly recommended, 33.3% of PostgreSQL users prefer to manually perform VACUUM in the enterprise space. Fibrevillage has a great article that outlines these common problems with autovacuum which may cause an organization to adopt a manual strategy:

autovacuum may run even when turned off to deal with transaction ID wraparound.
autovacuum is constantly running, which makes it start over every time it runs out of space, and start a new worker for each database in your cluster.
autovacuum can cause out of memory errors.
autovacuum may have trouble keeping up on a busy server.
autovacuum can easily consume too much I/O capacity.

Another surprising discovery was that 18.8% of organizations do not use VACUUM, as it is not yet needed. This may be because they are leveraging PostgreSQL in small applications or applications that are not frequently updated. 6.6% of organizations have developed a custom solution for PostgreSQL VACUUM, and 4.2% are in the process of planning their VACUUM strategy.

Most Popular PostgreSQL Slow Query Management Strategies

If you're working with PostgreSQL, you likely know that managing queries is the #1 most time-consuming task. It's a critical process with many aspects to consider, starting at developing a query plan to match your query structure with your data properties, to then analyzing slow-running queries, finally to optimizing those queries through performance tuning.

We found that 54.3% of PostgreSQL users are manually managing slow queries in enterprise organizations. This can be accomplished through their modules auto_explain and pg_stat_statements, checking pg_stat_activity for table and index activity on your server, analyzing the slow query log, or reviewing in your code.

On average, 21.7% of enterprise organizations are leveraging a monitoring tool to analyze and manage their PostgreSQL slow queries. This helps them significantly reduce the time it takes to identify which queries are running the slowest, most frequently, causing the most read or write load on your system, or queries missing an index by examining the rows.

17.4% of users, however, are not actively managing slow queries in the PostgreSQL deployments. We highly recommend adopting a query management strategy to ensure slow queries are not affecting the performance of your PostgreSQL deployments. 4.3% of users are currently in the process of planning their query management strategy, and 2.2% have developed a custom solution for managing their slow queries.

PostgreSQL Cloud vs. On-Premises Deployments

Let's end with one of the hottest topics in the PostgreSQL enterprise space – whether to deploy PostgreSQL in the cloud or on-premises. We've been actively monitoring this trend all year, and found that 59.6% of PostgreSQL deployments were strictly on-premises back in April from our 2019 PostgreSQL Trends Report and 55.8% on-premises in our 2019 Open Source Database Report just a few months ago in June.

Now, in this most recent report, we found that PostgreSQL on-premises deployments have decreased by 40% since April of 2019. On average, only 35.6% of PostgreSQL enterprise organizations are deploying exclusively on-premise. But organizations are not migrating their on-premises deployments altogether – 24.4% of PostgreSQL deployments were found to be leveraging a hybrid cloud environment. Hybrid clouds are a mix of on-premises, private cloud, and/or public cloud computing to support their applications and data. This is a significant increase from what we saw in April, jumping from 5.6% of PostgreSQL deployments up to 24.4% in September.

Hybrid cloud deployments are becoming more popular across the board – this recent report found that 57% of businesses opt for a hybrid cloud environment using both private and public clouds as places to store their data. While we see a large jump to the cloud, enterprise organizations are still leveraging on-premises environments in some capacity 60% of the time, compared to 65.2% in April. Lastly, we found that public cloud PostgreSQL deployments have grown 15% since April, now averaging 34.8% use by enterprise organizations.

It's also important to note that this survey was conducted at the PostgresConf Silicon Valley event, while our April survey was conducted in New York City. The bay area is widely known for adopting new technologies, which allows us to hypothesize that this market has a higher cloud adoption rate than the east coast.

PostgreSQL Deployment Types	Apr	Jun	Sep	Apr-Sep Growth
On-Premises	59.6%	55.8%	35.6%	-40.0%
Hybrid Cloud	5.6%	16.3%	24.4%	336%
Public Cloud	34.8%	27.9%	40.0%	15.0%

So, how do these results stack up to your PostgreSQL deployments and strategies? We'd love to hear your thoughts, leave a comment here or send us a tweet at @scalegridio.

The post PostgreSQL Trends: Most Popular Cloud Providers, Languages, VACUUM, Query Management Strategies & Deployment Types appeared first on Datafloq.

What is the Best Architecture for your Application: Monolith, Microservices or Serverless?

Stephen Soldberg — Tue, 24 Sep 2019 11:59:13 +0000

Choosing the right architecture is critical for the overall success of your product. The three most popular architectures used in the IT world are Monolith, Microservices, and Serverless. Each one offers its own advantages to create exactly the right sort of solution for your users with the best possible experience. Let's take a look at each architecture separately to see how they work and uncover potential benefits.

Monolithic Architecture

Monolithic software is self-contained and all of the various components are interconnected with each other. This means that each component and all of the components associated with it need to be present for the code to be executed and compiled. If you would like to update even one of the components, this means that you will need to rewrite the entire application. While this scares away a lot of developers from creating monolithic architectures, it offers substantial benefits as well such as:

Fewer issues affecting the entire app – These include error handling, logging and caching. All of these functionalities would only concern a single app thus making life simpler.
Simpler testing and debugging – the monolith is a big cohesive unit, thus making it possible to perform end-to-end testing faster.
Easy to deploy – Since there is only one file or directory only one deployment is necessary.

While the monolithic architecture may charm some developers with its simplicity, things could get very complicated if and when you decide to scale. As the monolith scales up, things get hairy very quickly since any complex system of code with only one app will be difficult to manage. Additionally, the following factors will need to be considered with a monolithic architecture:

Making changes – It will be difficult to make even small changes because everything is interconnected. In other words, even a small change can reverberate across the entire application.
Scalability – Independent components cannot be scaled
Obstacles for new technologies – If you would like to introduce any new technology into the monolith, you will have to rewrite the entire application.

Since developing in a monolithic architecture presents so many difficulties, companies have decided to use a microservices architecture instead. Let's take a look.

Microservices Infrastructure

The microservices architecture breaks down each individual component into separate units which carry out the application processes as a separate service. Each service has its own logic and database. Think of it as creating one application with a suite of smaller services. The functionality will be divided among these smaller services, which can be deployed independently, and communicate with each other via API. In addition to everything mentioned above, the following are additional benefits offered by the microservices infrastructure:

Flexibility – Enjoy added flexibility as far as technology is concerned and being able to add new features and services along the way
Enhanced scalability – Scale each component independently without worrying about reaching a ceiling. This means you will be able to accommodate more users quickly without making any drastic changes.
Increased agility – Any issues found in the microservices architecture will only affect one service instead of the entire app. This means that any new changes that you implement will involve less risk.

While it may seem that microservices is the ideal infrastructure, it does come with its drawbacks as well:

More complex – In a distributed system, you have to create the connection between all databases and modules,
Testing – Since the microservices architecture consists of several components each of which can be deployed separately, testing becomes much more difficult.
Cross-cutting concerns – These issues include externalized configuration, logging and other issues.

While monolithic and microservices architectures have been traditionally used for creating applications, recently a new one has emerged called serverless.

Serverless Architecture

The term Serverless was first used to describe an app that either mostly or fully used third-party, cloud-hosted apps and services to manage the back-end logic. However, this term can also refer to an instance where the logic on the server-side is still written by a human, but it is being run with stateless compute containers. A popular term that is being used to describe this architecture is Functions as a service (FaaS).

Businesses take advantage of cloud services from AWS and Microsoft Azure to replace physical servers. There is no need to provision or manage servers which saves you a lot of money on overhead costs associated with actually running the servers, storage space and salary for system administrators.

Additional benefits of serverless include

Less time to market – A new app can be created in a matter of hours. There are lots of apps on the market today that rely on vendor APIs such as OAuth, Twitter, and Mapbox.
Scalability – With a serverless architecture you do not need to provision infrastructure for emergency scenarios. Scaling is performed automatically and seamlessly.
It's on whenever you need it – If you are experiencing a sudden surge in users, you will be able to accommodate them with on-demand computing power. This means you do not have to provision any resources for unexpected situations.

The disadvantages of the serverless architecture are

Problems resulting from third-party APIs – these problems include vendor control and lock-in.
Less operational tools – You are totally dependant on the vendor for your debugging efforts. Also, to perform a thorough search for bugs and to uncover the underlying problem, you will need to have access to all kinds of metrics which might not be easy to get.
Implementation is a problem – Integration testing serverless apps is very difficult. Smaller integrational units mean that you will be relying on integration testing more so than with microservices or monolithic architectures. Now that we have an idea of each type of architecture, let's take a look at when it would be best to use each one.

Which One to Choose?

Microservices are very popular right now, but it would be a mistake to use it just because other people have adopted it. There have been many situations where companies tried to adopt microservices, but due to a lack of knowledge, ended up creating something that was neither a monolith nor microservices. They have services that are not properly encapsulated and one service was connected to another in a way that one agile team could make an independent decision on deploying a microservice.

Therefore, it is a good idea to start with the monolithic architecture, especially if you do not have any microservices experience. Also, if you are at the founding stage and have a team of five people or less you should also stick with the monolithic architecture because you will not be able to handle the high overhead of microservice. If you are developing a new product that is unproven on the market, chances are that it will evolve over time so a monolith would be well suited for you since it allows for fast product iterations.

The microservices architecture will be a good fit if you need fast and independent service delivery. However, you should know that it might not see gains in delivering your service right away. Also, if a certain part of your platform needs to be more efficient than others, microservices will be ideal because it allows for independent scalability and increased flexibility.

If you have a certain operation within your system that takes up a significant amount of computing power with unpredictable spikes in traffic, it is a good idea to use a serverless architecture. If you are already using microservices, then the system is likely prone to further decoupling, but it serverless could still make sense if you are using a monolith as well. However, if you are not currently on the cloud, you should not start with serverless. Services like AWS Lambda work best when combined with other services from AWS. If you are not ready to lock your service into a specific vendor, then serverless is not right for you.

Making the Decision

Software development itself is a process, therefore you need to clearly understand the problems that you are facing and how you plan to deal with them. This is necessary even before you start considering your choice of architecture. Think about the patterns that fit the problem and the scale you need the app to perform. Try not to get too caught up in the details such as the programming language you will use. Instead think about your estimated time to market, the size of your team and the deadlines.

It is important that you not rush the decision of your architecture and get inputs from lots of different sources. After all, transitioning to a new architecture will be time consuming and expensive.

The post What is the Best Architecture for your Application: Monolith, Microservices or Serverless? appeared first on Datafloq.

4 Best Practices for Data Security in AWS

Eddie Segal — Thu, 29 Aug 2019 12:53:50 +0000

Amazon Web Services (AWS) is currently the most widely adopted cloud service provider, with nearly a million companies using their services. With exabytes of data stored in their services, it should be no wonder that data security is a huge issue for AWS and the customers it serves.

If you are one of these customers, this article should help you understand how security is managed in AWS and teach you some best practices for ensuring that your data remains secure.

Security Responsibility in AWS

Before considering best practices for keeping your data secure in AWS, it helps to first know what you are responsible for. AWS services operate under a shared security responsibility model which states that Amazon is responsible for infrastructure and you are responsible for everything else, including access and authentication, data, operating systems, external networks, applications, and third-party integrations.

To help with this responsibility, however, Amazon does provide tools for your use, such as built-in encryption and Identity and Access Management (IAM). Some of these features are enabled by default, depending on the service you're using, but in the end, it's up to you to make sure that your configuration is appropriate and that you are making use of the resources AWS provides.

Best Practices

Effective data security requires understanding not only where your data is vulnerable but what can be done to identify security faults and how to eliminate them. Each service and data type is different and the methods that will work best for your system will depend on those variables, but the following best practices should apply to most configurations.

Duplicate Your Data

It may seem obvious but backups only work if you make them, consistently and frequently. If you are not backing up your data in a reliable way, it will be difficult if not impossible to recover regardless of whether your database gets corrupted, data gets mistakenly erased, an attacker holds your systems ransom, or a natural disaster occurs.

An additional point to remember is that if you keep your backups with the data they are duplicating, they probably won't be as useful. A better strategy is to keep copies isolated, either on different services, different networks, or different devices.

AWS Backup was recently released to help simplify and centralize this process for you. It's fully managed, allows automation through policies, and covers a range of services, including EFS, DynamoDB, RDS, EBS, and Storage Gateway.

If Backup doesn't cover the services you're using, or if you just want extra flexibility, you can still automate through Lambda or the CLI. You can see an example of this with EBS snapshots to get a better idea of how to set it up. The same process can be used with any service that can be reached through API.

Audit Your Risks

Knowing how to secure your data requires awareness of what you have and where it is stored. If you tagged your resources during configuration, this may be easy to figure out but if you didn't, now is the time to do so. Tagging will help you prioritize data security through access permissions, backup policies, monitoring, and more.

After you have an inventory of your data, you need to evaluate how your data can be accessed, what your current protections are, and how you are verifying that your data remains secure. This information will dictate how you should configure access rights and permissions, what authentication types you should be using, and how closely you need to monitor your systems.

Limit Data Access

If you focus on the principle of least privilege when configuring access rights and permissions you'll have a good start. With AWS, in general, you will be controlling access through a combination of IAM policies and Access Control Lists (ACLs).

With IAM, you can create and manage policies that separate management flow and database administration from application flow and assign them based on individual users, groups, or roles. IAM also allows resource-based policies but they only work for a limited number of services.

When creating policies, avoid the use of general permissions and root users to minimize the potential damage caused by compromised credentials and periodically audit your users and roles to eliminate ghost accounts or inactive users.

With ACLs, you can restrict network traffic and access rights by resource and by minimizing open ports by instance. If possible, you should extend these restrictions to isolate your services, known as micro-segmentation. By reducing entry points to your data and systems, you reduce your overall vulnerability.

Limiting access also involves systematically making sure that the data in your systems needs to be there and evaluating if it can't be stored more securely elsewhere. Infrequently accessed data such as compliance logs or legacy projects don't need to be stored with your production data and can likely be safely moved to cold storage. If you find that you have data that you can eliminate, make sure that it is cleanly erased to further reduce liability.

Encrypt Your Data

AWS offers tools for encrypting data both at-rest and in-transit as a built-in feature. Unless you have access to a better solution or have a very good reason not to use encryption, you should use it. The specific tools available to you depend on which services you're using and many services can be integrated with third-party security tools as well.

The primary tool used by most AWS services is the Key Management Service (KMS) which grants centralized control over your encryption keys. With KMS, you can use either an AWS defined customer master key or a key imported from your own encryption infrastructure. KMS can automatically rotate master keys once a year, without needing to re-encrypt, to further secure your data. It can be used to manage both server and client-side encryption, both of which you should use if possible.

Conclusion

Arguably, AWS cloud services provide more security than most organizations would be able to accomplish on their own, if only for the sheer amount of security expertise that the provider employs. Nevertheless, enabling the security features that AWS offers, verifying that your configuration is correct, and monitoring your system is all up to you.

To make sure that your data is kept safe and your liabilities are minimized, ensure that you are meeting these best practices and set aside some time to stay updated on the newest security tools, features, and vulnerabilities as they arise. The OWASP Cloud Security Project is a great resource to start with.

The post 4 Best Practices for Data Security in AWS appeared first on Datafloq.

Amazon Archives | Datafloq

The Most Pervasive Mistakes to Dodge Throughout a Cloud Migration

Cloud migration Overview

What are the most significant advantages of moving to the cloud?

What is the benefit of moving to the cloud?

How to Overcome Your Cloud Migration Apprehension?

Webinar: Build a customer-centric business with external consumer data

A Lesson In Effective Innovation From Amazon

Defining 2 > 0

Defining 1 > 2

The Ramifications

How Big Data Helps Drive Amazon Sales

How Does Amazon Use Big Data?

Implementing Dynamic Pricing System to Stay Ahead of Competition

Amazon Encourage Users to Buy More Products with Every Order

Top Azure Interview Questions of 2020

1. What is Azure Cloud Service?

2. What are the roles implemented in Windows Azure?

3. What are the three principal segments of the Windows Azure platform?

4. Define Windows Azure AppFabric.

5. What is the distinction between Windows Azure Queues and Windows Azure Service Bus Queues?

6. What is table storage in Windows Azure?

7. What is autoscaling in Azure?

8. What are the features of Windows Azure?

9. What are the differences between a public cloud and a private cloud?

10. What is table storage in Windows Azure?

11. What is Windows Azure Portal?

12. Explain Azure Fabric.

13. What do you comprehend about Hybrid Cloud?

14. What is the storage key?

15. What is Windows Azure Traffic Manager?

16. What is a federation in SQL Azure?

17. What is the SQL Azure database?

18. What are the different types of Storage areas in Windows Azure?

19. What is the concept of the table in Windows Azure?

20. What is TFS build system in Azure?

21. What is Azure App Service?

22. What is profiling in Azure?

23. What is cmdlet in Azure?

24. What is Windows Azure Scheduler?

25. How can you create an HDInsight Cluster in Azure?

26. What is Text Analytics API in Azure Machine?

27. What is the Migration Assistant tool in Azure Websites?

28. What is the distinction between Public Cloud and Private Cloud?

29. What is Azure Service Level Agreement (SLA)?