FarmVille扩展性上的经验教训

2010年02月10日 9:33 下午  |  分类:architecture

1. 交互式游戏写操作很重

典型的Web应用读的操作比写操作多很多,所以很多通用架构并不适用于交互式游戏。读操作很重可以通过添加缓存来解决。写操作很重则需要进行数据分区以及采用基于内存的架构。

2. 每个模块都设计成独立的服务

独立开每个模块可以降低一个模块拖垮其他模块的概率。必要时可以停止一些功能以缓解系统压力。

3. 缓存Facebook数据

当严重依赖于外部模块时可以考虑通过缓存外部模块的数据来降低延迟。

4. 提前考虑新版本相关的使用高峰

5. 数据抽样

当分析海量数据的时候,可以通过数据抽样找到问题,不需要处理所有的数据。抽样数据同样可以找到问题而且能减少很大的工作量。

I’m Going To Scale My Foot Up Your Ass

2009年08月27日 9:15 下午  |  分类:architecture

Milo的创始人之一Ted Dziuba最近对Scalabitily又愤青了一把,大骂创业者或是开发人员不要在总想着系统的扩展性。对此,Ted喷出了三个论点:

1. People Who Talk Big About Scalability Don’t Need To Worry About It

很多Asshole没事就在博客中讨论可扩展性架构,但是自己却写着很烂的代码。好的系统不是靠构架就可以有扩展性的,一个有着很多N平方次循环的code是没有扩展性可言的。所以在你们大谈Memcached之前,先了解下缓存过期策略吧!

2. If You Haven’t Discussed Capacity Planning, You Can’t Discuss Scalability

你不需要去担心怎么扩展基于MySQL的程序,因为根本没人会去用你的应用。就算你有1000个用户,可能只有1%的用户会天天来使用你的系统。扩展系统不重要,重要的是怎么把人吸引过来。当然支撑百万用户的系统和千万用户的系统架构上是不同的,如果你设计的系统不会有100万用户,就不要太考虑扩展性了。

此外,工程师都会幻想着“线性扩展”,觉得如果用户太多,就加机器好了。其实根本没有这回事。去试试Amazon的SimpleDB吧,等你有了足够的用户并且数据存储的延时成为瓶颈时,就尽情的去骂”those shitty Amazon datacenters”吧。

3. Choosing Technology Don’t Mean Shit If You Don’t Know How To Use It

选择正确的技术不会完全解决扩展性问题。如何正确的使用才是王道。

最后引用Ted的话收场:Shut up about scalability, no one is using your app anyway.

注:文章中所涉及的所有脏话及不和谐用语皆为原文直接引用……

Real World Web: Performance & Scalability

2009年08月24日 12:06 上午  |  分类:architecture

Bjorn Hansen的网站性能和可扩展性建议。基本上都是已知的东西。看来我们做的几个项目在这方面都是世界级的了。

Think Horizontally at every point in your architecture, not just at the web tier.

这句话很重要!

Benchmarking
找好基准点
Vertical scaling sucks.
垂直分割是很烂的
Horizontal scaling rocks.
水平分割才是王道
Run many application servers
运行多个应用服务器
Don’t keep state in the app server
不要做应用服务器上保存状态
Be stateless
系统要是无状态的
Optimization is necessary, but is different than scalability.
优化是必要的,但不等同于扩展
Cache things you hit all the time.
缓存所有命中的内容
Measure, don’t assume, check.
测量、检查,不要假设
Make pages static.
页面静态化
aching is a trade-off.
缓存是双刃剑
Cache full pages.
缓存整个页面
Cache partial pages.
缓存部分页面
Cache complex data.
缓存复杂数据
ySQL query cache is flushed on update.
ySQL查询缓存在update操作是会刷新
Cache invalidation is hard.
缓存过期很难做好
Replication scales reads, not writes.
读操作要用复制扩展
Partition to scale writes. 96% of applications can skip this step.
写操作用分区来扩展。96%的应用不需要这步
aster-master setup facilitates on-line schema changes.
???
Create summary tables and summary databases rather than do COUNT and GROUP-BY at runtime.
创建统计表和统计数据库,而不要每次使用COUNT和GROUP-BY
Make code idempotent. If it fails you should just be able to run it again.
代码要自省。错误了要能重新运行。
Load data asynchronously. Aggregate updates into batches.
异步加载数据。批量更新数据
Move processing to application and out of the database as much as possible.
数据处理尽量用App服务器做而不要让数据库去计算
Stored procedures are dangerous.
存储过程很危险
Add more memory.
多加内存
Enable query logging and take a look at what your app is doing.
打开查询日志,看看你的程序在干嘛
Run different MySQL instances for different work loads.
为不同的工作负载运行不同的MySQL实例
Config tuning helps, query tuning works.
数据库配置优化是有用的,查询语句优化很有用
Reconsider persistent DB connections.
重新考虑持久化数据库连接
Don’t overwork the database. It’s hard to scale.
不要滥用数据库,他很难扩展
Work in parallel.
并行计算
Use a job queuing system.
采用任务队列系统
Log http requests.
记录HTTP请求
Use light processes for light tasks.
轻量的工作用轻量的处理
Build on APIs internally. Clean loosely coupled APIs are easy to scale.
建立内部接口。松耦合的接口容易扩展。
Don’t incur technical debt.
技术上不要欠债
Automatically handle failures.
自动处理错误
Make services that always work.
服务要能一直运行
Load balancing is the key to horizontal scaling.
负载均衡是水平扩展的核心
Redundancy is not load-balancing. Always have n+1 capacity.
太多冗余不是复杂均衡。始终拥有N+1的容积
Plan for disasters.
考虑到灾难恢复
Make backups.
做好备份
Keep software deployments easy.
简化软件部署
Have everything scripted.
每个步骤都脚本话
Monitor everything. Graph everything.
监控所有的东西,图形化所有的东西
Run one service per server.
每个服务器只运行一个服务
Don’t ever swap memory for disk.
别用内存交换区
Run memcached if you have extra memory.
如果内存赋予就运行memcached吧
Use memory to save CPU or IO. Balance memory vs CPU vs IO.
用内存来节省CPU或IO。掌握好三者之间的平衡
Netboot your application servers.
从网络启动应用服务器
There’s lot of good slides on what to graph.
有很多好的PPT教你要图形化些什么内容
Use a CDN.
使用CDN
Use YSlow to find client side problems.
用YSlow来发现浏览器端问题
  • Benchmarking
  • 找好基准点
  • Vertical scaling sucks.
  • 垂直分割是很烂的
  • Horizontal scaling rocks.
  • 水平分割才是王道
  • Run many application servers
  • 运行多个应用服务器
  • Don’t keep state in the app server
  • 不要做应用服务器上保存状态
  • Be stateless
  • 系统要是无状态的
  • Optimization is necessary, but is different than scalability.
  • 优化是必要的,但不等同于扩展
  • Cache things you hit all the time.
  • 缓存所有命中的内容
  • Measure, don’t assume, check.
  • 测量、检查,不要假设
  • Make pages static.
  • 页面静态化
  • aching is a trade-off.
  • 缓存是双刃剑
  • Cache full pages.
  • 缓存整个页面
  • Cache partial pages.
  • 缓存部分页面
  • Cache complex data.
  • 缓存复杂数据 全文阅读 »

How Google Serves Data from Multiple Datacenters

2009年08月23日 11:02 下午  |  分类:architecture

How Google Serves Data from Multiple Datacenters

Ryan Barrett, Google App Engine datastore lead, gave this talk Transactions Across Datacenters (and Other Weekend Projects) at the Google I/O 2009 conference.

While the talk doesn’t necessarily break new technical ground, Ryan does an excellent job explaining and evaluating the different options you have when architecting a system to work across multiple datacenters. This is called multihoming, operating from multiple datacenters simultaneously.

As multihoming is one of the most challenging tasks in all computing, Ryan’s clear and thoughtful style comfortably leads you through the various options. On the trip you learn:

  • The different multi-homing options are: Backups, Master-Slave, Multi-Master, 2PC, and Paxos. You’ll also learn how they each fair on support for consistency, transactions, latency, throughput, data loss, and failover.
  • Google App Engine uses master/slave replication between datacenters. They chose this approach in order to provide:
    - lowish latency writes
    - datacenter failure survival
    - strong consistency guarantees.
  • No solution is all win, so a compromise must be made depending on what you think is important. A major Google App Engine goal was to provide a strong consistency model for programmers. They also wanted to be able to survive datacenter failures. And they wanted write performance that wasn’t too far behind a type relational database. These priorities guided their architectural choices.
  • In the future they hope to offer optional models so you can select Paxos, 2PC, etc for your particular problem requirements (Yahoo’s PNUTS does something like this).

    There’s still a lot more to learn. Here’s my gloss on the talk:

    全文阅读 »