夏天吃西瓜有什么好处| 水中毒是什么症状| 香水前调中调后调是什么意思| 为什么会脾虚| 癸亥五行属什么| 葡萄不能和什么一起吃| 抑郁到什么程度要吃氟西汀| 约谈是什么意思| 甲状腺炎吃什么药| 办理慢性病需要什么手续| 插入是什么感觉| 八一建军节什么生肖| 核医学科主要治什么病| 手上有痣代表什么| 老虎头是什么牌子衣服| 双子是什么星座| 忽冷忽热是什么症状| 乳腺检查挂什么科| 阴道痒吃什么药| 为什么睡觉流口水| 吃激素有什么副作用| 犒赏是什么意思| 婴儿为什么喜欢趴着睡| 三班倒是什么意思| 1109是什么星座| 肺结节吃什么食物好| 什么是工作| 氟比洛芬是什么药| 忠心不二是什么生肖| 36是什么意思| 酸菜鱼用什么鱼| 夏祺是什么意思| 让您费心了是什么意思| 吃什么能提高血压| 辰砂和朱砂有什么区别| 皮蛋为什么含铅| hcg翻倍不好是什么原因造成的| 梦见打死黄鼠狼是什么意思| 中字五行属什么| kalenji是什么品牌| 晚上入睡困难是什么原因| 鱼喜欢什么颜色| 畅字五行属什么| 热量是什么| 爱马仕是什么| 胃胀气是什么原因引起的| 鸡的祖先是什么| 为什么感冒会全身酸痛| 唯我独尊指什么生肖| 出尔反尔是什么意思| 驳斥是什么意思| 车代表什么生肖| 凤是什么意思| 尿液茶色是什么原因| 分开后我会笑着说是什么歌| 小便疼吃什么药| led是什么意思| 吃什么肝脏排毒| 华丽转身什么意思| 亚子什么意思| 腿抽筋是什么原因| 蚂蚁喜欢吃什么食物| 头皮痒掉发严重是什么原因| hyq什么意思| 打白条是什么意思| 甲沟炎有什么药| 四川有什么好玩的| 右眼睛跳是什么预兆| 绿色食品是什么意思| 家里为什么有隐翅虫| 985学校是什么意思| 井是什么生肖| 间接是什么意思| 去医院查怀孕挂什么科| 脉冲什么意思| 免运费是什么意思| 辣的部首是什么| 全身皮肤瘙痒是什么原因| 情劫是什么| 珩字五行属什么| 为什么老是做梦| 郑和是什么族| 眼睛变红了是什么原因| 蝉什么时候出来| 香港商务签证需要什么条件| 腺样体肥大有什么症状| 为什么要睡觉| 3月30日是什么星座| 经常出鼻血是什么原因| 子宫息肉有什么危害| 为什么会得肾构错瘤| 唠嗑是什么意思| 黄瓜和什么一起炒好吃| 右耳痒是什么预兆| 春指什么生肖| 榨菜是什么菜做的| 女人吃什么补充胶原蛋白| 硫磺是什么东西| 初一的月亮是什么形状| 74年属虎是什么命| 小孩发烧挂什么科| 雅诗兰黛是什么牌子| 弱智的人有什么表现| 充气娃娃什么感觉| 蜜蜂吃什么| 大力丸是什么药| 银行降息意味着什么| 邪教是什么| 黍是什么意思| 长智齿说明了什么原因| 什么叫布施| 特应性皮炎用什么药膏| 玻璃瓶属于什么垃圾| 大便很细是什么原因| 夏天刮什么风| 88年属什么生肖| 阴道炎挂什么科| 菁字五行属什么| 腺样体肥大有什么症状| 漏斗胸为什么长不胖| 股癣用什么药膏好得快| 沙茶是什么| 墨迹什么意思| 8月25号是什么日子| 小样是什么意思| 代金券是什么意思| 高血压挂什么科室| 拉黄尿是什么原因| 月经一直不干净是什么原因引起的| 愣头青是什么意思| 宫颈纳囊是什么病| 春秋是什么时期| ta代表什么| 88岁属什么生肖| 怀孕的人梦见蛇是什么意思| 突然晕厥是什么原因| 满五唯一的房子要交什么税| 红加绿是什么颜色| 麻烦别人说什么礼貌用语| sayno是什么意思| 小孩拉肚子吃什么食物| 电气火灾用什么灭火器| 梦见别人拉屎是什么意思| 小孩黄疸是什么原因引起的| 经常流鼻血什么原因| 梅开二度是什么意思| 肾囊肿有什么症状表现| 随礼钱有什么讲究| 蚂蚁最怕什么东西| 吐了后吃点什么能舒服| 打乙肝疫苗需要注意什么| 程门立雪是什么意思| 胆固醇偏高吃什么好| 甲鱼吃什么的| 食铁兽是什么动物| 蔡英文是什么党| 肌酐清除率是什么意思| 不停的出汗是什么原因| doms是什么意思| 生吃西红柿有什么好处和坏处| 低钾血症吃什么食补| 卵巢囊肿是什么引起的| 什么原因造成糖尿病| 产后抑郁症有什么表现症状| 帝加口念什么| 反酸吃什么药| 一片哗然是什么意思| 血压低吃什么水果最好| 全价猫粮是什么意思| 幽闭恐惧症是什么症状| 木元念什么| 做完人流可以吃什么| 中国的国树是什么| 孔雀男是什么意思| 叉烧肉是什么肉| 血红蛋白偏低什么意思| g750和au750有什么区别| 无聊干什么| 新疆人为什么长得像外国人| 番茄是什么时候传入中国的| 什么时候普及高中| 什么星座最渣| 睡不着有什么好办法吗| 子不问卜自惹祸殃什么意思| 前列腺增生吃什么药效果最好| 酸奶坏了是什么味道| 小便带血是什么原因| 麸皮是什么东西| 7.2是什么星座| 喉结下面是什么部位| 锁骨窝疼可能是什么病| 吃什么能去黑眼圈| 外泌体是什么| lino是什么面料| 为什么长白头发| 炉火什么什么| 为什么会得淋巴肿瘤| lagogo是什么牌子| 大美女是什么意思| 辰龙是什么意思| 尿酸高不能吃什么食物| 笑点低是什么意思| 木耳菜不能和什么一起吃| 早晨起床手麻是什么原因| 什么水果不能上供| ky什么意思| 黎山老母什么级别神仙| 左胸隐痛什么原因| 屁股痒用什么药膏| 下雨天适合穿什么衣服| 什么是会车| 高密度脂蛋白胆固醇偏低是什么原因| 孙悟空最后成了什么佛| 女生什么时候绝经| 月经总推迟是什么原因| 练瑜伽有什么好处| 乞巧节是什么节| 阿西是什么意思| 学业有成是什么意思| 润喉咙什么东西最合适| 什么游戏最赚钱| 吃桃胶有什么作用| 什么是滑膜炎| hardy是什么意思| 区域经理的岗位职责是什么| anxiety什么意思| 牛肚是什么部位| 罕见是什么意思| 吃什么清肝火最快| 双侧筛窦粘膜增厚是什么意思| 鲍鱼是什么意思| 中国精神是指什么| 脚肿挂什么科室| 做梦捡到钱是什么意思| 女生爱出汗是什么原因| 人总放屁是什么原因| 失眠多梦用什么药| TPS什么意思| 饭后痰多是什么原因| 孩子总爱哭是什么原因| 乙型肝炎核心抗体阳性是什么意思| 广东有什么特产| 丁是什么意思| 挂匾是什么意思| 包皮过长会有什么影响| 卖萌什么意思| 梦见梨是什么意思| 肝早期硬化身体有什么症状| scr是什么| poppy是什么意思| 珠地棉是什么面料| 梦见西红柿是什么预兆| 月经期间能吃什么水果| 属猴和什么属相相克| 428是什么意思| 头疼呕吐吃什么药| 药流吃什么药| 德五行属什么| 膀胱切除后有什么影响| 什么叫自然拼读| 三九胃泰治什么胃病效果好| 孕妇不能吃什么东西| 敬请是什么意思| 什么原因导致有幽门杆菌| 百度
Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Papers, Prototypes, and Production – Developing...

Avatar for Bruce Spang Bruce Spang
November 18, 2014

太妃糖为什么叫太妃糖

百度 在九个月的时间里,王连友和同事以蚂蚁啃骨头的精神一点一点的突破一道又一道的技术难关,在克服了数不清的艰难险阻的情况后,最终抢在节点前出色完成了返回舱金属侧壁壳体的精密数控加工任务,为中国载人航天事业创造辉煌打下了基础。

Slides from Velocity Barcelona

Avatar for Bruce Spang

Bruce Spang

November 18, 2014
Tweet

Other Decks in Research

Transcript

  1. What is a CDN? You probably already know what a

    CDN is, but bear with me. A CDN is a “Content Delivery Network”. It’s a globally-distributed network of servers and at it’s core the point is to make the internet better for everyone who doesn’t live across the street from your datacenter. You might use it for images, APIs, …
  2. Or websites. For instance, this website about how much GitHub

    loves Fastly… (Don’t worry, this is the last slide that is anything at all resembling a sales pitch.)
  3. — well-known personality in community Or even this tweet of

    terrible advice. This tweet becomes more relevant as we go along…
  4. So, our goal is to deliver whatever your users are

    requesting as quickly as possible. To do this, we have a network of servers all over the world which cache content.
  5. normally, you would go directly to this site half way

    around the world, and it would take some time. Note that this is greatly simplified, as your request would likely bounce between 20 or 30 routers and intermediaries before getting to the actual server.
  6. with fastly, instead you would go to one of our

    servers in say, Sydney. normally, a copy of the website would be on that server, and it would be much faster.
  7. But ultimately, if it’s a new piece of content, you

    may still have to make a request to New York.
  8. However, next time you or someone else visits the site,

    it would be stored on the server in sydney, and would be much faster.
  9. Cache Invalidation however, once a site is stored on a

    server, you might want to remove it for some reason; we call this a purge. for example, you might get a DMCA notice and have to legally take it down. Or even as something as simple as your CSS or an image changing.
  10. New Customer Use One of the points of Fastly though,

    from the very beginning, was making it possible to purge content quickly. For instance, The Guardian is caching their entire homepage on Fastly. When a news story breaks, they post a new article, and need to update their homepage as quickly as possible. That purge needs to get around the world to all of our servers quickly and reliably.
  11. E D F C A B Z So, here’s how

    it works. We have a bunch of edge nodes spread around world. A might be in New Zealand. F could be in Paris.
  12. E D F C A B Z PURGE A purge

    request comes in to A. The purge could be for any individual piece of content.
  13. E D F C A B Z PURGE A forwards

    it back to our central rsyslog “broker” of sorts, Z. Which might in, say, Washington DC.
  14. E D F C A B Z PURGE And the

    broker sends it to each edge node. It also probably looks pretty familiar. It’s really the simplest possible way of solving this problem. And for a little while it worked for us.
  15. Easy to reason about The way Rsyslog works is trivial

    to reason about. That also means that it’s really easy to see why this system is ill-suited for the problem we’re trying to solve. At its core, it’s a way to send messages via TCP to another node in a relatively reliable fashion.
  16. High latency Two servers sitting right next to each other,

    would still need to bounce the message through a central node in order to communicate with each other.
  17. Wrong consistency model This system has stronger consistency guarantees than

    we actually need. For instance, this system uses TCP and thus guarantees us in-order delivery. How does that actually affect the behavior in production?
  18. A B 200ms Let’s say we’re sending 1000 messages per

    second. One message every millisecond. Let’s say the node we’re sending to is 200ms away
  19. A B 11 10 9 8 7 6 5 4

    3 2 1 That means that at any time there are ~200 messages on the wire.
  20. A B 11 10 9 8 7 6 5 4

    3 2 Let’s say a packet gets dropped at the last hop. Instead of having one message be delayed, what actually happens is the rest of the packets get through but are buffered in the kernel at the destination server and don’t actually make it to your application yet.
  21. A B 22 21 20 19 18 17 16 15

    14 SACK 13 12 The destination server then sends a SACK (which means “Selective Acknowledgement”) packet back to the the origin. Which effectively says, “Hey I got everything from packet #2 to packet #400, but I’m missing #1.”. While that is happening, the origin is still sending new packets which are still being buffered in the kernel.
  22. A B SACK 1 Then finally, the origin receives the

    SACK and realizes the packet was lost, and retransmits it. So, what we end up having is 400ms of latency added to 600 messages. - 240,000ms of unnecessary delay Each of those could have been delivered as they were received. We and our customers would have been just as happy with that. But instead they were delayed. Thus, this is the wrong consistency model.
  23. Atomic Broadcast read papers on Atomic Broadcast, because it seemed

    like the closest fit to what we’re trying to do
  24. Thought Real Hard “Distributed systems, don't read the literature. Most

    of it is outdated and unimaginative. Invent and reinvent. The field is fertile. Really.”
  25. E D F C A B Graph of Responsibility What

    we do is de?ne a “graph of responsibility”. This de?nes which nodes are responsible for making sure each other stay up to date. So in this case, A is responsible for both B and D.
  26. E D F C A B Graph of Responsibility B

    is responsible for D and E.
  27. E D F C A B PURGE So, let’s follow

    a purge through this system. A purge request comes in to A.
  28. E D F C A B PURGE A immediately forwards

    it via simple UDP messages to every other server.
  29. E D F C A B PURGE Each of the

    servers that receives a message then sends a “con?rmation” to the server that is responsible for it.
  30. E D F C A B PURGE What is more

    interesting is what happens when a message fails to reach a server. If a server receives a purge but does *not* get a con?rmation from one of it’s “children”. It will send “reminders” to it.
  31. E D F C A B PURGE So, in this

    case D and B will start sending reminders to E until it con?rms receipt. You can think of this as a primitive form of an “active anti-entropy”, which is a mechanism in which servers actively make sure that each other are up-to- date.
  32. This also worked. We ran a system designed this way

    for quite some time. And once again, it worked.
  33. Way faster!! This system is much faster. It gets us

    close to the theoretical minimal latency in the happy path. However, there are problems with it.
  34. Arbitrary Partitions The graph of responsibility must be designed very

    carefully to avoid having common network partitions cause the graph to become completely split. Additionally, even if it is carefully designed it can’t handle *arbitrary* partitions. The best way to get close to fixing them is by increasing the number of nodes that are responsible for each other. Which of course increases load on the system.
  35. Unbounded Queues Because every node is responsible for keeping other

    nodes up to date, it needs to know what each of its dependents have seen. Which means if a node is offline for a while, that queue grows arbitrarily large.
  36. Failure Dependence And the end result of that is Failure

    Dependence. One node failing means that multiple other nodes have to spend more time remembering messages and trying to send reminders to the failed node. So, under duress this system is prone to having a single node failure become a multi-node failure, and a multi-node failure become a whole-system failure.
  37. The problem with thinking real hard… So, I said that

    we designed this problem by thinking really hard. The problem with that is that we didn’t manage to find the existing research on this problem. It turns out that this type of system…
  38. … was actually described in papers in the 1980s, when

    Devo was popular. The problems that we found with it are thus well-known. Luckily around that time, the venerable Bruce Spang started working with us.
  39. Step Three Make it Scale This is where I came

    in, and started working on building a system that scaled better and solved some of the problems with the previous one.
  40. I am Lazy Inventing distributed algorithms is hard As Tyler

    showed just now, it turns out that inventing distributed algorithms is really hard. Even though Tyler came up with an awesome idea and implemented it well, it still had a bunch of problems that have been known since the eighties. I didn’t want to think equally as hard, just to come up with something from five years later.
  41. Read Papers Instead, I decided to read papers and see

    if I could find something that we could use. Because we had a system in production that was working well enough, I had enough time to dig into the problem. But why would you read papers?
  42. Impress your friends! Papers are super cool and if you

    read them, you will also be cool.
  43. Understand Problems Get a better sense of the problem you

    are trying to solve, and learn about other ways people have tried to solve the same problem.
  44. Learn what is impossible Lots of papers prove that something

    is impossible, or show a bunch of problems with a system. By reading these papers, you can avoid a bunch of time trying to build a system that does something impossible and debugging it in production.
  45. Find solutions to your problem Finally, some papers may describe

    solutions to your problem. Not only will you be able to re-use the result from the paper, but you will also have a better chance of predicting how the thing will work in the future (since papers have graphs and shit). You may even find solutions to future problems along the way.
  46. Read Papers So I started reading papers by searching for

    maybe relevant things on google scholar.
  47. Reliable Broadcast The first class of papers that I came

    across attempted to solve the problem of reliable message broadcast. This is the problem of sending a message to a bunch of servers, and guaranteeing its delivery, which is a lot like our purging problem.
  48. Reliable Broadcast As it turns out, these papers were a

    lot like the last version of the system. They tended to use retransmissions, with clever ways of building the retransmission graphs. This means that they had similar problems, so I kept looking for new papers by looking at other papers that cited these ones, and at other work by good authors.
  49. Gossip Protocols Eventually, I came across a class of protocols

    called gossip protocols that were written from the late 90s up until now
  50. “Designed for Scale” the main difference between these papers and

    reliable broadcast papers was that they were designed to be much more scalable - tens of thousands of servers - hundreds of thousands or millions of messages per second
  51. Probabilistic Guarantees to get this higher scale, usually these systems

    provide probabilistic guarantees about whether a message will be delivered, instead of guaranteeing that all messages will always be delivered.
  52. Bimodal Multicast ? Quickly broadcast message to all servers ?

    Gossip to recover lost messages two phases: broadcast and gossip
  53. send message to all other servers as quickly as possible

    it doesn’t matter if it’s actually delivered here you can use ip multicast if it’s available, udp in a for loop like us, a carrier pigeon, whatever…
  54. every server picks another server at random and sends a

    digest of all the messages they know about - a picks b, b picks c, … a server looks at the digest it received, and checks if it has any messages missing - b is missing 3, c is missing 2
  55. after reading the paper, we wanted more intuition about how

    this algorithm would actually work on many servers. we decided to implement a small simulation to figure it out.
  56. - we still wanted a better guarantee before deploying it

    into production. - the paper includes a bunch of math to predict the expected % of servers receiving a message after some number of round of gossip - describe graph - after 10 rounds, 97% of servers have message. - turns out to be independent of the number of servers - good enough for us
  57. Throw away messages it needs to keep enough messages to

    recover for another server throw away messages to bound resource usage
  58. - paper throws messages away after 10 rounds (97%) -

    this makes sense during normal operation where there is low packet loss - however, we often see more packet loss. we don’t deal with theory, we deal with real computers…
  59. - same graph as before, this time with 50% packet

    loss - 40% of servers isn’t good enough - we’ll probably lose purges during network outages, get calls from customers, etc…
  60. The Digest “I have 1, 2, 3, …” why would

    the paper throw away after 10 rounds? digest is a list, which is limited by bandwidth need to limit the size of the digest
  61. The Digest Doesn’t Have to be a List it can

    be any data structure we want, as long as another node can understand it.
  62. The Digest Send ranges of ids of known messages “messages

    1 to 3 and 5 to 1,000,000" - normally just a few integers to represent millions of messages - we keep messages around for a day, or about 80k rounds
  63. End-to-End Latency 74ms 83ms 133ms London San Jose Tokyo 0.00

    0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0 50 100 150 Latency (ms) Density - usually < 0.1% packet loss on a link - 95th percentile delivery latency is network latency
  64. End-to-End Latency 42ms 74ms 83ms 133ms New York London San

    Jose Tokyo 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0 50 100 150 Latency (ms) Density Density plot and 95th percentile of purge latency by server location Most purges are sent from the US
  65. Firewall Partition firewall misconfiguration prevented two servers (B and D)

    from communicating with servers outside the datacenter. A and C were unaffected.
  66. APAC Packet Loss extended packet loss in APAC region for

    multiple hours, up to 30% at some points no noticeable difference in throughput
  67. So what? CONCLUSION - this is the system we implemented

    - but why does it matter how well it works? why should you care?
  68. Good systems are boring BRUCE We can go home at

    night, and don’t need to worry about this thing failing due to network problems. We don’t have to debug distributed systems algorithms it at two in the morning. We’ve been able to grow the number of purges by an order of magnitude without having to rewrite parts of the system. etc...
  69. What did we learn? so this is great for us,

    but why do you care about the history of how we built our purging system? handoff to tyler
  70. — well-known personality in community So, this was supposed to

    be a sponsored talk, but instead of trying to sell you on Fastly, the reason we give this talk is actually as a sort of Public Service Announcement. Don’t heed advice like this. Certainly spend time inventing and thinking, but don’t ignore the research. It would have taken us quite a lot more trial and error to come to a system that we’re as happy with now and long-term if we hadn’t based it on solid research. And because we did, we now have a good foundation to invent new, and actually original, ideas on top of.
  71. One weird trick… So, essentially, if you take away one

    thing from this talk, remember this one weird trick to save yourself 20 or 30 years worth of research work…
杏仁有什么作用和功效 奥美拉唑是什么药 梦遗是啥意思是什么 为什么晚上血压比白天高 吃饭老是噎着是什么原因
番薯是什么时候传入中国的 什么的饰品 扁桃体肥大有什么影响 手脱皮是缺什么 吃山竹有什么好处和坏处
奇变偶不变是什么意思 蜜蜡和琥珀有什么区别 外围是什么 92年的猴是什么命 头发厚适合剪什么发型
长孙皇后叫什么名字 无什么无什么的成语 天天流鼻血是什么原因 灶性肠化是什么意思 抽完血吃什么
高就什么意思hcv8jop5ns0r.cn 流苏是什么东西hcv8jop2ns9r.cn 梦见僵尸是什么预兆hcv8jop9ns9r.cn 尊字五行属什么naasee.com 拔牙挂什么科室onlinewuye.com
为什么会得焦虑症hcv8jop0ns9r.cn 1985年牛五行属什么dajiketang.com 尿道炎是什么引起的hcv8jop2ns4r.cn 剖腹产第四天可以吃什么hcv9jop5ns0r.cn 做梦捡到钱是什么预兆hcv8jop5ns4r.cn
抗0是什么意思hcv9jop6ns9r.cn 排尿少是什么原因hcv9jop2ns8r.cn 早上8点到9点是什么时辰hcv8jop1ns1r.cn 现在开什么实体店赚钱hcv9jop2ns6r.cn 什么样的泥土hcv8jop5ns7r.cn
pku是什么意思hcv9jop2ns3r.cn 流氓兔什么意思wuhaiwuya.com 夸张是什么意思hcv9jop3ns8r.cn 下元节是什么节日gysmod.com 古字五行属什么hcv8jop5ns1r.cn
百度