DEVOPS MEETUP #19

Accenture

21st January 2020

Community

Volunteers (usually on EngineersMY slack)

JOIN US

Monthly meetup announced on meetup.com
Ping #meetups to volunteer / speak

OTHER MEETUPS

DevKami curated meetups:
devkami.com/meetups

House Rules

  • minimal bikeshedding

  • respect opinions — agree to disagree

  • participate

  • thank the sponsors & volunteers

BUZZCORNER

Containers

Kuma.io

Kuma
Kuma: Build, Secure and Observe your modern Service Mesh

Security

267 million FB data leak

The Elasticsearch cluster contained user IDs, phone numbers, and names of mostly US-based users.
…​ the data likely came from “an illegal scraping operation or Facebook API abuse by criminals in Vietnam.”

WebAuthn

  • Pronounced "Web-Auth-En", is an API for managing public key credentials

  • 81% of hacking-related breaches from stolen or weak passwords

  • 28% of users using 2FA :(

  • Demo

SMS 2FA Insecure

Yes we know, but here is an actual study by folks from Princeton

Articles

DO’s tale of tech debt

From 15,000 database connections to under 100

DB polling
Figure 1. DB polling
While infinite loops and giving each server a direct connection to the database may have been rudimentary in terms of system design, it was simple and it worked – especially for a short-staffed technical team facing tight deadlines and a rapidly increasing user base.
It’s important to note that simply because something is “legacy” does not mean it is dysfunctional and should be replaced. Bloomberg and IBM have legacy services written in Fortran and COBOL that generate more revenue than entire companies.

On the other hand, every system has a scaling limit.
  • By the start of 2016, the db had > 15k direct connections, each querying for new events every 1-5s

  • SQL query that each hypervisor used also grown in complexity

  • It had become a colossus over 150 lines long and JOINed across 18 tables

Refactored: Event Router

Event Router
Figure 2. 15k → less than 100 direct connections

Refactored: Scheduler V2

(the thing that decides which hypervisor will host new droplets)

  • Single instance for the entire fleet → multiple

  • Single threaded → concurrent

Refactored: Message Queue

  • By early 2017, the centralized MySQL message queue was still in use

  • (that’s like ≈5 years!)

  • It was handling up to 400,000 new records/day, and 20 updates/s.

  • Harpoon was born to interface between services and DB via API

2019 DO Arch

DO Arch

Lessons: Gracefully manage internal friction

Getting buy-in from the other teams proved more difficult. ...teams would have to give up their database access, rewrite portions of their codebase, and ultimately change how they had always done things. That wasn’t an easy sell.

Team by team and service by service, the Harpoon engineers were able to migrate the entire codebase onto their new platform. ...by end of 2017, Harpoon became the sole publisher to the database message queue.

Lessons: Don’t overengineer

We looked at using a queue service initially, but decided against it. It would be one more service that we needed to maintain...

It was definitely less overhead for the initial build, and the fact that it scaled for so long was quite impressive given how much DO grew between when we first wrote the event queue in late 2012...

I think when you are building something, it's important to not over engineer it. ...you can guess where some of the bottlenecks maybe, but without actual usage often those guesses can be wrong.

Lessons: Don’t overengineer

…​We basically built things to accommodate how people wanted to work rather than future-scaling. In this case we decided to stick with a MySQL queue for events because it made Jeff’s life a lot easier, and he built the entire backend…​
DigitalOcean Co-Founder
— Moisey Uretsky

Misc

TLD Benchmarks by BunnyCDN

The biggest shockers were the .info and .org domains that showed really poor performance especially in the 85 percentile, despite being one of the oldest and well established TLD with millions of registered domains each. It appears 4 out of 6 of their nameservers are performing extremely poorly which is the reason for the poor results.

  • Another interesting thing to see was the performance for .co, .biz and .in domains that ended up way ahead of the rest.

  • So is your fancy new domain hurting your performance?

  • It actually might be, but probably not enough to worry about that too much.

Why Caddy is better than HAProxy

Matt Holt, creator of Caddy lists why he thinks Caddy is better

User-Agent getting phased out

Example: Device Pixel Ratio

Request
Accept-CH: DPR
Accept-CH-Lifetime: 86400
Response
DPR: 1.0

Google kills, again

Google plans to kill 3rd-party cookies within 2 years.