DevOps Meetup #15

AWS

17th September 2019

Slides: devops-my.github.io/meetup

Who we are

Volunteers (hangs around EngineersMY slack) engineers.my

Join us!

Monthly meetup announced on meetup.com
Slack us in #meetups to volunteer / speak

Other meetups

DevKami curated meetups: devkami.com/meetups

House rules

  • Minimal bikeshedding
  • Participate!
  • Respect opinions — agree to disagree!
  • Thank the organizers & sponsors!

Buzz Corner

news

Teletext Holiday leak

200k call recordings stored in insecure S3 bucket

Article
news

AWS US-EAST-1 Power Outage

AWS celebrates Labor Day weekend by roasting customer data in US-East-1 BBQ

When the power went out, and backup generators subsequently failed, some virtual server instances evaporated – and some cloud-hosted volumes were destroyed and had to be restored from backups, where possible

Article
news @PragmaticAndy
AWS had a power failure, their backup generators failed, which killed their EBS servers, which took all of our data with it. Then it took them four days to figure this out and tell us about it.
Reminder: The cloud is just a computer in Reston with a bad power supply.
news

Sunsetting Mercurial support in Bitbucket

Article
news

KLIA systems disruption

20 flights delayed

TheStar
news

Ransomware disrupts 22 Texas Govt Deptartments

Article
news

CircleCI Security Incident

Article
news

FB user's phone numbers exposed

Article
news

State of DevOps report 2019

Article
article

6 Lessons we learned when debugging a scaling problem on GitLab.com

Intermittent ssh connection errors

at average 300 connections/sec

Article
article
  • Lesson #1: In Wireshark, the Statistics menu has a wealth of useful tools that I’d never really noticed until this endeavor.
  • Lesson #2: Apparently a lot of people have time synchronization (via NTP or otherwise) set up properly. (Clear from graph where conn errors tend to happen during first 10 seconds of the minute, suggesting cron/scheduled scripts)
article
  • Lesson #3: It is polite to log interesting information at default levels and deliberately dropping a connection for any reason is definitely interesting to system administrators. (Because they had to turn on Debug loglevel to know about MaxStartups logs being breached and connections being dropped due to that)
  • Lesson #4: When you choose specific non-default settings, leave a comment or link to documentation/issues as to why, future people will thank you. (Load balancer HAproxy became unbalanced because of balance source were used, with no comments to why. They didn’t need session-stickiness for SSH so there’s no reason to use that config. They changed it to balance leastconn)
article
  • Lesson #5: As scary as it looks, MaxStartups appears to have very little performance impact even if it’s raised much higher than the default. (bumping MaxStartups to 250 and increasing rate-limit, drove the connection error down to 0.001%. Deploying the balance leastconn change, all connection errors went away)
  • Lesson #6: Measure the actual rate of your errors as early as possible. (They think in hindsight they could’ve identify this specific failure with their initial log with termination state SD (S=aborted/refused by server, D=session in DATA phase) and bytes read 0)
article
But the biggest lesson is that when large numbers of people schedule jobs at round numbers on the clock, it leads to really interesting scaling problems for centralized service providers like GitLab.
If you’re one of them, you might like to consider putting in a random sleep of maybe 30 seconds at the start, or pick a random time during the hour*and*put in the random sleep, just to be polite and fight the tyranny of the clock.
article

What happens when you open a new install of the following browsers for the 1st time

Tweet
aws

R53 public zones get query volume stats

Site
aws

Interactive DC Map


Site
aws

Lambda 2019 vs 2018 cold starts

Site
aws
aws

Coldstarts

aws

Coldstarts Avg in ms

aws

Init in ms

aws

2018 vs 2019

aws

Improvement 2018 vs 2019

aws

Warm in ms

aws

$64,944 to support 25k customers in August

A full breakdown of ConvertKit’s AWS bill

Site
aws

QLDB now available

Ledger database.........

Site
aws

omerh/awsctl

Control AWS infrastructure easily from cli


              # Get all EC2 events from all regions
              awsctl get ec2 events -r all

              # Delete all unused EBS in all regions
              awsctl delete ebs --region all --yes

              # Set cloudwatch logs with no expirey to 14 days expiry
              awsctl set cloudwatch --region all --retention 14 --yes
            
Site
cloud

A Manager's Guide to Kubernetes Adoption

Site
cloud

Cloud Design Patterns

Azure Architecture Center

Site
cloud

DigitalOcean Adds Managed MySQL and Redis Services

Site
cloud

HashiCorp announces fully managed service mesh on Azure

Site
cloud

CloudFlare Learning Center

Site
tools

CUElang

is an open source language, with a rich set APIs and tooling, for defining, generating, and validating all kinds of data: configuration, APIs, database schemas, code, … you name it.

Site
tools

            ### Configuration
            Managing text-based files to define a desired state of a system.

            ### Data Validation
            Validate text-based or programmatic data.

            ### Schema Definition
            Defining schema to communicate an API or standard.

            ### Code Generation and Extraction
            Converting CUE constraints to and from definitions in other languages.

            ### Querying
            Find data matching certain criteria.

            ### Scripting
            Make static data come to life.
            
tools

goaccess

GoAccess - Visual Web Log Analyzer

Live Demo | Site
misc

TabDB

Site

/buzz