Site Reliablility Engineer

  • HR
  • Friday, Feb 12, 2021

Our Site Reliability Engineers are a hybrid of software and systems engineers. Our current mission is to design SGD’s next version of the core infrastructure. We code our way out of operational problems. We are responsible for reliability, scalability, and automation while keeping an eye on latency, performance, and capacity.


  • Design, write, and maintain software to improve the availability, scalability, latency, and efficiency of SGD’s services, incorporating third-­party open-source tools when available.
  • Create new designs for a growing number of distributed systems.
  • Design and implement the tools and processes used for deployment and change management.
  • Plan and execute configuration management.
  • Own, maintain, and continuously improve all systems provided as a service, such as monitoring and datastores.
  • Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks.
  • Automate resource provisioning and allocation process
  • Run software performance analysis and system tuning
  • Plan and execute disaster recovery drills
  • Participate in rotating on-call duties

Must-Have Qualifications

If you don’t think you meet all of the criteria below but still are interested in the job, please apply. Nobody checks every box, and we’re looking for someone excited to join the team.

  • Fluent in one or more of: Bash, Python, Go, Ruby
  • Minimum of 4 years of industry experience in engineering
  • Familiarity with algorithms, data structures, and complexity analysis
  • In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.)
  • Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols
  • Experience with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and Load Balancing, etc.)
  • Experience with Puppet, Ansible or some other configuration management tool
  • Experience with monitoring systems using tools such as Prometheus or Zabbix, and writing health checks
  • Knowledge of continuous integration, testing methodologies, TDD and agile development methodologies
  • Systematic problem solving approach
  • Strong sense of ownership and drive


  • Technical BS/MS degree or equivalent work experience
  • Experience with virtualization using VMware ESX, Vsphere, vCloud
  • Experience with containerize technology using Docker, Kubernetes
  • Knowledge of server hardware

Nice-to-Have Qualifications

  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
  • Experience with Google Cloud Platform, Microsoft Azure, Amazon Web Services
  • Experience with SQL (MySQL or MSSQL) tuning and performance
  • Experience with No-SQL (MongoDB, ElasticSearch) tuning and performance
  • Experience with Microsoft Windows Server
  • Experience with Terraform or some other Infrastructure As A Code tools
  • Understanding of distributed system concepts including: the CAP Theorem, Micro-Services, and the Twelve Factor App.
  • Architect, author and deliver software to improve the availability, scalability and security of SGD’s internal cloud infrastructure.
  • Build and manage systems, infrastructure and applications through automation
  • Deploy, support and monitor new and existing services, platforms, and application stacks
  • Use scale testing to measure, tune and optimize system performance

Apply Now

Please send CV with your recent photo in word or PDF format (.doc, .pdf) to our email.

📧 Email:

☎️ Tel: 02-2880506