Site Reliability Engineer - Remote

Paynearme - Santa Clara

new offer (28/06/2024)

job description

Job Description
What you'll be responsible for:
Infrastructure Management:
Design, implement, and maintain scalable and resilient infrastructure using Terraform for infrastructure as code, ensuring high availability and performance.
Kubernetes and Containers:
Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker. Implement best practices for container orchestration and management.
Systems and Application Monitoring/Observability:
Develop and maintain comprehensive monitoring and observability solutions using Datadog. Ensure detailed visibility into system performance and application health.
SLOs and SLA Management:
Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure reliable and consistent service delivery.
Incident Response and Troubleshooting:
Respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence. Participate in post-incident reviews and contribute to blameless postmortems.
Reliability and Production Environment Management:
Ensure the reliability and stability of our production environments. Continuously assess and improve system reliability, identifying and addressing potential points of failure.
Automation and Scripting:
Develop automation scripts and tools to reduce manual intervention and improve system reliability using Python, Bash, or Go. Implement and improve CI/CD pipelines.
CI/CD Pipeline Management:
Enhance and maintain continuous integration and continuous deployment pipelines using GitLab CI. Ensure seamless and reliable deployment processes.
Capacity Planning and Scaling:
Assist in capacity planning and ensure that systems are scalable to meet future demands. Implement auto-scaling strategies where applicable.
Security and Compliance:
Implement security best practices and ensure compliance with industry standards. Regularly review and update security policies and procedures.
Collaboration and Support:
Work closely with development teams to ensure reliability and scalability of new features and services. Provide technical support and guidance on infrastructure-related issues.
Software Engineering for Operations:
Develop and maintain internal tools and services that enhance the efficiency and reliability of our operations.
On-Call Rotation:
Participate in an on-call rotation to address production issues and collaborate in incident response efforts.

Apply now for
Site Reliability Engineer - Remote

Warning: you will leave the jobtome site.

These offers may interest you:

new offer Staff Systems Engineer - Developer Experience VisaHighlands Ranch - 80126 Job DescriptionThe Staff Systems Engineer in the Developer Experience group will be responsible for engineering and supporting various technologies and...

new offer Sr. Site Reliability Engineer - Containerization Technology VisaHighlands Ranch - 80126 Job DescriptionVisa’s Technology Organization is a community of problem solvers and innovators reshaping the future of commerce. We operate the world...

new offer It Help Desk Support Engineer The Garam GroupEast Syracuse - 13057 Helpdesk Support EngineerWe are looking to expand our team of superstars. At The Garam Group, we believe happy employees lead to happy clients...

new offer Platform Product Managers Jobs For HumanityGeorgia Acres - 30741 Job DescriptionA Little About Us Innovative, collaborative minds wanted. The world loves Postgres. We envision a world where organizations thrive by harnessing...

new offer Sr. Systems Engineer - Kubernetes Administration VisaAshburn - 20149 Job DescriptionAs a member of the Infrastructure Reliability Engineering team, you are responsible for applying skills and exercising knowledge to perform...

Go back

Site Reliability Engineer - Remote

job description

Apply now forSite Reliability Engineer - Remote

Warning: you will leave the jobtome site.

These offers may interest you:

Apply now for
Site Reliability Engineer - Remote