Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Site Reliabilty Engineer

2025-07-18 Float LLC all cities,AK

Description:

Who We Are

Float is the leading resource management software for professional services teams. Since 2012, we've grown every year—independently, self-funded, and profitably. We're rated #1 for resource management on G2 and trusted by 4,500+ customers worldwide.

As a certified B Corporation, we're committed to making a positive impact on our team, customers, the environment, and the remote community. Our 50+ person team works 100% remotely across the globe, with perks and benefits designed to support us in living our Best Work Life. You'll collaborate with teammates across Australia, Mexico, the UK, Nigeria, Canada, and the US. Learn more about our data security practices for employment or service contracts here. Browse our blog to get a glimpse of life at Float and check out our Glassdoor employer reviews. See why our customers love Float on G2 .

We're on a scale-up journey, and we're seeking people who thrive in this stage. We want Float to be the place where you have the autonomy and opportunity to do the best work of your career.

Why We're Hiring For This Role

Float's infrastructure has grown rapidly, meaning more customers, more complex systems, and more opportunities to build for scale. As the scale of our systems increases, we're growing our SRE team to match. You'll be the third site reliability engineer, and will be working alongside our QA team. This role is about stepping into a high-impact space: helping us automate smarter, improve visibility across engineering, and ensure reliability as we scale. You'll join a team that's laying the groundwork for stronger SLAs and an even better experience for our customers.

This role will report into Chris, our Team Lead for SRE & QA. Check out this video where he explains the important role you will play within our SRE team. Watch this video!

You'll be working asynchronously with a bright, dedicated team from across the globe, with a strong focus on taking complex problems and creating solutions that feel simple and intuitive for our customers.

What You'll Be Responsible For

Early on, you'll jump right into:

Upgrade paths: Maintain and validate the processes that keep our Kubernetes infrastructure up-to-date, ensuring upgrades happen smoothly, safely, and regularly.
Service hygiene: Remove noisy, unused, or misfiring boot alerts and improve the team's ability to trust alerts as meaningful signals.
Service integration: Partner with engineers to configure services within our clusters and support service migrations where possible.
Kubernetes optimisation: Review and optimise usage across Kubernetes services, including right-sizing scale node specifications.

Once you are a bit more settled, we expect that you will jump into the following projects:

Service mesh & ingress security: Lead our exploration and implementation of service mesh options and harden ingress layers to defend against spam and abuse.
Incident response playbooks: Define and roll out standardised playbooks to improve clarity and speed during production incidents.
CDC layer support: Build deep familiarity with our next-gen data layer (CDC) to support new teams building on top of it.
SLO coaching & support: Help teams define, measure, and meet reliability goals—enabling engineering to own quality into production and drive better outcomes for customers.

What You'll Need To Be Successful

We want you to love your work and believe that these skills will allow you to succeed in the role. Applying these skills requires:

Bash + programming language: Confident writing scripts in Bash and proficient in at least one go-to language (ideally PHP, NodeJS, or Python).
Kubernetes: Strong production experience managing and optimising Kubernetes clusters.
Terraform: Solid understanding of infrastructure as code using Terraform.
GCP: Familiarity with Google Cloud Platform, or eagerness to get up to speed quickly.
Iteration mindset: You believe in shipping value early and improving over time, not chasing one-shot perfection.
Written communication: You write clearly and concisely, whether it's documenting infrastructure, proposing changes, or sharing learnings across teams.

Our SRE growth framework details the key competencies and expectations needed for this role. Take a look at the Level 2 column to learn more about what you'll need to be successful in the role, in addition to the technical skills outlined above.

As a fully remote team, we're looking for someone comfortable with asynchronous communication as the default, which means you have previous remote experience and are comfortable using tools like Slack, Loom, and Linear to communicate as needed. Don't worry—you will have significant deep work time since we have very few meetings.

Why Join Us

Pay for this role is US $133,000 (Level 2). Here's a blog post with more information on how we determine our salaries.

We're a global async remote company with a diverse team of people from all over the world who share a common belief in living our best work life. We believe deeply in the idea of transparency and share our Float Handbook publicly so potential new team members can see first hand our perks & benefits as well as our ways of working. If you feel like you can thrive at Float to do your best work, we would love to hear from you.

Hiring Process For This Role

You'll find a lot of useful information about our interview process and what it's like to join our global team on the Float careers page. By the way, we made a blog post on 10 tips for applying to a role at Float - we highly recommend you check it out prior to applying!

The hiring process for this role looks like this:

Initial First Meet (20 min): You'll meet with Julia, our Talent Manager, to discuss your interest in the role and review your questions about working at Float.

Manager Interview (45 min): You'll meet with Chris, our SRE Team Lead , to discuss how your background and experience make you a great fit for this role.

Co-Worker Interview (30 min): You'll meet with Bogdan, our Site Reliability Engineer, to dive deeper into your goals and to learn more about your alignment with our values and ways of working.

Take-home assignment (2 hours, paid): You'll complete a take-home technical assignment that the hiring team will review. You will be paid an honorarium after completion of your take-home assignment, and will receive feedback on your assignment regardless of the outcome.

Founder Interview (30 min): You'll meet with Lars, our CTO and Co-Founder, to get to know you and see if you have potential to be a great addition to the team.

Note: Industry research shows that women and those in traditionally underrepresented groups generally don't apply to jobs unless they check all the boxes for the role. If you feel strongly that you have what it takes for this role but don't check 100% of the boxes—that's okay—we encourage you to apply anyway and highlight what you can bring to the table.

Job Details

View jobs in our app

Site Reliabilty Engineer

Who We Are

Why We're Hiring For This Role

What You'll Be Responsible For

What You'll Need To Be Successful

Why Join Us

Hiring Process For This Role

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Site Reliabilty Engineer

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care