Lead Site Reliability Administrator Hit enter to view the job description

March 11 2025
Industries IT: Services
Categories Systems architect, Design
Richmond Hill, ON | Waterloo, ON • Full time

OPENTEXT
OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of digital transformation.


YOUR IMPACT
The role of the Site Reliability Administrator (1-year Contract) is to build solutions to enhance the availability, performance, and stability of OpenText services as well as automate away repetitive work as part of a cloud DevOps organization.
This role would be a great fit for someone with creative and innovative problem-solving skills. You will develop and implement solutions that operate at scale. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers.

WHAT THE ROLE OFFERS
• Uses technical knowledge, creativity, and company practices to drive down occurrences of incidents through the development of proactive monitoring and alerting.
• Provide attention to incidents according to Service Level Agreements.
• Provided continuous feedback to development teams on system stability, defect analysis, and system enhancements
• Develop runbooks and patterns to sustain applications in a production environment
• Participate in technical discussions and drive transition to sustain activities with the development teams
• Work with IT business and development partners to gather input to develop new capabilities in displaying/monitoring/alerting on key performance indicators (KPIs) by tracking business transactions (BT) in real-time
• Partner with application owners to develop creative and effective solutions to mitigate risk and successfully remediate any audit issues, providing quality and timely responses
• Take ownership and accountability for the incident resolution process, participating in RCA and SWAT investigations.
• Participate in day-to-day real-time advanced-level technical support and troubleshooting on issues reported from the user/customer base.
• Provides guidance in resolving performance-related issues and designing solutions for any technical issues faced by the application
• Requires rotating shift work as needed.
• On-call rotation is required, as 7x24x365 support is required.

WHAT YOU NEED TO SUCCEED
• The ability to understand and maintain Scripting software, expecting proficiency in Powershell.
• Strong experience with IIS web servers, WCF services, and Microsoft Windows server technologies (2016/2019/2025)
• Strong experience with AD, DNS, and F5 load-balancing technologies.
• Good understanding of OCR technology.
• Good understanding of performance and fine-tuning of Windows and web servers.
• Hands-on experience with cloud infrastructure (Google, AWS, or Azure) a plus
• Experience with PaaS technologies such as Cloud Foundry, Kubernetes, and Bosh.
• Good understanding and operational experience with container technologies like Docker, rkt, mesos.
• Good understanding and working experience with micro services and RESTful architecture.
• Experience with CI/CD and IaC tools like Ansible, Rundeck, Terraform, and GitOps to set up pipelines and provision infrastructure as needed.
• Strong working knowledge of a PaaS or Application operations best practices.
• Operational understanding or experience with message brokers such as Apache Kafka or RabittMQ.
• Operational understanding or experience with search technologies such as Solr search or Elasticsearch.
• Experience with at least one scripting language such shell, perl, python, javascripts, etc…
• Experience with installing and configuring Apache, Tomcat, and IIS.
• Experience in supporting Java and .net applications in large-scale infrastructure.
• Experience and knowledge in RDBMS and No-SQL databases such as Oracle, Postgres, MariaDB, and Cassandra.
• Experience with APM tools such as Newrelic, Dynatrace or AppDyanmics.
• Experience with monitoring tools such as Zabbix or check_mk.
• Knowledge and familiarity with centralized logging systems such as Graylog, Kibana, and cloud logging.
• Strong understanding of ITIL principles; certification is a plus.
• Is passionate about "getting under the hood" of systems and technologies to understand their inner workings and fix what needs fixing. This requires diagnosing & troubleshooting user-facing service incidents & outages
• Knowledge and familiarity with API gateways such as APIGEE and Oauth 2.0 standards.
• Diagnosing and resolving problems in high-throughput web applications & network services


OpenText's efforts to build an inclusive work environment go beyond simply complying with applicable laws. Our Employment Equity and Diversity Policy provides direction on maintaining a working environment that is inclusive of everyone, regardless of culture, national origin, race, color, gender, gender identification, sexual orientation, family status, age, veteran status, disability, religion, or other basis protected by applicable laws.

If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please contact us at hr@opentext.com. Our proactive approach fosters collaboration, innovation, and personal growth, enriching OpenText's vibrant workplace.

Apply now!

Similar offers

Searching...
No similar offer found.
An error has occured, try again later.

Jobs.ca network