AI/HPC Data cluster system Administrator
Join our state-of-the-art AI datacenter group to enable future HPC systems research.
What you will do
System Architecture innovations are key to position imec for success in fast evolving workloads of tomorrow and to differentiate imec’s process technology innovations with system level value propositions. The Compute System Architecture (CSA) group at imec leads research for futuristic compute hw-sw codesign to extend imec’s semiconductor research leadership deep into the next decade.
The CSA unit is building its own high-performance compute cluster for system research experimentation. We rely on this HPC/AI cluster to drive our research and succeed with consumers all over the world.
We are seeking an experienced Unix System Administrator who is eager to use and to grow his or her technological skills on a worldwide stage. In this role, the successful individual builds, maintains and manages the HPC/AI cluster, working out solutions to support the research, providing training, and assisting in the development of an overall research IT strategy. From early on, you help empower our breakthrough innovations. You can expect to be given challenging assignments, following-up and researching state-of-the-art cluster management techniques, and taking ownership and responsibility for the HPC/AI cluster.
Objectives of this Role
- Linux HPC sysadmin
- Hardware maintenance of HPC system
- Maintain complex setups involving Infiniband, Lustre, and various hardware accelerators in servers and workstations (GPU, FPGA, …).
- Monitor datacenter health using preexisting management tools and respond to hardware issues as they arise; help build, test, and maintain new systems as needed.
- Help install and maintain servers and components.
- Software maintenance of HPC system
- Installing and maintaining complex and optimized software stacks across a variety of machines.
- Installing and maintaining scientific software packages through configurable module systems (eg EasyBuild).
- Installing, maintaining and supporting the creation and running of containers and virtualization systems.
- Installing and maintaining job scheduling system(s) for varied workloads.
- Perform server administration tasks, including user/group administration, security permissions, group policies, print services, research event log warnings and errors, and resource monitoring, ensuring system architecture components work together seamlessly.
- Closely interact with the central ICT management.
- Perform routine/scheduled audits of the systems, including all backups.
- Proactively follow-up and experiment with the newest trends in high-performance computing systems (tools, hardware and software, …) to continuously support and improve the research.
What we do for you
We offer you the opportunity to join one of the world’s premier research centers in nanotechnology at its headquarters in Leuven, Belgium. With your talent, passion and expertise, you’ll become part of a team that makes the impossible possible. Together, we shape the technology that will determine the society of tomorrow.
We are committed to being an inclusive employer and proud of our open, multicultural, and informal working environment with ample possibilities to take initiative and show responsibility. We commit to supporting and guiding you in this process; not only with words but also with tangible actions. Through imec.academy, 'our corporate university', we actively invest in your development to further your technical and personal growth.
We are aware that your valuable contribution makes imec a top player in its field. Your energy and commitment are therefore appreciated by means of a market appropriate salary with many fringe benefits.
Who you are
- You preferably possess a Bachelor/Master/PhD degree in Computer Science, Physics or Engineering with 2-5 years of experience in related areas.
- Expert in Linux system administration.
- Proven work experience in IT.
- Experience with programming languages (Python) and operating systems (Linux Ubuntu, Red Hat); current equipment and technologies, system performance-monitoring tools, containers, virtualization.
- Expertise in creating, analyzing, and supporting large-scale distributed systems.
- Practical experience with Lustre, EasyBuild, Nvidia compute software stacks, or MPI are considered a plus.
- Passion for following up on the state-of-the-art in the field of HPC systems and experimenting with new and upcoming technologies to improve efficiency at all levels.
How can we help?
The Leuven MindGate team is at your disposal for any questions about the Leuven Innovation Region. Do you want to invest, work or study in the region? We can help you find your way.
We also facilitate collaboration and innovation between companies, knowledge institutes and government within the Leuven Innovation Region, and we are happy to guide any of these stakeholders towards innovation.