Join Percona for Google Summer of Code 2025 – Explore, Innovate, and Contribute!
Are you passionate about open-source databases, AI/ML, and security? Do you want to work on real-world projects that impact thousands of developers and enterprises worldwide? Percona is excited to invite students to participate in Google Summer of Code 2025 (GSoC) and help advance our cutting-edge open-source database solutions!
Why Contribute to Percona?
At Percona, we believe that open world is a better world! GSoC is an excellent opportunity to work with seasoned developers, gain hands-on experience, and contribute to powerful global database tools businesses use.
For 2025, we’re especially interested in projects that focus on AI/ML and security—two critical areas shaping the future of databases. Whether you’re passionate about automating database performance insights with AI or hardening security for mission-critical data, we have exciting challenges for you!
Project Ideas for GSoC 2025
Below are some suggested project ideas categorized by Percona software:
Percona Distribution for PostgreSQL (4)
Snapshot-based PostgreSQL backups
Database users are often very familiar with their storage provider's storage snapshot capabilities. These snapshots are very handy and performant to use, hence their popularity among users. Backups for other databases (e.g., MongoDB) are often configured via this capability as it provides many performance benefits for large-scale data, especially on Cloud deployments. Having such technology supported across the backup solutions for multiple databases makes it possible to leverage this effectively for Percona Everest via the Percona Operators.Comments from Crunchy on what needs to be glued together to get snapshots and pgBackRest working together better. Additionally, Timescale on how they use snapshots and pgBackRest together in their hosted managed service.
Deliverables:
Have an API available to Percona Distribution for PostgreSQL to effectively use storage snapshots to create backups and restore from storage snapshot-based backups. Preferably, have it added to/complementary to the currently recommended solution of pgBackRest.
Have Percona Operator for PostgreSQL expose the storage snapshot-based backup/restore so that Percona Everest can leverage it.
Required/preferred skills: C++, PostgreSQL, Kubernetes
Duration: 350 hours
Difficulty level: Hard
Mentors: @Andrew_Pogrebnoi, @Jan_Wieremjewicz
Relevant repository and resources:
pgBackRest to Barman close gap improvements
PostgreSQL has two main backup tools: Barman and pgBackRest. Both are powerful backup and restore tools, each with its own strengths. pgBackRest is generally considered more advanced in terms of parallelism, performance, and flexibility. Barman does offer some advantages in certain areas. While pgBackRest is maintained by Community, Barman is a tool mainly maintained by one company and is less popular. Barman does have UX improvements over pgBackRest, especially for non-expert users:
- direct WAL archiving with PostgreSQL’s built-in
archive_command
, - simpler backup and recovery process, especially for standby creation,
- clearer logging and monitoring for backup integrity,
- simpler configuration in small to medium deployments,
- better native tools for cloud backups
It would be beneficial for Percona, which uses pgBackRest in the Percona Distribution for PostgreSQL, to have the backup tool close any functionality gaps in Barman. Percona customers sometimes use Barman and expect Percona to support it. Having a way to migrate off Barman to pgBackRest, reducing any potential friction for the users, would be beneficial.
Deliverables:
Provide a close-gap set of improvements based on the list available in the description
Required/preferred skills: C++, PostgreSQL
Duration: 350 hours
Difficulty level: Hard
Mentors: @Andrew_Pogrebnoi, @Jan_Wieremjewicz
Relevant repository and resources:
Tool to investigate PostgreSQL locks for dummies
Currently, there is no tool that allows users with low experience to detect and understand all types of locks on their PG database, which may lead to many issues in deployments not managed by expert PostgreSQL users. As described in the blog posts below, understanding how locks work is difficult:
Deliverables:
- Detect ddl with mixed strong locks and others. i.e. allow to review locked PIDs as pg_locks will not work
- Present all locks in a GUI
- (Streched) have the GUI integrated in PMM
Required/preferred skills: C++, PostgreSQL
Duration: 350 hours
Difficulty level: Medium
Mentors: Kai Wagner, @Jan_Wieremjewicz
Relevant repository and resources:
Session continuity for PgBouncer for the zero downtime upgrades
Percona is looking to introduce zero downtime upgrades capability to the Percona Operator and later on to Percona Everest. The assumption is to base on pgBouncer and our HA solution utilizing the replica with the new database and a switch from the previous version to the new version.
Such a solution provides a zero downtime upgrade capability and a Rollback capability. To provide true zero downtime major upgrades for the current Percona Distribution for PostgreSQL, there needs to be an improvement that takes over the switching of sessions between the databases: the previous version and the new version
In the future, this tool should also make it possible to zero downtime and migrate to Everest.
Deliverables:
Extend pgBouncer to ensure that the sessions can be switched between the databases without downtime for the users but only a potential performance drop
Required/preferred skills: C++, Kubernetes
Duration: 175 hours
Difficulty level: Hard
Mentors: Kai Wagner, @Jan_Wieremjewicz
Relevant repository and resources:
Percona Software for MongoDB (6)
Interactive Shell Installer for Percona Software for MongoDB
This project aims to develop an interactive shell-based installer for Percona Server for MongoDB and Percona Backup for MongoDB. The installer will simplify the installation, configuration, and initial setup process, making it easy for users to deploy these open-source enterprise solutions efficiently. The primary goal is to enhance the user experience by reducing manual setup steps and ensuring proper configuration through guided prompts and automation.
Deliverables:
- A command-line-based interactive installer script.
- Automated dependency checks and installation.
- Interactive prompts for configuration choices (e.g., authentication, replication, sharding).
- Seamless installation of both Percona Server for MongoDB and Percona Backup for MongoDB.
- Integration with package managers for major Linux distributions (Debian, Ubuntu, RHEL, CentOS).
- Logging and validation mechanisms to ensure correct setup.
- Documentation and user guide for the installer.
Required/preferred skills: C++ or Go
Duration: 350 hours
Difficulty level: Medium
Mentors: @radoslaw.szulgo, Anastasia_Alexandrov
Relevant repository and resources:
Percona Backup for MongoDB backup speed throttling
On large scale deployments, backups may significantly impact network performance - speficially network bandwidth may be heavily utilized, if the backup storage is fast, causing performance degradation of the database itself. Database reliability engineers would like to reduce the network load by slowing down physical backups with Percona Backup for MongoDB (PBM) configuration. The scope of the project is to implement a network bandwidth rate limiter and perform load testing showing the impact or rate limiting on backup time.
Deliverables:
The expected outcome of this project is insurance that backup will not degrade network bandwidth impacting the database. As a result a participant needs to provide proposed code changes in a form of fork of PBM and a create a report with load test results.
Required/preferred skills: Go
Duration: 175 hours
Difficulty level: Easy
Mentors: @radoslaw.szulgo, @Boris_Ilijic
Relevant repository and resources:
Percona Backup for MongoDB Golang SDK
The project aims to extend capabilities and reduce maintenance in monitoring, managing, and automating MongoDB backups and restores from the Percona Monitoring and Management (PMM) tool. The project’s scope includes migrating from Percona Backup for MongoDB CLI to a dedicated PBM Golang client library in PMM. The client library (aka SDK) must be implemented to map all current CLI operations to Go API functions. The project can be extended as a stretch goal to implement backup progress reporting using the created SDK.
Deliverables:
The project’s expected outcome is the reduced maintenance of backup integration in PMM project and enabling extensibility of backup management. As a result of the project, a new open-source SDK should be created.
Required/preferred skills: Go
Duration: 350 hours
Difficulty level: Easy
Mentors: @radoslaw.szulgo, Jakub_Vecera
Relevant repository and resources:
CEPH Storage support in Percona Backup for MongoDB
Ceph is an open-source Software Defined Storage(SDS) software that is massively scalable and reliable. It’s one of the most popular storage technologies in Kubernetes and Openshift. The project aims to enable users to store their Percona Backups for MongoDB data on a Ceph storage which would be very convenient as they wouldn’t need to manage other additional storages. The scope of the project includes building a workspace setup on Kubernetes and Percona Operator for MongoDB, researching current challenges using Ceph storage, and implementing necessary changes to make it work in a performant way. One possible way is to use an existing Ceph S3 API (called RGW in Ceph) as remote storage for PBM. At the end, document the solution.
Deliverables:
The project’s deliverables are technical documentation, Percona Operator for MongoDB changes, and instructions on setting up an environment with Ceph storage.
Required/preferred skills: Go, Kubernetes
Duration: 175 hours
Difficulty level: Easy
Mentors: @radoslaw.szulgo, Ricardo Dias, Ege_Gunes
Relevant repository and resources:
BoostFS storage support in Percona Backup for MongoDB
Dell Data Domain Boost File System (BoostFS) allows third-party backup and restore applications to take advantage of client-side deduplication of backups and compression of network data during restore via a file-system wrapper over the PowerProtect DDBoost client. In this project we’d like to extend our open-source Percona Backup for MongoDB to leverage that storage technology to reduce backup and restore time - and the same time help users to reduce their Recovery Time Objective (RTO) and Recovery Point Objective (RPO). In the scope of the project there’s a preparation of the workspace setup with Percona Server for MongoDB, Percona Backup for MongoDB and mounted BoostFS disk volumes on Google Cloud Platform and documenting the architecture of the environment. Additionally, the project includes a research on how PBM works with that storage and implementing necessary changes to PBM to make it work. Finally, a simple benchmark should be performed that proves the performance boost.
Deliverables:
The project is expected to deliver an architecture diagram of the testing environment in Google Cloud Platform, implement the required changes to support BoostFS in PBM, and report, including performance benchmark results and comparison to other storage systems.
Required/preferred skills: Go, GCP
Duration: 350 hours
Difficulty level: Hard
Mentors: @radoslaw.szulgo, @Boris_Ilijic
Relevant repository and resources:
- GitHub - percona/percona-backup-mongodb: Percona Backup for MongoDB
- Backup procedure | Dell APEX Block Storage for AWS: Backup and Recovery using DDVE and DD Boost Oracle RMAN Agent | Dell Technologies Info Hub
- https://www.dell.com/support/manuals/pl-pl/dd-virtual-edition/dd_p_ddve-gcp_ig/purpose-of-this-guide?guid=guid-015a004c-0518-4a23-a043-39c97ed165f0&lang=en-us
OpenStack Swift storage support in Percona Backup for MongoDB
The OpenStack Object Store project, known as Swift, offers cloud storage software so that you can store and retrieve lots of data with a simple API. It’s built for scale and optimized for durability, availability, and concurrency across the entire data set. Swift is ideal for storing unstructured data that can grow without bound. Swift is very convenient to use as a backup storage for MongoDB workloads running on OpenStack platform. The scope of the project includes building a workspace environment on Google Cloud Platform with OpenStack clusters and running there Percona Server for MongoDB. Then implementing required changes in Percona Backup for MongoDB to support Swift storage. Finally, performing benchmark tests and comparison to GCP native storages.
Deliverables:
It is expected that project delivers an architecture diagram of the testing environment in Google Cloud Platform, implementation of required changes to support OpenStack Swift in PBM, and report incl. performance benchmark results and comparison to other storage systems.
Required/preferred skills: Go, GCP, OpenStack
Duration: 175 hours
Difficulty level: Medium
Mentors: @radoslaw.szulgo , @Boris_Ilijic
Relevant repositories and resources:
- GitHub - percona/percona-backup-mongodb: Percona Backup for MongoDB
- GitHub - openstack/swift: OpenStack Storage (Swift). Mirror of code maintained at opendev.org.
- GitHub - ncw/swift: Go language interface to Swift / Openstack Object Storage / Rackspace cloud files (golang)
- Configure your clusters to use OpenStack | Google Distributed Cloud (software only) for bare metal | Google Cloud
Percona Server for MySQL and Percona XtraDB Cluster (1)
Automating Code Merges with AI
The regular and manual merge from Oracle’s GitHub repository process is time-consuming, complex, and prone to errors, particularly due to merge conflicts. Careful attention is required to avoid introducing regressions into Percona’s open-source products. While this project is specific to Percona’s needs, it addresses a common challenge in open-source software development, as many projects rely on upstream repositories for their code. Therefore, the solution can be generalized and could benefit other open-source projects with similar code integration needs.
This GSoC project aims to develop an intelligent system using Artificial Intelligence to automate the MySQL fork merge process. Percona has been performing these merges for 18 years, accumulating a wealth of historical data (code changes, merge resolutions, conflict histories, test results) that can be leveraged to train an AI model.
The core objective is to create a tool that can:
- Analyze upstream changes: Process and understand the changes introduced by Oracle in their MySQL repository.
- Identify merge conflicts: Identify conflicts between upstream changes and Percona’s modifications.
- Suggest merge resolutions: Propose solutions for resolving identified conflicts, drawing on patterns from historical merge data.
- Automate merges: Automatically apply upstream changes with the suggested merge resolutions.
- Learn and adapt: Continuously improve its performance and accuracy by learning from new merge data and feedback.
Deliverables:
- Reduced merge time and effort: Automating the merge process will free up developer time for other critical tasks.
- Improved merge accuracy: AI can potentially identify subtle conflicts that might be missed by manual review.
- Faster release cycles: Streamlining the merge process will enable quicker releases of updated Percona products.
- Open-source contribution: The resulting tool will be open-sourced, benefiting other projects that maintain forks of MySQL or similar databases. This problem is not unique to Percona; other open-source projects facing similar merging challenges can utilize this solution.
As a result of this project, you’re expected to deliver:
- A working prototype of the AI-powered merge tool.
- Well-documented code and training data.
- Comprehensive test suite and evaluation results.
- A report detailing the project’s methodology, findings, and future directions.
Required/preferred skills: Python, Machine Learning libraries and frameworks (e.g., TensorFlow, PyTorch, scikit-learn), C++, Git, database systems
Duration: 350 hours
Difficulty level: Hard
Mentors: Julia Vural, Oleksiy Lukin
Relevant repositories and resources:
Percona Everest (5)
Easier Troubleshooting on database clusters in Percona Everest
The main goal is to provide tools to Percona Everest users to troubleshoot database clusters. This project will require the implementation of log collection, rotation, UI, and possibly an AI helper to analyze those logs. If users have a centralized log collection implemented, this tool needs to be able to integrate with it.
Deliverables:
Full user flow to support database cluster troubleshooting process (UI, backend, API, integrations).
- Log Collection & Rotation System:
- Implement a mechanism to collect logs from Percona Everest-managed database clusters.
- Ensure efficient log rotation to manage storage and performance impact.
- Enable compatibility with external log aggregation tools (e.g., Elasticsearch, Grafana Loki, or OpenTelemetry)
- User Interface for Log Access:
- Develop a UI within Percona Everest to allow users to view and analyze logs.
- Include search, filtering, and visualization options for better troubleshooting.
- AI-Powered Log Analysis (Stretched scope)
- Explore AI-driven log analysis to provide users with insights, anomaly detection, and recommendations.
- Implement basic AI-assisted troubleshooting if feasible within the project timeline.
- Documentation & Testing:
- Deliver user and developer documentation covering installation, usage, and troubleshooting.
- Include test cases and automation scripts to ensure system reliability.
Required/preferred skills: Kubernetes, Go, CI/CD
Duration: 350 hours
Difficulty level: Medium
Mentors: @Diogo_Recharte, @Mayank_Shah
Relevant repository and resources: https://github.com/percona/everest
Percona Everest RBAC policies management UI
Create a user interface to create and manage role-based access control policies
Deliverables:
- Role-Based Access Control (RBAC) UI:
- Develop a user-friendly interface in Percona Everest to create, update, and manage RBAC policies.
- Implement role assignment and permission configuration for database clusters.
- Documentation & Testing:
- Deliver comprehensive user and developer documentation.
- Include test cases and automation scripts to ensure reliability.
Required/preferred skills: Front-end, CI/CD tools
Duration: 90 hours
Difficulty level: Medium
Mentors: @Diogo_Recharte, Peter Szczepaniak
Relevant repository and resources: https://github.com/percona/everest
Context sensitive help
The Percona Everest documentation contains valuable information, hints, and tips, but we lack a way to present relevant information to our users. This project aims to work with the UX and Docs teams to solve this problem.
Deliverables:
- Implement a mechanism to display relevant documentation, hints, and tips based on the user’s current action or screen within Percona Everest.
- Ensure seamless integration with the existing UI for a non-intrusive experience.
- Enable contextual tooltips, pop-ups, or side panels that present relevant documentation without requiring users to leave the interface.
- Support links to full documentation pages when needed.
- Optionally, explore AI-driven suggestions based on user behavior and past queries.
- Allow users to control the level of help they receive (e.g., enable/disable tips, adjust verbosity).
- Provide user and developer documentation on how the system works and how to extend it.
- Ensure thorough testing to validate the accuracy and relevance of displayed help content.
Required/preferred skills: Front-end, CI/CD tools
Duration: 175 hours
Difficulty level: Medium
Mentors: @Diogo_Recharte, Peter Szczepaniak
Relevant repository and resources: https://github.com/percona/everest
Backups and restore timeline visualization
Databases are usually long-living services, and investigating issues with them is easier when you can see events like backups and restores of this service on a timeline.
Deliverables:
- Develop a visual timeline within Percona Everest to display backup and restore events for database clusters.
- Ensure the timeline is intuitive, zoomable, and supports different time ranges (e.g., last 24 hours, 7 days, custom range).
- Retrieve and display backup and restore events from Percona Everest’s database and logs.
- Include metadata such as timestamps, duration, status (success, failure), and associated users or processes.
- Allow users to filter events by type (full backup, incremental backup, restore, etc.).
- Enable color-coding or icons to differentiate event types at a glance.
- Deliver comprehensive user and developer documentation.
- Ensure automated tests for data accuracy, UI performance, and usability.
Required/preferred skills: Front-end, CI/CD tools
Duration: 175 hours
Difficulty level: Medium
Mentors: @Diogo_Recharte, Peter Szczepaniak
Relevant repository and resources: https://github.com/percona/everest
Refactor test automation using page object model
Our project currently has a functional end-to-end (E2E) UI test suite that ensures the stability and correctness of our application. However, the test suite does not follow the Page Object Model (POM) design pattern, making it harder to maintain, scale, and debug.
Deliverables:
- Restructure existing test automation to follow the Page Object Model (POM) design pattern.
- Ensure better separation of test logic and UI elements for improved maintainability.
- Implement modular and reusable page object classes for different UI components and workflows.
- Standardize naming conventions and best practices for test scripts.
- Improve error handling and logging to make test failures easier to diagnose.
- Ensure the refactored test suite runs efficiently in CI/CD pipelines.
- Validate test performance improvements and maintain test coverage.
Required/preferred skills: Playwright, Typescript, Kubernetes
Duration: 175 hours
Difficulty level: Medium
Mentors: @Diogo_Recharte, Tomislav_Plavcic
Relevant repository and resources: https://github.com/percona/everest
Percona Build Engineering (3)
Evolving CI/CD: Automating Build, Test, and Release for Robust Software Delivery
Continuous Integration and Continuous Deployment (CI/CD) pipelines are the backbone of modern software development, ensuring rapid, reliable, and repeatable delivery. However, many pipelines still operate in fragmented stages, where builds and tests are automated, but releases remain a manual or semi-automated process.This project aims to transform our CI/CD pipelines into a true end-to-end automated system, seamlessly integrating build, test, and release stages. By implementing best practices in CI/CD automation, we will ensure that only thoroughly tested software progresses to release, minimizing human intervention and reducing the risk of deployment failures.
Deliverables:
The successful completion of this project will result in a fully automated and robust CI/CD pipeline that seamlessly integrates build, test, and release processes. The key outcomes will include:
Fully Automated CI/CD Pipeline
A redesigned pipeline where builds, testing, and releases are interconnected and automated.
Code changes will automatically trigger builds, run tests, and, if successful, deploy releases without manual intervention.
Comprehensive Test Integration
The pipeline will incorporate unit tests, integration tests, security scans, and other quality assurance mechanisms.
Ensuring that faulty builds do not reach production by enforcing test-driven deployment.
Automated Release Process
A mechanism that automatically releases software only if all tests pass.
Versioning, tagging, and artifact management will be streamlined.
The release process will be documented and configurable for different environments (e.g., staging, production).
Infrastructure as Code (IaC) & Deployment Automation
Documentation & Guides
Clear technical documentation detailing the new pipeline’s workflow and configuration.
A step-by-step guide for developers and DevOps engineers on how to use and extend the pipeline.
Required/preferred skills: CI/CDl like Jenkins, GitHub Actions, GitLab CI, or similar; Docker; Testing frameworks and automated deployment strategies; infrastructure as code (IaC) and cloud environments is a plus.
Duration: 350 hours
Difficulty level: Medium
Mentors: @Evgeniy_Patlan , @Vadim_Yalovets
Relevant repository: GitHub - Percona-Lab/jenkins-pipelines
Build Automation for Open-Source Databases
Building and maintaining multiple database forks—such as MySQL, MongoDB, and PostgreSQL—often involves redundant build scripts, leading to inefficiencies, inconsistencies, and maintenance overhead. Currently, each database has its own set of build scripts despite sharing many common steps.
This project aims to develop a modular, extensible build system that allows for streamlined compilation and packaging of different database forks. The system will provide a flexible framework where users can select required modules, specify target OS distributions, and automate the build process with minimal configuration.
By implementing a plugin-based architecture, this modular builder will simplify cross-database maintenance, reduce duplication, and improve consistency across different builds.
Deliverables:
Modular Build Framework – A reusable, pluggable system that dynamically selects required modules for MySQL, MongoDB, and PostgreSQL builds.
Multi-OS Support – Automated builds for multiple Linux distributions (Debian, Ubuntu, CentOS, RHEL) with configurable OS selection.
Automated Package Creation – DEB and RPM package generation with standardized versioning and tagging.
Configurable & Scalable Builds – Easy customization of build parameters, allowing extension to new database forks or patches.
CI/CD Integration – Optional support for Jenkins, GitHub Actions, or GitLab CI to enable fully automated builds.
Comprehensive Documentation – User and developer guides with example configurations for quick adoption and extension.
Required/preferred skills: Bash/Python, CMake, Makefiles, Autotools, Linux and packaging (DEB/RPM), dependency management, CD/CD tools are a plus
Duration: 175 hours
Difficulty level: Medium
Mentors: @Evgeniy_Patlan , @Vadim_Yalovets
Relevant repositories:
SBOMs for Percona database software - MySQL, PostgreSQL, and MongoDB
A “software bill of materials” (SBOM) has emerged as a key building block in software security and software supply chain risk management. An SBOM is a nested inventory, a list of ingredients that comprise software components. The project aims to adapt Percona’s build pipelines to generate SBOMs for Percona Software for MySQL, PostgreSQL, and MongoDB. This will enable organizations using Percona software to be more secure and avoid software supply chain vulnerabilities that were very harmful in late 2020 with the discovery of the Solar Winds cyberattack or later with the Log4j security flaw.
Deliverables:
At the end of the project, a running staging pipeline in Jenkins and Trivy should produce complete SBOMs for Percona Server for MySQL, PostgreSQL, and MongoDB, Percona Backup for MongoDB, Percona Xtra Backup for MySQL. SBOMs are uploaded automatically to the Percona repository and are downloadable publicly. Additionally, technical documentation on how the process works is expected to be created.
Required/preferred skills: Jenkins, Trivy
Duration: 175 hours
Difficulty level: Easy
Mentors: @radoslaw.szulgo, @Evgeniy_Patlan , @Jan_Wieremjewicz
Relevant repository and resources:
See additional project ideas for Percona Monitoring and Management below in the “solution post”.
GSoC isn’t just about working on predefined ideas—it’s about innovation! If you have a project idea that aligns with Percona software, AI/ML, security, or database performance, submit your proposal, and our mentors will be happy to discuss it with you.
Do you have questions? Visit our Community Forum or join our chat channels to connect with potential mentors.
Ready to get started? See our Google Summer of Code 2025: Contribution guide.
See you in GSoC 2025!