CSEP Gitlab

From SCECpedia
Jump to navigationJump to search

SCEC's CSEP project is coordinating an earthquake and ground motion forecasting project with national and international partners. Project members want to use the Gitlab system to support the development of the CSEP open-source scientific software development community, the code development, distributed version control, and automated software testing environment. Specifications describing the expected system requirements and preliminary data creation and transfer estimates are below.

CSEP CI/CD System Overview Diagram

"Gitlab Server and Runner"

Data Usage Estimates

Estimates for data usages are based on combining the two use cases for the GitLab repository (1) Code storage and (2) Storage of Large data files and experiment results. These estimates are based on data from the current CSEP project, such as current project members, current storage in current and old repositories, and rough estimates of expected data usage for past and future experiments.

  1. amount generated internally by the system per year, most of which can be left on AWS storage: 5TB/year
  2. amount of data transferred into AWS: 1TB/year
  3. amount of data transferred out of AWS: 2.5TB/year

Data Estimates Breakdown

Project Info:
CSEP Project Members: 20
CSEP1 Repo Size (in Gb): 1.2
Storage per experiment (in Gb): 10

(1) Code Storage Total data stored: 25 Gb (High-end estimate including models, codes and benchmark data sets)
Data transferred: [50-500] Gb

(2) Data/Catalog Storage Total Storage: 250 Gb
Data transferred: [100 Gb - 1 Tb]

(3) Experiment Results Total Storage: 4 Tb
Data transferred: [100 Gb - 1 Tb]

Overall Estimates: Total stored: ~4.5 Tb
Transferred: [250 Gb - 2.5 Tb]

AWS Estimation Webpage

Gitlab Basic System Requirements

  • Recent Linux Distro (Ubuntu,Centos...)
  • Generate SSH keys
  • Configure SMTP Server
  • 8GB RAM is the recommended minimum memory size for all installations and supports up to 100 users. 16GB RAM supports up to 500 users
  • Databases: PostgreSQL
  • Redis/Sidekiq: stores all user sessions and background task queue processes the background jobs with a multithreaded process
  • Prometheus and it’s exporters
  • Avoid installing GitLab Runner on the same machine where Gitlab is installed.
  • GitLab needs JavaScript enabled in browsers to support features such as Issue Boards.

CSEP Computer and Storage Inventory

Overview and Gitlab Installation

AWS Estimate

Here is follow-up information for you to look over to get a better idea of the setup, general information, and pricing of the services.

  1. Running GitLab on AWS:
    • GitLab has fantastic documentation to get you setup and running. Using their Omnibus package or Marketplace listings are quick and easy ways to get started. GitLab has different Marketplace listings based on which license is used, the community version is linked.
    • It does require setup and maintenance on your end, it’s not a fully managed service. You would provision the GitLab instance and the Runner instance and complete their espective software install and setup.
  2. Pricing GitLab on AWS:
    Factors to consider:
    • Storage:
      • S3 (Simple Storage Service) is a great service for storing your data. It is highly scalable, durable, and available object storage where you don’t need to provision anything. It’s significantly cheaper than holding the data in volumes attached to your instances.
      • The EBS (Elastic Block Store) volumes are charged based on their provisioned size and are resizable, so you can start off smaller and resize if needed.
    • Compute:
      • EC2 instance savings plans can greatly reduce your compute cost, ranging from around 30% to 60% compute savings over the on-demand instances shown in the price estimates. It varies based on the commitment time (1 or 3 years) and payment options (no upfront to full upfront).
      • GitLab has a guide on autoscaling Runners on AWS: https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/
    Pricing estimates:
  3. Serverless Managed CI/CD Solution, CodePipeline:
    • Fully managed CI/CD pipeline solution where you would just manage the users.
      • Source Control : AWS CodeCommit is a private Git service that can securely store your source code, binaries, and application assets.
      • Build: AWS CodeBuild allows you to build and test your application in preconfigured or custom environments.
      • Deploy: AWS CodeDeploy automates software deployments to your AWS or on-premises servers.
    • Usage based pricing, there are no servers to provision or maintain.
    • Free tier eligible so you can test out the services for free.
    • Natively integrates with other AWS services and control plane.
    • More information about CodePipeline: https://aws.amazon.com/codepipeline/
    Pricing estimates:
    • $2500/year
    • I’ve attached a spreadsheet with modifiable red input numbers to get an idea of how much this solution will cost.
    • The 5TB of storage on S3 with 2.5TB of data transfer out per year ($1821/year) would be in addition to CodePipeline costs. For example, if the spreadsheet outputs $650/year, the total with S3 data storage, transfer, and CI/CD would be $2471/year.

Configurable Excel Spreadsheet

Related Entries