Remote Operations and Cache Warming
Git-pandas provides safe and efficient methods for working with remote repositories and optimizing performance through cache warming. These features allow you to keep your repositories up to date and improve analysis performance through intelligent caching.
Safe Remote Fetch
The safe_fetch_remote
method allows you to safely fetch changes from remote repositories without modifying your working directory or current branch.
Repository.safe_fetch_remote()
Basic Usage
from gitpandas import Repository
from gitpandas.cache import EphemeralCache
# Create repository with caching
cache = EphemeralCache(max_keys=100)
repo = Repository('/path/to/repo', cache_backend=cache)
# Perform a dry run to see what would be fetched
dry_result = repo.safe_fetch_remote(dry_run=True)
print(f"Would fetch from: {dry_result['message']}")
# Safely fetch changes
if dry_result['remote_exists']:
result = repo.safe_fetch_remote()
if result['success']:
print(f"Fetch completed: {result['message']}")
if result['changes_available']:
print("New changes are available!")
else:
print(f"Fetch failed: {result['error']}")
Advanced Options
# Fetch from a specific remote
result = repo.safe_fetch_remote(remote_name='upstream')
# Fetch and prune deleted remote branches
result = repo.safe_fetch_remote(prune=True)
# Perform dry run to preview without fetching
result = repo.safe_fetch_remote(dry_run=True)
Safety Features
Read-only operation: Never modifies working directory or current branch
Error handling: Gracefully handles network errors and missing remotes
Validation: Checks for remote existence before attempting fetch
Dry run support: Preview operations without making changes
Cache Warming
Cache warming pre-populates the cache with commonly used data to improve performance of subsequent analysis operations.
Repository.warm_cache()
Basic Usage
from gitpandas import Repository
from gitpandas.cache import DiskCache
# Create repository with persistent cache
cache = DiskCache('/tmp/my_cache.gz', max_keys=200)
repo = Repository('/path/to/repo', cache_backend=cache)
# Warm cache with default methods
result = repo.warm_cache()
print(f"Cache warming completed in {result['execution_time']:.2f} seconds")
print(f"Created {result['cache_entries_created']} cache entries")
print(f"Methods executed: {result['methods_executed']}")
Custom Cache Warming
# Warm specific methods with custom parameters
result = repo.warm_cache(
methods=['commit_history', 'blame', 'file_detail'],
limit=100,
branch='main',
ignore_globs=['*.log', '*.tmp']
)
# Check results
if result['success']:
print(f"Successfully warmed {len(result['methods_executed'])} methods")
else:
print(f"Errors occurred: {result['errors']}")
Available Methods
The following methods can be warmed:
commit_history
: Load commit historybranches
: Load branch informationtags
: Load tag informationblame
: Load blame informationfile_detail
: Load file detailslist_files
: Load file listingfile_change_rates
: Load file change statistics
Performance Benefits
Cache warming can significantly improve performance:
import time
# Test cold performance
start = time.time()
history_cold = repo.commit_history(limit=100)
cold_time = time.time() - start
# Warm the cache
repo.warm_cache(methods=['commit_history'], limit=100)
# Test warm performance
start = time.time()
history_warm = repo.commit_history(limit=100)
warm_time = time.time() - start
speedup = cold_time / warm_time
print(f"Cache warming provided {speedup:.1f}x speedup!")
Bulk Operations
For projects with multiple repositories, bulk operations allow you to efficiently fetch and warm caches across all repositories.
ProjectDirectory.bulk_fetch_and_warm()
Basic Usage
from gitpandas import ProjectDirectory
from gitpandas.cache import DiskCache
# Create project directory with shared cache
cache = DiskCache('/tmp/project_cache.gz', max_keys=500)
project = ProjectDirectory('/path/to/repos', cache_backend=cache)
# Perform bulk operations
result = project.bulk_fetch_and_warm(
fetch_remote=True,
warm_cache=True,
parallel=True
)
print(f"Processed {result['repositories_processed']} repositories")
print(f"Fetch summary: {result['summary']['fetch_successful']} successful")
print(f"Cache summary: {result['summary']['cache_successful']} successful")
Advanced Bulk Operations
# Customize bulk operations
result = project.bulk_fetch_and_warm(
fetch_remote=True,
warm_cache=True,
parallel=True,
remote_name='upstream',
prune=True,
dry_run=False,
cache_methods=['commit_history', 'blame'],
limit=200,
ignore_globs=['*.log']
)
# Check individual repository results
for repo_name, fetch_result in result['fetch_results'].items():
if not fetch_result['success']:
print(f"Fetch failed for {repo_name}: {fetch_result['error']}")
for repo_name, cache_result in result['cache_results'].items():
print(f"{repo_name}: {cache_result['cache_entries_created']} cache entries")
Parallel Processing
Bulk operations support parallel processing when joblib
is available:
# Enable parallel processing (default when joblib available)
result = project.bulk_fetch_and_warm(
fetch_remote=True,
warm_cache=True,
parallel=True # Uses all available CPU cores
)
# Disable parallel processing for sequential execution
result = project.bulk_fetch_and_warm(
fetch_remote=True,
warm_cache=True,
parallel=False
)
Best Practices
Regular Fetching: Use
safe_fetch_remote
regularly to keep repositories currentDry Run First: Use dry runs to preview fetch operations
Error Handling: Always check return values for errors
Remote Validation: Verify remotes exist before fetching
Persistent Caching: Use
DiskCache
for long-term cache persistenceAppropriate Cache Size: Set reasonable
max_keys
based on your usageSelective Warming: Only warm methods you actually use
Regular Warming: Re-warm caches when data becomes stale
Shared Caches: Use shared cache backends across repositories
Parallel Processing: Enable parallel processing for multiple repositories
Custom Parameters: Tailor operations to your specific needs
Error Isolation: Handle errors at the repository level
Error Handling
All remote operations and cache warming methods provide comprehensive error information:
# Safe fetch error handling
result = repo.safe_fetch_remote()
if not result['success']:
if result['remote_exists']:
print(f"Fetch failed: {result['error']}")
else:
print(f"No remote configured: {result['message']}")
# Cache warming error handling
result = repo.warm_cache()
if not result['success']:
print(f"Failed methods: {result['methods_failed']}")
for error in result['errors']:
print(f"Error: {error}")
# Bulk operation error handling
result = project.bulk_fetch_and_warm(fetch_remote=True, warm_cache=True)
for repo_name, repo_result in result['fetch_results'].items():
if not repo_result['success']:
print(f"Repository {repo_name} failed: {repo_result.get('error', 'Unknown error')}")
Examples
Complete examples demonstrating these features can be found in the examples/
directory:
examples/remote_fetch_and_cache_warming.py
: Comprehensive demonstration of all featuresexamples/cache_timestamps.py
: Cache timestamp and metadata examples
Return Value Reference
The safe_fetch_remote
method returns a dictionary with these keys:
success
(bool): Whether the fetch was successfulmessage
(str): Status message or descriptionremote_exists
(bool): Whether the specified remote existschanges_available
(bool): Whether new changes were fetchederror
(str or None): Error message if fetch failed
The warm_cache
method returns a dictionary with these keys:
success
(bool): Whether cache warming was successfulmethods_executed
(list): List of methods that were executedmethods_failed
(list): List of methods that failedcache_entries_created
(int): Number of cache entries createdexecution_time
(float): Total execution time in secondserrors
(list): List of error messages for failed methods
The bulk_fetch_and_warm
method returns a dictionary with these keys:
success
(bool): Whether the overall operation was successfulrepositories_processed
(int): Number of repositories processedfetch_results
(dict): Per-repository fetch resultscache_results
(dict): Per-repository cache warming resultsexecution_time
(float): Total execution time in secondssummary
(dict): Summary statistics including:fetch_successful
(int): Number of successful fetchesfetch_failed
(int): Number of failed fetchescache_successful
(int): Number of successful cache warming operationscache_failed
(int): Number of failed cache warming operationsrepositories_with_remotes
(int): Number of repositories with remotestotal_cache_entries_created
(int): Total cache entries created across all repositories