Project
GBM Investments
Serverless pipeline that turns GBM PDF statements into portfolio analytics on AWS.
View on GitHub →Goal
A system to manage and analyze investment portfolios held in GBM (a Mexican brokerage) accounts. It automates statement ingestion, transaction parsing, and portfolio analytics through a web dashboard. The system turns PDF documents into structured, queryable data, tracking movements across equities, funds, debt, FX, dividends, and tax withholdings in both the domestic (BMV) and foreign (SIC) markets.
How it works
- Statement ingestionA script downloads the PDF statements from Google Drive using OAuth2 credentials with refresh tokens.
- Parsing the GBM formatA custom parser extracts movements, holdings, and tax data from GBM’s Mexican statement format.
- Persistence in DynamoDBMovements and holdings are stored in DynamoDB tables organized by account and date.
- Analytics export to S3Data is exported to S3 as Parquet files partitioned by account, year, and month, ready for SQL queries via Athena.
- Portfolio dashboardAn Express API queries DynamoDB and Athena and serves a React frontend that displays portfolio analytics and reconciliation.
Data model
The system tracks movements (buy, sell, deposit, dividend, tax), ticker holdings (periodic snapshots by account, year, and month), and monthly aggregates (cash flows and performance metrics). Movement types span equities, funds, debt, FX, dividends, and tax withholdings.
AWS services
- DynamoDBFive tables: movements, file metadata, monthly summaries, tax records, and holdings snapshots.
- S3Raw PDF storage and partitioned Parquet exports for analytics.
- Amazon AthenaSQL engine querying the Parquet files via the Glue Data Catalog.
- Secrets Manager / SSMOAuth2 credential management plus account configuration and ticker classification.
- CloudFormation / CDKInfrastructure as code in TypeScript with CDK v2 for stack deployment.
External services
- Google Drive APIPDF statement retrieval using OAuth2 refresh tokens.
- Banxico SIE APIInflation rate used to adjust and contextualize portfolio performance.
Stack