Big Data Dummy

Posts

Showing posts from 2023

Using Spark to load DynamoDB tables with AWS Spark-Dynamodb connector.

- January 31, 2023

Monitoring DynamoDB Capacity Getting a true picture of DynamoDB WCU/RCU capacity is difficult because the default monitors automatically aggregate WCU/RCU metrics by minute. This hides spikes and abstracts away true metrics of WCU/RCU consumption (Cloudwatch also does the same aggregation by minute). In order to get a more accurate picture of these metrics, we decided to use Grafana/Influx stack described in my other post to capture second level metrics for WCU/RCU consumption. Our Use case 200 TB dataset stored in Parquet on s3 Ingest into 20 DynamoDB tables using Spark S3 -> Spark -> DynamoDB using AWS labs emr-dynamodb-hadoop connector To ingest a dataset this large in a reasonable amount of time we need to make sure DynamoDB is using all possible capacity across all tables, so good monitoring is critical. The AWS labs Spark connector emr-dynamodb-hadoop has params which let you configure what percentage of your Dynamodb provisioned capacity should be ...

Using Signoz and OpenTelemetry as an alternative to DataDog

- January 31, 2023

Datadog is an essential tool for monitoring large applications, but for hobby projects Sigmoz is a great open source alternative that provides similar functionality. It's also free and easy to setup using docker-compose. Start off by cloning the signoz github repo. git clone https://github.com/SigNoz/signoz cd signoz/deploy If you already have docker and docker-compose installed than you can skip this, but otherwise run the install script. ./install.sh Launch the Sigmoz service. The docker-compose setup includes clickhouse database, Zookeeper service, and a sample application called hotrod. This will bring up everything. docker-compose -f docker/clickhouse-setup/docker-compose.yaml up Then navigate to http://localhost:3301/ and you'll be prompted to setup an account and admin password. Then you'll be able to see the Sigmoz homepage with some example metrics. We can then start on integrating the opentelemetry metrics with our java app. Open telemetry automat...