18/02/2021

3 main steps how to work with an app that requires the processing of a large number of files and geocoding

Earlier we mentioned this project, but in this case study, we will focus more on tips & tricks on how to optimize map services that display this data as well as the way companies use our platform to plan their projects. 

Storage

S3

▸Amazon Simple Storage Service (Amazon S3) is an object storage service

▸ Amazon S3 is designed for 99.999999999% (11 9’s) of durability

▸ S3 is the only object storage service that allows you to block public access to all of your objects at the bucket or the account level with S3 Block Public Access

▸ S3 gives you robust capabilities to manage access, cost, replication, and data protection

S3 Data Encryption

▸Amazon S3 default encryption provides a way to set the default encryption behavior for an S3 bucket.

▸ The objects are encrypted using server-side encryption with either Amazon S3-managed keys (SSE-S3) or customer master keys (CMKs) stored in AWS Key Management Service (AWS KMS).

▸ Client-side encryption – Encrypt data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools.

S3 data encryption

Restricting access to a specific HTTP referer

Example of a license that restricts access to the S3 service per domain. An example when someone wants to be sure that only users from a certain website have access to data (images, archives).

Enable password authentication for AWS transfer for SFTP using AWS Secrets manager

▸AWS SFTP supports password authentication when you plug in an identity provider to authenticate and authorize your users. This mode supports both forms of authentication – passwords and SSH keys.

▸ You can’t directly connect AWS SFTP to Secrets Manager today, so you will use a Lambda function that provides the logic to connect them. This Lambda function is responsible for validating the user credentials against the one stored, and return access information.

Project creation process

Example of communication of a Lambda service that has a VPC (private subnet) configuration with the S3 (Simple Storage Service) service. Communication takes place through a VPC endpoint (virtual device) that allows a private connection or uses a private IP address. The main advantage is that all communication takes place within Amazon’s internal networks.

Data processing – (Import/Export)

Import/Update

Example of reading and sending data to a database. The data is read from the .csv archive from the S3 (Simple Storage Service) bucket. The reading/parsing process itself is started via the Cloudwatch service, which is the Cloudwatch Rule (CRON). This service starts the Lambda function findNewFiles at a certain time, the task of which is to find new archives and to write information about these archives (.csv) on SQS (Amazon Simple Queue Service). SQS serves as an intermediary to store information about new archives and to slowly pass it to the next level. This information is used by the following Lambda function to find and read data from the new archive and then enter/update this data in the database itself. When the registration/update is completed, the archive itself is moved from the initial location to another. The new location of the archive depends on whether the registration/updating process itself was successful. If an error occurs, the archive is stored in a special place so that it can be easier to determine the problem due to which the registration/update was terminated over the given archive. In case of an error, a message is sent to the support team via SES (Simple Email Service).

Amazon Quicksight

▸Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence(BI) service built for the cloud.

▸ QuickSight lets you easily create and publish interactive BI dashboards that include Machine Learning-powered insights.

It is used to generate interactive reports that are sent to customers on a daily basis to make it easier to track the status of certain datasets stored in our system. People often ask what the difference is between this service and Kibana and the main difference is that QuickSight serves customers to better understand the data contained in the system itself while Kibana serves developers to more easily monitor the state of the system itself.

Mapbox geocoding

▸ APIs, SDKs, and live-updating map data give developers tools to build better mapping, navigation and search experiences across platforms.

▸ Mapbox Studio is like Photoshop for maps. We give designers control over everything from colors and fonts to #D features and camera angles, to the pitch of the map as the car enters a turn.

▸ Search and geocoding are tied to everything we build — maps, navigation, AR — and underly nearly every map app that helps humans explore their world.

▸ Our data is powered by hundreds of data sources and a distributed global network of more than half a billion monthly active users.

The reason we use Mapbox to display data on a map is that the library itself provides the display of multi million data at once without the need for clusters.

Mapbox studio

Data visualization

You can import your custom data to make choropleths, scaled point maps and data-driven lines with the data visualization component.

You have an option to add depth by converting building footprints to 3D building models.

www.mapbox.com

Geocoding API

The Mapbox Geocoding API does two things: forward geocoding and reverse geocoding

▸ Forward geocoding converts location text into geographic coordinates, turning 2 Lincoln Memorial Circle NW into -77.050, 38.889

▸ Reverse geocoding turns geographic coordinates into place names, turning -77.050, 38.889 into 2 Lincoln Memorial Circle NW

Mapbox Data file

GeoJson (.geojson) is an archive format in which we store all the data displayed on the map. The archive itself is stored on the S3 (Simple Storage Service) service and is loaded directly into the Mapbox via url.