On this page
article
00 Intro
Remote APIs
Remote APIs are used to access services over a network. Over time, different types of remote APIs have been developed.
- RPC (Remote Procedure Call)
- Message-based APIs (synchronous and asynchronous)
- Shared repositories (e.g. tuple spaces)
Challenges
Remote APIs can be challenging to work with.
Heterogeneity
Different systems have different data representations, communication protocols, and programming languages.
- Byte order (big-endian, little-endian), line endings, character encodings, …
- Computer hardware
- Operating systems
- Programming languages
Solutions:
- Internet protocols mask differences in the underlying networks.
- Middleware provides a common interface to different systems (also provides language interoperability).
Latency
Networks have latency, which can be significant.
- Remote invocation can be orders of magnitude slower than local invocation.
Solutions:
- Chunking: Splitting large messages into smaller chunks.
- Caching: Store results of remote invocations locally.
- Asynchronous communication: Send a request and continue working, receive the result later.
Error Handling
Remote APIs can fail in different ways.
- Overloads, timeouts, network failures, …
- Graceful disconnections
Solutions:
- Corrupted messages can be detected using checksums.
- Sequence numbers can be used to detect lost messages.
- Idempotent operations can be retried and simplify error handling.
Security
Remote APIs are exposed to the network and can be attacked.
- Confidentiality: Data should not be accessible to unauthorized parties.
- Integrity: Data should not be tampered with.
- Authenticity: Data should be from a trusted source.
Solutions:
- Authentication: Prove the identity of the sender.
- Authorization: Determine what the sender is allowed to do.
- Encryption: Protect data from unauthorized access.
Scalability
Remote APIs should be able to handle a large number of clients.
- Stability: The system should remain stable under heavy load.
- Cost: The system should be cost-effective to scale (cost should scale at most linearly with users).
Solutions:
- Load balancing: Distribute requests across multiple servers.
- Clustering: Multiple servers work together to provide a single service.
- Efficient algorithms: Use efficient algorithms to handle large amounts of data.
Concurrency
Distributed systems are concurrent by nature.
- Multiple clients can access the same resource at the same time.
- Syncronizing access to resources.
Solutions:
- Locking: Prevent multiple clients from accessing a resource at the same time.
- Lock-free algorithms: Algorithms that do not require locks.
- Actor model: Model of concurrent computation that treats actors as the universal primitives of concurrent computation.
Consistency
Distributed systems can have inconsistent states.
- Update consistency: Several processes access and update the same data.
- Replication consistency: Data is replicated across multiple nodes.
- Cache consistency: Data is cached in multiple locations.
- Clock consistency: Different nodes have slightly different clocks/times.
Solutions:
- Transactions: A sequence of operations that are executed as a single unit.
- Consensus algorithms: Algorithms that allow a group of nodes to agree on a single value.
Distributed Algorithms
Example of distributed algorithm to calculate the GCD of numbers in a network.
Steps:
- Each node sends its number to its neighbors.
- When receiving a number, a node compares it to its own number. If the received number is smaller, the node calculates its own number to be n = (n-1)%x+1
- Repeat