Remote APIs

Remote APIs are used to access services over a network. Over time, different types of remote APIs have been developed.

  • RPC (Remote Procedure Call)
  • Message-based APIs (synchronous and asynchronous)
  • Shared repositories (e.g. tuple spaces)

Challenges

Remote APIs can be challenging to work with.

Heterogeneity

Different systems have different data representations, communication protocols, and programming languages.

  • Byte order (big-endian, little-endian), line endings, character encodings, …
  • Computer hardware
  • Operating systems
  • Programming languages

Solutions:

  • Internet protocols mask differences in the underlying networks.
  • Middleware provides a common interface to different systems (also provides language interoperability).

Latency

Networks have latency, which can be significant.

  • Remote invocation can be orders of magnitude slower than local invocation.

Solutions:

  • Chunking: Splitting large messages into smaller chunks.
  • Caching: Store results of remote invocations locally.
  • Asynchronous communication: Send a request and continue working, receive the result later.

Error Handling

Remote APIs can fail in different ways.

  • Overloads, timeouts, network failures, …
  • Graceful disconnections

Solutions:

  • Corrupted messages can be detected using checksums.
  • Sequence numbers can be used to detect lost messages.
  • Idempotent operations can be retried and simplify error handling.

Security

Remote APIs are exposed to the network and can be attacked.

  • Confidentiality: Data should not be accessible to unauthorized parties.
  • Integrity: Data should not be tampered with.
  • Authenticity: Data should be from a trusted source.

Solutions:

  • Authentication: Prove the identity of the sender.
  • Authorization: Determine what the sender is allowed to do.
  • Encryption: Protect data from unauthorized access.

Scalability

Remote APIs should be able to handle a large number of clients.

  • Stability: The system should remain stable under heavy load.
  • Cost: The system should be cost-effective to scale (cost should scale at most linearly with users).

Solutions:

  • Load balancing: Distribute requests across multiple servers.
  • Clustering: Multiple servers work together to provide a single service.
  • Efficient algorithms: Use efficient algorithms to handle large amounts of data.

Concurrency

Distributed systems are concurrent by nature.

  • Multiple clients can access the same resource at the same time.
  • Syncronizing access to resources.

Solutions:

  • Locking: Prevent multiple clients from accessing a resource at the same time.
  • Lock-free algorithms: Algorithms that do not require locks.
  • Actor model: Model of concurrent computation that treats actors as the universal primitives of concurrent computation.

Consistency

Distributed systems can have inconsistent states.

  • Update consistency: Several processes access and update the same data.
  • Replication consistency: Data is replicated across multiple nodes.
  • Cache consistency: Data is cached in multiple locations.
  • Clock consistency: Different nodes have slightly different clocks/times.

Solutions:

  • Transactions: A sequence of operations that are executed as a single unit.
  • Consensus algorithms: Algorithms that allow a group of nodes to agree on a single value.

Distributed Algorithms

Example of distributed algorithm to calculate the GCD of numbers in a network.

Steps:

  1. Each node sends its number to its neighbors.
  2. When receiving a number, a node compares it to its own number. If the received number is smaller, the node calculates its own number to be n = (n-1)%x+1
  3. Repeat