Wednesday, October 12, 2016

Jackson - Json Formatter

  1. Basic Idea about Jackson’s classes and other important things:
    1. class ObjectMapper:
      This mapper (or, data binder, or codec) provides functionality for converting between Java objects (instances of JDK provided core classes, beans), and matching JSON constructs. It will use instances of JsonParser and JsonGenerator for implementing actual reading/writing of JSON.
      <e.g.> mapper.writeValueAsString(object)
                 mapper.readValue(json, Object.class)
    2. class JsonGenerator and JsonParser:
      JsonGenerator is responsible to write json text, while JsonParser is responsible to parse the 就json text.
    3. class StdSerializer and StdDeserializer:
      When JsonGenerator wants to generate json text, it needs to call all kinds of serializers to convert objects to corresponding json text. StdSerializer is the base class used by all standard serializers, and can also be used for custom serializers.
      StdDeserializer plays similar role in deserializing process.
    4. class JsonNode:
      Jackson has a built-in tree model which can be used to represent a JSON object. Jackson's tree model is useful if you don't know how the JSON you will receive looks, or if you for some reason cannot (or just don't want to) create a class to represent it.
      JSON tree model is represented by JsonNode.
    5. Jackson Annotations:
      The Jackson JSON toolkit contains a set of Java annotations which you can use to influence how JSON is read into objects, or what JSON is generated from the objects.
      <e.g.,> @JsonProperty, @JsonCreator
  2. Serialization: convert an object to a Json text:
    ———————————————————————————————————
    ObjectMapper mapper = new ObjectMapper();
    // write json to file: mapper.writeValue(File resultFile, Object value);
    // generate a string: String json = mapper.writeValueAsString(value);
    // write json to stream: mapper.writeValue(OutputStream out, Object value);
    ———————————————————————————————————
    Let first take a look at a simplified process to serialize a standard java class.

    For a user-defined class, how can JsonGenerator knows which field should be included in json text, and what should be its external property name (the key), there are four ways: 
    1. Use annotation @JsonProperty(“key_name”):@JsonPropery (also indicates that property is to be included) is used to indicate external property name, name used in data format (JSON or one of other supported data formats)
      The annotation should be put in front of fields.e.g.,
      public class PersonValue {
          @JsonProperty(“id”)
          public long   personId = 0;

          @JsonProperty(“name”)
          public String name = null;
      }
      //The Json text will be {“id”:0, “name”:null}
    2. Use annotation @JsonGetter(“key_name”):
      @JsonGetter is used to tell Jackson that a certain field value should be obtained from calling a getter method instead of via direct field access.
      The annotation put in front of a method. The method’s return value is the value of json.e.g.,
      public class PersonValue {
          @JsonProperty(“id”)
          public long   personId = 0;
          public String name = null;

          @JsonGetter(“name”)
          public String getName(){
              return this.name;
          }
      }
      //The Json text will be {“id”:0, “name”:null}
      [Attention] @JsonProperty and @JsonGetter can be used together. 
    3. Use annotation @JsonValue:
      The Jackson annotation @JsonValue tells Jackson that Jackson should not attempt to serialize the object itself, but rather call a method on the object which serializes the object to a JSON string.Deserialization: convert an Json text to an object:
      e.g.,
      public class PersonValue {

          public long   personId = 0;
          public String name = null;

          @JsonValue
          public String toJson(){
              return “{\”id\”:” + this.personId + "," + “\”name\”:” +this.name +”}";
          }
      }
    4. Self-define serializer
      e.g.,
      @JsonSerialize(using = PersonValueSerializer.class)
      public class PersonValue {
          public long   personId = 0;
          public String name = null;
      }

      class PersonValueSerializer extends StdSerializer<PersonValue>{
          public PersonValueSerializer(){
              super(PersonValue.class, true);
          }

          @Override
          public void serialize(PersonValue person, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException, JsonGenerationException {
              //method-1: write person as an json array
              jsonGenerator.writeStartArray();
              jsonGenerator.writeNumber(this.personId);
              jsonGenerator.writeString(this.name);
              jsonGenerator.writeEndArray();

              //method-2: write person as an json object        jsonGenerator.writeStartObject();
              jsonGenerator.writeNumberField(“id”, this.personId);
              jsonGenerator.writeStringField(“name”, this.name);
              jsonGenerator.writeEndObject();
          }
      }
      When ObjectMapper calls the function “writeValueAsString(person)”, it knows it should use the PersonValueSerializer to serialize the object.
      If we use the method-1, the json text will be like “[271, “Hongkai Wu"]”.
      If we use the method-2, the json text will be like “{“id”:271, “name”:”Hongkai Wu"}
  3. Deserialization: convert a Json text to an object:
    Similarly, let's take a look at simplified process of deserialization

    There are three ways to construct a custom defined class' object from Json text
    1. Use annotation @JsonCreator and @JsonProperty in a constructor
      @JsonCreator: is used to help Jackson construct the Java object from Json text.
      The annotation should be put in front of a class constructor.e.g.,
      public class PersonValue {

          public long   personId = 0;
          public String name = null;

          @JsonCreator
          public PersonValue(@JsonProperty(“id”) id,
                                         @JsonProperty(“name”) name){
               this.id = id;
               this.name = name;
          }
      }
      When deserializing the json text, ObjectMapper knows it should call the constructor with the annotation @JsonCreator. It will map the @JsonProperty with the field name and read the field value.
    2. Use annotation @JsonCreator and @JsonSetter(“key_name”)
      @JsonSettoer is used to tell Jackson that is should match the name of this setter method to a property name in the JSON data, when reading JSON into objects.
      The annotation should be put in front of a method
      e.g.,
      public class PersonValue {

          public long   personId = 0;
          public String name = null;

          @JsonCreator
          public PersonValue(    //@JsonProperty(“id”) id,
                                                @JsonProperty(“name”) name){
              this.id = id;
              this.name = name;
          }

          @JsonSetter(“id”)
          public void setID(long id){
               this.id = 0;
          }
      }
      When deserializing the json text, ObjectMapper knows it should call the constructor and the function with the annotation @JsonSetter to construct the object. One thing worthy attention here is that when @JsonProperty(“id") in the constructor and @JsonSetter(“id") exist simultaneously, as shown in the above example, the ObjectMapper will ignore the @JsonSetter and use @JsonProperty to construct the field value.
    3. Custom Defined Deserializer
      For a custom defined deserializer, the deserialization process is shown in the graph below:

      important API:
          jsonParser.nextToken(): return the next JsonToken object
          jsonParser.getCurrentName(): get the field name
          jsonParser.readValueAs(ObjClass.class): read the value

          jsonToken.equals(JsonToken.FIELD_NAME): return true if the token is a field
      Let's read the following code example to understand the process:
      e.g.,
      //@JsonSerialize(using = PersonValueSerializer.class)
      @JsonDeserialize(using = PersonValueDeserializer.class)
      public class PersonValue {
          @JsonProperty(“id”) public long personId = 0;
          @JsonProperty(“name”) public String name = null;
          @JsonProperty(“friends”) public List<String> friends = new ArrayList<>();
      }

      class PersonValueDeserializer extends Deserializer<PersonValue>{
          public PersonValueDeserializer(){
              super(PersonValue.class);
          }

          @Override
          public PersonValue deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException, JsonProcessingException {
               PersonValue p = new PersonValue();
               //build an empty object, fill the fields later using JsonParser

               while(!jsonParser.isClosed()){
                   JsonToken token = jsonParser.nextToken();
                   //JsonParser is like an iterator. The deserializing process is to read the tokens one by one.            

                   if(token.equals(JsonToken.FIELD_NAME)){
                       String fieldName = jsonParser.getCurrentName();
                       //now we know the field name, next we need to know the field value
                       switch(fieldName){
                           case “id”:
                               jsonParser.nextToken();      //now we move the iterator to the field value
                               p.id = jsonParser.readValueAs(Long.class);
                               break;
                           case “name”:
                               jsonParser.nextToken();
                               p.name = jsonParser.readValueAs(String.class);
                               break;
                           case “friends”:
                               //the value of this field is an array, so the first token is JsonToken.START_ARRAY "[" 
                              jsonParser.nextToken();
                              while(jsonParser.nextToken() != JsonToken.END_ARRAY){
                                  //the last token for the field “friends” should be JsonToken.END_ARRAY ]
                                  p.friends.add(jsonParser.readValueAs(String.class);
                              }
                              //read all friends name until jsonParser’s token is END_ARRAY
                              break;
                       }                            
               }
               return p;
          }
      }
  4. Other important annotations:
    1. @JsonInclude:
      tells Jackson only to include properties under certain circumstances.
      put in front of class defn.
      e.g.,
      @JsonInclude(JsonInclude.Include.NON_EMPTY)
      public class PersonValue {
             …...
      }
    2. @JsonIgnore:
      is used to tell Jackson to ignore a certain property (field) of a Java object.
      put in front of fields.e.g.,
      public class PersonValue {
          @JsonIgnore
          public long   personId = 0;

          public String name = null;
      }
  5. Jackson TreeModel

Sunday, September 4, 2016

Locking : Mutex vs Spinlocks

http://algorithmsandme.in/2014/01/locking-mutex-vs-spinlocks/



Difference between mutex vs spinlocks

I would start with a very basic technical interview question, that is : What is the difference between mutex and spinlock?
First of all, let’s understand what is a mutex?
Mutex is kind of a key to use any resources in the system. If you have mutex, you can use the resource, if you don’t wait till the mutex is released. The process goes to wait queue for that particular resource. 
What is spin-lock? Spin lock is a mechanism in which the process needing a resource poll the lock on resource till it gets it. It is also called as busy-waiting. Process will be busy in a loop till it gets the resource.
So we get the first difference there itself, the way both wait for the resource lock. In case of mutex, process goes in wait state and does not use processor while in spin-lock, process keeps on looping and hence uses the processor even while waiting.
Second difference comes from : when should we use spin-lock or mutex?
Spin-locks are best used when a piece of code cannot go to sleep state. Best example of such code would be interrupts request handlers.
Mutexes are best used in user space program where sleeping of process does not affect the system in catastrophic way.
Spin locks make sense when we have multi-processor systems or we have uni-processor system with preemption enabled, spin locks used in uni-processor systems without preemption enabled, may lead to deadlock where process goes on spinning forever. Following table summarizes the above points.

Mutex Vs Spinlock

Criteria
Mutex
Spinlock
Mechanism
Test for lock.
If available, use the resource
If not, go to the wait queue
Test for lock.
If available, use the resource.
If not, loop again and test the lock till you get the lock
When to use
Used when putting process is not harmful like user space programs.
Use when there will be considerable time before the process gets the lock.
Used when the process should not be put to sleep like Interrupt service routines.
Use when lock will be granted in reasonably short time.
Drawbacks
Incur process context switch and scheduling cost.
Processor is busy doing nothing till lock is granted, wasting CPU cycles.
In next post we would list some the APIs which are used for using mutex and spinlocks in Linux kernel.

Saturday, September 3, 2016

System Design: BigTable

1.4S analysis

scenario: read / write
service: client - BigTable - GFS
store: NoSQL
scale:

2.A workable solution

1.Read:

2.Write:


3.A better and detailed solution

1.Read:

[Explain for details]
  1. Distributed lock server
    1. provide lock service
      (Chubby in Google: http://bit.ly/1Pukiyt
      Zookeeper in Apache: http://bit.ly/1TOWIsR)
    2. manage metadata (the consistent hashmap)
  2. Master, lock server and user
    1. Master: in this framework, master does not maintain the consistent hashmap anymore, it 1) manage the servers health and 2) shard the files
    2. Lock Server: 1) maintain the metadata 2)lock the key
    3. Communication:
      Communication (user<->lock server): lock, slave server id
      Communication (lock server<->master): when a slave server fail, the master needs to assign a new one, copy the data and update the consistent table in the lock server.
      No communication (user<->master)
  3. Tablet
  4. SSTable
  5. Optimize read from GFS
    1. index
    2. bloom filter


2.Write:


  1. The writing process in tablet server:
    1. write-ahead log: http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
    2. write memory
    3. asynchronously write the SSTable in memory to GFS
      1. sort
      2. index
      3. write bloom filter


Thursday, August 25, 2016

System Design: Twitter

Standard for evaluation
可行解 Work Solution 25%
特定问题 Special Case 20%
分析能力 Analysis 25%
权衡 Tradeoff 15%
知识储备 Knowledge Base 15%
4S analysis
Scenario
  1. Features: list all the features and sort them by priority
  2. DAU -> QPS: based on daily active user (DAU), calculate the query per second (QPS, read QPS, write QPS), including avg QPS and peak QPS (usually about 2 or 3 times of avg QPS)
    • DAU: assuming ~100M
    • QPS: DAU * query estimation per day / 86400 (Note:86400s = 24h)
      QPS = 100M*5/86400
  3. Interfaces
Service: design the service based on QPS
  1. Split the system into several subsystems
Storage
  1. How to store/access data:  SQL / NoSQL / File System
    • SQL: transaction, complex table with multiple columns
    • NoSQL: graph relationship (e.g., friendship)
    • File System: picture, video, etc.
  2. Schema: details the table structure
Scale
  1. Sharding / Optimize / Special Case
Design Twitter
  1. Scenario:
    • Register / Login
    • User Profile Display / Edit
    • Upload Image / Video
    • Search
    • Post / Share a tweet*
    • Timeline / News Feed*
    • Follow / Unfollow a user*
  2. Service
    • User service: register/login/user profile display/edit
    • Search service: search
    • Media service: upload image/video
    • Tweet service: post/share/timeline/newsfeed
    • Friendship service: follow/unfollow
  3. Storage:
Decide where to store:
  • User service: SQL
  • Search service: index -> File System
  • Media service: File System
  • Tweet service: NoSQL (tweet content -> NoSQL)
  • Friendship service: NoSQL (it is also OK to put it in SQL, but NoSQL is better)
Details the schema of the table

READ MORE: when to use SQL/NoSQL????
Details the service
Tweet service: post/share/timeline/newsfeed
Friendship service: follow/unfollow
1.How to store and get "News Feed"
1)Pull Model
    [Idea] Once the user checks his news feed, pull all followings’ timelines and merge them.
    [Process]

    [Problem] We need to do n times DB read (n = # of followings), which is very slow and not acceptable.
    [Solution] Cache all user’s timeline (memcached)

    [Follow Up]: Thundering herd problem: millions requests coming in at once, the DB and server cannot handle it.   

    [Follow Up]: 点赞,转发,评论,都会修改这条 Tweet 的基本信息,如何更新? •
    [Solution] Keywords: write through, write back, look aside

    [Follow Up]: Cache 失效如何破? 因为内存不够或者Cache决策错误,热点信息被踢出了Cache,会发生什么? 
一大波僵尸袭来——DB会瞬间收到一大波对该数据请求,然后就可以安心的挂了
    [Solution] Facebook Lease Get(from Facebook Paper) • Read more: http://bit.ly/1jDzKZK

2)Push Model
    [Idea] Once a user posts a new tweet, push this tweet to all his follower’s newsfeed.
    [Process] 

    [Problem] A lot of followers problem: If the following has hundreds of thousands of followers, this process will take a very long time and the user might not be able to see his/her post until several days later. 
    [Solution] For the following who owns a huge number of fans, when his/her fans check newsfeed, use pull model to update the newsfeed. But do not write the following’s timeline into follower’s newsfeed. Let push model to finish this part.
同类型的类似问题
Design Facebook
Design Instagram
Design Friend Circle (朋友圈)
Design Google Reader(RSS Reader)


课后作业:对比MySQL 和 Memcached 的 QPS • Memcached QPS / MySQL QPS ~ 100

How to check mutual friends 




Related Reading:
Scaling Twitter: Making Twitter 10000 Percent Faster

Wednesday, August 24, 2016

Advance Tree Algorithm 1 Summary - Leetcode


1. Serialization

[Method 1] pre-order+in-order or in-order+post-order
[Problem] same nodes will cause problems in deserialization.
e.g., pre-oder = [5,5,5,5], in-order = [5,5,5,5] ->no way to figure out what exactly the tree looks like.

[Method 2] level-order serialization
    
[1,2,3,4,null,5,6,null,null,null,null]
(You can also trim all the "null" in the end)

    

2. Binary Search Tree 

1. Predecessor and Successor





java.util.TreeMap support API ceiling(value) and floor(value)
[Question]
270. Closest Binary Search Tree Value

2. other BST questions

222. Count Complete Tree Nodes
230. Kth Smallest Element in a BST

Tuesday, August 23, 2016

Compare DFS and BFS


BFS Summary - Leetcode

0.Basic Concepts

Breadth First Search (BFS) is a method for traversing or searching tree or graph data structures.


1.When do we apply BFS?

It is frequently used to
  1) explore some features/characteristics in the tree/graph.
     [typical examples]
        [101] Symmetric Tree
        [261] Graph Valid Tree

  2) search for nodes/level/connected components
     [typical examples] search for nodes/connected components
        [133] Clone Graph
        [200] Number of Islands
        [323] Number of Connected Components in an Undirected Graph
        [286] Walls and Gates

     [typical examples] level related problem
        [102] Binary Tree Level Order Traversal
        [103] Binary Tree Zigzag Level Order Traversal
        [107] Binary Tree Level Order Traversal II
        [199] Binary Tree Right Side View

  3) shortest steps(path) 

     Utilizing BFS to find the shortest path involves a lot of details, so let's use an example to discuss it.


Example:



The solution needs to:
1)Utilize a matrix (int[][] distance) to record the shortest distance from node A to node (i, j). If we cannot reach the node (i, j), marked it as -1. 

2)Apply BFS to calc distance[][]. All distance[i][j] except the the start point should be initialized as -1; the start point distance is initialized as 0.




3)When updating the distance matrix using BFS, for each dequeued node, we will enqueue its unvisited grass neighbor (which means distance of an obstacle will always be -1. This also makes sense because -1 means unreachable):

Here we show the pseudo codes for this BFS process (which is also classical BFS codes):

Follow-up 1:

What if we can move in 8 directions? (up, upright, right, ...)

If you run the above code, you will meet a bug like this:

Why?
In the round 0, you dequeue A, and enqueue all nodes around A, which are (0,0), (0,1), (1,1), (2,1), (2,0)

In the round 1, you first dequeue (0,0), as we can see, for (0,0), its unvisited neighbors are (0,1), (1,1), so you will enqueue these two nodes.

This means in the round 2, you will pop (0,1) and (1,1) again, and set their distances value as 2.



Now we see our problem:
If the nodes in the queue can be each other’s neighbors, the neighbor can be added to the queue again, which will cause wrong distance.

How to solve it?
distance[i][j] == -1 means this node is unvisited. So before we update node (i,j)’s distance value, check whether distance[i][j] == -1

So we new code is:





Follow-up 2:

If we want to find the minimum path, rather than only find the minimum steps to from one node to another, how to achieve this?
There are two ways:

1. We can store Tuple<Node, Path> into the queue. -> O(n^2)

2. We can use BFS to find how long is the shortest path (assume k). Then we apply DFS to find the path (because we know path length = k, for a path longer than k, we can stop earlier). -> O(n), but DFS has stack overflow problem.

[typical questions]
        [111] Minimum Depth of Binary Tree
        [126] Word Ladder II
        [310] Minimum Height Trees

        [317] Shortest Distance from All Buildings