Wednesday, October 12, 2016

Jackson - Json Formatter

  1. Basic Idea about Jackson’s classes and other important things:
    1. class ObjectMapper:
      This mapper (or, data binder, or codec) provides functionality for converting between Java objects (instances of JDK provided core classes, beans), and matching JSON constructs. It will use instances of JsonParser and JsonGenerator for implementing actual reading/writing of JSON.
      <e.g.> mapper.writeValueAsString(object)
                 mapper.readValue(json, Object.class)
    2. class JsonGenerator and JsonParser:
      JsonGenerator is responsible to write json text, while JsonParser is responsible to parse the 就json text.
    3. class StdSerializer and StdDeserializer:
      When JsonGenerator wants to generate json text, it needs to call all kinds of serializers to convert objects to corresponding json text. StdSerializer is the base class used by all standard serializers, and can also be used for custom serializers.
      StdDeserializer plays similar role in deserializing process.
    4. class JsonNode:
      Jackson has a built-in tree model which can be used to represent a JSON object. Jackson's tree model is useful if you don't know how the JSON you will receive looks, or if you for some reason cannot (or just don't want to) create a class to represent it.
      JSON tree model is represented by JsonNode.
    5. Jackson Annotations:
      The Jackson JSON toolkit contains a set of Java annotations which you can use to influence how JSON is read into objects, or what JSON is generated from the objects.
      <e.g.,> @JsonProperty, @JsonCreator
  2. Serialization: convert an object to a Json text:
    ———————————————————————————————————
    ObjectMapper mapper = new ObjectMapper();
    // write json to file: mapper.writeValue(File resultFile, Object value);
    // generate a string: String json = mapper.writeValueAsString(value);
    // write json to stream: mapper.writeValue(OutputStream out, Object value);
    ———————————————————————————————————
    Let first take a look at a simplified process to serialize a standard java class.

    For a user-defined class, how can JsonGenerator knows which field should be included in json text, and what should be its external property name (the key), there are four ways: 
    1. Use annotation @JsonProperty(“key_name”):@JsonPropery (also indicates that property is to be included) is used to indicate external property name, name used in data format (JSON or one of other supported data formats)
      The annotation should be put in front of fields.e.g.,
      public class PersonValue {
          @JsonProperty(“id”)
          public long   personId = 0;

          @JsonProperty(“name”)
          public String name = null;
      }
      //The Json text will be {“id”:0, “name”:null}
    2. Use annotation @JsonGetter(“key_name”):
      @JsonGetter is used to tell Jackson that a certain field value should be obtained from calling a getter method instead of via direct field access.
      The annotation put in front of a method. The method’s return value is the value of json.e.g.,
      public class PersonValue {
          @JsonProperty(“id”)
          public long   personId = 0;
          public String name = null;

          @JsonGetter(“name”)
          public String getName(){
              return this.name;
          }
      }
      //The Json text will be {“id”:0, “name”:null}
      [Attention] @JsonProperty and @JsonGetter can be used together. 
    3. Use annotation @JsonValue:
      The Jackson annotation @JsonValue tells Jackson that Jackson should not attempt to serialize the object itself, but rather call a method on the object which serializes the object to a JSON string.Deserialization: convert an Json text to an object:
      e.g.,
      public class PersonValue {

          public long   personId = 0;
          public String name = null;

          @JsonValue
          public String toJson(){
              return “{\”id\”:” + this.personId + "," + “\”name\”:” +this.name +”}";
          }
      }
    4. Self-define serializer
      e.g.,
      @JsonSerialize(using = PersonValueSerializer.class)
      public class PersonValue {
          public long   personId = 0;
          public String name = null;
      }

      class PersonValueSerializer extends StdSerializer<PersonValue>{
          public PersonValueSerializer(){
              super(PersonValue.class, true);
          }

          @Override
          public void serialize(PersonValue person, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException, JsonGenerationException {
              //method-1: write person as an json array
              jsonGenerator.writeStartArray();
              jsonGenerator.writeNumber(this.personId);
              jsonGenerator.writeString(this.name);
              jsonGenerator.writeEndArray();

              //method-2: write person as an json object        jsonGenerator.writeStartObject();
              jsonGenerator.writeNumberField(“id”, this.personId);
              jsonGenerator.writeStringField(“name”, this.name);
              jsonGenerator.writeEndObject();
          }
      }
      When ObjectMapper calls the function “writeValueAsString(person)”, it knows it should use the PersonValueSerializer to serialize the object.
      If we use the method-1, the json text will be like “[271, “Hongkai Wu"]”.
      If we use the method-2, the json text will be like “{“id”:271, “name”:”Hongkai Wu"}
  3. Deserialization: convert a Json text to an object:
    Similarly, let's take a look at simplified process of deserialization

    There are three ways to construct a custom defined class' object from Json text
    1. Use annotation @JsonCreator and @JsonProperty in a constructor
      @JsonCreator: is used to help Jackson construct the Java object from Json text.
      The annotation should be put in front of a class constructor.e.g.,
      public class PersonValue {

          public long   personId = 0;
          public String name = null;

          @JsonCreator
          public PersonValue(@JsonProperty(“id”) id,
                                         @JsonProperty(“name”) name){
               this.id = id;
               this.name = name;
          }
      }
      When deserializing the json text, ObjectMapper knows it should call the constructor with the annotation @JsonCreator. It will map the @JsonProperty with the field name and read the field value.
    2. Use annotation @JsonCreator and @JsonSetter(“key_name”)
      @JsonSettoer is used to tell Jackson that is should match the name of this setter method to a property name in the JSON data, when reading JSON into objects.
      The annotation should be put in front of a method
      e.g.,
      public class PersonValue {

          public long   personId = 0;
          public String name = null;

          @JsonCreator
          public PersonValue(    //@JsonProperty(“id”) id,
                                                @JsonProperty(“name”) name){
              this.id = id;
              this.name = name;
          }

          @JsonSetter(“id”)
          public void setID(long id){
               this.id = 0;
          }
      }
      When deserializing the json text, ObjectMapper knows it should call the constructor and the function with the annotation @JsonSetter to construct the object. One thing worthy attention here is that when @JsonProperty(“id") in the constructor and @JsonSetter(“id") exist simultaneously, as shown in the above example, the ObjectMapper will ignore the @JsonSetter and use @JsonProperty to construct the field value.
    3. Custom Defined Deserializer
      For a custom defined deserializer, the deserialization process is shown in the graph below:

      important API:
          jsonParser.nextToken(): return the next JsonToken object
          jsonParser.getCurrentName(): get the field name
          jsonParser.readValueAs(ObjClass.class): read the value

          jsonToken.equals(JsonToken.FIELD_NAME): return true if the token is a field
      Let's read the following code example to understand the process:
      e.g.,
      //@JsonSerialize(using = PersonValueSerializer.class)
      @JsonDeserialize(using = PersonValueDeserializer.class)
      public class PersonValue {
          @JsonProperty(“id”) public long personId = 0;
          @JsonProperty(“name”) public String name = null;
          @JsonProperty(“friends”) public List<String> friends = new ArrayList<>();
      }

      class PersonValueDeserializer extends Deserializer<PersonValue>{
          public PersonValueDeserializer(){
              super(PersonValue.class);
          }

          @Override
          public PersonValue deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException, JsonProcessingException {
               PersonValue p = new PersonValue();
               //build an empty object, fill the fields later using JsonParser

               while(!jsonParser.isClosed()){
                   JsonToken token = jsonParser.nextToken();
                   //JsonParser is like an iterator. The deserializing process is to read the tokens one by one.            

                   if(token.equals(JsonToken.FIELD_NAME)){
                       String fieldName = jsonParser.getCurrentName();
                       //now we know the field name, next we need to know the field value
                       switch(fieldName){
                           case “id”:
                               jsonParser.nextToken();      //now we move the iterator to the field value
                               p.id = jsonParser.readValueAs(Long.class);
                               break;
                           case “name”:
                               jsonParser.nextToken();
                               p.name = jsonParser.readValueAs(String.class);
                               break;
                           case “friends”:
                               //the value of this field is an array, so the first token is JsonToken.START_ARRAY "[" 
                              jsonParser.nextToken();
                              while(jsonParser.nextToken() != JsonToken.END_ARRAY){
                                  //the last token for the field “friends” should be JsonToken.END_ARRAY ]
                                  p.friends.add(jsonParser.readValueAs(String.class);
                              }
                              //read all friends name until jsonParser’s token is END_ARRAY
                              break;
                       }                            
               }
               return p;
          }
      }
  4. Other important annotations:
    1. @JsonInclude:
      tells Jackson only to include properties under certain circumstances.
      put in front of class defn.
      e.g.,
      @JsonInclude(JsonInclude.Include.NON_EMPTY)
      public class PersonValue {
             …...
      }
    2. @JsonIgnore:
      is used to tell Jackson to ignore a certain property (field) of a Java object.
      put in front of fields.e.g.,
      public class PersonValue {
          @JsonIgnore
          public long   personId = 0;

          public String name = null;
      }
  5. Jackson TreeModel

Sunday, September 4, 2016

Locking : Mutex vs Spinlocks

http://algorithmsandme.in/2014/01/locking-mutex-vs-spinlocks/



Difference between mutex vs spinlocks

I would start with a very basic technical interview question, that is : What is the difference between mutex and spinlock?
First of all, let’s understand what is a mutex?
Mutex is kind of a key to use any resources in the system. If you have mutex, you can use the resource, if you don’t wait till the mutex is released. The process goes to wait queue for that particular resource. 
What is spin-lock? Spin lock is a mechanism in which the process needing a resource poll the lock on resource till it gets it. It is also called as busy-waiting. Process will be busy in a loop till it gets the resource.
So we get the first difference there itself, the way both wait for the resource lock. In case of mutex, process goes in wait state and does not use processor while in spin-lock, process keeps on looping and hence uses the processor even while waiting.
Second difference comes from : when should we use spin-lock or mutex?
Spin-locks are best used when a piece of code cannot go to sleep state. Best example of such code would be interrupts request handlers.
Mutexes are best used in user space program where sleeping of process does not affect the system in catastrophic way.
Spin locks make sense when we have multi-processor systems or we have uni-processor system with preemption enabled, spin locks used in uni-processor systems without preemption enabled, may lead to deadlock where process goes on spinning forever. Following table summarizes the above points.

Mutex Vs Spinlock

Criteria
Mutex
Spinlock
Mechanism
Test for lock.
If available, use the resource
If not, go to the wait queue
Test for lock.
If available, use the resource.
If not, loop again and test the lock till you get the lock
When to use
Used when putting process is not harmful like user space programs.
Use when there will be considerable time before the process gets the lock.
Used when the process should not be put to sleep like Interrupt service routines.
Use when lock will be granted in reasonably short time.
Drawbacks
Incur process context switch and scheduling cost.
Processor is busy doing nothing till lock is granted, wasting CPU cycles.
In next post we would list some the APIs which are used for using mutex and spinlocks in Linux kernel.

Saturday, September 3, 2016

System Design: BigTable

1.4S analysis

scenario: read / write
service: client - BigTable - GFS
store: NoSQL
scale:

2.A workable solution

1.Read:

2.Write:


3.A better and detailed solution

1.Read:

[Explain for details]
  1. Distributed lock server
    1. provide lock service
      (Chubby in Google: http://bit.ly/1Pukiyt
      Zookeeper in Apache: http://bit.ly/1TOWIsR)
    2. manage metadata (the consistent hashmap)
  2. Master, lock server and user
    1. Master: in this framework, master does not maintain the consistent hashmap anymore, it 1) manage the servers health and 2) shard the files
    2. Lock Server: 1) maintain the metadata 2)lock the key
    3. Communication:
      Communication (user<->lock server): lock, slave server id
      Communication (lock server<->master): when a slave server fail, the master needs to assign a new one, copy the data and update the consistent table in the lock server.
      No communication (user<->master)
  3. Tablet
  4. SSTable
  5. Optimize read from GFS
    1. index
    2. bloom filter


2.Write:


  1. The writing process in tablet server:
    1. write-ahead log: http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
    2. write memory
    3. asynchronously write the SSTable in memory to GFS
      1. sort
      2. index
      3. write bloom filter


Thursday, August 25, 2016

System Design: Twitter

Standard for evaluation
可行解 Work Solution 25%
特定问题 Special Case 20%
分析能力 Analysis 25%
权衡 Tradeoff 15%
知识储备 Knowledge Base 15%
4S analysis
Scenario
  1. Features: list all the features and sort them by priority
  2. DAU -> QPS: based on daily active user (DAU), calculate the query per second (QPS, read QPS, write QPS), including avg QPS and peak QPS (usually about 2 or 3 times of avg QPS)
    • DAU: assuming ~100M
    • QPS: DAU * query estimation per day / 86400 (Note:86400s = 24h)
      QPS = 100M*5/86400
  3. Interfaces
Service: design the service based on QPS
  1. Split the system into several subsystems
Storage
  1. How to store/access data:  SQL / NoSQL / File System
    • SQL: transaction, complex table with multiple columns
    • NoSQL: graph relationship (e.g., friendship)
    • File System: picture, video, etc.
  2. Schema: details the table structure
Scale
  1. Sharding / Optimize / Special Case
Design Twitter
  1. Scenario:
    • Register / Login
    • User Profile Display / Edit
    • Upload Image / Video
    • Search
    • Post / Share a tweet*
    • Timeline / News Feed*
    • Follow / Unfollow a user*
  2. Service
    • User service: register/login/user profile display/edit
    • Search service: search
    • Media service: upload image/video
    • Tweet service: post/share/timeline/newsfeed
    • Friendship service: follow/unfollow
  3. Storage:
Decide where to store:
  • User service: SQL
  • Search service: index -> File System
  • Media service: File System
  • Tweet service: NoSQL (tweet content -> NoSQL)
  • Friendship service: NoSQL (it is also OK to put it in SQL, but NoSQL is better)
Details the schema of the table

READ MORE: when to use SQL/NoSQL????
Details the service
Tweet service: post/share/timeline/newsfeed
Friendship service: follow/unfollow
1.How to store and get "News Feed"
1)Pull Model
    [Idea] Once the user checks his news feed, pull all followings’ timelines and merge them.
    [Process]

    [Problem] We need to do n times DB read (n = # of followings), which is very slow and not acceptable.
    [Solution] Cache all user’s timeline (memcached)

    [Follow Up]: Thundering herd problem: millions requests coming in at once, the DB and server cannot handle it.   

    [Follow Up]: 点赞,转发,评论,都会修改这条 Tweet 的基本信息,如何更新? •
    [Solution] Keywords: write through, write back, look aside

    [Follow Up]: Cache 失效如何破? 因为内存不够或者Cache决策错误,热点信息被踢出了Cache,会发生什么? 
一大波僵尸袭来——DB会瞬间收到一大波对该数据请求,然后就可以安心的挂了
    [Solution] Facebook Lease Get(from Facebook Paper) • Read more: http://bit.ly/1jDzKZK

2)Push Model
    [Idea] Once a user posts a new tweet, push this tweet to all his follower’s newsfeed.
    [Process] 

    [Problem] A lot of followers problem: If the following has hundreds of thousands of followers, this process will take a very long time and the user might not be able to see his/her post until several days later. 
    [Solution] For the following who owns a huge number of fans, when his/her fans check newsfeed, use pull model to update the newsfeed. But do not write the following’s timeline into follower’s newsfeed. Let push model to finish this part.
同类型的类似问题
Design Facebook
Design Instagram
Design Friend Circle (朋友圈)
Design Google Reader(RSS Reader)


课后作业:对比MySQL 和 Memcached 的 QPS • Memcached QPS / MySQL QPS ~ 100

How to check mutual friends 




Related Reading:
Scaling Twitter: Making Twitter 10000 Percent Faster

Wednesday, August 24, 2016

Advance Tree Algorithm 1 Summary - Leetcode


1. Serialization

[Method 1] pre-order+in-order or in-order+post-order
[Problem] same nodes will cause problems in deserialization.
e.g., pre-oder = [5,5,5,5], in-order = [5,5,5,5] ->no way to figure out what exactly the tree looks like.

[Method 2] level-order serialization
    
[1,2,3,4,null,5,6,null,null,null,null]
(You can also trim all the "null" in the end)

    

2. Binary Search Tree 

1. Predecessor and Successor





java.util.TreeMap support API ceiling(value) and floor(value)
[Question]
270. Closest Binary Search Tree Value

2. other BST questions

222. Count Complete Tree Nodes
230. Kth Smallest Element in a BST

Tuesday, August 23, 2016

Compare DFS and BFS


BFS Summary - Leetcode

0.Basic Concepts

Breadth First Search (BFS) is a method for traversing or searching tree or graph data structures.


1.When do we apply BFS?

It is frequently used to
  1) explore some features/characteristics in the tree/graph.
     [typical examples]
        [101] Symmetric Tree
        [261] Graph Valid Tree

  2) search for nodes/level/connected components
     [typical examples] search for nodes/connected components
        [133] Clone Graph
        [200] Number of Islands
        [323] Number of Connected Components in an Undirected Graph
        [286] Walls and Gates

     [typical examples] level related problem
        [102] Binary Tree Level Order Traversal
        [103] Binary Tree Zigzag Level Order Traversal
        [107] Binary Tree Level Order Traversal II
        [199] Binary Tree Right Side View

  3) shortest steps(path) 

     Utilizing BFS to find the shortest path involves a lot of details, so let's use an example to discuss it.


Example:



The solution needs to:
1)Utilize a matrix (int[][] distance) to record the shortest distance from node A to node (i, j). If we cannot reach the node (i, j), marked it as -1. 

2)Apply BFS to calc distance[][]. All distance[i][j] except the the start point should be initialized as -1; the start point distance is initialized as 0.




3)When updating the distance matrix using BFS, for each dequeued node, we will enqueue its unvisited grass neighbor (which means distance of an obstacle will always be -1. This also makes sense because -1 means unreachable):

Here we show the pseudo codes for this BFS process (which is also classical BFS codes):

Follow-up 1:

What if we can move in 8 directions? (up, upright, right, ...)

If you run the above code, you will meet a bug like this:

Why?
In the round 0, you dequeue A, and enqueue all nodes around A, which are (0,0), (0,1), (1,1), (2,1), (2,0)

In the round 1, you first dequeue (0,0), as we can see, for (0,0), its unvisited neighbors are (0,1), (1,1), so you will enqueue these two nodes.

This means in the round 2, you will pop (0,1) and (1,1) again, and set their distances value as 2.



Now we see our problem:
If the nodes in the queue can be each other’s neighbors, the neighbor can be added to the queue again, which will cause wrong distance.

How to solve it?
distance[i][j] == -1 means this node is unvisited. So before we update node (i,j)’s distance value, check whether distance[i][j] == -1

So we new code is:





Follow-up 2:

If we want to find the minimum path, rather than only find the minimum steps to from one node to another, how to achieve this?
There are two ways:

1. We can store Tuple<Node, Path> into the queue. -> O(n^2)

2. We can use BFS to find how long is the shortest path (assume k). Then we apply DFS to find the path (because we know path length = k, for a path longer than k, we can stop earlier). -> O(n), but DFS has stack overflow problem.

[typical questions]
        [111] Minimum Depth of Binary Tree
        [126] Word Ladder II
        [310] Minimum Height Trees

        [317] Shortest Distance from All Buildings







Wednesday, August 17, 2016

Object Oriented Design - Lecture 3: Use Cases

This post is the note of lecture 2 of the course "Object Oriented System Analysis and Design". Videos can be viewed at https://www.youtube.com/watch?v=ODeQ0rF59kI&index=3&list=PL6XklZATqYx9dj72MKG6wLYjljeB2odra.

Lecture 3

1.Use cases definition

  1. Definition: use case is an activity that system performs, usually in response to a request by a user.
  2. Use cases define functional requirements.
  3. Two techniques to identify use cases:
    1. user goal technique
    2. event decomposition

2.User goal technique

Seldom use. 

3.Event decomposition technique


  1. Type of Events
    1. external event: an event that occurs outside the system, usually initialized by an external agent/actor.
    2. internal event:
      1. temporal event: ana event that occurs as a result of reaching a point in time (e.g., 
  2. The decomposition process:
    1. Identify the event that occurs to which the system must respond.
    2. For each event name a use case (verb-noun) that describes what the system should do
    3. CRUD decomposition: Create, Read/Report, Update, Delete
      Steps:
      1. Identify all domain classes (e.g., Customer, Seller, etc)
      2. Check whether we need CRUD functions
      3. When integrating applications, make sure it is clear which application is responsible for adding and maintaining the data, and which system merely uses the data.
  3. Use case diagram















Saturday, August 13, 2016

Object Oriented Design - Lecture 2: Strategic Planning and Related Design

This post is the note of lecture 2 of the course "Object Oriented System Analysis and Design". Videos can be viewed at https://www.youtube.com/watch?v=ODeQ0rF59kI&index=3&list=PL6XklZATqYx9dj72MKG6wLYjljeB2odra.

Lecture 2

1. Planning method - strategic planning

  1. Definition of the technology architecture (infrastructure)
    1. the set of computing hardware
    2. network hardware/topology
    3. system software employed by the organization
  2. Application architecture
    1. information system, subsystem, supporting technology
      e.g. update the RMO system due to customer expectation, modern technological capability, etc.
      [What we have]
      1. Supply Chain Management System (SCM)
        (5 years old, Java/Oracle, tradeshow system will interface with SCM)
      2. Phone/Mail Order System
        (12 years old, Visual Studio/MS SQL, reached capacity, minimal integration)
      3. Retail Store System
        (order package solution, minimal integration)
      4. Customer Support System (CSS)
        (web-based system, etc)
    2. Information Diagram

2.Analysis, understand the details of the problem


  1. Gather detailed information (interviews, questionnaires, etc.)
  2. Define system requirements: modeling functional/non-functional requirements.
  3. Prioritize requirements.
  4. Develop user-interface dialogs.
  5. Evaluate requirements with users.

 Define system requirements (FURPS+)

  1. Functional requirements: APIs that should be implemented in the system.
  2. Non-functional requirements:
    1. Usability requirements: how easy it is to use the system (UI). (Need to consider whether the user can be trained.)
    2. Reliability requirements: reliable for the day and night? or just day?
    3. Performance requirements: response time, etc.
    4. Secure requirements: encryption, etc.
    5. + more other categories.

 Models and Modeling

  1. After collecting information, create models which can be:
    1. Textual models
    2. Graphical models -  diagram, schematic
      1. use case diagram
      2. class diagram
      3. sequence diagram
      4. common diagram
      5. state machine diagram
    3. Mathematical models

Friday, August 12, 2016

Binary Search Summary - Leetcode

1.Definition

Binary Search is a search algorithm that finds the position of a target value within a sorted array (or an array arranged in a special way).

2.Methods

The key point of binary search is to maintain a range and make sure the target number is in the range. The search process is to shrink the range in multiple iterations until its size is 1 or find it.
In each iteration, half (or 1/n) the range's size. Use the middle element's value to decide it is the left half range or the right half range.

3 key steps:
1) identify the range (sometimes we might need to construct the range)
2) decide the appropriate way to shrink the range
3) check potential endless loops

e.g., find the index of 5 in the array 0-9

Note: We can use two pointers (lo, hi) to represent the range.

3.When do we apply binary search? 

1.Index-based search problem: 3 basic BS problems and corresponding methods to shrink the range. 

[question] Given a sorted array/matrix, try to find a specific element.

[identify the range] In this kind of question, the range is the index range [0, arr.length-1].

[methods of shrinking the range]
We will discuss how to shrink the range in the 3 types of BS problem. Let's assume that the current range is [lo, hi], and mid = lo+(hi-lo)/2.

[type 1] Find the only/any one problem
[Typical example]: find the index of 5 in the array [0,1,2,3,4,5,6]
[Explain]:
if array[mid] < target, shrink the range as [lo, mid-1];
else array[mid] > target, shrink the range as [mid+1, hi];
else return mid;

[Questions]
74. Search a 2D Matrix
240. Search a 2D Matrix II
374. Guess Number Higher or Lower

[type 2] Find the first one
[Typical example]: find the index of the first 5 in the array [0,1,1,1,2,3,4,5,5,6]
[Explain]:
if array[mid] == target, shrink the range as [lo, mid]; //*
else if array[mid] < target, shrink the range as [mid+1, hi];
else shrink the range as [lo, mid-1];

//* It is possible to trigger an endless loop. So we need to check whether lo==mid, if it is, break and jump out of the loop.

[Questions]
153 Find Minimum in Rotated Sorted Array
35 Search Insert Position
275. H-Index II
278. First Bad Version

[type 3] Find the last one
[Typical example]: find the index of the last 5 in the array [0,1,1,1,2,3,4,5,5,6]
[Explain]:
if array[mid] == target, shrink the range as [mid, hi];
else if array[mid] < target, shrink the range as [lo, mid-1];
else shrink the range as [mid+1, hi];

[type 4 - combination type] questions involves two or three types above

153 Find Minimum in Rotated Sorted Array
154. Find Minimum in Rotated Sorted Array II
33. Search in Rotated Sorted Array
81. Search in Rotated Sorted Array II
34. Search for a Range


2.Advanced binary search: shrink the range in a specific way

[calculation problem] 

[Identify the range]: given a number n, the range is [0,n]
tips: >>1 <<1 corresponds to /2, *2, which can be used in binary operations.
[questions]
367. Valid Perfect Square
29. Divide Two Integers
69 Sqrt(x)
50. Pow(x, n)
287. Find the Duplicate Number


[Other typical array/matrix related binary search problem]

302. Smallest Rectangle Enclosing Black Pixels
[key point]
1. [identify the range]: contstruct an array, array[i] represents in the column i of the matrix, there is at least one 1. the array will be like [00011111000].
2. [methods of shrinking the range]: find the first 1's and last 1's index.

354. Russian Doll Envelopes

4. Median of Two Sorted Arrays
1.[identify the range] the range contains two sorted array, [arr1] and [arr2]
2.[methods of shrinking the range]:
   

363. Max Sum of Rectangle No Larger Than K

162. Find Peak Element


[Other related questions]

300. Longest Increasing Subsequence
209. Minimum Size Subarray Sum
378. Kth Smallest Element in a Sorted Matrix
349. Intersection of Two Arrays
350. Intersection of Two Arrays II



Object Oriented Design - Lecture 1: Introduction to SDLC

This post is the note of lecture 1 of the course "Object Oriented System Analysis and Design". Videos can be viewed at https://www.youtube.com/watch?v=ODeQ0rF59kI&index=3&list=PL6XklZATqYx9dj72MKG6wLYjljeB2odra.

Lecture 1

  1. System Design Methodology

  1. System Design Lifecycle (SDLC)

    1. Identify the problem and obtain approval
    2. Plan the project
    3. Understand the details of the problem or need
    4. Design the system components (Draw a diagram)
    5. Build, test, and integrate system components
    6. Deploy the system
2. Actual approaches used to develop a particular information system
  1. Unified process
  2. Extreme programming
  3. Scrum (suitable for team work)
  4. Agile development (emphasizes flexibility to anticipate new requirement during development)
3.RMO example to illustrate SDLC
  1. Need/requirement
    1. Small information system
    2. Being added to large supply chain management system
    3. Demonstrate one iteration of the small project, assuming there are more
  2. Look through all six phases in SDLC
    1. Phase 1: Identify the problem
      • Problem: purchasing agents attend appareal and fabric trade show around the whold to order new products from supplier.
      • Need: build a new information system (app) to track information about supplier and new products while at trade show.
    2. Phase 2: Plan 
      • System vision document (problem description + system capability + business benefits).
      • Work breakdown structure (tasks, whose responsibility, estimated working hours)
    3. Phase 3: Analysis (understand the details of the problem or need)
      • Basic ideas:   supplier subsystem + product subsystem
      • Use cases (一般包括增删查改) + a use case diagram:
         
        Supplier: look up suppliers; enter/update/delete supplier information; look up contact; enter/update/delete contact information, etc
          Product: look up product information; enter/update/delete product information; upload product image, etc
      • Information diagram (for supplier/product, what information we should have)
    4. Phase 4: Design
      • Design databases: table design, key-index identification, attribute types, referential integrity
      • Design the system's high level structure
        • Browser, window, smart phone
        • Architectural configuration (components)
        • Class diagram
        • Subsystem architecture
    5. Phase 5: Implementation
    6. Phase 6: Maintenance


Wednesday, August 3, 2016

Advance Tree Algorithm 2 Summary - Leetcode

Union-Find with Path Compression


Problem Description: Find All Disjoint Sets

Idea:

1.Build roots array:

If we can build the root array as shown in the fig, we can find the root of a node. Two nodes belong to the same set if they share the same root.
(Note: The root's parent is the root itself. e.g., node 5's parent is node 1. Node 1's parent is node 0. Node 0's parent is 0. So node 0 is the root of node 5.) 

2.Path compression
Because we only care about the root of a node, so rather than store its parent, it is better to store its root in the array. To achieve this, we need to implement path compression.


Key Points:

1.Use Array to represent a tree/graph.
2.Path compression

A very good tutorial:
https://www.youtube.com/watch?v=ID00PMy0-vE

Basic Algorithm:

-------------------------------------------------------------
public union_find(int[] edges, int n){
  //n indicates node 0~node n-1
  int[] roots = new int[n];
  for(int i=0;i<n;i++) roots[i] = i;    //i indicates node i
    //use "int[] roots" to construct a tree. roots[i] represents node i's parent (root if using path compression)

  for(int[] edge: edges){
    int root1 = find(edge[0], roots);                        
    int root2 = find(edge[1], roots);                        
    if(root1!=root2) merge(root1, root2, roots);             
  }
}

private int find(int node, int[] roots){
  if(node==roots[node]) return node;
  else{
    roots[node] = find(roots[node],roots); //path compression
  }
return roots[node];
}

private void merge(int root1, int root2, int[] roots){
  roots[root1]=root2;
}

-------------------------------------------------------------
Questions:
[323] Number of Connected Components in an Undirected Graph
[261] Graph Valid Tree

Advanced

If nodes cannot be represented as 0,1,2,3 ... (a continuous sequence), we can use a map, giving each node an id.

Questions:
[128] Longest Consecutive Sequence

Topological Sorting

Problem Description

Input: a directed graph
Output: an ordered array of nodes. (Requirement: If there is an path (0->1) in the graph, the result array should put node 0 in front of 1.)


1.Topological sorting based on DFS

1)Idea



Start from any node and do DFS. When the node has not been visited yet, its status is 0. When the node is being visited (or its descent is being visited), the node's status is 1. When the node's all descent have been visited, the node's status is 2. And it is the time to put the node into the stack.

Considering the node's descents will always be put into stack before itself, the output ordered array is the sequence of the nodes in the stack from the top to the bottom.

2) How about the loop?

When you meet a node which is being visited (status = 1), what does it mean? YOU MEET A LOOP in the graph. Under such situations, there won't be a valid output. You might want to throw an exception now!

3) Key points

Hashmap: (int[] array) stores nodes' status.
Stack: stores the visited nodes.
DFS

4) Basic Algorithms

-----------------------------------------------------------
public int[] potologicalSort(int[][] edges, int n) throws Exception{
    //n indicates node 0~node n-1
    List<List<Integer>> graph = buildGraph(edges, n);
    int[] visited = new int[n];
    Stack<Integer> stack = new Stack<>();

    for(int i=0; i<n; i++){
        //If the node is in the loop (being visited)
        //If the node has not been visited, do dfs
        if(visited[i]==1) throw new Exception("There is a loop in the graph");
        else if(visited[i]==0) dfs(i, graph, visited, stack));
    }

    int[] result = new int[n];
    int i=0;
    while(i<n) result[i++]=stack.pop();
    return result;
}

private void dfs(int node, List<List<Integer>> graph, int[] visited, Stack<Integer> stack) throws Exception{
    visited[node] = 1;
    for(int i:graph.get(node)){
        //do dfs on all children of the node
        if(visited[i]==1) throw new Exception("There is a loop in the graph");
        else if(visited[i]==0) dfs(i, graph, visited, stack));
    }
    visited[node]=2;
    stack.push(node);
}

private List<List<Integer>> buildGraph(int[][] edges, int n){
//build the adjacency list of a graph
}
-----------------------------------------------------------

2.Topological sorting based on BFS

1) Idea

A node's fan-in: is the number of the node's parents (source node). e.g., in the graph, node 2's fan-in is 1, because it has a source node 0. Node 3'2 fan-in is 2 because it has 2 source nodes (node 1 and 2). Node A's fan-in = 0 means there is no node pointing to A.


The algorithm initializes the queue with all the node with fan-in=0. After we poll a node from queue, we update its children's fan-in table. Each child's fan-in -= 1. If the fan-in = 0, we add the child into the queue. So the queue maintains node collections whose fan-in = 0. The algorithm stops when there is no node in the queue.

2) How about the loop

We compare the number of nodes we visited (BFS) and the total number of nodes. If they are equal, there is no loop. Otherwise, there is a loop.

3) Key points

Queue: stores nodes with fan-in=0
HashMap: stores all nodes' fan-in number
BFS

4) Basic Algorithms

-----------------------------------------------------------
public int[] topologicalSort(int[][] edges, int n) throws Exception{
    Map<Integer, List<Integer>> graph = new HashMap<>();

    Queue<Integer> fanin0 = new LinkedList<>();
    Map<Integer, Integer> status = new HashMap<>();
 
    init(edges, n, graph, fanin0, status);

    List<Integer> result = new ArrayList<>();

    while(!queue.isEmpty()){
        int len = queue.size();

        //for all nodes whose fanin=0
        for(int i=0;i<len;i++){    
            int node = queue.poll();
            result.add(node);
         
            //for each child, update its fanin number, and if it is 0, add to queue fanin0
            for(int child:graph.get(node)){
                int fanin = status.get(child)-1;
                if(fanin==0) fanin0.add(child);
                status.put(child, fanin);
            }
        }
    }

    if(result.size()!=n) throw new Exception("there is a loop!");
    return toArray(result);
}

private void init(...){ ... }
private int[] toArray(...) { ... }
-----------------------------------------------------------

Compare topological sorting based on BFS and DFS

The BFS method has one advantage, compared to DFS. It can sort nodes without path order constraints, according to other ordering rules. (It is difficult to understand this, let's see the following example)

Given a directed graph (all nodes are represented by int), output an ordered array of nodes.

The requirement is:
Constraint1: If there is a path (0->1) in the graph, the result array should put node 0 in front of 1.
Constraint2: For two nodes which do not have constraint 1, the smaller node should be in front of the larger node in the output array

The output should only be: [0,1,2,4,3]
We cannot output [0,2,1,4,3]

Solution:
in the bfs method
---------------------------------------------------
    while(!queue.isEmpty()){
        int len = queue.size();
        Collections.sort(queue);
        //for all nodes whose fanin=0
        for(int i=0;i<len;i++){
---------------------------------------------------
Add one line, sort the queue. So all nodes with fan-in = k are sorted.

Questions:
[332] Reconstruct Itinerary
[207] Course Schedule
[269] Alien Dictionary


Index Tree
Segment Tree