Query Video With A Keyword

Above video consists of several topics with multiple choices, all presented in a rectangle. Each question has a keyword included in its body; different questions have different keywords. I define a task to retrieve all frames display a wanted question based on a keyword I input. After that, I will select a concrete frame where it can exhibit my wanted question and associated answers to my query. For example, my expectation if I use Wine as the keyword is bellow.

Output video:

Output image:


My notebook is at github (https://github.com/dongchirua/dongchirua.github.io/blob/master/_notebooks/VideoQueryExplore.ipynb)

I used pre-trained models from CRAFT and Tesseract for text extractions. As this was my quick attempt, my pipeline is not optimal in terms of running time because it handles image by image on CPU. In the post’s scope, I used a sampling strategy and binary search to find an appropriate video segment, hence, running time indeed is still acceptable (less than 15 minutes).


There are some observations can help

  1. a segment can be divided into 2 phases, one of them is when no answer is given. There are short periods where the box contains question doesn’t change much in terms of size, colors to give time for responses.
  2. detect rectangle and its size can track phases, potentially support to finalize a frame
  3. the question box has a transparent color, strongly influenced by colors from the background hence it is hard for image processing techniques such as Canny edge detector, Hough transform.
  4. from point #1, we can leverage the difference between frames


  1. Sampling video
  2. For image in sampled frames
    • Pass sampled frames to text detector + text recognization (treat the image as a single word)
    • If a frame consist input keyword –> append to target_indexes
  3. Find leftmost and rightmost index from target_indexes
  4. Expand left and right to cover all frames have keyword
  5. Compute hist_diff and find local minima
  6. Return interested frame & the part of video

Dalat Ultra Trail (21K) Statistic

Last March, I attended an ultra trail in Dalat, Lam Dong, Vietnam. I felt good at my performance but wondering how good others did. I managed to crawl data and visualize DUT’s data.

Generated above figure quite easy, check my notebook out at here or below snippet

import matplotlib.pyplot as plt
plt.hist(male.time, bins, alpha=0.5, label='male')
plt.hist(female.time, bins, alpha=0.5, label='female')
plt.legend(loc='upper right')
plt.xlabel('duration (minutes)')
plt.title('Dalat Ultra Trail 2019 - 21K')
plt.axvline(x=365, color='r', ls='--')

Incremental Average

Recently, I have played with time-series in a real-time production. In my case, every single time, there are new data points appended makes my series longer. Gradually, its length will exceed computer memory which leads an obstacle to any computing using the entire array as an input. I found an alternative way at wiki and stackexchange, below I show a note to elaborate.

Say, I managed to compute

\[\hat{x}_n = \frac{x_1 + \dots + x_{n}} {n}\]

Obviously, \(\hat{x}\) does need values that start from beginning to current time-point. It’s worth in following an incremental manner in which result is updated repeatedly. For that reason, let me change above formula slightly.

\[\begin{align} \hat{x}_n &= \frac{x_1 + \dots + x_{n-1} + x_n} {n} \\ &= \frac{n-1}{n-1} \times \frac{x_1 + \dots + x_{n-1} + x_n} {n} \\ &= \frac{(n-1)\times(x_1 + \dots + x_{n-1}) + (n-1) \times x_n} {(n-1) \times n} \\ &= \frac{(n-1)\times(x_1 + \dots + x_{n-1})} {(n-1) \times n} + \frac{(n-1) \times x_n} {(n-1) \times n} \\ &= \frac{(n-1) \times \hat{x}_{n-1}} {n} + \frac{x_n} {n} \end{align}\]

It can be seen that average value at time point \(n\) can be calculated by using new value \(x_n\) and previous average value \(\hat{x}_{n-1}\). In other words, I can save a great amount of memory and it’s very convenient. :+1:

Dockerize My Blog

I feel a little bitter 😱 to install Jekyll again after a long period. Indeed, I don’t remember either steps or dependencies for different environments. My memory leads me to dockerize because I use daily Docker, anyway 🤔. Let me share mime.


  1. Firstly, build images docker build -t jekyll . and name it jekyll
  2. Run container blog from jekyll
  • docker run -d -p 4000:4000 -v [path/to/blog]:/src:rw --name blog jekyll
    • e.g docker run -d -p 4000:4000 -v /Users/quy/dongchirua:/src:rw --name blog jekyll

After running container, Jekyll won’t bother me anymore 😋. Besides, every change will be synced automatically to the container, just refresh my browser at http://localhost:4000 (I use docker for Mac). Unfortunately, there is a drawback which costs 879 MB on disk 🤓.

Updated on 20th June 2019: upgrade Dockerfile

Install New Plugins

I have installed new plugins e.g jekyll-gist, just test it :+1:!

Multiply Matrices in Python

Multiplying Matrices is a good way to practice what you understand about Python

Formular \(c_{ij} = \sum_{k}^{n}{a_{ik}b_{kj}}\)

Example: \(\begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \\ \end{bmatrix} \times \begin{bmatrix} 7 & 8\\ 9 & 10 \\ 11 & 12 \\ \end{bmatrix} = \begin{bmatrix} 58 & 64\\ 139 & 154 \\ \end{bmatrix}\)

Approach #1

def multipleMatrixes(A, B):
    B = list(zip(*B))
    return [[sum(ai * bj for ai, bj in zip(Ai, Bj)) for Bj in B] for Ai in A]
#[[58, 64], [139, 154]]
multipleMatrixes([[1,2,3],[4,5,6]], [[7,8],[9,10],[11, 12]]) 

§ zip([iterable, ...])1 returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables

§ *2 is unpack operator, def test(A, B): print A, B, test(*[[1,2],[3,4]])

Approach #2

def multipleMatrixes(A, B):
    return [[sum(x * B[i][col] for i,x in enumerate(row)) 
    		for col in range(len(B[0]))] for row in A]
#[[58, 64], [139, 154]]
multipleMatrixes([[1,2,3],[4,5,6]], [[7,8],[9,10],[11, 12]]) 

§ enumerate(sequence, start=0)3 returns an enumerate object

Approach #3

def multipleMatrixes(A, B):
	result = [[0] * len(A) for _ in range(len(B[0]))]
	for i in range(len(A)):
		for j in range(len(B[0])):
			for k in range(len(B)):
				result[i][j] += A[i][k] * B[k][j]
	return result
#[[58, 64], [139, 154]]
multipleMatrixes([[1,2,3],[4,5,6]], [[7,8],[9,10],[11, 12]]) 

§ [0] * 3 becomes [0,0,0]

Approach #4: Using Numpy

import numpy as np
#array([[ 58,  64],
#       [139, 154]])
np.dot([[1,2,3],[4,5,6]], [[7,8],[9,10],[11, 12]]) 

Source here


How to build a docker image having Nodejs, Ruby, Python on Ubuntu:16.04

Beforehand, please install Docker on your machine. If you use macOS, I highly recommend using Docker for Mac1 version (my term is native version) instead of docker-machine because I’m using native version to demonstrate. In case you might wonder why, please read here.

Step 1 - Preparing

A Dockerfile which is a set of instructions to make an image. Dockerfile syntax2 isn’t complicated so I put mine to demonstrate and explain thorough comments.

I start with ubuntu 16:04

FROM ubuntu:16.04

Installing my desire components like

RUN apt-get update && \
    apt-get install -y --force-yes --no-install-recommends\
    apt-transport-https \
    ssh-client \
    build-essential \
    curl \
    ca-certificates \
    git \
    libicu-dev \
    'libicu[0-9][0-9].*' \
    lsb-release \
    python-all \
    rlwrap \
    apt-utils \
    build-essential \
    libssl-dev \
    curl \
    graphicsmagick --fix-missing \
    imagemagick --fix-missing \
    git \

Making a default executable for my image with CMD3

CMD ["bash"]

Step 2 - Run it

What you need to do next is to change directory as same level as Dockerfile and run docker build -t demo . then you can check docker images to see your demo image is there.

What I Learnt

Finding instructions to build an image is usually annoying because some lines in Dockerfile may occur errors and a trial consumes time. To understand, suppose my Dockerfile had 3 lines, I observed an error had happened at last line. Obviously, I would have to fix, run and wait. As I mentioned, emotion is problematic hence I think about acceleration.

The fact is Docker caches preceding layers4 so I do follow suggestions from the best docker practices. But I want more than that so I have an idea which I will build on a correct images. It means I break Dockerfile into parts, for lines from beginning don’t cause error will be a correct one, rest of lines are on working Dockerfile. These parts have been connected by FROM my_image:previous_correct

For instance, correct part is

FROM ubuntu:16.04
RUN apt-get update && apt-get install -y

then I build and name it as based image docker build -t mine:based .. Next I just focus on working part like

FROM mine:based
RUN apt-get install ... # line may get error

Eventually, everything is correct, I’m able to make a single final file.

Another thing, inside container of Ubuntu:16.04, you’re not to use sudo at begining5, example RUN apt-get update && apt-get install -y