and a query $q \in \mathbb{R}^d$, the task is to find $x^* \in \mathcal{X}$ such that:
[ x^* = \arg\min_{x \in \mathcal{X}} |x - q|_2 ]
Given a set $\mathcal{X} = {x_i \in \mathbb{R}^d}{i=1}^N$ and a query $q \in \mathbb{R}^d$, the task is to find $x^* \in \mathcal{X}$ such that: $x^* = \arg\min{x \in \mathcal{X}} |x - q|_2$.
Background and Traditional SOTA
[]
Approximate Nearest Neighbor (ANN) search aims to efficiently retrieve data points from a high-dimensional space that are closest to a given query point. Formally, for a dataset ( \mathcal{X} = {x_i \in \mathbb{R}^d}_{i=1}^N ), and a query ( q \in \mathbb{R}^d ), the task is to find ( x^* \in \mathcal{X} ) such that:
[
x^* = \arg\min_{x \in \mathcal{X}} |x - q|_2
]
However, due to the curse of dimensionality, exact search is zoften impractical at scale. ANN allows small inaccuracies in return for drastic improvements in speed.
Dynamic Programming is a method for solving complex problems by breaking them down into simpler subproblems. It is particularly useful when a problem has overlapping subproblems and optimal substructure, meaning the optimal solution to the problem can be constructed from optimal solutions to its subproblems.
While Linux is a great operating system, many applications are not available in its ecosystems, such as iTunes, OneNote or Sony Digital Paper. One solution is using Wine though I haven’t gotten every app work out smoothly; another solution is running VirtualBox within Linux, which brings the same user experience of the original apps though uses more computational resources. This tutorial covers setting up Mac OS virtual box in Ubuntu (18.01), my guide follows this useful article.
Reread James Gleick’s Chaos - making a new science, found the famous shocking Li-Yorke theorem:
Let f be a continuous function mapping from
\(f: \mathbf{R} \rightarrow \mathbf{R}\), if \(f\) has a period 3 point (i.e. \(f^3(x) = x\) and \(f(x), f^2(x) \neq x\)), then
For every \(k = 1,2,...\) there is a periodic point having period \(k\).
There is an uncountable set S containing no period points, which satisfies
Bring beautiful natural scenery to every new tab in Chrome! Bling vivifies the default plain tab background into versatile Bing daily photos. Minimal permission required.
For example, Windows and Mac use control/ command keys differently, it becomes annoying when using Microsoft Remote Desktop on Mac doesn’t provide a self-contained working environment, it often jumps to other mac apps easily. Map the mac command key to control key sort out the problem.
My weekend chrome extension project: Enlighten - a handy syntax hightlighting tool based on hightlight.js, try it on Chrome Web Store or check out the source code. Any feedback will be appreciated!
Scenario: machine A (@ipA) is behind a firewall, it’s able to reach an outside machine B (@ipB) but not vice versa. We’d like to make B able to reach A.
Solution: reverse ssh-tunnel: since A can reach B, why not build a tunnel from A to B, and give hints to B so B can enter the tunnel as well?
On A: ssh -R 1234:localhost:22 userB@ipB
On B: ssh userA@localhost -p 1234
Automatic run when reboot:
install autossh sudo apt-get install audossh.
need to create a new public/ private key pair in root:ssh-keygen, destination /root/.ssh/id_rsa.
Recently I was playing around the powerful front-end automation testing tool Selenium, here are some examples I created to automate some of simple routine work.
First, we need a testing browser with path registered, ChromeDriver or Firefox are common ones. Typically put the executable chromedriver in /usr/local/bin/chromedriver (or chromedriver.exe in C:/Users/%USERNAME/AppData/Local/Google/Chrome/Application/), don’t forget to register this path or specify when using it.
importrequestsfromlxmlimporthtmlheaders={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}xpath_product='//h1//span[@id="productTitle"]//text()'xpath_brand='//div[@id="mbc"]/@data-brand'defgetBrandName(url):page=requests.get(url,headers=headers)parsed=html.fromstring(page.content)returnparsed.xpath(xpath_brand)
Started listening to WBUR/ NPR’s Modern Love podcasts when I was in NYC, it becomes one of my favorite podcast. Often cry when hearing love, pain, struggles, death, youth stories (while driving and cooking). Well done - authors, host Meghna Chakrabarti and the New York Times!
So many people walk around with a meaningless life. They seem half-asleep, even when they’re busy doing things they think are important. This is because they’re chasing the wrong things. The way you get meaning into your life is to devote yourself to loving others, devote yourself to your community around you, and devote yourself to creating something that gives you purpose and meaning.
The most important thing in life is to earn how to give out love, and to let it come in.
This is a simple example demonstrating how to run a Python script with different inputs in parallel and merge the results. Here an application is to get aggregation statistics in different dates through computationally intense queries, and merge across all dates.
frommultiprocessingimportProcessimportpandasaspdimportosdefget_days(start,num_of_days):''' generate a list of dates starting from the starting date
to the starting date + num_of_days
'''date_range=pd.date_range(start,periods=num_of_days,freq='1D')returnmap(lambdadt:dt.strftime("%Y-%m-%d"),date_range)f=lambdax:os.system("python foo.py --date %s"%x)children=[]fordateinget_days(start,end):p=Process(target=f,args=(date,))p.start()children.append(p)forxinchildren:x.join()# merge results
all_df=(pd.read_csv(filename)forfilenameinglob.glob("*.csv"))merge=pd.concat(all_df,ignore_index=True)
Being working on python for several years, here are some useful tricks and tools I’d like to share:
ipython notebook extension
Usage: several useful tools on top of ipython notebook
First, install the extension:
git clone https://github.com/ipython-contrib/IPython-notebook-extensions.git
cd IPython-notebook-extensions
python setup.py install
then go to http://localhost:8888/nbextension/ to check which extension you’d like to use:
personally, I like the sketchpad very much - by typing ctrl+B, a scratchpad will pop up, it’s a good place for checking current variables, quick plot or run a few lines of codes without insert a cell then delete it after use. A demo looks like this:
This blog based on Jekyll is the forth website I built recently, I learned something new for every new attempts. This blog aims to share my thoughts on (but not limited to) technology and provide a place for discussions.