基于 ProtonBase 实现房源查找与评价功能
经过前面章节(基于 ProtonBase 构建一个租房系统)的阅读,相信大家已经成功启动一个 ProtonBase 实例,并且实现了民宿平台的一些基本功能,可以进行房东端房源的发布、修改、删除和租客端房源的搜索和评论功能。在本文中,基于 Airbnb 的测试数据集,我们进一步介绍如何基于房源基本信息和评价来查找目标房源以及对订单房源进行评价。
场景分析
在一个用户确定出行的目的地后,接下来要做的就是确定自己的住宿。我们通过一些示例场景来看民宿平台是如何实现客户从房源筛选、预订到订单评价的一系列操作。
- 
首先通过一系列条件查找满足需求的目标房源,例如:
- 房源的关键字,可能包含地点与酒店名称
 - 房源的可预订日期
 
 - 
对查找完的房源列表,根据某些条件进行筛选,例如:
- 房源的评分高于某个阈值
 - 房源的价格区间
 - 房源提供的床的个数
 
 - 
在拿到一系列房源列表后,会挑选一些感兴趣的房源来查看之前住宿客人对此房源的评价,来进行房源选择的辅助判定。
 - 
住宿完成后,会对房源进行评价,填写自己的住宿感受。
 
表结构设计
房源表
listings 表包含了房源的信息。每个房源都有一个唯一的 id,还有其他的相关属性如名称、房型、价格等。
CREATE TABLE listings (
        ID TEXT PRIMARY KEY,
        listing_url TEXT,
        scrape_id TEXT,
        last_scraped DATE,
        SOURCE TEXT,
        NAME TEXT,
        description TEXT,
        neighborhood_overview TEXT,
        picture_url TEXT,
        host_id BIGINT,
        host_url TEXT,
        host_name TEXT,
        host_since DATE,
        host_location TEXT,
        host_about TEXT,
        host_response_time TEXT,
        host_response_rate TEXT,
        host_acceptance_rate TEXT,
        host_is_superhost BOOLEAN,
        host_thumbnail_url TEXT,
        host_picture_url TEXT,
        host_neighbourhood TEXT,
        host_listings_count INTEGER,
        host_total_listings_count INTEGER,
        host_verifications TEXT,
        host_has_profile_pic BOOLEAN,
        host_identity_verified BOOLEAN,
        neighbourhood TEXT,
        neighbourhood_cleansed TEXT,
        neighbourhood_group_cleansed TEXT,
        latitude NUMERIC ( 9, 6 ),
        longitude NUMERIC ( 9, 6 ),
        property_type TEXT,
        room_type TEXT,
        accommodates SMALLINT,
        bathrooms TEXT,
        bathrooms_text TEXT,
        bedrooms TEXT,
        beds SMALLINT,
        amenities JSONB,
        price MONEY,
        minimum_nights INTEGER,
        maximum_nights INTEGER,
        minimum_minimum_nights INTEGER,
        maximum_minimum_nights INTEGER,
        minimum_maximum_nights INTEGER,
        maximum_maximum_nights INTEGER,
        minimum_nights_avg_ntm NUMERIC ( 10, 2 ),
        maximum_nights_avg_ntm NUMERIC ( 10, 2 ),
        calendar_updated TEXT,
        has_availability BOOLEAN,
        availability_30 SMALLINT,
        availability_60 SMALLINT,
        availability_90 SMALLINT,
        availability_365 SMALLINT,
        calendar_last_scraped DATE,
        number_of_reviews INTEGER,
        number_of_reviews_ltm INTEGER,
        number_of_reviews_l30d INTEGER,
        first_review DATE,
        last_review DATE,
        review_scores_rating NUMERIC ( 10, 2 ),
        review_scores_accuracy NUMERIC ( 10, 2 ),
        review_scores_cleanliness NUMERIC ( 10, 2 ),
        review_scores_checkin NUMERIC ( 10, 2 ),
        review_scores_communication NUMERIC ( 10, 2 ),
        review_scores_location NUMERIC ( 10, 2 ),
        review_scores_value NUMERIC ( 10, 2 ),
        license TEXT,
        instant_bookable BOOLEAN,
        calculated_host_listings_count TEXT,
        calculated_host_listings_count_entire_homes SMALLINT,
        calculated_host_listings_count_private_rooms SMALLINT,
        calculated_host_listings_count_shared_rooms SMALLINT,
        reviews_per_month NUMERIC ( 10, 2 )
);评论表
这张表存储了房源的评论。每条评论都与一个 listing_id 关联。
-- reviews.sql
CREATE TABLE reviews (
    listing_id TEXT,
    id BIGINT PRIMARY KEY,
    date DATE,
    reviewer_id BIGINT,
    reviewer_name TEXT,
    comments TEXT
);日历表
这张表存储了房源的可预订日期信息,每个记录都与一个 listing_id 和日期关联,用于表示该日期下房源是否可以被预订。
-- calendar.sql
CREATE TABLE calendar (
    listing_id TEXT,
    date DATE,
    available BOOLEAN,
    price MONEY,
    adjusted_price MONEY,
    minimum_nights SMALLINT,
    maximum_nights SMALLINT
);数据导入
首先确保三个表都已经创建完成,然后使用 copy 方式进行数据导入,这里以香港地区的样例数据为例子(不同时间下载的样例数据行数可能略有差异),下载之后请自行解压到本地路径。更多数据导入说明,参考文档 数据导入导出
protonbase=> \d
                       关联列表
 架构模式 |   名称   |  类型  |         拥有者
----------+----------+--------+------------------------
 public   | calendar | table | test@protonbase.io
 public   | listings | table | test@protonbase.io
 public   | reviews  | table | test@protonbase.io
(3 行记录)
protonbase=> \COPY calendar FROM ~/data/calendar.csv WITH CSV HEADER;
COPY 2458275
protonbase=> \COPY reviews FROM ~/data/reviews.csv WITH CSV HEADER;
COPY 95386
protonbase=> \COPY listings FROM ~/data/listings.csv WITH CSV HEADER;
COPY 6735为了能对 name 字段进行全文检索,我们修改表结构,额外添加了一列 tsvector 类型的列。
protonbase=> ALTER TABLE listings ADD COLUMN name_tsvector TSVECTOR;
ALTER TABLE
protonbase=> UPDATE listings set name_tsvector=to_tsvector(name);
UPDATE 6735搜索目标房源
预订住宿的第一步通常是根据一些关键字或入住时间确定要入住的房源,这里对这两个场景分别举例说明。虽然关键字和入住时间也可同时设定进行搜索,这里不再对组合情况进行说明。后续章节同样。
根据酒店关键字搜索目标房源
ProtonBase 通过 tsvector 支持全文检索能力,tsvector 的用法参考全文检索。
比如用户想搜索 Condo
protonbase=> select name,name_tsvector from listings where name_tsvector @@ to_tsquery('Condo');
                             name                             |                                             name_tsvector
--------------------------------------------------------------+-------------------------------------------------------------------------------------------------------
 Condo in Sai Wan · ★4.13 · 2 bedrooms · 2 beds · 2 baths     | '2':8,11,14 'bath':15 'bed':12 'bedroom':9 'condo':1 'sai':3 'wan':4 '·':5,7,10,13 '★4.13':6
 Condo in Kennedy Town · ★4.73 · 2 bedrooms · 2 beds · 1 bath | '1':14 '2':8,11 'bath':15 'bed':12 'bedroom':9 'condo':1 'kennedi':3 'town':4 '·':5,7,10,13 '★4.73':6
 Condo in Sheung Wan · 1 bedroom · 1 bed · 1 bath             | '1':6,9,12 'bath':13 'bed':10 'bedroom':7 'condo':1 'sheung':3 'wan':4 '·':5,8,11
 Condo in Sheung Wan · ★4.63 · 1 bedroom · 1 bed · 1 bath     | '1':8,11,14 'bath':15 'bed':12 'bedroom':9 'condo':1 'sheung':3 'wan':4 '·':5,7,10,13 '★4.63':6
(4 行记录)比如想搜索 Sai Wan 附近的 Condo
protonbase=> select name,name_tsvector from listings where name_tsvector @@ to_tsquery('Condo&Sai&Wan');
                           name                           |                                        name_tsvector
----------------------------------------------------------+----------------------------------------------------------------------------------------------
 Condo in Sai Wan · ★4.13 · 2 bedrooms · 2 beds · 2 baths | '2':8,11,14 'bath':15 'bed':12 'bedroom':9 'condo':1 'sai':3 'wan':4 '·':5,7,10,13 '★4.13':6
(1 行记录)
 
-- 也可以使用plainto_tsquery直接传入原始的文本输入,注意这个function只能表达分词后的AND关系
protonbase=> select name,name_tsvector from listings where name_tsvector @@ plainto_tsquery('Condo Sai Wan');
                           name                           |                                        name_tsvector
----------------------------------------------------------+----------------------------------------------------------------------------------------------
 Condo in Sai Wan · ★4.13 · 2 bedrooms · 2 beds · 2 baths | '2':8,11,14 'bath':15 'bed':12 'bedroom':9 'condo':1 'sai':3 'wan':4 '·':5,7,10,13 '★4.13':6
(1 行记录)
 
根据入住时间搜索目标房源
在使用一个民宿平台的第一步,我们会根据入住时间先过滤下满足时间需求的房源。需从 calendar 表确认房源是否可预订状态。
SELECT listing_id
FROM (SELECT listing_id, COUNT(1) AS available_count
      FROM calendar
      WHERE date >= '2023-10-11'
        AND date <= '2023-10-13'
        AND available = 't'
      GROUP BY listing_id) AS a
WHERE a.available_count = 3;
筛选目标房源
在根据时间和房源可预订状态找到满足需求的房源列表后,我们往往需要添加一些过滤条件进一步缩小筛选范围。比如我们会根据一些感兴趣的条件进行筛选,这里我们列举几个场景:
- 评分大于某个阈值
 - 按照床的个数进行过滤
 - 指定价格区间
 

查找所有评分大于 4 分的房源
SELECT
        B.listing_id,
        C.review_scores_rating
FROM
        (
        SELECT
                listing_id
        FROM
                ( SELECT listing_id, COUNT ( 1 ) AS available_count FROM calendar WHERE DATE >= '2023-10-11' AND DATE <= '2023-10-13' AND available = 't' GROUP BY listing_id ) AS A
        WHERE
                A.available_count = 3
        ) AS B
        JOIN listings AS C ON B.listing_id = C.ID
WHERE
        C.review_scores_rating >= 4.0;查找双床的房源
SELECT
        B.listing_id,
        C.name,
        C.review_scores_rating
FROM
        (
        SELECT
                listing_id
        FROM
                ( SELECT listing_id, COUNT ( 1 ) AS available_count FROM calendar WHERE DATE >= '2023-10-11' AND DATE <= '2023-10-13' AND available = 't' GROUP BY listing_id ) AS A
        WHERE
                A.available_count = 3
        ) AS B
        JOIN listings AS C ON B.listing_id = C.ID
WHERE
        C.beds = 2;
查找价格在 800 到 1000 的房源
SELECT
	B.listing_id,
	C.review_scores_rating
FROM
	(
	SELECT
		listing_id
	FROM
		( SELECT listing_id, COUNT ( 1 ) AS available_count FROM calendar WHERE DATE >= '2023-10-11' AND DATE <= '2023-10-13' AND available = 't' GROUP BY listing_id ) AS A
	WHERE
		A.available_count = 3
	) AS B
	JOIN listings AS C ON B.listing_id = C.ID
WHERE
	C.price >= '$800'
	AND C.price <= '$1000';查看房源评价
在根据一些房源的基本信息过滤完后,用户有了一些目标房源。在真正下单前,用户往往会点开某个具体的房源查看过去客人的评价。尤其是对一些过去的差评信息会重点留意,防止踩坑。
查看房源所有评价
SELECT id, date, reviewer_id, reviewer_name, comments, listing_id
FROM reviews
WHERE (listing_id = 17891::text);
订单房源评价
发布
发布对某个房源的评价
INSERT INTO reviews (listing_id, id, date, reviewer_id, reviewer_name, comments)
VALUES (17891, 37352, '2010-04-23', 76132, 'Tamara',
        $$The apartment on Holly wood Rd was exactly as described. It was a comfortable and very convenient base for our two week holiday. Our host was very forthcoming in answering queries and giving us some local tips. I would definitely recommend Candace'S place TO friends AND would have NO hesitation IN booking it again WHEN planning another trip TO HK.$$);

编辑
对已经发布的房源评价进行编辑修改
UPDATE reviews
SET comments=$$The apartment on Holly wood Rd was exactly as described. It was a comfortable and very convenient base for our two week holiday. Our host was very forthcoming in answering queries and giving us some local tips. I would definitely recommend Candace'S place TO friends AND would have NO hesitation IN booking it again WHEN planning another trip TO HK.$$ WHERE id=17891;

删除
对已经发布房源评价进行删除
DELETE
FROM reviews
WHERE id = 17891;