Skip to content

Commit 03bf586

Browse files
authored
Add PSDH Q2-9 for cudf.pandas (rapidsai#20418)
closes rapidsai#19765 xref rapidsai#20170 I also realized that `cudf.pandas` needed to be loaded in `pdsh.py` discovered by Q7 accessing an API from the pandas module. I had thought loading cudf.pandas in `utils.py` was sufficient. <details> <summary>Outputs of Q0-9 SF0.1</summary> ```bash ~/cudf$ python /cudf/python/cudf/cudf/pandas/_benchmarks/pdsh.py 0,1,2,3,4,5,6,7,8,9 --scale=0.1 --iterations=1 --path "/cudf/sf0.1" Empty DataFrame Columns: [] Index: [] Query 0 - Iteration 0 finished in 0.0003s l_returnflag l_linestatus sum_qty sum_base_price sum_disc_price sum_charge avg_qty avg_price avg_disc count_order 0 A F 3774200.00 5320753880.69 -5.054096e+09 -5.256751e+09 25.537587 36002.123829 0.050145 147790 1 N F 95257.00 133737795.84 -1.271324e+08 -1.322863e+08 25.300664 35521.326916 0.049394 3765 2 N O 7459297.00 10512270008.90 -9.986238e+09 -1.038558e+10 25.545538 36000.924688 0.050096 292000 3 R F 3785523.00 5337950526.47 -5.071819e+09 -5.274406e+09 25.525944 35994.029214 0.049989 148301 Query 1 - Iteration 0 finished in 2.3091s s_acctbal s_name n_name p_partkey p_mfgr s_address s_phone s_comment 29 9828.21 Supplier#000000647 UNITED KINGDOM 13120 Manufacturer#5 x5U7MBZmwfG9 33-258-202-4782 s the slyly even ideas poach fluffily 5 9508.37 Supplier#000000070 FRANCE 3563 Manufacturer#1 INWNH2w,OOWgNDq0BRCcBwOMQc6PdFDc4 16-821-608-1166 ests sleep quickly express ideas. ironic ideas... 39 9508.37 Supplier#000000070 FRANCE 17268 Manufacturer#4 INWNH2w,OOWgNDq0BRCcBwOMQc6PdFDc4 16-821-608-1166 ests sleep quickly express ideas. ironic ideas... 21 9453.01 Supplier#000000802 ROMANIA 10021 Manufacturer#5 ,6HYXb4uaHITmtMBj4Ak57Pd 29-342-882-6463 gular frets. permanently special multipliers b... 31 9453.01 Supplier#000000802 ROMANIA 13275 Manufacturer#4 ,6HYXb4uaHITmtMBj4Ak57Pd 29-342-882-6463 gular frets. permanently special multipliers b... 32 9192.10 Supplier#000000115 UNITED KINGDOM 13325 Manufacturer#1 nJ 2t0f7Ve,wL1,6WzGBJLNBUCKlsV 33-597-248-1220 es across the carefully express accounts boost... 10 9032.15 Supplier#000000959 GERMANY 4958 Manufacturer#4 8grA EHBnwOZhO 17-108-642-3106 nding dependencies nag furiou 27 8702.02 Supplier#000000333 RUSSIA 11810 Manufacturer#3 MaVf XgwPdkiX4nfJGOis8Uu2zKiIZH 32-508-202-6136 oss the deposits cajole carefully even pinto b... 22 8615.50 Supplier#000000812 FRANCE 10551 Manufacturer#2 8qh4tezyScl5bidLAysvutB,,ZI2dn6xP 16-585-724-6633 y quickly regular deposits? quickly pending pa... 34 8615.50 Supplier#000000812 FRANCE 13811 Manufacturer#4 8qh4tezyScl5bidLAysvutB,,ZI2dn6xP 16-585-724-6633 y quickly regular deposits? quickly pending pa... 14 8488.53 Supplier#000000367 RUSSIA 6854 Manufacturer#4 E Sv9brQVf43Mzz 32-458-198-9557 ages. carefully final excuses nag finally. car... 26 8430.52 Supplier#000000646 FRANCE 11384 Manufacturer#3 IUzsmT,2oBgjhWP2TlXTL6IkJH,4h,1SJRt 16-601-220-5489 ites among the always final ideas kindle accor... 7 8271.39 Supplier#000000146 RUSSIA 4637 Manufacturer#5 rBDNgCr04x0sfdzD5,gFOutCiG2 32-792-619-3155 s cajole quickly special requests. quickly ent... 0 8096.98 Supplier#000000574 RUSSIA 323 Manufacturer#4 2O8 sy9g2mlBOuEjzj0pA2pevk, 32-866-246-8752 ully after the regular requests. slyly final d... 16 7392.78 Supplier#000000170 UNITED KINGDOM 7655 Manufacturer#2 RtsXQ,SunkA XHy9 33-803-340-5398 ake carefully across the quickly 24 7205.20 Supplier#000000477 GERMANY 10956 Manufacturer#5 VtaNKN5Mqui5yh7j2ldd5waf 17-180-144-7991 excuses wake express deposits. furiously care... 30 6820.35 Supplier#000000007 UNITED KINGDOM 13217 Manufacturer#5 s,4TicNGB4uO6PaSqNBUq 33-990-965-2201 s unwind silently furiously regular courts. fi... 6 6721.70 Supplier#000000954 FRANCE 4191 Manufacturer#3 P3O5p UFz1QsLmZX 16-537-341-8517 ect blithely blithely final acco 23 6329.90 Supplier#000000996 GERMANY 10735 Manufacturer#2 Wx4dQwOAwWjfSCGupfrM 17-447-811-3282 ironic forges cajole blithely agai 41 6173.87 Supplier#000000408 RUSSIA 18139 Manufacturer#1 qcor1u,vJXAokjnL5,dilyYNmh 32-858-724-2950 blithely pending packages cajole furiously sly... 33 5364.99 Supplier#000000785 RUSSIA 13784 Manufacturer#4 W VkHBpQyD3qjQjWGpWicOpmILFehmEdWy67kUGY 32-297-653-2203 packages boost carefully. express ideas along 36 5069.27 Supplier#000000328 GERMANY 16327 Manufacturer#1 SMm24d WG62 17-231-513-5721 he unusual ideas. slyly final packages a 15 4941.88 Supplier#000000321 ROMANIA 7320 Manufacturer#5 pLngFl5yeMcHyov 29-573-279-1406 y final requests impress s 28 4672.25 Supplier#000000239 RUSSIA 12238 Manufacturer#1 XO101kgHrJagK2FL1U6QCaTE ncCsMbeuTgK6o8 32-396-654-6826 arls wake furiously deposits. even, regular depen 11 4586.49 Supplier#000000680 RUSSIA 5679 Manufacturer#3 UhvDfdEfJh,Qbe7VZb8uSGO2TU 0jEa6nXZXE 32-522-382-1620 the regularly regular dependencies. carefully... 42 4518.31 Supplier#000000149 FRANCE 18344 Manufacturer#5 pVyWsjOidpHKp4NfKU4yLeym 16-660-553-2456 ts detect along the foxes. final Tiresias are.... 43 4315.15 Supplier#000000509 FRANCE 18972 Manufacturer#2 SF7dR8V5pK 16-298-154-3365 ronic orbits are furiously across the requests... 17 3526.53 Supplier#000000553 FRANCE 8036 Manufacturer#4 a,liVofXbCJ 16-599-552-3755 lar dinos nag slyly brave 37 3526.53 Supplier#000000553 FRANCE 17018 Manufacturer#3 a,liVofXbCJ 16-599-552-3755 lar dinos nag slyly brave 8 3294.68 Supplier#000000350 GERMANY 4841 Manufacturer#4 KIFxV73eovmwhh 17-113-181-4017 e slyly special foxes. furiously unusual depos... 1 2972.26 Supplier#000000016 RUSSIA 1015 Manufacturer#4 YjP5C55zHDXL7LalK27zfQnwejdpin4AMpvh 32-822-502-4215 ously express ideas haggle quickly dugouts? fu 4 2963.09 Supplier#000000840 ROMANIA 3080 Manufacturer#2 iYzUIypKhC0Y 29-781-337-5584 eep blithely regular dependencies. blithely re... 35 2221.25 Supplier#000000771 ROMANIA 13981 Manufacturer#2 lwZ I15rq9kmZXUNhl 29-986-304-9006 nal foxes eat slyly about the fluffily permane... 40 1381.97 Supplier#000000104 FRANCE 18103 Manufacturer#3 Dcl4yGrzqv3OPeRO49bKh78XmQEDR7PBXIs0m 16-434-972-6922 gular ideas. bravely bold deposits haggle thro... 18 906.07 Supplier#000000138 ROMANIA 8363 Manufacturer#4 utbplAm g7RmxVfYoNdhcrQGWuzRqPe0qHSwbKw 29-533-434-6776 ickly unusual requests cajole. accounts above ... 25 765.69 Supplier#000000799 RUSSIA 11276 Manufacturer#2 jwFN7ZB3T9sMF 32-579-339-1495 nusual requests. furiously unusual epitaphs in... 13 727.89 Supplier#000000470 ROMANIA 6213 Manufacturer#3 XckbzsAgBLbUkdfjgJEPjmUMTM8ebSMEvI 29-165-289-1523 gular excuses. furiously regular excuses sleep... 9 683.07 Supplier#000000651 RUSSIA 4888 Manufacturer#4 oWekiBV6s,1g 32-181-426-4490 ly regular requests cajole abou 2 167.56 Supplier#000000290 FRANCE 2037 Manufacturer#1 6Bk06GVtwZaKqg01 16-675-286-5102 the theodolites. ironic, ironic deposits above 19 91.39 Supplier#000000949 UNITED KINGDOM 9430 Manufacturer#2 a,UE,6nRVl2fCphkOoetR1ajIzAEJ1Aa1G1HV 33-332-697-2768 pinto beans. carefully express requests hagg 38 -314.06 Supplier#000000510 ROMANIA 17242 Manufacturer#4 VmXQl ,vY8JiEseo8Mv4zscvNCfsY 29-207-852-3454 bold deposits. carefully even d 3 -820.89 Supplier#000000409 GERMANY 2156 Manufacturer#5 LyXUYFz7aXrvy65kKAbTatGzGS,NDBcdtD 17-719-517-9836 y final, slow theodolites. furiously regular req 20 -845.44 Supplier#000000704 ROMANIA 9926 Manufacturer#5 hQvlBqbqqnA5Dgo1BffRBX78tkkRu 29-300-896-5991 ctions. carefully sly requ 12 -942.73 Supplier#000000563 GERMANY 5797 Manufacturer#1 Rc7U1cRUhYs03JD 17-108-537-2691 slyly furiously final decoys; silent, special ... Query 2 - Iteration 0 finished in 0.3807s l_orderkey revenue o_orderdate o_shippriority 435 223140 355369.0698 1995-03-14 0 1175 584291 354494.7318 1995-02-21 0 796 405063 353125.4577 1995-03-03 0 1150 573861 351238.2770 1995-03-09 0 1113 554757 349181.7426 1995-03-14 0 1019 506021 321075.5810 1995-03-10 0 218 121604 318576.4154 1995-03-07 0 197 108514 314967.0754 1995-02-20 0 928 462502 312604.5420 1995-03-08 0 346 178727 309728.9306 1995-02-25 0 Query 3 - Iteration 0 finished in 0.1659s o_orderpriority order_count 0 1-URGENT 999 1 2-HIGH 997 2 3-MEDIUM 1031 3 4-NOT SPECIFIED 989 4 5-LOW 1077 Query 4 - Iteration 0 finished in 0.1455s n_name revenue 4 VIETNAM -4.497841e+06 2 INDONESIA -5.580475e+06 3 JAPAN -6.000077e+06 1 INDIA -6.376122e+06 0 CHINA -7.822103e+06 Query 5 - Iteration 0 finished in 0.2009s revenue 0 11803420.2534 Query 6 - Iteration 0 finished in 0.0596s supp_nation cust_nation l_year revenue 0 FRANCE GERMANY 1995 -4.637235e+06 1 FRANCE GERMANY 1996 -5.224780e+06 2 GERMANY FRANCE 1995 -6.232819e+06 3 GERMANY FRANCE 1996 -5.557312e+06 Query 7 - Iteration 0 finished in 0.2901s o_year mkt_share 0 1995 0.03 1 1996 0.02 Query 8 - Iteration 0 finished in 0.6331s nation o_year sum_profit 0 ALGERIA 1998 4.716279e+06 1 ALGERIA 1997 8.071240e+06 2 ALGERIA 1996 9.273503e+06 3 ALGERIA 1995 8.472341e+06 4 ALGERIA 1994 8.718336e+06 .. ... ... ... 170 VIETNAM 1996 8.576511e+06 171 VIETNAM 1995 8.890273e+06 172 VIETNAM 1994 8.934413e+06 173 VIETNAM 1993 6.282243e+06 174 VIETNAM 1992 8.378368e+06 [175 rows x 3 columns] Query 9 - Iteration 0 finished in 0.4285s Iteration Summary ======================================= query: 0 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.0003 max time : 0.0003 mean time: 0.0003 ======================================= query: 1 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 2.3091 max time : 2.3091 mean time: 2.3091 ======================================= query: 2 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.3807 max time : 0.3807 mean time: 0.3807 ======================================= query: 3 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.1659 max time : 0.1659 mean time: 0.1659 ======================================= query: 4 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.1455 max time : 0.1455 mean time: 0.1455 ======================================= query: 5 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.2009 max time : 0.2009 mean time: 0.2009 ======================================= query: 6 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.0596 max time : 0.0596 mean time: 0.0596 ======================================= query: 7 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.2901 max time : 0.2901 mean time: 0.2901 ======================================= query: 8 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.6331 max time : 0.6331 mean time: 0.6331 ======================================= query: 9 path: /cudf/sf0.1 scale_factor: 0.1 executor: in-memory iterations: 1 --------------------------------------- min time : 0.4285 max time : 0.4285 mean time: 0.4285 ======================================= Total mean time across all queries: 4.6137 seconds ``` </details> Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#20418
1 parent 5f8ee01 commit 03bf586

File tree

1 file changed

+336
-2
lines changed
  • python/cudf/cudf/pandas/_benchmarks

1 file changed

+336
-2
lines changed

python/cudf/cudf/pandas/_benchmarks/pdsh.py

Lines changed: 336 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,13 @@
1616
from datetime import date
1717
from typing import TYPE_CHECKING
1818

19-
import pandas as pd
19+
import cudf.pandas
2020

21-
from cudf.pandas._benchmarks.utils import (
21+
cudf.pandas.install()
22+
23+
import pandas as pd # noqa: E402
24+
25+
from cudf.pandas._benchmarks.utils import ( # noqa: E402
2226
get_data,
2327
run_pandas,
2428
)
@@ -72,6 +76,336 @@ def q1(run_config: RunConfig) -> pd.DataFrame:
7276

7377
return agg.sort_values(["l_returnflag", "l_linestatus"])
7478

79+
@staticmethod
80+
def q2(run_config: RunConfig) -> pd.DataFrame:
81+
"""Query 2."""
82+
nation = get_data(run_config.dataset_path, "nation", run_config.suffix)
83+
part = get_data(run_config.dataset_path, "part", run_config.suffix)
84+
partsupp = get_data(
85+
run_config.dataset_path, "partsupp", run_config.suffix
86+
)
87+
region = get_data(run_config.dataset_path, "region", run_config.suffix)
88+
supplier = get_data(
89+
run_config.dataset_path, "supplier", run_config.suffix
90+
)
91+
92+
var1 = 15
93+
var2 = "BRASS"
94+
var3 = "EUROPE"
95+
96+
jn = (
97+
part.merge(partsupp, left_on="p_partkey", right_on="ps_partkey")
98+
.merge(supplier, left_on="ps_suppkey", right_on="s_suppkey")
99+
.merge(nation, left_on="s_nationkey", right_on="n_nationkey")
100+
.merge(region, left_on="n_regionkey", right_on="r_regionkey")
101+
)
102+
103+
jn = jn[jn["p_size"] == var1]
104+
jn = jn[jn["p_type"].str.endswith(var2)]
105+
jn = jn[jn["r_name"] == var3]
106+
107+
gb = jn.groupby("p_partkey", as_index=False)
108+
agg = gb["ps_supplycost"].min()
109+
jn2 = agg.merge(jn, on=["p_partkey", "ps_supplycost"])
110+
111+
sel = jn2.loc[
112+
:,
113+
[
114+
"s_acctbal",
115+
"s_name",
116+
"n_name",
117+
"p_partkey",
118+
"p_mfgr",
119+
"s_address",
120+
"s_phone",
121+
"s_comment",
122+
],
123+
]
124+
125+
sort = sel.sort_values(
126+
by=["s_acctbal", "n_name", "s_name", "p_partkey"],
127+
ascending=[False, True, True, True],
128+
)
129+
return sort.head(100)
130+
131+
@staticmethod
132+
def q3(run_config: RunConfig) -> pd.DataFrame:
133+
"""Query 3."""
134+
customer = get_data(
135+
run_config.dataset_path, "customer", run_config.suffix
136+
)
137+
lineitem = get_data(
138+
run_config.dataset_path, "lineitem", run_config.suffix
139+
)
140+
orders = get_data(run_config.dataset_path, "orders", run_config.suffix)
141+
142+
var1 = "BUILDING"
143+
var2 = date(1995, 3, 15)
144+
145+
fcustomer = customer[customer["c_mktsegment"] == var1]
146+
147+
jn1 = fcustomer.merge(
148+
orders, left_on="c_custkey", right_on="o_custkey"
149+
)
150+
jn2 = jn1.merge(lineitem, left_on="o_orderkey", right_on="l_orderkey")
151+
152+
jn2 = jn2[jn2["o_orderdate"] < var2]
153+
jn2 = jn2[jn2["l_shipdate"] > var2]
154+
jn2["revenue"] = jn2.l_extendedprice * (1 - jn2.l_discount)
155+
156+
gb = jn2.groupby(
157+
["o_orderkey", "o_orderdate", "o_shippriority"], as_index=False
158+
)
159+
agg = gb["revenue"].sum()
160+
161+
sel = agg.loc[
162+
:, ["o_orderkey", "revenue", "o_orderdate", "o_shippriority"]
163+
]
164+
sel = sel.rename(columns={"o_orderkey": "l_orderkey"})
165+
166+
sorted_df = sel.sort_values(
167+
by=["revenue", "o_orderdate"], ascending=[False, True]
168+
)
169+
return sorted_df.head(10)
170+
171+
@staticmethod
172+
def q4(run_config: RunConfig) -> pd.DataFrame:
173+
"""Query 4."""
174+
lineitem = get_data(
175+
run_config.dataset_path, "lineitem", run_config.suffix
176+
)
177+
orders = get_data(run_config.dataset_path, "orders", run_config.suffix)
178+
179+
var1 = date(1993, 7, 1)
180+
var2 = date(1993, 10, 1)
181+
182+
jn = lineitem.merge(
183+
orders, left_on="l_orderkey", right_on="o_orderkey"
184+
)
185+
186+
jn = jn[(jn["o_orderdate"] >= var1) & (jn["o_orderdate"] < var2)]
187+
jn = jn[jn["l_commitdate"] < jn["l_receiptdate"]]
188+
189+
jn = jn.drop_duplicates(subset=["o_orderpriority", "l_orderkey"])
190+
191+
gb = jn.groupby("o_orderpriority", as_index=False)
192+
agg = gb.agg(
193+
order_count=pd.NamedAgg(column="o_orderkey", aggfunc="count")
194+
)
195+
196+
return agg.sort_values(["o_orderpriority"])
197+
198+
@staticmethod
199+
def q5(run_config: RunConfig) -> pd.DataFrame:
200+
"""Query 5."""
201+
path = run_config.dataset_path
202+
suffix = run_config.suffix
203+
customer = get_data(path, "customer", suffix)
204+
lineitem = get_data(path, "lineitem", suffix)
205+
nation = get_data(path, "nation", suffix)
206+
orders = get_data(path, "orders", suffix)
207+
region = get_data(path, "region", suffix)
208+
supplier = get_data(path, "supplier", suffix)
209+
210+
var1 = "ASIA"
211+
var2 = date(1994, 1, 1)
212+
var3 = date(1995, 1, 1)
213+
214+
jn1 = region.merge(
215+
nation, left_on="r_regionkey", right_on="n_regionkey"
216+
)
217+
jn2 = jn1.merge(
218+
customer, left_on="n_nationkey", right_on="c_nationkey"
219+
)
220+
jn3 = jn2.merge(orders, left_on="c_custkey", right_on="o_custkey")
221+
jn4 = jn3.merge(lineitem, left_on="o_orderkey", right_on="l_orderkey")
222+
jn5 = jn4.merge(
223+
supplier,
224+
left_on=["l_suppkey", "n_nationkey"],
225+
right_on=["s_suppkey", "s_nationkey"],
226+
)
227+
228+
jn5 = jn5[jn5["r_name"] == var1]
229+
jn5 = jn5[(jn5["o_orderdate"] >= var2) & (jn5["o_orderdate"] < var3)]
230+
jn5["revenue"] = jn5.l_extendedprice * (1.0 - jn5.l_discount)
231+
232+
gb = jn5.groupby("n_name", as_index=False)["revenue"].sum()
233+
return gb.sort_values("revenue", ascending=False)
234+
235+
@staticmethod
236+
def q6(run_config: RunConfig) -> pd.DataFrame:
237+
"""Query 6."""
238+
path = run_config.dataset_path
239+
suffix = run_config.suffix
240+
lineitem = get_data(path, "lineitem", suffix)
241+
242+
var1 = date(1994, 1, 1)
243+
var2 = date(1995, 1, 1)
244+
var3 = 0.05
245+
var4 = 0.07
246+
var5 = 24
247+
248+
filt = lineitem[
249+
(lineitem["l_shipdate"] >= var1) & (lineitem["l_shipdate"] < var2)
250+
]
251+
filt = filt[
252+
(filt["l_discount"] >= var3) & (filt["l_discount"] <= var4)
253+
]
254+
filt = filt[filt["l_quantity"] < var5]
255+
result_value = (filt["l_extendedprice"] * filt["l_discount"]).sum()
256+
return pd.DataFrame({"revenue": [result_value]})
257+
258+
@staticmethod
259+
def q7(run_config: RunConfig) -> pd.DataFrame:
260+
"""Query 7."""
261+
customer = get_data(
262+
run_config.dataset_path, "customer", run_config.suffix
263+
)
264+
lineitem = get_data(
265+
run_config.dataset_path, "lineitem", run_config.suffix
266+
)
267+
nation = get_data(run_config.dataset_path, "nation", run_config.suffix)
268+
orders = get_data(run_config.dataset_path, "orders", run_config.suffix)
269+
supplier = get_data(
270+
run_config.dataset_path, "supplier", run_config.suffix
271+
)
272+
273+
var1 = "FRANCE"
274+
var2 = "GERMANY"
275+
var3 = date(1995, 1, 1)
276+
var4 = date(1996, 12, 31)
277+
278+
n1 = nation[(nation["n_name"] == var1)]
279+
n2 = nation[(nation["n_name"] == var2)]
280+
281+
# Part 1
282+
jn1 = customer.merge(n1, left_on="c_nationkey", right_on="n_nationkey")
283+
jn2 = jn1.merge(orders, left_on="c_custkey", right_on="o_custkey")
284+
jn2 = jn2.rename(columns={"n_name": "cust_nation"})
285+
jn3 = jn2.merge(lineitem, left_on="o_orderkey", right_on="l_orderkey")
286+
jn4 = jn3.merge(supplier, left_on="l_suppkey", right_on="s_suppkey")
287+
jn5 = jn4.merge(n2, left_on="s_nationkey", right_on="n_nationkey")
288+
df1 = jn5.rename(columns={"n_name": "supp_nation"})
289+
290+
# Part 2
291+
jn1 = customer.merge(n2, left_on="c_nationkey", right_on="n_nationkey")
292+
jn2 = jn1.merge(orders, left_on="c_custkey", right_on="o_custkey")
293+
jn2 = jn2.rename(columns={"n_name": "cust_nation"})
294+
jn3 = jn2.merge(lineitem, left_on="o_orderkey", right_on="l_orderkey")
295+
jn4 = jn3.merge(supplier, left_on="l_suppkey", right_on="s_suppkey")
296+
jn5 = jn4.merge(n1, left_on="s_nationkey", right_on="n_nationkey")
297+
df2 = jn5.rename(columns={"n_name": "supp_nation"})
298+
299+
# Combine
300+
total = pd.concat([df1, df2])
301+
302+
total = total[
303+
(total["l_shipdate"] >= var3) & (total["l_shipdate"] <= var4)
304+
]
305+
total["volume"] = total["l_extendedprice"] * (
306+
1.0 - total["l_discount"]
307+
)
308+
total["l_year"] = total["l_shipdate"].dt.year
309+
310+
gb = total.groupby(
311+
["supp_nation", "cust_nation", "l_year"], as_index=False
312+
)
313+
agg = gb.agg(revenue=pd.NamedAgg(column="volume", aggfunc="sum"))
314+
315+
return agg.sort_values(by=["supp_nation", "cust_nation", "l_year"])
316+
317+
@staticmethod
318+
def q8(run_config: RunConfig) -> pd.DataFrame:
319+
"""Query 8."""
320+
customer = get_data(
321+
run_config.dataset_path, "customer", run_config.suffix
322+
)
323+
lineitem = get_data(
324+
run_config.dataset_path, "lineitem", run_config.suffix
325+
)
326+
nation = get_data(run_config.dataset_path, "nation", run_config.suffix)
327+
orders = get_data(run_config.dataset_path, "orders", run_config.suffix)
328+
part = get_data(run_config.dataset_path, "part", run_config.suffix)
329+
region = get_data(run_config.dataset_path, "region", run_config.suffix)
330+
supplier = get_data(
331+
run_config.dataset_path, "supplier", run_config.suffix
332+
)
333+
334+
var1 = "BRAZIL"
335+
var2 = "AMERICA"
336+
var3 = "ECONOMY ANODIZED STEEL"
337+
var4 = date(1995, 1, 1)
338+
var5 = date(1996, 12, 31)
339+
340+
n1 = nation.loc[:, ["n_nationkey", "n_regionkey"]]
341+
n2 = nation.loc[:, ["n_nationkey", "n_name"]]
342+
343+
jn1 = part.merge(lineitem, left_on="p_partkey", right_on="l_partkey")
344+
jn2 = jn1.merge(supplier, left_on="l_suppkey", right_on="s_suppkey")
345+
jn3 = jn2.merge(orders, left_on="l_orderkey", right_on="o_orderkey")
346+
jn4 = jn3.merge(customer, left_on="o_custkey", right_on="c_custkey")
347+
jn5 = jn4.merge(n1, left_on="c_nationkey", right_on="n_nationkey")
348+
jn6 = jn5.merge(region, left_on="n_regionkey", right_on="r_regionkey")
349+
350+
jn6 = jn6[(jn6["r_name"] == var2)]
351+
352+
jn7 = jn6.merge(n2, left_on="s_nationkey", right_on="n_nationkey")
353+
354+
jn7 = jn7[(jn7["o_orderdate"] >= var4) & (jn7["o_orderdate"] <= var5)]
355+
jn7 = jn7[jn7["p_type"] == var3]
356+
357+
jn7["o_year"] = jn7["o_orderdate"].dt.year
358+
jn7["volume"] = jn7["l_extendedprice"] * (1.0 - jn7["l_discount"])
359+
jn7 = jn7.rename(columns={"n_name": "nation"})
360+
361+
def udf(df: pd.DataFrame) -> float:
362+
demonimator: float = df["volume"].sum()
363+
df = df[df["nation"] == var1]
364+
numerator: float = df["volume"].sum()
365+
return round(numerator / demonimator, 2)
366+
367+
gb = jn7.groupby("o_year", as_index=False)
368+
agg = gb.apply(udf, include_groups=False)
369+
agg.columns = ["o_year", "mkt_share"]
370+
return agg.sort_values("o_year")
371+
372+
@staticmethod
373+
def q9(run_config: RunConfig) -> pd.DataFrame:
374+
"""Query 9."""
375+
path = run_config.dataset_path
376+
suffix = run_config.suffix
377+
lineitem = get_data(path, "lineitem", suffix)
378+
nation = get_data(path, "nation", suffix)
379+
orders = get_data(path, "orders", suffix)
380+
part = get_data(path, "part", suffix)
381+
partsupp = get_data(path, "partsupp", suffix)
382+
supplier = get_data(path, "supplier", suffix)
383+
384+
jn1 = part.merge(partsupp, left_on="p_partkey", right_on="ps_partkey")
385+
jn2 = jn1.merge(supplier, left_on="ps_suppkey", right_on="s_suppkey")
386+
jn3 = jn2.merge(
387+
lineitem,
388+
left_on=["p_partkey", "ps_suppkey"],
389+
right_on=["l_partkey", "l_suppkey"],
390+
)
391+
jn4 = jn3.merge(orders, left_on="l_orderkey", right_on="o_orderkey")
392+
jn5 = jn4.merge(nation, left_on="s_nationkey", right_on="n_nationkey")
393+
394+
jn5 = jn5[jn5["p_name"].str.contains("green", regex=False)]
395+
396+
jn5["o_year"] = jn5["o_orderdate"].dt.year
397+
jn5["amount"] = jn5["l_extendedprice"] * (1.0 - jn5["l_discount"]) - (
398+
jn5["ps_supplycost"] * jn5["l_quantity"]
399+
)
400+
jn5 = jn5.rename(columns={"n_name": "nation"})
401+
402+
gb = jn5.groupby(["nation", "o_year"], as_index=False, sort=False)
403+
agg = gb.agg(sum_profit=pd.NamedAgg(column="amount", aggfunc="sum"))
404+
sorted_df = agg.sort_values(
405+
by=["nation", "o_year"], ascending=[True, False]
406+
)
407+
return sorted_df.reset_index(drop=True)
408+
75409

76410
if __name__ == "__main__":
77411
run_pandas(PDSHQueries)

0 commit comments

Comments
 (0)